west china medical publishers
Keyword
  • Title
  • Author
  • Keyword
  • Abstract
Advance search
Advance search

Search

find Keyword "Large language model" 5 results
  • The application of large language models in the field of evidence-based medicine

    Large Language Models (LLMs) are highly sophisticated deep learning models pre-trained on massive datasets, with ChatGPT representing a prominent application of LLMs in the field of generative models. Since the release of ChatGPT at the end of 2022, generative chatbots have become widely employed across various medical disciplines. As a crucial discipline guiding clinical practices, the usage of generative chatbots like ChatGPT in Evidence-Based Medicine (EBM) is gradually increasing. However, the potential, challenges, and intricacies of their application in the domain of EBM remain unclear. This paper aims to explore and discuss the prospects, challenges, and considerations associated with the application of ChatGPT in the field of EBM through a review of relevant literature. The discussion spans four aspects: evidence generation, synthesis, assessment, dissemination, and implementation, providing researchers with insights into the latest developments and future research suggestions.

    Release date: Export PDF Favorites Scan
  • Application of large language models in sarcopenia diagnosis and treatment: a comparative study with clinical decision-making by physicians

    ObjectiveTo evaluate the quality differences in recommendations generated by large language models (LLMs) and clinical practitioners for sarcopenia-related questions. MethodsA sarcopenia knowledge base was constructed based on the latest domestic and international research and consensus guidelines. Using the Python environment, a locally deployed and sarcopenia-focused hybrid vertical LLM (referred to as LC) was implemented via LangChain-LLM. Eight fixed questions covering etiology, diagnosis, and prevention were selected, along with eight virtual patient cases. The evaluation team assessed the quality of answers generated by LC and written by clinical practitioners. Quantitative analysis was performed on the precision, recall, and F1 scores (harmonic mean of precision and recall) of treatment recommendations. ResultsThe responses were generally perceived as "possibly written by humans or AI", with a stronger inclination toward being AI-generated, although the accuracy of such judgments was low. Regarding answer quality attributes, LC's responses were superior to those of clinical practitioners in guideline consistency (P<0.01), exhibited similar acceptability (P>0.05), showed better practicality (P<0.05), and had a lower proportion of "1–2 errors" (P<0.05). Quantitative analysis of treatment recommendations indicated that LC and GPT-4.0 outperformed clinical practitioners in recall and F1 scores (P<0.05), with minimal differences between LC and GPT-4.0. ConclusionThe locally deployed sarcopenia-focused hybrid vertical LLM demonstrates high accuracy and applicability in addressing sarcopenia-related issues, outperforming clinical practitioners and exhibiting strong clinical decision-support capabilities.

    Release date: Export PDF Favorites Scan
  • Evaluation of the accuracy of the large language model for risk of bias assessment in analytical studies

    Objective To systematically review the accuracy and consistency of large language models (LLM) in assessing risk of bias in analytical studies. Methods The cohort and case-control studies related to COVID-19 based on the team's published systematic review of clinical characteristics of COVID-19 were included. Two researchers independently screened the studies, extracted data, and assessed risk of bias of the included studies with the LLM-based BiasBee model (version Non-RCT) used for automated evaluation. Kappa statistics and score differences were used to analyze the agreement between LLM and human evaluations, with subgroup analysis for Chinese and English studies. Results A total of 210 studies were included. Meta-analysis showed that LLM scores were generally higher than those of human evaluators, particularly in representativeness of exposed cohorts (△=0.764) and selection of external controls (△=0.109). Kappa analysis indicated slight agreement in items such as exposure assessment (κ=0.059) and adequacy of follow-up (κ=0.093), while showing significant discrepancies in more subjective items, such as control selection (κ=−0.112) and non-response rate (κ=−0.115). Subgroup analysis revealed higher scoring consistency for LLM in English-language studies compared to that of Chinese-language studies. Conclusion LLM demonstrate potential in risk of bias assessment; however, notable differences remain in more subjective tasks. Future research should focus on optimizing prompt engineering and model fine-tuning to enhance LLM accuracy and consistency in complex tasks.

    Release date: Export PDF Favorites Scan
  • Evolution of large language models and their applications in clinical medical education

    Large language models (LLMs), a key component of artificial intelligence (AI), represent a significant breakthrough in natural language processing. As the capabilities of LLMs continue to evolve, their potential applications and future implications in clinical medical education warrant considerable attention. This study systematically reviews the development of LLMs, explores their innovative applications within the context of current challenges in clinical medical education, and critically assesses both the advantages and limitations of their implementation. The objective is to provide a comprehensive reference for the continued integration of AI-driven LLMs into clinical medical education.

    Release date: Export PDF Favorites Scan
  • Interpretation of the TRIPOD-LLM reporting guideline for studies using large language models

    [Abstract]As the volume of medical research using large language models (LLMs) surges, the need for standardized and transparent reporting standards becomes increasingly critical. In January 2025, Nature Medicine published “TRIPOD-LLM reporting guideline for studies using large language models”. This represents the first comprehensive reporting framework specifically tailored for studies that develop prediction models based on LLMs. It comprises a checklist with 19 main items (encompassing 50 sub-items), a flowchart, and an abstract checklist (containing 12 items). This article provides an interpretation of TRIPOD-LLM’s development methods, primary content, scope, and the specific details of its items. The goal is to help researchers, clinicians, editors, and healthcare decision-makers to deeply understand and correctly apply TRIPOD-LLM, thereby improving the quality and transparency of LLM medical research reporting and promoting the standardized and ethical integration of LLMs into healthcare.

    Release date: Export PDF Favorites Scan
1 pages Previous 1 Next

Format

Content