• 1. Department of Thoracic Surgery, Peking Union Medical College Hospital, Beijing, 100010, P. R. China;
  • 2. Department of Information, Peking Union Medical College Hospital, Beijing, 100010, P. R. China;
LI Shanqing, Email: lishanqing@pumch.cn
Export PDF Favorites Scan Get Citation

Objective  To develop an artificial intelligence (AI)-driven lung cancer database by structuring and standardizing clinical data, enabling advanced data mining for lung cancer research, and providing high-quality data for real-world studies. Methods  Building on the extensive clinical data resources of the Department of Thoracic Surgery at Peking Union Medical College Hospital, this study utilized machine learning techniques, particularly natural language processing (NLP), to automatically process unstructured data from electronic medical records, examination reports, and pathology reports, converting them into structured formats. Data governance and automated cleaning methods were employed to ensure data integrity and consistency. Results  As of September 2024, the database included comprehensive data from 18 811 patients, encompassing inpatient and outpatient records, examination and pathology reports, physician orders, and follow-up information, creating a well-structured, multi-dimensional dataset with rich variables. The database’s real-time querying and multi-layer filtering functions enabled researchers to efficiently retrieve study data that meet specific criteria, significantly enhancing data processing speed and advancing research progress. In a real-world application exploring the prognosis of non-small cell lung cancer, the database facilitated the rapid analysis of prognostic factors. Research findings indicated that factors such as tumor staging and comorbidities had a significant impact on patient survival rates, further demonstrating the database’s value in clinical big data mining. Conclusion  The AI-driven lung cancer database enhances data management and analysis efficiency, providing strong support for large-scale clinical research, retrospective studies, and disease management. With the ongoing integration of large language models and multi-modal data, the database’s precision and analytical capabilities are expected to improve further, providing stronger support for big data mining and real-world research of lung cancer.

Copyright © the editorial department of Chinese Journal of Clinical Thoracic and Cardiovascular Surgery of West China Medical Publisher. All rights reserved