PURPOSE:Clinical research in pancreatic cancer (PC) has been limited because of a lack of granular data in national data sets. An electronic health record (EHR)–based data set specifically designed for PC has immense potential to advance research. This study describes the creation of an EHR-based data commons for patients with PC.
METHODS:We generated an index cohort of adult patients at our institution diagnosed with PC (International Classification of Diseases for Oncology, codes C25.0-25.9) between January 1, 2010, and December 31, 2023. To develop the Pancreatic Cancer Data Commons (PCDC), we linked six data sources: (1) institutional EHR data, (2) cancer-specific data from the North American Association of Central Cancer Registries, (3) surgical outcomes from the National Surgical Quality Improvement Program, (4) community-level data from the American Community Survey, (5) national mortality data from Obituary.com, and (6) genomic data from the cBioPortal for Cancer Genomics. We evaluated the feasibility of using the Observational Medical Outcomes Partnership common data model. The data set is stored on a cloud-based, Health Insurance Portability and Accountability Act–secure, and National Institute of Standards and Technology–compliant server.
RESULTS:The PCDC currently includes data of 3,542 unique patients. The mean age at diagnosis is 66.6 ± 11.7 years; 53.3% is male, and 92.2% is White. Linkage to six national data sets increased the completeness of cancer-specific data from 31.3% to 71.6%. Most patients presented at stage IV (43.6%), followed by stage I (22.6%). As of the latest update, 1,074 (30.3%) patients were still alive.
CONCLUSION:The PCDC is a centralized resource that solves a gap in PC research. The ability to securely link and analyze protected patient data is a strategic step toward enhancing clinical research and optimizing care for patients with PC. Our future work includes expanding the PCDC to multiple centers using common data models.