Establishing a Carotid Artery Stenosis Disease Cohort for Comparative Effectiveness Research Using Natural Language Processing.
Investigation of asymptomatic carotid stenosis treatment is hindered by the lack of a contemporary population-based disease cohort. We describe the use of natural language processing (NLP) to identify stenosis in patients undergoing carotid imaging.Adult patients with carotid imaging between 2008-2012 in a large integrated health care system were identified and followed through 2017. An NLP process was developed to characterize carotid stenosis according to the Society of Radiologists in Ultrasound (for ultrasounds) and NASCET (for axial imaging) guidelines. The resulting algorithm assessed text descriptors to categorize normal/non-hemodynamically significant stenosis, moderate or severe stenosis as well as occlusion in both carotid ultrasound (US) and axial imaging (computed tomography and magnetic resonance angiography [CTA/MRA]). For US reports, internal carotid artery systolic and diastolic velocities and velocity ratios were assessed and matched for laterality to supplement accuracy. To validate the NLP algorithm, positive predictive value (PPV or precision) and sensitivity (recall) were calculated from simple random samples from the population of all imaging studies. Lastly, all non-normal studies were manually reviewed for confirmation for prevalence estimates and disease cohort assembly.A total of 95,896 qualifying index studies (76,276 US and 19,620 CTA/MRA) were identified among 94,822 patients including 1,059 patients who underwent multiple studies on the same day. For studies of normal/non-hemodynamically significant stenosis arteries, the NLP algorithm showed excellent performance with a PPV of 99% for US and 96.5% for CTA/MRA. PPV/sensitivity to identify a non-normal artery with correct laterality in the CTA/MRA and US samples were 76.9% (95% CI 74.1-79.5%)/93.1% (95% CI 91.1-94.8%) and 74.7% (95% CI 69.3-79.5%)/94% (95% CI 90.2-96.7%), respectively. Regarding cohort assembly, 15,522 patients were identified with diseased carotid artery including 2,674 exhibiting equal bilateral disease. This resulted in a laterality-specific cohort with 12,828 moderate, 5,283 severe, and 1,895 occluded arteries and 326 diseased arteries with unknown stenosis. During follow-up, 30.1% of these patients underwent 61,107 additional studies.Use of NLP to detect carotid stenosis or occlusion can result in accurate exclusion of normal/non-hemodynamically significant stenosis disease states with more moderate precision with lesion identification which can substantially reduce the need for manual review. The resulting cohort allows for efficient research and holds promise for similar reporting in other vascular diseases.
Authors: Robert W Chang, Lue-Yen Tucker, Kara A Rothenberg, Elizabeth M Lancaster, Andrew L Avins, Hui C Kuang, Rishad M Faruqi, Mai N Nguyen-Huynh