The paper titled “St. Michael’s Hospital Tuberculosis Database (SMH-TB), a retrospective cohort of electronic health record data and variables extracted using natural language processing”, led by David and Ahmed, was recently accepted for publication at PLOS ONE.
In this work, we created a database by extracting demographics, treatment and diagnosis variables related to tuberculosis (TB) from retrospective outpatient visits to the TB clinic at St. Michael’s Hospital. We collected and extracted data from structured (patient demographics, test results) and unstructured sources (physician dictations during patient visits) using text mining methods. We evaluated the database for the purposes of answering questions related to latent TB infection treatment and described future applications of our database to areas from quality improvement to disease modeling and even machine learning research.