Machine learning combines advanced statistical methods with the results from validation studies (“supervised” learning) to construct new case-identifying algorithms. We have found these methods to offer marked improvement in the accuracy with which we can identify patients of interest. These methods not only reduce bias from misclassification, which translates into more valid research, but we have also been able to identify patients with conditions for which there are no diagnosis codes in claims, such as cancer stage or biomarker status. In this way, supervised machine learning coupled with validation can expand the utility of claims databases — allowing for the accurate characterization of difficult to measure populations or study endpoints.
The development of machine learning methods has been utilized in database studies conducted throughout the drug development process. Before a drug is marketed, information is needed about the target population, including its size, treatment patterns, comorbidities, health resource utilization and cost of care. By honing in on this population more accurately, our clients are better prepared to understand the unmet needs and deliver new therapies to patients who can benefit most. At approval, regulators seek additional information on rare safety outcomes that can be difficult to identify and where results depend on accurately identifying small numbers of cases. Identifying too many cases can overstate the risks, while failing to identify cases means that risks are underestimated. In safety studies, machine learning algorithms enable higher quality safety studies and better-informed decisions about risks and benefits of treatment decisions.