Multi-Modal Machine Learning for Precision Medicine
The promise of precision medicine lies in data diversity. More than the sheer size of biomedical data, it is the layering of multiple data modalities, offering complementary perspectives, that is thought to enable the identification of coherent patient subgroups with shared pathophysiology. Here we use autism spectrum disorder (ASD) to test this notion. By combining healthcare claims, electronic health records, familial whole-exome sequences, and neurodevelopmental expression patterns, we identified a robust subgroup of patients with dyslipidemia-associated ASD.
Autism affects 1 in 54 children in the United States, 80% of whom are boys. Autism refers to a very heterogeneous set of brain development conditions, all characterized by reduced social and communication abilities, and increased repetitive and stereotypical behaviors. Its etiology and mechanism is poorly understood. It is still diagnosed only based on symptoms, but when you see it, it's too late, you let critical brain developmental time windows go by without any intervention.
Our study is the first precision medicine approach that overlays all types of healthcare data we can get to study autism, one of the most complex heritable disorders. The idea is similar to that of today’s maps. In order to get a true representation of the real world we overlay different layers of information on one another. For example cities, streets, parcels, land use, and elevation are often integrated on our Google map to understand the real world around us.
Our study is also the first in using state-of-the-art AI algorithms--graph clustering--to aggregate functionally related genetic mutations, and to find novel mechanisms of autism. This is a task of finding a needle in a haystack, as we have thousands of variants in hundreds of genes thought to underlie autism, each of which is mutated in less than 1% of the families. Our intuition is that mutations hitting the genes that together perform a certain function will have a shared impact, and can be lumped together in assessment. This gives us the magnifier for looking under the hood.
Our study unveiled opportunities for early biochemical screening and early intervention of autism. First, this study suggests that families with a history of dyslipidemia may be at increased risk for having children with autism. They should be counseled and monitored accordingly. Second, common lipid lab tests, including total cholesterol, HDL, LDL, and triglyceride levels may be informative for screening newborns for increased ASD risk. Third, metabolomic studies, which include fatty acid derivatives, may be used for early screening. Our study also showcased a generalizable way of using multiple data modalities for subtyping autism and frankly many other genetically complex diseases to inform targeted clinical trials.
Select Publication
Media Coverage
- Spectrum News "Blood lipid levels may be altered in some autistic people"
- Neuroscience News "AI-Enhanced Precision Medicine Identifies Novel Autism Subtype"
- Genome Web "Autism Subtype Marked by Abnormal Lipid Levels Uncovered in Analysis of Research, Healthcare Data"
- Health IT Analytics "AI, Precision Medicine Tool May Enable Early Autism Diagnosis"
- YNet "A new study has found a link between high cholesterol levels in pregnancy and the risk of autism"
- Northwestern News "Artificial intelligence tool lays groundwork for autism early diagnosis and intervention"
- Feinberg News "AI-Enhanced Approach Offers New Hope for Earlier Autism Diagnoses"