Illumina today announced the new PrimateAI-3D, an artificial intelligence (AI) algorithm trained on genomic sequencing data from 233 species of nonhuman primates. Because of their closeness to the human genome, nonhuman primate species are uniquely valuable in teaching us about the genetic basis of human diseases. PrimateAI-3D is shown to predict with unprecedented accuracy disease-causing genetic mutations in patients. The results are published in two papers in the June 2 issue

Published findings demonstrate PrimateAI-3D’s ability to improve genetic risk prediction and drug target discovery using primate DNA and advanced artificial intelligence.

Illumina Inc. (NASDAQ: ILMN), a global leader in DNA sequencing and array-based technologies, today announced the new PrimateAI-3D, an artificial intelligence (AI) algorithm that predicts with unprecedented accuracy disease-causing genetic mutations in patients. The results are published in two papers in the June 2 issue of Science (issue 6648), detailing the training of the algorithm and its application to half a million genomes in the UK Biobank cohort. Two accompanying papers on the primate evolution research that informed the development of PrimateAI-3D also published in the journal today.

“It is exciting to see PrimateAI-3D and the latest in AI technology combine with the most advanced DNA sequencing capabilities,” said Francis deSouza, chief executive officer of Illumina. “Helping clinicians and researchers keep up with the vast quantities of genomic data now being generated from our platforms holds the potential to exponentially accelerate the critical work underway to better serve patients.”

According to the National Institutes of Health, the amount of genomic data being generated is approaching 40 billion gigabytes each year. The ability to share, analyze, and interpret genomic data is critical to unlocking discoveries that will advance understanding of human health and improve precision medicine.

An AI algorithm trained by natural selection

Each person carries millions of genetic variants that underlie individual differences in health and disease risk, but most of these variants are presently of unknown function. By highlighting disease-causing variants with unparalleled accuracy, PrimateAI-3D addresses a critical challenge facing the successful implementation of personalized genomic medicine. 

To achieve its state-of-the-art performance, PrimateAI-3D utilizes deep neural network architectures similar to ChatGPT and AlphaFold, but is trained on genome sequences rather than human language. However, unlike generative language models such as ChatGPT, where existing texts can be used to inform training, the genetic variants that cause disease in the human genome are largely unknown. 

To overcome this, PrimateAI-3D effectively uses natural selection to train the parameters of the deep neural network, using millions of benign genetic variants identified through the sequencing of 233 diverse primate species, the largest such sequencing effort of nonhuman primate species to date. Sequencing nonhuman primates can help scientists infer the pathogenicity of human genetic variants, and thus improve clinical variant interpretation on a genome-wide scale.

The result is a deep neural network that has been shown to identify disease-causing variants with superior accuracy in all six clinical cohorts that were tested, and provide individualized predictions of genetic disease risk that have been validated in a cohort of nearly half a million people.

“Because of their closeness to the human genome, nonhuman primate species are uniquely valuable, both for what they can teach us about the genetic basis of human diseases, and in their own right,” said Kyle Farh, vice president of Artificial Intelligence at Illumina and senior author of the publications.

Unlocking precision medicine and genetic-based drug target discovery

As described in the accompanying paper published in Science, Illumina scientists, along with academic collaborators, next applied the PrimateAI-3D algorithm to identify rare pathogenic mutations in nearly half a million individuals in the UK Biobank. They found that the genomes of 97% of otherwise healthy members of the general population harbor highly actionable variants for at least one of 90 different clinical conditions that they surveyed. PrimateAI-3D also greatly improved the accuracy of genetic risk prediction, enabling the first demonstration of polygenic risk scores that were largely unaffected by ancestry bias, a key step toward the equitable implementation of genetic-based precision medicine for diverse, non-European populations.

“The application of the latest advances in AI to genomics opens tremendous opportunities for Illumina in both genetic risk prediction and drug target discovery by decoding the basis of complex genetic diseases such as diabetes, heart disease, and autoimmune diseases,” said Alex Aravanis, chief technology officer of Illumina.

PrimateAI-3D will be made broadly available to the genomics community PrimateAI-3D will be made broadly available to the genomics community integrated across Illumina Connected Software.

Use of forward-looking statements

This release may contain forward-looking statements that involve risks and uncertainties. Among the important factors to which our business is subject that could cause actual results to differ materially from those in any forward-looking statements are: (i) challenges inherent in developing, launching, and maintaining new products and services; (ii) the level of customer uptake of new products and services, together with other factors detailed in our filings with the Securities and Exchange Commission of the United States, including our most recent filings on Forms 10-K and 10-Q, or in information disclosed in public conference calls, the date and time of which are released beforehand. We undertake no obligation, and do not intend, to update these forward-looking statements, to review or confirm analysts’ expectations, or to provide interim reports or updates on the progress of the current quarter.