Articles
AI’s role in making rare disease diagnosis more efficient
29 January 2025

Understanding rare genetic conditions

Rare genetic conditions affect roughly ~5% of the global population and impose a significant burden on individuals, families, and healthcare institutions worldwide [1,2]. These conditions tend to be heterogeneous, often pediatric, presenting complex pathophysiological mechanisms and a suite of unique challenges, made worse by the lack of currently available effective treatments for the majority of known rare diseases [1-3].

Over the last twenty years, many dedicated efforts have been put into rendering genetic testing more accessible and improving the diagnostic odyssey for patients affected by rare diseases. In fact, genetic testing is not yet accessible to everyone, and in more than a quarter of the cases in which it is accessible, patients with rare diseases have to wait 5 to 30 years before obtaining a correct diagnosis, a journey which is neither ideal nor sustainable [3-5]. Fortunately, with the advent of Next Generation Sequencing and Artificial Intelligence powered bioinformatic tools, identifying the genetic causes of diseases has become easier, more cost-effective for healthcare systems, and faster - elements which all come together in shortening the diagnostic odyssey and, simultaneously, offering more data for research to base itself on, helping speed up the development of effective treatments [3-7]. 

Next Generation Sequencing and its impact

Next Generation Sequencing (NGS), which has by now surpassed Sanger Sequencing as the standard sequencing approach, allows for the simultaneous sequencing of millions of DNA fragments. Targeted sequencing and Whole-Exome Sequencing, which focus on sequencing gene panels and all protein-coding regions of the genome, respectively, are by now the standard sequencing approaches both in clinical diagnostics and research settings [4-8,11]. Whole Genome Sequencing (WGS), although undoubtedly powerful, is not yet a standard-class approach in diagnostic settings due to its higher costs and the challenges associated with the interpretation and storing of such vast amounts of data. However, in recent years there has been an insistent push towards WGS which is unlikely to stop in the coming years, as prices decrease and sequencing data interpretation becomes faster and more accessible [6,8,9,11].

NGS has revolutionized both the throughput and speed of DNA sequencing but comes with its own set of challenges, related by large to the complex data curation required to accurately interpret it [6,8,9]. In the last ten years, the field of genetics has seen a steep increase in its need for bioinformatic software and pipelines, which render analyzing vast amounts of genetic data easily accessible to anyone, from researchers to molecular geneticists to individuals who have never had to rely on bioinformatics before. Luckily, in parallel to the rise of NGS, Artificial Intelligence has become widespread in the healthcare industry as well, giving momentum to the progress in genomics by offering easy, efficient ways to extract valuable information from genetic data [3-5,7].

As the mystery that is the human genome unravels at ever-increasing speed, one thing has become clear - the major bottleneck in the journey from sequencing to diagnosis overlaps with what is perhaps the most crucial aspect in genetic data analysis: variant interpretation.

The challenge of variant interpretation

Genetic disorders are caused by mutations, or variations, in the DNA sequence of affected individuals. There are many different types of genetic mutations, ranging from Single Nucleotide Variants (SNVs), which involve the change of one single nucleic acid base, to Copy Number Variants (CNVs) such as duplications and deletions, all the way to complex Structural Variants (SVs), chromosomal rearrangements, and more [1,2,9-11].

Identifying what mutations an individual carries, which genes they fall in, and how deleterious the mutations are comprise the first steps towards diagnosing an affected individual. Although NGS has revolutionized the playing field by allowing for incredibly fast and highly accurate sequencing, this is only the first piece of the puzzle [9,11,12]. It’s interpreting the variant and making sense of its impact on the biology of an individual that constitutes the bottleneck in the diagnostic odyssey, because variant interpretation is complex, time-consuming, and sometimes may lead to more questions than answers [5,12]. 

To aid geneticists worldwide in this challenge, in 2015 the American College of Medical Genetics and Genomics (ACMG) released what would go on to become internationally recognized guidelines for the interpretation of sequence variants [13]. This document outlines a process for classifying variants into one of five categories (Pathogenic, Likely Pathogenic, Uncertain Significance, Likely Benign and Benign) by relying on different kinds of variant evidence, such as frequency in the population, segregation in families, data from functional studies and more [13]. The ACMG criteria also suggest taking into account data from computational tools that provide scores on the damaging effects of a variant in various biological contexts. The 2015 guidelines provide a total of 28 criteria which, when combined according to a set of scoring rules detailed in the framework, indicate the class to which a variant belongs for any given condition [13]. Since then, guidelines for the interpretation of sequence variants have been expanded with gene and disease specific refinements, offering comprehensive practice guidelines to clinicians and geneticists [22]. In 2019, similar guidelines for Copy Number Variants (specifically, deletions and duplications) were released, comprising a total of 36 criteria for Copy Number Loss variants and 40 for Copy Number Gains, additionally providing a semi-quantitative point-based scoring system to classify the variants in one of the five ACMG classes [14]. By the end of 2025, The ACMG is set to release newer, more comprehensive variant interpretation guidelines. These new guidelines will offer an even more refined interpretation framework, ensuring that interpretation can be as accurate as possible.

However, even while following the guidelines, manual curation of variants is a time-intensive, complex process which requires detailed investigation of available data, literature, and databases [5,12,15]. In a research context, manual curation lengthens an already lengthy process, with the added challenge that research on genetic variants and diseases often involves complex, hard-to-classify variants. In a diagnostic context, the low efficiency of manual variant curation and its costs, both in terms of time and resources, are reflected in a longer wait for diagnosis - which patients can’t afford [12,15,16]. 

So how can geneticists tackle this fundamentally important aspect of sequencing data interpretation in a better way to more accurately and efficiently reach a diagnosis?

One answer lies in automation.

The role of automation in analysis

Automation is one of the most effective ways to increase efficiency in any given process. When automatic liquid handling robots first entered molecular laboratories, the rise in efficiency was steep: they allowed laboratories to carry out more tests in less time, increasing the throughput of the lab while lowering costs and times. Being able to apply similar automation to all those processes that come after sequencing - sequencing data processing, filtering and variant calling being the main ones - has become indispensable in the last few years, which explains the rise we have also seen in the development of high-performing bioinformatics pipelines [4,5,7,16]. 

enGenome saw the need for more efficient genetic data analysis solutions all the way back in 2018 and we were one of the first to propose that automation could be extended to one of the most time-intensive processes in genetics: manual variant curation [17].

The question was simple: by taking the internationally recognized ACMG guidelines and data from the most reliable databases and in-silico tools, is there a way to accurately and automatically activate the ACMG criteria to speed up the variant curation process?

The answer is yes. Our paper in 2018 delineated a first approach to the automatic triggering of ACMG criteria for SNVs and, through the years, we have expanded, developed, and made the workflow better and more efficient, culminating in the pipelines that you can now explore in our variant interpretation platform, eVai [17-19].

Introducing eVai: A solution to facilitate variant interpretation

eVai is built on a Rule Set based on the ACMG guidelines and integrates evidence from varied omic datasets and computational tools to automatically compute, through a unique rule engine and prioritization algorithm, the ACMG class of each assessed variant. Furthermore, eVai provides a unique Pathogenicity Score - a quantitative score that ranks a variant’s pathogenicity in relation to a given disease [17].

eVai boasts an ACMG Classification accuracy of up to 98% and can automatically classify SNV/INDELs, CNV and mitochondrial variants. It provides all the information necessary for users to understand how and why specific criteria have been activated, providing references and links to relevant papers and databases. eVai is also flexible: it allows users to modify the variant classification if they wish to, by activating or deactivating criteria and modifying their level of evidence. 

Although eVai’s powerful automatic classification and prioritization of variants according to the ACMG criteria can drastically decrease the time it takes for a geneticist to interpret a variant, thereby making the curation process more efficient, it goes even further than that. In fact, eVai relies on Artificial Intelligence, in the form of a proprietary algorithm called Suggested Diagnosis, to account for further data that may be available to clinicians (such as the patient phenotype and inheritance information) and prioritize the variants by their clinical significance [19]

This additional layer of variant prioritization provides geneticists with a candidate list of clinically relevant variants ranked based on how well they explain the clinical case.


The Suggested Diagnosis algorithm is powerful and, as shown in our benchmark studies and our top performance in the CAGI 6 Challenge funded by the NIH, can consistently identify the causative variants within the top 15 positions of the candidate variant list [20,21]. The impact this has on the variant curation workflow is significant: by automating variant classification and prioritization, eVai can allow geneticists, researchers and clinicians to more rapidly arrive at a likely diagnosis, thereby reducing patient wait times and laboratory costs.

Conclusion: Advancing diagnosis workflow and treatment through innovation

The diagnostic odyssey will continue to be a complex, multifaceted challenge, even as technology advances. All we can aim to do is make it easier for researchers, geneticists and clinicians to make sense of genetic data by providing cutting-edge solutions that do not sacrifice accuracy for speed. Implementing solutions such as eVai can not only alleviate the bottleneck that is variant curation within the diagnostic workflow, but can, perhaps more critically, give specialists more time to dedicate themselves to an even more fundamental process than variant curation: finding treatments and saving lives.

If you would like to know more about how eVai can increase the efficiency of your lab while meeting the highest standards of interpretation accuracy, get in touch today!

  1. Papaioannou I, Owen JS, Yáñez-Muñoz RJ. Clinical applications of gene therapy for rare diseases: A review [published correction appears in Int J Exp Pathol. 2024 Jun;105(3):114. doi: 10.1111/iep.12505]. Int J Exp Pathol. 2023;104(4):154-176. doi:10.1111/iep.12478

  2. Chung CCY; Hong Kong Genome Project, Chu ATW, Chung BHY. Rare disease emerging as a global public health priority. Front Public Health. 2022;10:1028545. Published 2022 Oct 18. doi:10.3389/fpubh.2022.1028545

  3. Abdallah S, Sharifa M, I Kh Almadhoun MK, et al. The Impact of Artificial Intelligence on Optimizing Diagnosis and Treatment Plans for Rare Genetic Disorders. Cureus. 2023;15(10):e46860. Published 2023 Oct 11. doi:10.7759/cureus.46860

  4. Willmen, T., Völkel, L., Ronicke, S. et al. Health economic benefits through the use of diagnostic support systems and expert knowledge. BMC Health Serv Res 21, 947 (2021). https://doi.org/10.1186/s12913-021-06926-y

  5. Visibelli A, Roncaglia B, Spiga O, Santucci A. The Impact of Artificial Intelligence in the Odyssey of Rare Diseases. Biomedicines. 2023;11(3):887. Published 2023 Mar 13. doi:10.3390/biomedicines11030887

  6. Yadav D, Patil-Takbhate B, Khandagale A, Bhawalkar J, Tripathy S, Khopkar-Kale P. Next-Generation sequencing transforming clinical practice and precision medicine. Clin Chim Acta. 2023;551:117568. doi:10.1016/j.cca.2023.117568

  7. Kitsios F, Kamariotou M, Syngelakis AI, Talias MA. Recent Advances of Artificial Intelligence in Healthcare: A Systematic Literature Review. Applied Sciences. 2023; 13(13):7479. https://doi.org/10.3390/app13137479

  8. Satam H, Joshi K, Mangrolia U, Waghoo S, Zaidi G, Rawool S, Thakare RP, Banday S, Mishra AK, Das G, et al. Next-Generation Sequencing Technology: Current Trends and Advancements. Biology. 2023; 12(7):997. https://doi.org/10.3390/biology12070997

  9. Wojcik MH, Lemire G, Berger E, et al. Genome Sequencing for Diagnosing Rare Diseases. N Engl J Med. 2024;390(21):1985-1997. doi:10.1056/NEJMoa2314761

  10. Claussnitzer M, Cho JH, Collins R, et al. A brief history of human disease genetics. Nature. 2020;577(7789):179-189. doi:10.1038/s41586-019-1879-7

  11. Sullivan JA, Schoch K, Spillmann RC, Shashi V. Exome/Genome Sequencing in Undiagnosed Syndromes. Annu Rev Med. 2023;74:489-502. doi:10.1146/annurev-med-042921-110721

  12. Lappalainen T, MacArthur DG. From variant to function in human disease genetics. Science. 2021;373(6562):1464-1468. doi:10.1126/science.abi8207

  13. Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405-424. doi:10.1038/gim.2015.30

  14. Riggs ER, Andersen EF, Cherry AM, et al. Technical standards for the interpretation and reporting of constitutional copy-number variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics (ACMG) and the Clinical Genome Resource (ClinGen) [published correction appears in Genet Med. 2021 Nov;23(11):2230. doi: 10.1038/s41436-021-01150-9]. Genet Med. 2020;22(2):245-257. doi:10.1038/s41436-019-0686-8

  15. Di Resta C, Galbiati S, Carrera P, Ferrari M. Next-generation sequencing approach for the diagnosis of human diseases: open challenges and new opportunities. EJIFCC. 2018;29(1):4-14. Published 2018 Apr 30.

  16. Licata L, Via A, Turina P, et al. Resources and tools for rare disease variant interpretation. Front Mol Biosci. 2023;10:1169109. Published 2023 May 10. doi:10.3389/fmolb.2023.1169109

  17. Nicora G, Limongelli I, Gambelli P, et al. CardioVAI: An automatic implementation of ACMG-AMP variant interpretation guidelines in the diagnosis of cardiovascular diseases. Hum Mutat. 2018;39(12):1835-1846. doi:10.1002/humu.23665

  18. Nicora G, Zucca S, Limongelli I, Bellazzi R, Magni P. A machine learning approach based on ACMG/AMP guidelines for genomic variant classification and prioritization. Sci Rep. 2022;12(1):2517. Published 2022 Feb 15. doi:10.1038/s41598-022-06547-3

  19. Zucca, S., Nicora, G., De Paoli, F. et al. An AI-based approach driven by genotypes and phenotypes to uplift the diagnostic yield of genetic diseases. Hum. Genet. (2024). https://doi.org/10.1007/s00439-023-02638-x

  20. Aspromonte MC, Conte AD, Zhu S, et al. CAGI6 ID-Challenge: Assessment of phenotype and variant predictions in 415 children with Neurodevelopmental Disorders (NDDs). Preprint. Res Sq. 2023;rs.3.rs-3209168. Published 2023 Aug 2. doi:10.21203/rs.3.rs-3209168/v1

  21. Stenton, S.L., O’Leary, M.C., Lemire, G. et al. Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project. Hum Genomics 18, 44 (2024). https://doi.org/10.1186/s40246-024-00604-w

  22. Resource, C. G. (n.d.). Sequence Variant Interpretation - ClinGen | Clinical Genome Resource. https://clinicalgenome.org/working-groups/sequence-variant-interpretation/