The challenge of variant interpretation
Genetic disorders are caused by mutations, or variations, in the DNA sequence of affected individuals. There are many different types of genetic mutations, ranging from Single Nucleotide Variants (SNVs), which involve the change of one single nucleic acid base, to Copy Number Variants (CNVs) such as duplications and deletions, all the way to complex Structural Variants (SVs), chromosomal rearrangements, and more [1,2,9-11].
Identifying what mutations an individual carries, which genes they fall in, and how deleterious the mutations are comprise the first steps towards diagnosing an affected individual. Although NGS has revolutionized the playing field by allowing for incredibly fast and highly accurate sequencing, this is only the first piece of the puzzle [9,11,12]. It’s interpreting the variant and making sense of its impact on the biology of an individual that constitutes the bottleneck in the diagnostic odyssey, because variant interpretation is complex, time-consuming, and sometimes may lead to more questions than answers [5,12].
To aid geneticists worldwide in this challenge, in 2015 the American College of Medical Genetics and Genomics (ACMG) released what would go on to become internationally recognized guidelines for the interpretation of sequence variants [13]. This document outlines a process for classifying variants into one of five categories (Pathogenic, Likely Pathogenic, Uncertain Significance, Likely Benign and Benign) by relying on different kinds of variant evidence, such as frequency in the population, segregation in families, data from functional studies and more [13]. The ACMG criteria also suggest taking into account data from computational tools that provide scores on the damaging effects of a variant in various biological contexts. The 2015 guidelines provide a total of 28 criteria which, when combined according to a set of scoring rules detailed in the framework, indicate the class to which a variant belongs for any given condition [13]. Since then, guidelines for the interpretation of sequence variants have been expanded with gene and disease specific refinements, offering comprehensive practice guidelines to clinicians and geneticists [22]. In 2019, similar guidelines for Copy Number Variants (specifically, deletions and duplications) were released, comprising a total of 36 criteria for Copy Number Loss variants and 40 for Copy Number Gains, additionally providing a semi-quantitative point-based scoring system to classify the variants in one of the five ACMG classes [14]. By the end of 2025, The ACMG is set to release newer, more comprehensive variant interpretation guidelines. These new guidelines will offer an even more refined interpretation framework, ensuring that interpretation can be as accurate as possible.
However, even while following the guidelines, manual curation of variants is a time-intensive, complex process which requires detailed investigation of available data, literature, and databases [5,12,15]. In a research context, manual curation lengthens an already lengthy process, with the added challenge that research on genetic variants and diseases often involves complex, hard-to-classify variants. In a diagnostic context, the low efficiency of manual variant curation and its costs, both in terms of time and resources, are reflected in a longer wait for diagnosis - which patients can’t afford [12,15,16].
So how can geneticists tackle this fundamentally important aspect of sequencing data interpretation in a better way to more accurately and efficiently reach a diagnosis?
One answer lies in automation.