Junk DNA
At the beginning, non coding DNA sequences were originally thought to be “junk DNA” but scientists still didn’t understand how it was possible that these “junk parts”, accounting for 98% of the DNA overall, didn’t have a function.
With the ENCODE project, which involved 30 research groups and more than 400 scientists, it was first demonstrated that non-coding DNA is transcribed and produces thousands of different RNAs.
This project and other lines of recent evidence argue for the functionality of non coding DNA sequences, including intronic DNA, long and short non-coding RNAs and regulatory regions. For example, it was demonstrated that non-coding DNA contains highly conserved sequence elements that are involved in gene expression, fundamental cell processes, embryonic development and necessary for the correct development formation of mRNA.
For this and other reasons, scientists have stopped calling it junk DNA.
Among all non coding DNA classes, introns have a pivotal role, since the protein repertoire or variety is greatly enhanced by alternative splicing in which introns play a fundamental part [1].
New discoveries obtained through genome-wide association studies (GWAS) are fostering a new era of research focused on understanding how variations in intronic sequences affects pre-mRNA splicing and contributes to disease phenotypes [2,3].
Intronic Mutation and Human Genetic Disorders
During transcription, both exons and introns of a gene are copied into a precursor messenger mRNA (pre-mRNA). Then the introns are removed with a process called splicing to obtain the coding sequence which will then be translated into proteins.
Alterations in pre-mRNA splicing are increasingly recognized as responsible for monogenic disorders. Notably, there is evidence from mRNA analysis and entire genomic sequencing (WGS) which indicates that pathogenic mutations can occur deep within the introns of over 75 disease-associated genes [4].
How intronic mutations affect mRNA (and therefore proteins)
There are several mechanisms by which intronic mutations can alter the canonical process of splicing. The inclusion of pseudo-exon, that is an intronic sequence flanked by apparent consensus splice sites, is now considered a more frequent cause of disease than previously thought.
This atypical process can be triggered by intronic mutations that activate cryptic splice sites by creating a novel donor splice site or a novel acceptor splice site. These cryptic splice sites, also known as pseudo splice sites, are sequences present throughout introns that are very similar to the consensus motifs of canonical splice sites.
For this reason the spliceosome may also recognize these sites and promote the splicing process that will lead to improper intron removal. The inclusion of a pseudo-exon generally disrupts the reading frame introducing a premature termination codon (PTC) that targets the mutant mRNA for a faster degradation by nonsense-mediated decay (NMD).
The degradation of the defective messenger RNA is a protective process that prevents aberrant protein synthesis and has the same effect as gene deletion or nonsense mutation [6].
Hence, deep intronic mutations induce pseudo-exon inclusion by creating de novo splice sites, resulting in loss of function of genes by introducing a premature stop codon.
Some examples of diseases due to the mutations of intronic regions
Pseudo-exon inclusion was first reported in β-Thalassemia patients, in which the synthesis of the β-globin protein (HBB) fails because of a point mutation T>G localized in an intron that causes alterations in the pattern of splicing thus interrupting the normal processing of the HBB pre-mRNA [5].
One of the most common and well-known deep intronic changes is a C>T variant being one of the most frequent mutations in the CFTR gene responsible for cystic fibrosis (CF) in Polish population. This mutation is located within the intron 19 and creates a novel donor site that results in the inclusion of an 84-bp pseudo-exon into the mature mRNA. This pseudo-exon contains an in-frame stop codon, and thus, the translated protein is shorter and nonfunctional.
It was also found that the severity of the disease is inversely correlated with the level of correctly spliced transcripts that suggest that the splicing regulation might be an important modifier of the CF clinical course in the presence of intronic mutations [6].
The longer a gene the more likely it is to be affected by pathogenic mutations. Therefore, it is not surprising that numerous deep intronic mutations have been described in particularly long genes such as those associated with neurofibromatosis and Duchenne muscular dystrophy. Remarkably, deep intronic mutations that promote inclusion of a pseudo-exon have been described in several hereditary tumor syndromes. These include neurofibromatosis types 1 and 2, melanoma, ataxia-telangiectasia, retinoblastoma, Lynch syndrome, breast cancer, and familial adenomatous polyposis [4].
The diagnosis of Genetic diseases caused by intronic mutations
In a variety of Mendelian disorders, nearly 50-75% of patients sequences by WES do not receive a genetic diagnosis.
For specific diseases, splicing variants undetected by WES may increase diagnostic rate by 10% [7]. Usually in the field of diagnostics WES is used as a routine practice for reasons related to costs and timing. However, the recent introduction of WGS approaches in clinically oriented screening studies has resulted in the identification of an increasing number of pathogenic variants located just deep within introns.
It is therefore clear that the study of localized pathogenic variants in intronic sequences is one of the major challenges of modern genetics.
The analysis for the presence of deep intronic mutations should be considered when the identification of potentially pathogenic variants in the coding regions and exon/intron boundaries was not effective.
As the functional testing is challenging, different in silico tools have been developed to predict the effect of variants that may have an impact on pre-mRNA splicing.
Despite splicing mutations falling on canonical splicing sites are better characterized, deep intronic variants require an additional experimental validation (e.g. transcriptional analysis) and few computational approaches are available for their characterization.
eVai: the expert variant interpreter
With eVai, enGenome’s platform solution for variant interpretation, it is possible to analyze a VCF containing intronic variants and assess their pathogenicity according to the ACMG/AMP guidelines. eVai evaluates both deep intronic variants and nearby canonical splice site variants, allowing clinicians to diagnose disorders caused by these non-coding variations.
Written by Alessandra Bovio
[1] Jo, Bong-Seok, and Sun Shim Choi. “Introns: The Functional Benefits of Introns in Genomes.” Genomics & informatics vol. 13,4 (2015): 112-8. doi:10.5808/GI.2015.13.4.112
[2] Hsiao YH, Bahn JH, Lin X, Chan TM, Wang R, Xiao X (2016) Alter-native splicing modulated by genetic variants demonstrates accelerated evolution regulated by highly conserved proteins. Genome Res 26:440–450.
[3] Yu, Huihui et al. “Genome-wide discovery of natural variation in pre-mRNA splicing and prioritising causal alternative splicing to salt stress response in rice.” The New phytologist vol. 230,3 (2021): 1273-1287. doi:10.1111/nph.17189
[4] Vaz-Drago R, Custódio N, Carmo-Fonseca M. Deep intronic mutations and human disease. Hum Genet 2017, 136: 1093–1111.
[5] Busslinger M, Moschonas N, Flavell RA (1981) Beta+ thalassemia: aberrant splicing results from a single point mutation in an intron. Cell 27:289–298.
[6] Anna A, Monika G. Splicing mutations in human genetic disorders: examples, detection, and confirmation. J Appl Genet. (2018) 59:253–68. 10.1007/s13353-018-0444-7.
[7] Kremer LS, Bader DM, Mertes C, Kopajtich R, Pichler G, Iuso A, et al. Genetic diagnosis of Mendelian disorders via RNA sequencing. Nat Commun. 2017;8:1–11.