Tandem repeat disorders: from diagnosis to emerging therapeutic strategies
Article information
Abstract
Tandem repeat disorders (TRDs) are genetic conditions characterized by the abnormal expansion of repetitive DNA sequences within specific genes. The growing number of identified TRDs highlights their complexity, with varied molecular mechanisms ranging from toxic protein production and repeat-associated non-AUG translation to RNA toxicity and epigenetic modifications. TRDs also exhibit unique clinical features such as reduced penetrance, anticipation, and repeat motif changes. Advances in molecular diagnostics such as long-read sequencing have significantly improved the detection of TRDs, especially for large or complex repeat expansions. Additionally, emerging therapeutic strategies, particularly antisense oligonucleotides (ASOs) and gene editing technologies, are showing great promise. ASOs in particular have demonstrated success through mechanisms like allele-specific knockdown and splice modulation. In this review, we explore the classification of TRDs, advances in diagnostics, molecular mechanisms, clinical features, and innovative therapeutic strategies, highlighting the need for further research to refine treatments and improve outcomes.
Introduction
Tandem repeat disorders (TRDs) are a group of genetic conditions caused by the expansion of repetitive DNA sequences, known as tandem repeats, within specific genes [1]. These sequences, which typically consist of 2-6 consecutively repeated nucleotide units, can expand beyond the normal range, leading to gene dysfunction and a variety of clinical symptoms [2]. The abnormal expansion of these repeats disrupts cellular processes either by altering the structure and function of the encoded proteins or by interfering with gene regulation and RNA processing. The inherent genetic instability of these repetitive sequences poses significant challenges in understanding, diagnosing, and treating associated disorders [3].
Owing to the advances in molecular genetics and sequencing technologies, the number of identified TRDs has steadily increased [4]. TRDs are currently recognized as a diverse and complex group of disorders, each with unique molecular mechanisms and clinical presentations [5,6]. These disorders serve as critical models for developing novel diagnostic and therapeutic approaches. Despite significant progress, TRDs continue to present challenges in clinical management, underscoring the need for ongoing research and innovation [7]. In this review, we provide a comprehensive overview of classification, diagnostic approaches, pathogenesis, and emerging therapeutic strategies for TRDs, highlighting both current knowledge and future directions in the field.
Classification of tandem repeat disorders
Different types of tandem repeat disorders
TRDs can be classified based on the location of the repeat sequence within the gene and the nature of the repeat motif (Table 1) [2]. Tandem repeats can be found either within the coding regions of genes or in noncoding regions [1]. Repeat expansions within coding regions often lead to the production of abnormal proteins that can aggregate and disrupt cellular functions, while repeat expansions in noncoding regions can interfere with gene regulation, RNA processing, and other cellular processes without directly altering protein sequences [8]. For example, Huntington disease (HD) involves a CAG trinucleotide repeat expansion in the coding region of the HTT gene, resulting in a production of a toxic mutant protein that causes neurodegeneration [9]. On the other hand, fragile X syndrome (FXS) is caused by a CGG trinucleotide repeat expansion in the 5' untranslated region (UTR) of the FMR1 gene, leading to gene silencing through methylation [10].
TRDs can also be classified based on the sequence and length of the repeat motif. The most common types of repeat motifs are trinucleotide repeats, such as CAG and CGG, in disorders like HD and FXS [9,11]. Other motifs include tetranucleotide repeats (e.g., CCTG in myotonic dystrophy type 2) [12], pentanucleotide repeats (e.g., ATTCT in spinocerebellar ataxia [SCA] type 10) [13], and hexanucleotide repeats (e.g., GGGGCC in C9ORF72-related disorders) [14]. Understanding the various types of repeat motifs and their role in disease pathogenesis is crucial for developing targeted therapeutic approaches for these disorders.
Recent increases in tandem repeat disorder discovery
Since the first TRD FXS was discovered in the 1990s, there has been a steady increase in the identification of new TRDs, with approximately 50 different TRDs identified to date [1]. In the early years, new TRDs were primarily discovered through linkage analysis and microarray studies, which enabled researchers to associate specific genetic regions with disease phenotypes. The advent of next-generation sequencing (NGS) technologies accelerated the discovery of TRDs by allowing more comprehensive exploration of the genome [15]. In recent years, the development of long-read sequencing (LRS) technologies and advanced computational tools and algorithms has revolutionized the field, leading to a rapid acceleration in the discovery of TRDs [16]. Many of the newly discovered TRDs identified through LRS are in noncoding regions, often involving GC-rich motifs, repeats of variable length, or even mixture of different repeat motifs, which were previously difficult to detect with traditional short-read sequencing methods [17,18]. A detailed explanation of the diagnostic methods for TRD will be provided in the subsequent section.
Diagnostics in tandem repeat disorders
Conventional methods: polymerase chain reaction-based techniques and Southern blotting
The diagnosis of TRDs has traditionally relied on conventional molecular techniques, each suited to detect specific types and sizes of repeat expansions (Table 2). These methods include standard polymerase chain reaction (PCR) with fragment length analysis, repeat-primed PCR (RP-PCR), and Southern blotting.
Conventional PCR followed by fragment length analysis is often sufficient for the diagnosis of TRDs with relatively small to moderate repeat expansions [19]. In this approach, primers are designed to flank the repeat region, allowing for the amplification of the entire repeat sequence. The size of the PCR products is then determined and the presence of repeat expansions is determined. This method is particularly effective if the repeat expansions do not exceed the amplification capabilities of standard PCR, typically up to several hundred base pairs. However, when repeat expansions are very large or GC-rich, standard PCR is not applicable and other methods may be required [20].
RP-PCR is commonly used as a screening tool to detect the presence of expanded repeat alleles, particularly when standard PCR may fail due to the large size of the expansion [21,22]. In RP-PCR, one of the primers is specifically designed to bind within the repetitive sequence itself, while the other primers bind to the flanking regions. When a repeat expansion is present, the PCR amplification generates a characteristic “ladder” pattern of products due to the varying lengths of repeats being amplified. This pattern is typically observed as a long tail on length analysis [23]. While RP-PCR is highly sensitive for detecting the presence of repeat expansions, it does not provide precise sizing, especially for very large expansions. Therefore, RP-PCR is primarily used as a preliminary screening method, with positive results often requiring confirmation by more detailed methods like Southern blotting [24].
Southern blotting has long been considered the gold standard for confirming large tandem repeat expansions that cannot be accurately sized by PCR-based methods [25,26]. The process involves the digestion of DNA with restriction enzymes, followed by hybridization with a labeled probe that binds to the region containing the repeat. This technique is particularly useful for the detection of large repeat expansions and provides detailed information on the size of the expanded repeat alleles [27]. However, Southern blotting has significant limitations. It is technically challenging and time-consuming and requires a large amount of high-quality DNA which may not always be available from patient samples. Additionally, Southern blotting must be carefully optimized for each specific disease, which can make it difficult to set up and standardize across different conditions.
Next-generation sequencing: short-read sequencing
NGS has dramatically changed the landscape of genetic diagnostics, offering high-throughput and comprehensive analysis of the genome [28]. However, short-read NGS, which sequences DNA in small fragments of 100 to 300 base pairs, has limitations when it comes to diagnosing TRDs. The reads may not cover the entire repeat region, leading to fragmented and incomplete data, which can cause difficulties in correctly identifying and sizing the expansions.
Despite these challenges, recent advancements in bioinformatics have enabled short-read NGS to diagnose some TRDs. Specialized tools and algorithms, such as ExpansionHunter [29,30], STRetch [31], and GangSTR [32], have been developed to identify repeat expansions from short-read data by analyzing the patterns of sequencing reads that map to repeat regions. These tools have improved the ability to diagnose rare forms of TRDs using NGS. However, the approach remains inadequate for detecting all TRDs, especially those involving large or complex repeat expansions.
The need for long-read sequencing
To overcome the limitations of short-read NGS, there is a growing need for LRS technologies in the diagnosis of TRDs [33]. LRS platforms, offered by PacBio and Oxford Nanopore, enable the sequencing of much longer DNA fragments, often exceeding tens of thousands of base pairs [34]. This capability is particularly valuable for accurately detecting and sizing large repeat expansions that are difficult or impossible to resolve with short-read data [17,18]. The utility of LRS is further enhanced by integrating new technologies, such as clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9. Nanopore Cas9-targeted sequencing (nCATS) enables targeted and focused LRS of specific genomic regions, providing highly accurate sizing and structural characterization, and facilitating the diagnosis of TRDs [35].
LRS also offers the advantage of analyzing DNA methylation patterns within repeat regions [36], which is important for understanding the epigenetic factors involved in certain TRDs, such as FXS. As costs decrease and accessibility improves, LRS is set to become a key tool in TRD diagnostics, offering more comprehensive and accurate genetic analysis than currently possible with short-read NGS.
Molecular mechanisms and clinical characteristics of tandem repeat disorders
Molecular mechanism of pathogenesis
The pathogenesis of TRDs can be categorized into gain-of-function (GOF) and loss-of-function mechanisms that contribute to disease development and progression.
Toxic mutant protein production (gain-of-function)
A key GOF mechanism in TRDs is the formation of toxic mutant proteins due to repeat expansions in coding regions [37]. In polyglutamine (polyQ) diseases like HD and SCAs (SCA1, SCA2, SCA3), CAG repeat expansions in genes like HTT [38] and ataxin [39-41] produce proteins with elongated polyQ tracts, leading to misfolding, aggregation, and neurodegeneration. In other disorders, such as oculopharyngeal muscular dystrophy, GCN repeat expansions in the PABPN1 gene create an expanded polyalanine tract, causing protein aggregation in muscle cells and contributing to muscle weakness [42]. In these disorders, the accumulation of misfolded proteins is central to the disease process.
Repeat-associated non-AUG translation (gain-of-function)
Another GOF mechanism involves repeat-associated non-AUG (RAN) translation, where expanded repeats in noncoding regions trigger the synthesis of toxic peptides from normally UTRs [43,44]. This occurs without the need for a conventional start codon (AUG). Disorders such as neuronal intranuclear inclusion disease (NIID) exemplify this mechanism. In NIID, the expansion of GGC repeats in the NOTCH2NLC gene leads to the production of toxic dipeptide repeat proteins through RAN translation [17]. The accumulation of these peptides within neurons contributes to cellular dysfunction, ultimately leading to the characteristic intranuclear inclusions seen in the disease.
RNA-binding protein sequestration (gain-of-function)
In some TRDs, the pathogenic mechanism involves RNA toxicity. In myotonic dystrophy, the CTG repeats in the noncoding region of the DMPK gene, lead to the formation of abnormal RNA structures. These structures sequester RNA-binding proteins, which are essential for normal RNA processing [45]. The sequestration of these proteins disrupts RNA splicing and other critical cellular functions, leading to a toxic GOF effect.
Epigenetic modifications and gene silencing (loss-of-function)
In certain TRDs, repeat expansions in noncoding regions lead to epigenetic modifications, such as DNA methylation, histone modification, and heterochromatic protein 1 (HP1)-mediated silencing, which all contribute to the silencing of gene expression [6]. For example, in FXS, transcriptional silencing of the FMR1 gene is triggered by a combination of DNA methylation and histone modification [46]. Histone modifications have also been observed in HD, where they can compact chromatin and make the gene less accessible for transcription [47]. In Friedreich’s ataxia, transcriptional silencing is known to occur in the absence of DNA methylation, mainly by HP1-mediated gene silencing [48].
R-loop formation (loss-of-function)
R-loops are three-stranded nucleic acid structures that can form when expanded repeats cause RNA to hybridize with complementary DNA strands, leaving the other DNA strand unpaired [49]. This can disrupt the normal process of gene transcription and lead to genomic instability. In Friedreich’s ataxia, GAA repeat expansions in the FXN gene form R-loops that repress gene expression, reducing frataxin and causing mitochondrial dysfunction [50]. Similarly, in FXS, CGG repeat expansions in the FMR1 gene promote R-loops, leading to gene silencing and reduced fragile X mental retardation protein (FMRP) levels [50,51].
Reduced penetrance
In some cases of TRDs, individuals may carry intermediate repeat expansions, also known as premutations, where the repeat length is larger than normal but not sufficient to cause the full-blown disease [2]. These premutations can lead to reduced or incomplete penetrance, meaning that not all individuals with the expansion will develop the clinical symptoms associated with the disorder. For example, in HD, individuals with 36-39 CAG repeats in the HTT gene often show reduced penetrance, where they may not develop symptoms until late in life, or they may not develop symptoms at all [52]. Similarly, in fragile X-associated tremor/ataxia syndrome, individuals with a premutation in the FMR1 gene typically have 55-200 CGG repeats and may exhibit mild or late-onset symptoms, or they may remain asymptomatic throughout their lives [53]. These unstable premutations can expand in subsequent generations, increasing the risk of developing a full mutation and the associated disorder, making genetic screening crucial for early detection, management, and informed family planning.
Anticipation
Anticipation is a phenomenon observed in many TRDs where the disease severity increases and the age of onset decreases in successive generations [2]. This occurs because repeat expansions tend to increase in length when passed from parent to child, often more prominently through either the maternal or paternal germline. For example, in myotonic dystrophy type 1, the CTG repeat expansion in the DMPK gene frequently increases in size when transmitted through the maternal line, leading to earlier and more severe disease manifestations in the offspring [54]. In contrast, in HD, CAG repeats typically expand more through the paternal line, resulting in a more severe and earlier onset [55]. Although anticipation is a key factor in genetic counseling and management, further studies are needed to fully understand its mechanisms. Additionally, while expansion is common, repeat contractions can also occur, reducing repeat length in subsequent generations [56].
Interruptions and repeat motif changes
In some TRDs, the repeat sequence is not purely repetitive and may contain interruptions, or variations within the repeat sequence that can stabilize the repeat tract and reduce further expansion [57]. For example, in SCA1, CAA interruptions within the CAG repeat sequence are associated with milder disease due to their stabilizing effect on the repeat expansion [58]. Conversely, in HD, CAA interruptions are present in most cases, and the loss of interruptions (LOI) can lead to a more aggressive disease course [59]. LOI in HD can result in the formation of more toxic RNA hairpin structures, contributing to a more aggressive disease course even at lower repeat numbers, such as 36 to 39, or even at 35 or fewer repeats [60]. Additionally, in disorders related to expansions in the RFC1 gene, the most common pathogenic repeat expansion is the AAGGG motif [61], replacing the normal AAAAG sequence, but other motifs like ACAGG, AGAGG, and others can also occur, influencing the clinical presentation and disease severity [62]. Understanding sequence composition changes in TRDs is crucial for accurate genetic diagnosis and insights into the variability of clinical outcomes. LRS will be essential for elucidating these underlying mechanisms.
Somatic instability
Recent scientific advancements have significantly broadened our understanding of somatic mutations in TRDs [63]. Somatic instability, where repeat expansions continue to grow in certain tissues after birth, is a critical factor in the progression and severity of many TRDs [64-66]. These mutations are particularly prevalent in tissues such as the brain and muscles, where repeat expansions can accumulate to much higher levels than those found in germline cells. This ongoing expansion can exacerbate disease symptoms over time, contributing to variability in clinical outcomes among individuals with the same genetic mutation. Studies have shown that somatic instability is influenced by factors such as DNA repair pathways, with certain genetic variants either promoting or inhibiting further expansions [63]. This emerging knowledge highlights the importance of considering somatic mutations when assessing disease prognosis and developing targeted therapies. Despite advances in LRS and other genomic technologies, detecting and fully understanding these somatic changes remains challenging, underscoring the need for further technological improvements.
Cutting-edge research on therapeutics
Antisense oligonucleotides and RNA interference
Recent advances in therapeutics for TRDs offer new hope for managing these complex disorders. Among the most promising approaches are antisense oligonucleotides (ASOs) which are attracting attention as a safe and versatile tool for modulating RNA transcripts [67]. ASOs are short, synthetic nucleic acids that bind to specific RNA sequences, allowing precise control over gene expression. The success of U.S. Food and Drug Administration (FDA)-approved ASO drugs, such as nusinersen for spinal muscular atrophy [68] and tofersen for superoxide dismutase 1 (SOD1)-related amyotrophic lateral sclerosis (ALS) [69], has paved the way for ASO development in TRDs.
ASOs operate through several mechanisms [70]. One key approach is allele-specific knockdown, where ASOs selectively degrade mutant RNA transcripts while sparing the wild-type allele [71]. This strategy is particularly promising for TRDs driven by toxic GOF mechanisms, such as HD. However, developing allele-specific ASOs presents challenges, including the need to precisely differentiate between the mutant and wild-type alleles, which often differ by only a single nucleotide. Ensuring specificity without off-target effects is crucial, as unintended knockdown of the wild-type allele could result in undesirable side effects.
Another approach is allele-nonspecific knockdown, which targets both mutant and wild-type alleles to reduce overall gene expression [2,72]. This strategy is generally easier to develop than allele-specific knockdown since it does not require distinguishing between the mutant and normal alleles. However, the key challenge is achieving the right level of knockdown, as excessive reduction could lead to insufficient normal protein levels and potential side effects. Fine-tuning knockdown efficiency is crucial for therapeutic success.
Splice modulation is one of the most powerful and unique applications of ASO technology, enabling the correction of RNA splicing defects that other therapies cannot address [73]. This approach may be particularly useful for TRDs caused by aberrant splicing, such as those involving repeat expansions in UTRs or introns. Notably, most of the successful FDA-approved ASOs for rare diseases, such as nusinersen [68], milasen [74], and atipeksen [75], are based on splice modulation mechanisms. For instance, in FXS, ASOs can correct the splicing of the FMR1 gene, potentially restoring normal protein levels and alleviating symptoms [76]. The ability to precisely target and modulate splicing is a distinct advantage of ASO technology, offering therapeutic potential for a range of TRDs where splicing errors contribute to disease pathology.
Beyond these traditional methods, ASOs are being explored for novel strategies like blocking R-loop formation or inhibiting RAN translation. These innovative approaches could block R-loop formation in the FMR1 gene [77] or reduce RAN translation fragile X protein synthesis [78]. Further research is needed to determine whether these approaches can be effectively applied to a broader range of TRDs, potentially expanding their therapeutic impact.
Gene editing technologies
Recent advancements in gene editing technologies, particularly CRISPR/Cas9 [79], base editors [80], and prime editors [81], have opened new avenues for treating genetic disorders. Recently, base editors have shown promise in treating spinal muscular atrophies [82]. These approaches can be applied to tackle TRDs. CRISPR/Cas9 is being explored to reduce the number of repeat expansions in genes like C9ORF72 to mitigate the effects of ALS and frontotemporal dementia [83]. Base editors are now being investigated to introduce CAA interruptions in the CAG repeats of HD, potentially stabilizing the repeat expansion [84]. Prime editors, offering highly precise DNA modifications [85], are being researched to directly remove or correct expanded repeats in TRDs, potentially addressing the root cause of these disorders. While these technologies hold great promise, further research is essential to refine their application and ensure their safety and effectiveness in clinical settings.
Novel therapeutic approaches
Novel therapeutic strategies are being developed to address the molecular disruptions caused by repeat expansions in TRDs. Small molecule therapies [86] aim to target disrupted pathways by modulating protein function, enhancing the clearance of toxic aggregates, or stabilizing RNA structures to mitigate the pathogenic effects of expanded repeats. Meanwhile, gene therapy [87] and stem cell therapy [88] offer the potential to replace or repair damaged cells and restore normal function. Advances in viral vector technology and stem cell engineering are bringing these approaches closer to becoming viable therapeutic options, potentially transforming the treatment landscape for TRDs in the near future.
Conclusion
TRDs, where the expansion of repetitive DNA sequences leads to diverse clinical manifestations and outcomes, exemplify the complexity of genetic diseases. Advances in molecular genetics, particularly diagnostic tools like LRS, have significantly enhanced our ability to identify and understand these disorders. Despite this progress, a clear understanding of the mechanisms behind disease onset and progression remains elusive. Emerging therapies, including ASO, RNA interference, and gene editing technologies, offer promising therapeutic options, yet careful consideration of safety and efficacy is required during their development. As research continues to uncover the complexities of TRDs, it will be essential to refine these therapeutic strategies to improve patient outcomes and provide hope for those affected by these complex conditions.
Notes
Conflicts of Interest
Jangsup Moon has been an Associate Editor of encephalitis since October 2020. He was not involved in the review process of this original article. No other potential conflict of interest relevant to this article was reported.