Vertebrate internal exons are usually between 50 and 400 nt long;
exons outside this size range may require additional exonic and/or
intronic sequences to be spliced into the mature mRNA. The mouse
polymeric immunoglobulin receptor gene has a 654 nt exon that is
efficiently spliced into the mRNA. We have examined this exon to
identify features that contribute to its efficient splicing despite
its large size; a large constitutive exon has not been studied previously.
We found that a strong 5′ splice site
is necessary for this exon to be spliced intact, but the splice
sites alone were not sufficient to efficiently splice a large exon.
At least two exonic sequences and one evolutionarily conserved intronic
sequence also contribute to recognition of this exon. However, these
elements have redundant activities as they could only be detected
in conjunction with other mutations that reduced splicing efficiency.
Several mutations activated cryptic 5′ splice
sites that created smaller exons. Thus, the balance between use
of these potential sites and the authentic 5′ splice
site must be modulated by sequences that repress or enhance use
of these sites, respectively. Also, sequences that enhance cryptic
splice site use must be absent from this large exon.
Cryptic splice sites are used only when use of a natural splice site is disrupted by mutation. To determine the features that distinguish authentic from cryptic 5′ splice sites (5′ss), we systematically analyzed a set of 76 cryptic 5′ss derived from 46 human genes. These cryptic 5′ss have a similar frequency distribution in exons and introns, and are usually located close to the authentic 5′ss. Statistical analysis of the strengths of the 5′ss using the Shapiro and Senapathy matrix revealed that authentic 5′ss have significantly higher score values than cryptic 5′ss, which in turn have higher values than the mutant ones. β-Globin provides an interesting exception to this rule, so we chose it for detailed experimental analysis in vitro. We found that the sequences of the β-globin authentic and cryptic 5′ss, but not their surrounding context, determine the correct 5′ss choice, although their respective scores do not reflect this functional difference. Our analysis provides a statistical basis to explain the competitive advantage of authentic over cryptic 5′ss in most cases, and should facilitate the development of tools to reliably predict the effect of disease-associated 5′ss-disrupting mutations at the mRNA level.
Ehlers-Danlos syndrome (EDS) type I (the classical variety) is a dominantly inherited, genetically heterogeneous connective-tissue disorder. Mutations in the COL5A1 and COL5A2 genes, which encode type V collagen, have been identified in several individuals. Most mutations affect either the triple-helical domain of the protein or the expression of one COL5A1 allele. We identified a novel splice-acceptor mutation (IVS4-2A→G) in the N-propeptide-encoding region of COL5A1, in one patient with EDS type I. The outcome of this mutation was complex: In the major product, both exons 5 and 6 were skipped; other products included a small amount in which only exon 5 was skipped and an even smaller amount in which cryptic acceptor sites within exon 5 were used. All products were in frame. Pro-α1(V) chains with abnormal N-propeptides were secreted and were incorporated into extracellular matrix, and the mutation resulted in dramatic alterations in collagen fibril structure. The two-exon skip occurred in transcripts in which intron 5 was removed rapidly relative to introns 4 and 6, leaving a large (270 nt) composite exon that can be skipped in its entirety. The transcripts in which only exon 5 was skipped were derived from those in which intron 6 was removed prior to intron 5. The use of cryptic acceptor sites in exon 5 occurred in transcripts in which intron 4 was removed subsequent to introns 5 and 6. These findings suggest that the order of intron removal plays an important role in the outcome of splice-site mutations and provide a model that explains why multiple products derive from a mutation at a single splice site.
Despite a growing number of splicing mutations found in hereditary diseases, utilization of aberrant splice sites and their effects on gene expression remain challenging to predict. We compiled sequences of 346 aberrant 5′splice sites (5′ss) that were activated by mutations in 166 human disease genes. Mutations within the 5′ss consensus accounted for 254 cryptic 5′ss and mutations elsewhere activated 92 de novo 5′ss. Point mutations leading to cryptic 5′ss activation were most common in the first intron nucleotide, followed by the fifth nucleotide. Substitutions at position +5 were exclusively G>A transitions, which was largely attributable to high mutability rates of C/G>T/A. However, the frequency of point mutations at position +5 was significantly higher than that observed in the Human Gene Mutation Database, suggesting that alterations of this position are particularly prone to aberrant splicing, possibly due to a requirement for sequential interactions with U1 and U6 snRNAs. Cryptic 5′ss were best predicted by computational algorithms that accommodate nucleotide dependencies and not by weight-matrix models. Discrimination of intronic 5′ss from their authentic counterparts was less effective than for exonic sites...
Highly conserved sequences at the 5′ splice site and branch site of U12-dependent introns are important determinants for splicing by U12-dependent spliceosomes. This study investigates the in vivo splicing phenotypes of mutations in the branch site consensus sequence of the U12-dependent intron F from a human NOL1 (P120) minigene. Intron F contains a fully consensus branch site sequence (UUCCUUAAC). Mutations at each position were analyzed for their effects on U12-dependent splicing in vivo. Mutations at most positions resulted in a significant reduction of correct U12-dependent splicing. Defects observed included increased unspliced RNA levels, the activation of cryptic U2-dependent 5′ and 3′ splice sites, and the activation of cryptic U12-dependent branch/3′ splice sites. A strong correlation was observed between the predicted thermodynamic stability of the branch site: U12 snRNA interaction and correct U12-dependent splicing. The lack of a polypyrimidine tract between the branch site and 3′ splice site of U12-dependent introns and the observed reliance on base-pairing interactions for correct U12-dependent splicing emphasize the importance of RNA/RNA interactions during U12-dependent intron recognition and proper splice site selection.
DBASS3 and DBASS5 provide comprehensive repositories of new exon boundaries that were induced by pathogenic mutations in human disease genes. Aberrant 5′- and 3′-splice sites were activated either by mutations in the consensus sequences of natural exon–intron junctions (cryptic sites) or elsewhere (‘de novo’ sites). DBASS3 and DBASS5 currently contain approximately 900 records of cryptic and de novo 3′- and 5′-splice sites that were produced by over a thousand different mutations in approximately 360 genes. DBASS3 and DBASS5 data can be searched by disease phenotype, gene, mutation, location of aberrant splice sites in introns and exons and their distance from authentic counterparts, by bibliographic references and by the splice-site strength estimated with several prediction algorithms. The user can also retrieve reference sequences of both aberrant and authentic splice sites with the underlying mutation. These data will facilitate identification of introns or exons frequently involved in aberrant splicing, mutation analysis of human disease genes and study of germline or somatic mutations that impair RNA processing. Finally, this resource will be useful for fine-tuning splice-site prediction algorithms, better definition of auxiliary splicing signals and design of new reporter assays. DBASS3 and DBASS5 are freely available at http://www.dbass.org.uk/.
In 90% of people with erythropoietic protoporphyria (EPP), the disease results from the inheritance of a common hypomorphic FECH allele, encoding ferrochelatase, in trans to a private deleterious FECH mutation. The activity of the resulting FECH enzyme falls below the critical threshold of 35%, leading to the accumulation of free protoporphyrin IX (PPIX) in bone marrow erythroblasts and in red cells. The mechanism of low expression involves a biallelic polymorphism (c.315−48T>C) localized in intron 3. The 315−48C allele increases usage of the 3′ cryptic splice site between exons 3 and 4, resulting in the transcription of an unstable mRNA with a premature stop codon, reducing the abundance of wild-type FECH mRNA, and finally reducing FECH activity. Through a candidate-sequence approach and an antisense-oligonucleotide-tiling method, we identified a sequence that, when targeted by an antisense oligonucleotide (ASO-V1), prevented usage of the cryptic splice site. In lymphoblastoid cell lines derived from symptomatic EPP subjects, transfection of ASO-V1 reduced the usage of the cryptic splice site and efficiently redirected the splicing of intron 3 toward the physiological acceptor site, thereby increasing the amount of functional FECH mRNA. Moreover...
The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.
Mutations in the splicing factor SF3B1 are found in several cancer types and have been associated with various splicing defects. Using transcriptome sequencing data from chronic lymphocytic leukemia, breast cancer and uveal melanoma tumor samples, we show that hundreds of cryptic 3’ splice sites (3’SSs) are used in cancers with SF3B1 mutations. We define the necessary sequence context for the observed cryptic 3’ SSs and propose that cryptic 3’SS selection is a result of SF3B1 mutations causing a shift in the sterically protected region downstream of the branch point. While most cryptic 3’SSs are present at low frequency (<10%) relative to nearby canonical 3’SSs, we identified ten genes that preferred out-of-frame cryptic 3’SSs. We show that cancers with mutations in the SF3B1 HEAT 5-9 repeats use cryptic 3’SSs downstream of the branch point and provide both a mechanistic model consistent with published experimental data and affected targets that will guide further research into the oncogenic effects of SF3B1 mutation.
A mutation in intron 26 of CEP290 (c.2991+1655A>G) is the most common genetic cause of Leber congenital amaurosis (LCA), a severe type of inherited retinal degeneration. This mutation creates a cryptic splice donor site, resulting in the insertion of an aberrant exon (exon X) into ~50% of all CEP290 transcripts. A humanized mouse model with this mutation did not recapitulate the aberrant CEP290 splicing observed in LCA patients, suggesting differential recognition of cryptic splice sites between species. To further assess this phenomenon, we generated two CEP290 minigene constructs, with and without the intronic mutation, and transfected these in cell lines of various species. RT-PCR analysis revealed that exon X is well recognized by the splicing machinery in human and non-human primate cell lines. Intriguingly, this recognition decreases in cell lines derived from species such as dog and rodents, and it is completely absent in Drosophila. In addition, other cryptic splicing events corresponding to sequences in intron 26 of CEP290 were observed to varying degrees in the different cell lines. Together, these results highlight the complexity of splice site recognition among different species, and show that care is warranted when generating animal models to mimic splice site mutations in vivo.
The primary transcript from adenovirus 2 early region 1B (E1B) is processed by differential RNA splicing into two overlapping mRNAs, 13S and 22S. The 22S mRNA is the major E1B mRNA during the early phase of infection, whereas the 13S mRNA predominates during the late phase. In previous work, it has been shown that this shift in proportions of the E1B mRNAs is influenced by increased cytoplasmic stability of the 13S mRNA at late times in infection. Two observations presented here demonstrate that the increase in proportion of the 13S mRNA at late times is also regulated by a change in the specificity of RNA splicing. First, the relative concentrations of the 13S to 22S nuclear RNAs were not constant throughout infection but increased at late times. Secondly, studies with the mutant, adenovirus 2 pm2250 , provided evidence that there was an increased propensity to utilize a 5' splice in the region of the 13S 5' splice site at late times in infection. Adenovirus 2 pm2250 has a G----C transversion in the first base of E1B 13S mRNA intron preventing splicing of the 13S mRNA but not of the 22S mRNA. During the early phase of a pm2250 infection, the E1B primary transcripts were processed into the 22S mRNA only. However, during the late phase...
We have analyzed base pairing interactions between the U5 snRNA and 5' exon sequences during pre-mRNA splicing in a mammalian in vivo system. We constructed synthetic U5 genes with mutations that alter four bases (C3, U4, U5 and U6) within the invariant 9 nt U5 sequence GCCUUUUAC; transient transfection of HeLa cells with these U5 sequences cloned into a U1 expression vector yielded high levels of the mutant snRNAs. To test their function, we cotransfected a rabbit beta-globin gene containing one of two mutations (G1-->A or T2-->A) in the essential GT dinucleotide at the 5' end of the second intron. Certain U5 loop mutants activated novel 5' splice sites only in mutant rabbit beta-globin transcripts. One novel site surprisingly resides in the first exon; its use is invariably coupled to utilization of a particular cryptic 5' splice site in the second exon. All of the newly activated cryptic 5' splice sites exhibit complementarity with the mutant U5 loop in the exon 1-5 nt upstream of the cryptic site, extending previous results in yeast. However, the register of the potential pairing is not identical at the various novel cryptic 5' splice sites, indicating that the interaction between the U5 loop and the 5' exon may be more flexible than previously believed.
We have studied the expression of a cloned mutant human beta-globin gene in tissue culture cells. The gene, which was previously isolated from the chromosomal DNA of an individual with a low level of normal beta-globin expression (beta+-thalassemia), contains five mutations inside the large intervening sequence (IVS2), as well as a silent change in codon 2. This beta-thalassemia gene (thal) was inserted into a plasmid that is replicated and transcribed in a line of monkey kidney cells in culture. S1 nuclease mapping of the beta-globin RNA transcribed from this gene indicates that some of the beta-globin RNA is spliced abnormally by using a cryptic 3' splice sequence normally present in IVS2 but not used in processing the normal beta-globin transcript. The cryptic 3' splice site is not the site of a mutation in the thal gene. Because neither the 5' or 3' splice junction nor the cryptic site is mutated in this gene, it is most likely that the mutation at position 705 of IVS2, the only nonpolymorphic change in the gene, interferes indirectly with normal processing. These results suggest that certain sequences within IVS must be conserved to prevent abnormal splicing and loss of gene function.
The rol-6 gene is trans-spliced to the 22 nt leader, SL1, 173 nt downstream of the transcription start. We have analyzed splicing in transformants carrying extrachromosomal arrays of rol-6 with mutations in the trans-splice acceptor site. This site is a close match to the consensus, UUUCAG, that is highly conserved in both trans-splice and intron acceptor sites in C. elegans. When the trans-splice site was inactivated by mutating the perfectly-conserved AG, trans-splicing still occurred, but at a cryptic site 20 nt upstream. We tested the frequency with which splicing switched from the normal site to the cryptic site when the pyrimidines at this site were changed to A's. Since most C. elegans 3' splice sites lack an obvious polypyrimidine tract, we hypothesized that these four pyrimidines might play this role, and indeed mutation of these bases caused splicing to switch to the cryptic site. We also demonstrated that a major reason the downstream site is normally favored is because it occurs at a boundary between A+U rich and non-A+U rich RNA. When the RNA between the two splice sites was made less A+U rich, splicing occurred preferentially at the upstream site.
Large alternatively spliced internal exons are uncommon in vertebrate genes, and the mechanisms governing their usage are unknown. In this report, we examined alternative splicing of a 1-kb internal exon from the human caldesmon gene containing two regulated 5' splice sites that are 687 nucleotides apart. In cell lines normally splicing caldesmon RNA via utilization of the exon-internal 5' splice site, inclusion of the differential exon required a long purine-rich sequence located between the two competing 5' splice sites. This element consisted of four identical 32-nucleotide purine-rich repeats that resemble exon-splicing enhancers (ESE) identified in other genes. One 32-nucleotide repeat supported exon inclusion, repressed usage of the terminal 5' splice site, and functioned in a heterologous exon dependent on exon enhancers for inclusion, indicating that the caldesmon purine-rich sequence can be classified as an ESE. The ESE was required for utilization of the internal 5' splice site only in the presence of the competing 5' splice site and had no effect when placed downstream of the terminal 5' splice site. In the absence of the internal 5' splice site, the ESE activated a normally silent cryptic 5' splice site near the natural internal 5' splice site...
A two-site model for the binding of U1 small nuclear ribonucleoprotein particle (U1 snRNP) was tested in order to understand how exon partners are selected in complex pre-mRNAs containing alternative exons. In this model, it is proposed that two U1 snRNPs define a functional unit of splicing by base pairing to the 3' boundary of the downstream exon as well as the 5' boundary of the intron to be spliced. Three-exon substrates contained the alternatively spliced exon 4 (E4) region of the preprotachykinin gene. Combined 5' splice site mutations at neighboring exons demonstrate that weakened binding of U1 snRNP at the downstream site and improved U1 snRNP binding at the upstream site result in the failure to rescue splicing of the intron between the mutations. These results indicate the stringency of the requirement for binding a second U1 snRNP to the downstream 5' splice site for these substrates as opposed to an alternative model in which a certain threshold level of U1 snRNP can be provided at either site. Further support for the two-site model is provided by single-site mutations in the 5' splice site of the third exon, E5, that weaken base complementarity to U1 RNA. These mutations block E5 branchpoint formation and, surprisingly...
We examined the ability of U1 small nuclear ribonucleoproteins (U1 snRNPs) to recognize mutant and cryptic 5' splice sites on beta-globin pre-mRNA substrates using an RNase T1 protection assay. When U1 snRNPs were prebound to anti-(U1)RNP antibodies, we detected binding to mutant but not to cryptic 5' splice sites on several substrates. By contrast, in a splicing extract at 0 degree C, neither the mutated nor cryptic 5' splice sites of a human beta-globin transcript were selected as protected fragments with the same antibodies. However, after incubation of the transcript in the extract to yield splicing intermediates, fragments that included a cryptic 5' splice site were detected. The results of our analyses suggest that U1 snRNPs are involved in recognizing cryptic 5' splice sites but that interactions with other splicing components are required to stabilize the association.
Rabson-Mendenhall's syndrome is one of the most severe forms of insulin resistance syndrome. We analyzed an English patient described elsewhere and found novel mutations in both alleles of the insulin receptor gene. One is a substitution of G for A at the 3' splice acceptor site of intron 4, and the other is an eight-base pair deletion in exon 12. Both decrease mRNA expression in a cis-dominant manner, and are predicted to produce severely truncated proteins. Surprisingly, nearly normal insulin receptor levels were expressed in the patient's lymphocytes, although the level of expression assessed by immunoblot was approximately 10% of the control cells. Insulin binding affinity was markedly reduced, but insulin-dependent tyrosine kinase activity was present. Analyzing the insulin receptor mRNA of the patient's lymphocytes by reverse transcription PCR, we discovered aberrant splicing caused by activation of a cryptic splice site in exon 5, resulting in a four-amino acid deletion and one amino acid substitution, but restoring an open reading frame. Skipped exon 5, another aberrant splicing, was found in both the patient and the mother who had the heterozygotic mutation, whereas activation of the cryptic splice site occurred almost exclusively in the patient. Transfectional analysis in COS cells revealed that the mutant receptor produced by cryptic site activation has the same characteristics as those expressed in patient's lymphocytes. We speculate that this mutant receptor may be involved in the relatively long survival of the patient by rescuing otherwise more severe phenotypes resulting from the complete lack of functional insulin receptors.
Recognition of 5' splice points by group I and group II self-splicing introns involves the interaction of exon sequences--directly preceding the 5' splice site--with intronic sequence elements. We show here that the exon binding sequences (EBS) of group II intron aI5c can accept various substitutes of the authentic intron binding sites (IBS) provided in cis or in trans. The efficiency of cleavages at these cryptic 5' splice sites was enhanced by deletion of the authentic IBS2 element. All cryptic 5' cleavage sites studied here were preceded by an IBS1 like sequence; indicating that the IBS1/EBS1 pairing alone is sufficient for proper 5' splice site selection by the intronic EBS element. The results are discussed in terms of minimal requirements for 5' cleavages and position effects of IBS sites relative to the intron.
Virtually all mutations causing Hunter syndrome (mucopolysaccharidosis type II) are expected to be new mutations. Therefore, as a means of molecular diagnosis, we developed a rapid method to sequence the entire iduronate-2-sulfatase (IDS) coding region. PCR amplicons representing the IDS cDNA were sequenced with an automatic instrument, and output was analyzed by computer-assisted interpretation of tracings, using Staden programs on a Sun computer. Mutations were found in 10 of 11 patients studied. Unique missense mutations were identified in five patients: H229Y (685C-->T, severe phenotype); P358R (1073C-->G, severe); R468W (1402C-->T, mild); P469H (1406C-->A, mild); and Y523C (1568A-->G, mild). Non-sense mutations were identified in two patients: R172X (514C-->T, severe) and Q389X (1165C-->T, severe). Two other patients with severe disease had insertions of 1 and 14 bp, in exons 3 and 6, respectively. In another patient with severe disease, the predominant (> 95%) IDS message resulted from aberrant splicing, which skipped exon 3. In this last case, consensus sequences for splice sites in exon 3 were intact, but a 395 C-->G mutation was identified 24 bp upstream from the 3' splice site of exon 3. This mutation created a cryptic 5' splice site with a better consensus sequence for 5' splice sites than the natural 5' splice site of intron 3. A minor population of the IDS message was processed by using this cryptic splice site; however...