Sequence trace data files produced by the Sanger sequencing method were believed to create
peak height values of random height and with no added value for base calling. Our study is the first
to comprehensively prove the existence of definable peak height patterns and to develop tools that
allow the characterization of the frequency of such patterns for each sequence frame.
By studying hundreds of mtDNA samples sequenced in two certified forensic laboratories,
in the United States of America and in Portugal, we were able to prove that peak height patterns are
predictable and the same from sample to sample if the chemistry and primer combination is kept
constant within the same laboratory. Moreover, the characterization of these patterns and the ability
to predict their behavior for other samples led us to develop the novel concept of Sequence
Biometrics. Sequence Biometrics defines the characteristics of these peak height patterns for a
certain stretch of sequenced DNA, independently of the origin and sample, which is specific to the
primer/chemistry combination used within the laboratory. Therefore, Sequence Biometrics is a new
quality parameter for sample processing and can be used by novel expert systems in the assessment
of new data.
This work provides the basic informatics tools and workflow mechanisms to build standard
Sequence Biometrics tables...
Granulocyte colony-stimulating factor (G-CSF) acts on precursor hematopoietic cells to control the production and maintenance of neutrophils. Recombinant G-CSF (re-G-CSF)is used clinically to treat patients with neutropenia and has greatly reduced the infection risk associated with bone marrow transplantation. Cyclic hematopoiesis, a stem cell defect characterized by severe recurrent neutropenia, occurs in man and grey collie dogs, and can be treated by administration of re-G-CSF. Availability of the rat G-CSF cDNA would benefit the use of rats as models of gene therapy for the treatment of cyclic hematopoiesis. In preliminary rat experiments, retroviral-mediated expression of canine G-CSF caused neutralizing antibody formation which precluded long-term increases in neutrophil counts. To overcome this problem we cloned the rat G-CSF cDNA from RNA isolated from skin fibroblasts. The rat G-CSF sequence shared a high degree of identity in both the coding and non-coding regions with both the murine G-CSF (85%) and human G-CSF (74%). The signal peptides of murine and human G-CSF both contained 30 amino acids (aa), whereas the deduced signal sequence for rat G-CSF possessed 21 aa. A retrovirus encoding the rat G-CSF cDNA synthesized bioactive G-CSF from transduced vascular smooth muscle cells.
We have determined the structure of the fatty acid-binding protein 6 (fabp6) gene and the tissue-specific distribution of its transcripts in embryos, larvae and adult zebrafish (Danio rerio). Like most members of the vertebrate FABP multigene family, the zebrafish fabp6 gene contains four exons separated by three introns. The coding region of the gene and expressed sequence tags code for a polypeptide of 131 amino acids (14 kDa, pI 6.59). The putative zebrafish Fabp6 protein shared greatest sequence identity with human FABP6 (55.3%) compared to other orthologous mammalian FABPs and paralogous zebrafish Fabps. Phylogenetic analysis showed that the zebrafish Fabp6 formed a distinct clade with the mammalian FABP6s. The zebrafish fabp6 gene was assigned to linkage group (chromosome) 21 by radiation hybrid mapping. Conserved gene synteny was evident between the zebrafish fabp6 gene on chromosome 21 and the FABP6/Fabp6 genes on human chromosome 5, rat chromosome 10 and mouse chromosome 11. Zebrafish fabp6 transcripts were first detected in the distal region of the intestine of embryos at 72 h postfertilization. This spatial distribution remained constant to 7-day-old larvae, the last stage assayed during larval development. In adult zebrafish...
Short single-stranded DNA fragments carrying a GCGAAAGC sequence were found to move unexpectedly faster than other fragments of the same length in electrophoresis on a polyacrylamide gel containing a denaturing agent. The fragments were noted to have a stable structure even in 7M urea solution, but the stability cannot be explained simply on the basis of base pair formation alone. Physical characterization of the GCGAAAGC fragment indicated that it takes a hairpin-like structure in spite of the short chain length with only two G-C base pairs, comprised of GCG and AAAGC subsegments, each possessing a helical configuration independent of the others. Some biological implications of this unusual structure are discussed.
A 2.5-kilobase fragment of a sex-specific satellite DNA from the Colubrid snake species Elaphe radiata has been cloned, and its sequence has been determined. It contains 26 and 12 copies, respectively, of two base quadruplets, G-A-T-A and G-A-C-A, as its sole highly repetitious elements. Southern hybridization experiments with genomic DNA of the chicken, the mouse, and man indicated male sex-specific conservation of at least parts of this cloned DNA. In situ hybridization experiments with metaphase chromosomes of the mouse showed that elements that can cross-hybridize with parts of the cloned snake DNA are concentrated in the pericentric region of the Y chromosome. In blot hybridization experiments with liver poly(A)+ polysomal RNAs of male and female mice, a probe consisting of the first 1,224 bases of the cloned snake DNA singled out a male-specific RNA of 1,250-1,400 bases. Inasmuch as the proximal end of this probe contained an open reading frame (44 consecutive amino acid-specifying codons), the male-specific putative mRNA so detected may specify H-Y antigen. By contrast, a probe consisting of bases 1,480-1,906, containing the simple repeats of the quadruplets, singled out a shorter (approximately 1,000-base) RNA from males and females alike. Although this RNA is poly(A)+...
G·U wobble base pairs are the most common and highly conserved non-Watson–Crick base pairs in RNA. Previous surface maps imply uniformly negative electrostatic potential at the major groove of G·U wobble base pairs embedded in RNA helices, suitable for entrapment of cationic ligands. In this work, we have used a Poisson–Boltzmann approach to gain a more detailed and accurate characterization of the electrostatic profile. We found that the major groove edge of an isolated G·U wobble displays distinctly enhanced negativity compared with standard GC or AU base pairs; however, in the context of different helical motifs, the electrostatic pattern varies. G·U wobbles with distinct widening have similar major groove electrostatic potentials to their canonical counterparts, whereas those with minimal widening exhibit significantly enhanced electronegativity, ranging from 0.8 to 2.5 kT/e, depending upon structural features. We propose that the negativity at the major groove of G·U wobble base pairs is determined by the combined effect of the base atoms and the sugar-phosphate backbone, which is impacted by stacking pattern and groove width as a result of base sequence. These findings are significant in that they provide predictive power with respect to which G·U sites in RNA are most likely to bind cationic ligands.
The question of how NA base sequence influences the yield of DNA strand breaks produced by the direct effect of ionizing radiation was investigated in a series of oligodeoxynucleotides of the form (d(CG)n)2 and (d(GC)n)2. The yields of free base release from X-irradiated DNA films containing 2.5 waters/nucleotide were measured by HPLC as a function of oligomer length. For (d(CG)n)2, the ratio of the Gua yield to Cyt yield, R, was relatively constant at 2.4–2.5 for n = 2–4 and it decreased to 1.2 as n increased from 5 to 10. When Gua was moved to the 5′ end, for example going from d(CG)5 to d(GC)5, R dropped from 1.9 ± 0.1 to 1.1 ± 0.1. These effects are poorly described if the chemistry at the oligomer ends is assumed to be independent of the remainder of the oligomer. A mathematical model incorporating charge transfer through the base stack was derived to explain these effects. In addition, EPR was used to measure the yield of trapped-deoxyribose radicals at 4 K following X-irradiation at 4 K. The yield of free base release was substantially greater, by 50–100 nmol/J, than the yield of trapped-deoxyribose radicals. Therefore, a large fraction of free base release stems from a nonradical intermediate. For this intermediate...
Overlapping clones encoding rat liver pyruvate carboxylase (PC) have been isolated by screening a liver cDNA library and by performing rapid amplification of cDNA ends polymerase chain reaction on total liver RNA. The sequence of rat PC cDNA contains an open reading frame of 3537 nucleotides encoding a polypeptide of 1178 amino acids with a calculated Mr of 129848. This is flanked by a 5´ untranslated region of 66 bp and a 3´ untranslated region of 421 bp including the poly(A) tail. The inferred protein sequence is 96.6% identical with mouse and 96.3% identical with human PCs, 68.4% identical with mosquito PC and 53.5% identical with yeast PC isoenzymes PC1 and PC2. On the basis of partial proteolysis and sequence homology with PC from other organisms (yeast, mosquito, mouse and human) and with other biotin enzymes, three functional domains, namely the biotin carboxylation domain, the transcarboxylation domain and the biotinyl domain, have been identified. Comparison with the known structure of the biotin carboxylase subunit of Escherichia coli acetyl-CoA carboxylase [Waldrop, Rayment and Holden (1994) Biochemistry 33, 10249–10256] highlights the functional importance of 11 highly conserved residues. Northern analysis revealed that PC mRNA is highly expressed in rat liver...
The late-lytic region of the genome of bacteriophage 186 encodes the phage proteins that synthesize the complex viral particle and lyse the bacterial host. We report the completion of the DNA sequence of the late region and the assignment of 18 previously identified genes to open reading frames in the sequence. The 186 late region is similar to the late region of phage P2, sharing 26 genes of known function: the single gene for activation of late gene transcription, 6 genes for construction of DNA-containing heads, 16 for tail morphogenesis, and 3 for cell lysis. We identified two 186 late genes with unknown function; one is homologous to previously unrecognised genes in P2, HP1, and fCTX, and the other may modulate DNA packaging. The 186 late region, like the rest of the genome, lacks the lysogenic conversion genes that are carried by P2, allowing the 186 late region to be transcribed from only three late promoters rather than four. The relative absence of lysogenic conversion genes in 186 suggests that the two phages have evolved to use the lytic and lysogenic reproductive modes to different extents.; Roberto Portelli, Ian B. Dodd, Qing Xue and J. Barry Egan; Copyright 1998 Academic Press
We have cloned a cDNA containing the entire coding sequence of a marsupial (the brushtail possum, Trichosurus vulpecula) zona pellucida protein (ZPB). The open reading frame of 1,581 nt is predicted to encode a ZPB polypeptide of 527 amino acids which contains 20 cysteine residues, 7 potential N-linked glycosylation sites, a potential N-terminal signal peptide and a potential C-terminal trans-membrane domain, preceded by a furin proteolytic processing signal. Sequence comparisons between possum ZPB and orthologous polypeptides from 7 eutherian species and from Xenopus laevis, reveal the existence of a high degree of sequence similarity, particularly in the central portion of the molecule. Cysteine residues are highly conserved, and all nine species possess potential N-terminal signal peptide sequences and C-terminal trans-membrane domains of approximately the same length. In situ hybridisation revealed that expression of ZPB was restricted to oocytes of primordial and primary follicles of adult possums; no expression was detected in the surrounding granulosa cells. The broad conservation of ZPB sequence, structure and expression over a wide range of mammalian species, revealed by our studies, makes it unlikely that these features account for the different properties of the marsupial and eutherian zona pellucidae.; Article first published online: 4 JAN 1999
In most species, the synthesis of ADP-glucose (Glc) by the enzyme ADP-Glc pyrophosphorylase (AGPase) occurs entirely within the plastids in all tissues so far examined. However, in the endosperm of many, if not all grasses, a second form of AGPase synthesizes ADP-Glc outside the plastid, presumably in the cytosol. In this paper, we show that in the endosperm of wheat (Triticum aestivum), the cytosolic form accounts for most of the AGPase activity. Using a combination of molecular and biochemical approaches to identify the cytosolic and plastidial protein components of wheat endosperm AGPase we show that the large and small subunits of the cytosolic enzyme are encoded by genes previously thought to encode plastidial subunits, and that a gene, Ta.AGP.S.1, which encodes the small subunit of the cytosolic form of AGPase, also gives rise to a second transcript by the use of an alternate first exon. This second transcript encodes an AGPase small subunit with a transit peptide. However, we could not find a plastidial small subunit protein corresponding to this transcript. The protein sequence of the purified plastidial small subunit does not match precisely to that encoded by Ta.AGP.S.1 or to the predicted sequences of any other known gene from wheat or barley (Hordeum vulgare). Instead...
Expression of many microbial genes required for the utilisation of less favoured carbon sources is carbon catabolite repressed in the presence of a preferred carbon source such as D-glucose. In Aspergillus nidulans, creC mutants show derepression in the presence of D-glucose of some, but not all, systems normally subject to carbon catabolite repression. These mutants also fail to grow on some carbon sources, and show minor morphological impairment and altered sensitivity to toxic compounds including molybdate and acriflavin. The pleiotropic nature of the phenotype suggests a role for the creC gene product in the carbon regulatory cascade. The creC gene was cloned and found to encode a protein which contains five WD40 motifs. The sequence changes in three mutant alleles were found to lead to production of truncated proteins which lack one or more of the WD40 repeats. The similarity of the phenotypes conferred by these alleles implies that these alleles represent loss of function alleles. Deletion analysis also showed that at least the most C-terminal WD40 motif is required for function. The CreC protein is highly conserved relative to the Schizosaccharomyces pombe protein Yde3 – whose function is unknown – and human and mouse DMR-N9...
In mammals, before fertilization can occur, sperm have to bind to, and penetrate, the zona pellucida (ZP). In the laboratory mouse, which has been used as a model system for fertilization studies, sperm-ZP binding has been found to be mediated by a region at the carboxy terminal, encoded by exon 7 of the Zp3 gene. This region shows considerable interspecific sequence diversity with some evidence of adaptive evolution in mammals, suggesting that it may contribute to species-specific sperm-ZP binding. However, in a previous study of sequence diversity of ZP3 of three species of Australian murine rodents, we found an identical protein sequence of the region encoded by exon 7. Here, we expand this earlier study to determine the sequence diversity of this region in 68 out of the 130 species of Australasian murine rodents. Maximum likelihood analyses, using representatives of both New Guinean and Australian taxa, provide evidence of positive selection at three codons adjacent to, or within, the putative combining-site for sperm of ZP3, but this was not evident when the analysis was restricted to the Australian taxa. The latter group showed low levels of both intra- and inter-generic sequence divergences in the region encoded by exon 7 of Zp3...
tRNA molecules contain more than 80 chemically unique nucleotide base modifications that contribute to the chemical and physical diversity of RNAs as well as add to the overall fitness of the cell. For instance, base modifications have been shown to play a critical role in tRNA molecules by improving the fidelity and efficiency of translation. Most of this work has been carried out extensively in Gram-negative bacteria, however, the role of modified bases in tRNAs as they relate to thermostability, structure, and transcriptional regulation in Gram-positive bacteria, such as Bacillus subtilis and Bacillus anthracis, are not well characterized. Infections by Gram-positive bacteria that have become more resistant to established drug regiments are on the rise, making Gram-positive bacteria a serious threat to public safety.
My thesis work examined what role partial base modification of the tyrosyl-anticodon stem-loops (ASLTyr ) of B. subtilis and B. anthracis have on thermostability, structure, and transcriptional regulation. The ASLTyr molecules have three modified residues which include Queuine (Q34), 2-thiomethyl-N6-dimethylallyl (ms2i6A37), and pseudouridine (Y39). Differential Scanning Calorimetry (DSC) and UV melting were employed to examine the thermodynamic effects of partial modification on ASLTyr stability. The DSC and UV data indicated that the Y39 and i6A37 modifications improved the molecular stability of the ASL.
To examine the effects of partial base modification on ASLTyr structure...
Several classes of antitumor drugs are known to stabilize topoisomerase complexes in which the enzyme is covalently bound to a terminus of a DNA strand break. The DNA cleavage sites generally are different for each class of drugs. We have determined the DNA sequence locations of a large number of drug-stimulated cleavage sites of topoisomerase II, and find that the results provide a clue to the possible structure of the complexes and the origin of the drug-specific differences. Cleavage enhancements by VM-26 and amsacrine (m-AMSA), which are representative of different classes of topoisomerase II inhibitors, have strong dependence on bases directly at the sites of cleavage. The preferred bases were C at the 3' terminus for VM-26 and A at the 5' terminus for m-AMSA. Also, a region of dyad symmetry of 12 to 16 base pairs was detected about the enzyme cleavage positions. These results are consistent with those obtained with doxorubicin, although in the case of doxorubicin, cleavage requires the presence of an A at the 3' terminus of at least one the pair of breaks that constitute a double-strand cleavage (Capranico et al., Nucleic Acids Res., 1990, 18: 6611). These findings suggest that topoisomerase II inhibitors may stack with one or the other base pair flanking the enzyme cleavage sites.
We have used nuclear magnetic resonance (NMR) spectroscopy to measure the lifetimes of individual base-pairs in the palindromic DNA oligonucleotide 5'-d(CGCGAATTCGCG)-3' and in three other dodecamers with symmetrical base substitutions in the sites underlined. The resonances of the hydrogen-bonded imino protons in each of the substituted oligomers in the duplex form have been assigned using one dimensional nuclear Overhauser effect (1-D NOE) experiments. The lifetimes have been obtained from the dependence of selective longitudinal relaxation times and linewidths of the imino proton resonances on the concentration of base catalyst (Tris) at 25 degrees C and in the presence of 50 mM NaCl. The lifetimes of the central A.T base-pairs have been found to depend on base sequence. They are greatly increased in the dodecamer 5'-d(CGCAAATTTGCG)-3' which contains an A3T3 tract. The lifetimes of the central A.T base-pairs in 5'-d(CGCGAATTCGCG)-3', 5'-d(CGCTAATTAGCG)-3' and 5'-d(CGCCAATTGGCG)-3' are comparable. In all dodecamers, the lifetime of the A.T base-pair at the 5'-end of the AnTn tract is the shortest. The anomalous opening kinetics of the A.T base-pairs can be correlated to the bending properties of the corresponding sequences.
The spatial distribution of DNA base sequence A, C, G and T exhibit
selfsimilar fractal fluctuations and the corresponding power spectra follow
inverse power law form, which implies the following: (1) A scale invariant eddy
continuum, namely, the amplitudes of component eddies are related to each other
by a scale factor alone. In general, the scale factor is different for
different scale ranges and indicates a multifractal structure for the spatial
distribution of DNA base sequence. (2) Long-range spatial correlations of the
eddy fluctuations. Multifractal structure to space-time fluctuations and the
associated inverse power law form for power spectra is generic to spatially
extended dynamical systems in nature and is a signature of self-organized
criticality. The exact physical mechanism for the observed self-organized
criticality is not yet identified. The author has developed a general systems
theory where quantum mechanical laws emerge as self-consistent explanations for
the observed long-range space-time correlations, i.e. the apparently chaotic
fractal fluctuations are signatures of quantum-like chaos in dynamical systems.
The model provides unique quantification for the observed inverse power law
form for power spectra in terms of the statistical normal distribution. In this
paper it is shown that the frequency distribution of the bases C+G in all
available contiguous sequences for Human chromosome Y DNA exhibit model
predicted quantum-like chaos.; Comment: 23 pages...
The taxonomy of Metarhizium has been reassessed using sequence data and RAPD patterns from 123 isolates recognised as M. anisopliae, M. flavoviride or M. album. A high level of genetic diversity was found which was best resolved at the species/variety level by sequence data from the ITS and 28S rDNA D3 regions. RAPD patterns correlated closely with the sequence data and revealed a much greater degree of diversity useful for distinguishing strains within a variety. Ten distinct clades were revealed by the cladogram based on the combined sequence data set. Several major evolutionary lines were revealed, but the taxonomic relationships at the base of the tree are poorly resolved. The data support the monopoly of the M. anisopliae group, and recognise four clade within it. Two correspond with M. anisopliae var. anisopliae and M. anisopliae var. majus. The other two are described as new varieties based on their distinctive ITS sequence data: M. anisopliae vat. lepidiotum and M. anisopliae var. acridum vars nov. M. album, M. flavoviride var. flavoviride and M. flavoviride var. minus are recognised and redefined according to ITS sequence data. Three clades represent two new varieties, M. flavoviride var. novazealandicum and M. flavoviride var. pemphigum vats nov....