kdna minicircles sequences: Topics by Science.gov

Sample records for kdna minicircles sequences

The use of kDNA minicircle subclass relative abundance to differentiate between Leishmania (L.) infantum and Leishmania (L.) amazonensis.

PubMed

Ceccarelli, Marcello; Galluzzi, Luca; Diotallevi, Aurora; Andreoni, Francesca; Fowler, Hailie; Petersen, Christine; Vitale, Fabrizio; Magnani, Mauro

2017-05-16

Leishmaniasis is a neglected disease caused by many Leishmania species, belonging to subgenera Leishmania (Leishmania) and Leishmania (Viannia). Several qPCR-based molecular diagnostic approaches have been reported for detection and quantification of Leishmania species. Many of these approaches use the kinetoplast DNA (kDNA) minicircles as the target sequence. These assays had potential cross-species amplification, due to sequence similarity between Leishmania species. Previous works demonstrated discrimination between L. (Leishmania) and L. (Viannia) by SYBR green-based qPCR assays designed on kDNA, followed by melting or high-resolution melt (HRM) analysis. Importantly, these approaches cannot fully distinguish L. (L.) infantum from L. (L.) amazonensis, which can coexist in the same geographical area. DNA from 18 strains/isolates of L. (L.) infantum, L. (L.) amazonensis, L. (V.) braziliensis, L. (V.) panamensis, L. (V.) guyanensis, and 62 clinical samples from L. (L.) infantum-infected dogs were amplified by a previously developed qPCR (qPCR-ML) and subjected to HRM analysis; selected PCR products were sequenced using an ABI PRISM 310 Genetic Analyzer. Based on the obtained sequences, a new SYBR-green qPCR assay (qPCR-ama) intended to amplify a minicircle subclass more abundant in L. (L.) amazonensis was designed. The qPCR-ML followed by HRM analysis did not allow discrimination between L. (L.) amazonensis and L. (L.) infantum in 53.4% of cases. Hence, the novel SYBR green-based qPCR (qPCR-ama) has been tested. This assay achieved a detection limit of 0.1 pg of parasite DNA in samples spiked with host DNA and did not show cross amplification with Trypanosoma cruzi or host DNA. Although the qPCR-ama also amplified L. (L.) infantum strains, the C q values were dramatically increased compared to qPCR-ML. Therefore, the combined analysis of C q values from qPCR-ML and qPCR-ama allowed to distinguish L. (L.) infantum and L. (L.) amazonensis in 100% of tested samples
The effect of volume exclusion on the formation of DNA minicircle networks: implications to kinetoplast DNA

NASA Astrophysics Data System (ADS)

Diao, Y.; Hinson, K.; Sun, Y.; Arsuaga, J.

2015-10-01

Kinetoplast DNA (kDNA) is the mitochondrial of DNA of disease causing organisms such as Trypanosoma Brucei (T. Brucei) and Trypanosoma Cruzi (T. Cruzi). In most organisms, KDNA is made of thousands of small circular DNA molecules that are highly condensed and topologically linked forming a gigantic planar network. In our previous work we have developed mathematical and computational models to test the confinement hypothesis, that is that the formation of kDNA minicircle networks is a product of the high DNA condensation achieved in the mitochondrion of these organisms. In these studies we studied three parameters that characterize the growth of the network topology upon confinement: the critical percolation density, the mean saturation density and the mean valence (i.e. the number of mini circles topologically linked to any chosen minicircle). Experimental results on insect-infecting organisms showed that the mean valence is equal to three, forming a structure similar to those found in medieval chain-mails. These same studies hypothesized that this value of the mean valence was driven by the DNA excluded volume. Here we extend our previous work on kDNA by characterizing the effects of DNA excluded volume on the three descriptive parameters. Using computer simulations of polymer swelling we found that (1) in agreement with previous studies the linking probability of two minicircles does not decrease linearly with the distance between the two minicircles, (2) the mean valence grows linearly with the density of minicircles and decreases with the thickness of the excluded volume, (3) the critical percolation and mean saturation densities grow linearly with the thickness of the excluded volume. Our results therefore suggest that the swelling of the DNA molecule, due to electrostatic interactions, has relatively mild implications on the overall topology of the network. Our results also validate our topological descriptors since they appear to reflect the changes in the
Influence of DNA sequence on the structure of minicircles under torsional stress

PubMed Central

Wang, Qian; Irobalieva, Rossitza N.; Chiu, Wah; Schmid, Michael F.; Fogg, Jonathan M.; Zechiedrich, Lynn

2017-01-01

Abstract The sequence dependence of the conformational distribution of DNA under various levels of torsional stress is an important unsolved problem. Combining theory and coarse-grained simulations shows that the DNA sequence and a structural correlation due to topology constraints of a circle are the main factors that dictate the 3D structure of a 336 bp DNA minicircle under torsional stress. We found that DNA minicircle topoisomers can have multiple bend locations under high torsional stress and that the positions of these sharp bends are determined by the sequence, and by a positive mechanical correlation along the sequence. We showed that simulations and theory are able to provide sequence-specific information about individual DNA minicircles observed by cryo-electron tomography (cryo-ET). We provided a sequence-specific cryo-ET tomogram fitting of DNA minicircles, registering the sequence within the geometric features. Our results indicate that the conformational distribution of minicircles under torsional stress can be designed, which has important implications for using minicircle DNA for gene therapy. PMID:28609782
KDNA Genetic Signatures Obtained by LSSP-PCR Analysis of Leishmania (Leishmania) infantum Isolated from the New and the Old World

PubMed Central

Alvarenga, Janaína Sousa Campos; Ligeiro, Carla Maia; Gontijo, Célia Maria Ferreira; Cortes, Sofia; Campino, Lenea; Vago, Annamaria Ravara; Melo, Maria Norma

2012-01-01

Background Visceral Leishmaniasis (VL) caused by species from the Leishmania donovani complex is the most severe form of the disease, lethal if untreated. VL caused by Leishmania infantum is a zoonosis with an increasing number of human cases and millions of dogs infected in the Old and the New World. In this study, L. infantum (syn. L.chagasi) strains were isolated from human and canine VL cases. The strains were obtained from endemic areas from Brazil and Portugal and their genetic polymorphism was ascertained using the LSSP-PCR (Low-Stringency Single Specific Primer PCR) technique for analyzing the kinetoplastid DNA (kDNA) minicircles hypervariable region. Principal Findings KDNA genetic signatures obtained by minicircle LSSP-PCR analysis of forty L. infantum strains allowed the grouping of strains in several clades. Furthermore, LSSP-PCR profiles of L. infantum subpopulations were closely related to the host origin (human or canine). To our knowledge this is the first study which used this technique to compare genetic polymorphisms among strains of L. infantum originated from both the Old and the New World. Conclusions LSSP-PCR profiles obtained by analysis of L. infantum kDNA hypervariable region of parasites isolated from human cases and infected dogs from Brazil and Portugal exhibited a genetic correlation among isolates originated from the same reservoir, human or canine. However, no association has been detected among the kDNA signatures and the geographical origin of L. infantum strains. PMID:22912862
The kinetoplast DNA of the Australian trypanosome, Trypanosoma copemani, shares features with Trypanosoma cruzi and Trypanosoma lewisi.

PubMed

Botero, Adriana; Kapeller, Irit; Cooper, Crystal; Clode, Peta L; Shlomai, Joseph; Thompson, R C Andrew

2018-05-17

Kinetoplast DNA (kDNA) is the mitochondrial genome of trypanosomatids. It consists of a few dozen maxicircles and several thousand minicircles, all catenated topologically to form a two-dimensional DNA network. Minicircles are heterogeneous in size and sequence among species. They present one or several conserved regions that contain three highly conserved sequence blocks. CSB-1 (10 bp sequence) and CSB-2 (8 bp sequence) present lower interspecies homology, while CSB-3 (12 bp sequence) or the Universal Minicircle Sequence is conserved within most trypanosomatids. The Universal Minicircle Sequence is located at the replication origin of the minicircles, and is the binding site for the UMS binding protein, a protein involved in trypanosomatid survival and virulence. Here, we describe the structure and organisation of the kDNA of Trypanosoma copemani, a parasite that has been shown to infect mammalian cells and has been associated with the drastic decline of the endangered Australian marsupial, the woylie (Bettongia penicillata). Deep genomic sequencing showed that T. copemani presents two classes of minicircles that share sequence identity and organisation in the conserved sequence blocks with those of Trypanosoma cruzi and Trypanosoma lewisi. A 19,257 bp partial region of the maxicircle of T. copemani that contained the entire coding region was obtained. Comparative analysis of the T. copemani entire maxicircle coding region with the coding regions of T. cruzi and T. lewisi showed they share 71.05% and 71.28% identity, respectively. The shared features in the maxicircle/minicircle organisation and sequence between T. copemani and T. cruzi/T. lewisi suggest similarities in their process of kDNA replication, and are of significance in understanding the evolution of Australian trypanosomes. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Pathogenesis of Chagas' Disease: Parasite Persistence and Autoimmunity

PubMed Central

Teixeira, Antonio R. L.; Hecht, Mariana M.; Guimaro, Maria C.; Sousa, Alessandro O.; Nitz, Nadjar

2011-01-01

Summary: Acute Trypanosoma cruzi infections can be asymptomatic, but chronically infected individuals can die of Chagas' disease. The transfer of the parasite mitochondrial kinetoplast DNA (kDNA) minicircle to the genome of chagasic patients can explain the pathogenesis of the disease; in cases of Chagas' disease with evident cardiomyopathy, the kDNA minicircles integrate mainly into retrotransposons at several chromosomes, but the minicircles are also detected in coding regions of genes that regulate cell growth, differentiation, and immune responses. An accurate evaluation of the role played by the genotype alterations in the autoimmune rejection of self-tissues in Chagas' disease is achieved with the cross-kingdom chicken model system, which is refractory to T. cruzi infections. The inoculation of T. cruzi into embryonated eggs prior to incubation generates parasite-free chicks, which retain the kDNA minicircle sequence mainly in the macrochromosome coding genes. Crossbreeding transfers the kDNA mutations to the chicken progeny. The kDNA-mutated chickens develop severe cardiomyopathy in adult life and die of heart failure. The phenotyping of the lesions revealed that cytotoxic CD45, CD8+ γδ, and CD8α+ T lymphocytes carry out the rejection of the chicken heart. These results suggest that the inflammatory cardiomyopathy of Chagas' disease is a genetically driven autoimmune disease. PMID:21734249
A population study of the minicircles in Trypanosoma cruzi: predicting guide RNAs in the absence of empirical RNA editing.

PubMed

Thomas, Sean; Martinez, L L Isadora Trejo; Westenberger, Scott J; Sturm, Nancy R

2007-05-24

The structurally complex network of minicircles and maxicircles comprising the mitochondrial DNA of kinetoplastids mirrors the complexity of the RNA editing process that is required for faithful expression of encrypted maxicircle genes. Although a few of the guide RNAs that direct this editing process have been discovered on maxicircles, guide RNAs are mostly found on the minicircles. The nuclear and maxicircle genomes have been sequenced and assembled for Trypanosoma cruzi, the causative agent of Chagas disease, however the complement of 1.4-kb minicircles, carrying four guide RNA genes per molecule in this parasite, has been less thoroughly characterised. Fifty-four CL Brener and 53 Esmeraldo strain minicircle sequence reads were extracted from T. cruzi whole genome shotgun sequencing data. With these sequences and all published T. cruzi minicircle sequences, 108 unique guide RNAs from all known T. cruzi minicircle sequences and two guide RNAs from the CL Brener maxicircle were predicted using a local alignment algorithm and mapped onto predicted or experimentally determined sequences of edited maxicircle open reading frames. For half of the sequences no statistically significant guide RNA could be assigned. Likely positions of these unidentified gRNAs in T. cruzi minicircle sequences are estimated using a simple Hidden Markov Model. With the local alignment predictions as a standard, the HMM had an ~85% chance of correctly identifying at least 20 nucleotides of guide RNA from a given minicircle sequence. Inter-minicircle recombination was documented. Variable regions contain species-specific areas of distinct nucleotide preference. Two maxicircle guide RNA genes were found. The identification of new minicircle sequences and the further characterization of all published minicircles are presented, including the first observation of recombination between minicircles. Extrapolation suggests a level of 4% recombinants in the population, supporting a relatively high
Production of DNA minicircles less than 250 base pairs through a novel concentrated DNA circularization assay enabling minicircle design with NF-κB inhibition activity

PubMed Central

Thibault, Thomas; Degrouard, Jeril; Baril, Patrick; Pichon, Chantal; Midoux, Patrick

2017-01-01

Abstract Double-stranded DNA minicircles of less than 1000 bp in length have great interest in both fundamental research and therapeutic applications. Although minicircles have shown promising activity in gene therapy thanks to their good biostability and better intracellular trafficking, minicircles down to 250 bp in size have not yet been investigated from the test tube to the cell for lack of an efficient production method. Herein, we report a novel versatile plasmid-free method for the production of DNA minicircles comprising fewer than 250 bp. We designed a linear nicked DNA double-stranded oligonucleotide blunt-ended substrate for efficient minicircle production in a ligase-mediated and bending protein-assisted circularization reaction at high DNA concentration of 2 μM. This one pot multi-step reaction based-method yields hundreds of micrograms of minicircle with sequences of any base composition and position and containing or not a variety of site-specifically chemical modifications or physiological supercoiling. Biochemical and cellular studies were then conducted to design a 95 bp minicircle capable of binding in vitro two NF-κB transcription factors per minicircle and to efficiently inhibiting NF-κB-dependent transcriptional activity in human cells. Therefore, our production method could pave the way for the design of minicircles as new decoy nucleic acids. PMID:27899652
Production of DNA minicircles less than 250 base pairs through a novel concentrated DNA circularization assay enabling minicircle design with NF-κB inhibition activity.

PubMed

Thibault, Thomas; Degrouard, Jeril; Baril, Patrick; Pichon, Chantal; Midoux, Patrick; Malinge, Jean-Marc

2017-03-17

Double-stranded DNA minicircles of less than 1000 bp in length have great interest in both fundamental research and therapeutic applications. Although minicircles have shown promising activity in gene therapy thanks to their good biostability and better intracellular trafficking, minicircles down to 250 bp in size have not yet been investigated from the test tube to the cell for lack of an efficient production method. Herein, we report a novel versatile plasmid-free method for the production of DNA minicircles comprising fewer than 250 bp. We designed a linear nicked DNA double-stranded oligonucleotide blunt-ended substrate for efficient minicircle production in a ligase-mediated and bending protein-assisted circularization reaction at high DNA concentration of 2 μM. This one pot multi-step reaction based-method yields hundreds of micrograms of minicircle with sequences of any base composition and position and containing or not a variety of site-specifically chemical modifications or physiological supercoiling. Biochemical and cellular studies were then conducted to design a 95 bp minicircle capable of binding in vitro two NF-κB transcription factors per minicircle and to efficiently inhibiting NF-κB-dependent transcriptional activity in human cells. Therefore, our production method could pave the way for the design of minicircles as new decoy nucleic acids. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Enhanced gene disruption by programmable nucleases delivered by a minicircle vector.

PubMed

Dad, A-B K; Ramakrishna, S; Song, M; Kim, H

2014-11-01

Targeted genetic modification using programmable nucleases such as zinc finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs) is of great value in biomedical research, medicine and biotechnology. Minicircle vectors, which lack extraneous bacterial sequences, have several advantages over conventional plasmids for transgene delivery. Here, for the first time, we delivered programmable nucleases into human cells using transient transfection of a minicircle vector and compared the results with those obtained using a conventional plasmid. Surrogate reporter assays and T7 endonuclease analyses revealed that cells in the minicircle vector group displayed significantly higher mutation frequencies at the target sites than those in the conventional plasmid group. Quantitative PCR and reverse transcription-PCR showed higher vector copy number and programmable nuclease transcript levels, respectively, in 293T cells after minicircle versus conventional plasmid vector transfection. In addition, tryphan blue staining and flow cytometry after annexin V and propidium iodide staining showed that cell viability was also significantly higher in the minicircle group than in the conventional plasmid group. Taken together, our results show that gene disruption using minicircle vector-mediated delivery of ZFNs and TALENs is a more efficient, safer and less toxic method than using a conventional plasmid, and indicate that the minicircle vector could serve as an advanced delivery method for programmable nucleases.
Demonstration of mRNA editing and localization of guide RNA genes in kinetoplast-mitochondria of the plant trypanosomatid Phytomonas serpens.

PubMed

Maslov, D A; Hollar, L; Haghighat, P; Nawathean, P

1998-06-01

Maxicircle molecules of kDNA in several isolates of Phytomonas were detected by hybridization with the 12S rRNA gene probe from Leishmania tarentolae. The estimated size of maxicircles is isolate-specific and varies from 27 to 36 kb. Fully edited and polyadenylated mRNA for kinetoplast-encoded ribosomal protein S12 (RPS12) was found in the steady-state kinetoplast RNA isolated from Phytomonas serpens strain 1G. Two minicircles (1.45 kb) from this strain were also sequenced. Each minicircle contains two 120 bp conserved regions positioned 180 degrees apart, a region enriched with G and T bases and a variable region. One minicircle encodes a gRNA for the first block of editing of RPSl2 mRNA, and the other encodes a gRNA with unknown function. A gRNA gene for the second block of RPSl2 was found on a minicircle sequenced previously. On each minicircle, a gRNA gene is located in the variable region in a similar position and orientation with respect to the conserved regions.
Kinetoplast DNA minicircles of phloem-restricted Phytomonas associated with wilt diseases of coconut and oil palms have a two-domain structure.

PubMed

Dollet, M; Sturm, N R; Ahomadegbe, J C; Campbell, D A

2001-11-27

We report the cloning and sequencing of the first minicircle from a phloem-restricted, pathogenic Phytomonas sp. (Hart 1) isolated from a coconut palm with hartrot disease. The minicircle possessed a two-domain structure of two conserved regions, each containing three conserved sequence blocks (CSB). Based on the sequence around CSB 3 from Hart 1, PCR primers were designed to allow specific amplification of Phytomonas minicircles. This primer pair demonstrated specificity for at least six groups of plant trypanosomatids and did not amplify from insect trypanosomatids. The PCR results were consistent with a two-domain structure for other plant trypanosomatids.
Phylogenetic analysis of Bolivian bat trypanosomes of the subgenus schizotrypanum based on cytochrome B sequence and minicircle analyses.

PubMed

García, Lineth; Ortiz, Sylvia; Osorio, Gonzalo; Torrico, Mary Cruz; Torrico, Faustino; Solari, Aldo

2012-01-01

The aim of this study was to establish the phylogenetic relationships of trypanosomes present in blood samples of Bolivian Carollia bats. Eighteen cloned stocks were isolated from 115 bats belonging to Carollia perspicillata (Phyllostomidae) from three Amazonian areas of the Chapare Province of Bolivia and studied by xenodiagnosis using the vectors Rhodnius robustus and Triatoma infestans (Trypanosoma cruzi marenkellei) or haemoculture (Trypanosoma dionisii). The PCR DNA amplified was analyzed by nucleotide sequences of maxicircles encoding cytochrome b and by means of the molecular size of hyper variable regions of minicircles. Ten samples were classified as Trypanosoma cruzi marinkellei and 8 samples as Trypanosoma dionisii. The two species have a different molecular size profile with respect to the amplified regions of minicircles and also with respect to Trypanosoma cruzi and Trypanosoma rangeli used for comparative purpose. We conclude the presence of two species of bat trypanosomes in these samples, which can clearly be identified by the methods used in this study. The presence of these trypanosomes in Amazonian bats is discussed.
Phylogenetic Analysis of Bolivian Bat Trypanosomes of the Subgenus Schizotrypanum Based on Cytochrome b Sequence and Minicircle Analyses

PubMed Central

García, Lineth; Ortiz, Sylvia; Osorio, Gonzalo; Torrico, Mary Cruz; Torrico, Faustino; Solari, Aldo

2012-01-01

The aim of this study was to establish the phylogenetic relationships of trypanosomes present in blood samples of Bolivian Carollia bats. Eighteen cloned stocks were isolated from 115 bats belonging to Carollia perspicillata (Phyllostomidae) from three Amazonian areas of the Chapare Province of Bolivia and studied by xenodiagnosis using the vectors Rhodnius robustus and Triatoma infestans (Trypanosoma cruzi marenkellei) or haemoculture (Trypanosoma dionisii). The PCR DNA amplified was analyzed by nucleotide sequences of maxicircles encoding cytochrome b and by means of the molecular size of hyper variable regions of minicircles. Ten samples were classified as Trypanosoma cruzi marinkellei and 8 samples as Trypanosoma dionisii. The two species have a different molecular size profile with respect to the amplified regions of minicircles and also with respect to Trypanosoma cruzi and Trypanosoma rangeli used for comparative purpose. We conclude the presence of two species of bat trypanosomes in these samples, which can clearly be identified by the methods used in this study. The presence of these trypanosomes in Amazonian bats is discussed. PMID:22590570
First comparative insight into the architecture of COI mitochondrial minicircle molecules of dicyemids reveals marked inter-species variation.

PubMed

Catalano, Sarah R; Whittington, Ian D; Donnellan, Stephen C; Bertozzi, Terry; Gillanders, Bronwyn M

2015-07-01

Dicyemids, poorly known parasites of benthic cephalopods, are one of the few phyla in which mitochondrial (mt) genome architecture departs from the typical ~16 kb circular metazoan genome. In addition to a putative circular genome, a series of mt minicircles that each comprises the mt encoded units (I-III) of the cytochrome c oxidase complex have been reported. Whether the structure of the mt minicircles is a consistent feature among dicyemid species is unknown. Here we analyse the complete cytochrome c oxidase subunit I (COI) minicircle molecule, containing the COI gene and an associated non-coding region (NCR), for ten dicyemid species, allowing for first time comparisons between species of minicircle architecture, NCR function and inferences of minicircle replication. Divergence in COI nucleotide sequences between dicyemid species was high (average net divergence = 31.6%) while within species diversity was lower (average net divergence = 0.2%). The NCR and putative 5' section of the COI gene were highly divergent between dicyemid species (average net nucleotide divergence of putative 5' COI section = 61.1%). No tRNA genes were found in the NCR, although palindrome sequences with the potential to form stem-loop structures were identified in some species, which may play a role in transcription or other biological processes.
A Simple And Rapid Minicircle DNA Vector Manufacturing System

PubMed Central

Kay, Mark A; He, Cheng-Yi; Chen, Zhi-Ying

2010-01-01

Minicircle DNA vectors consisting of a circular expression cassette devoid of the bacterial plasmid DNA backbone provides several advantages including sustained transgene expression in quiescent cells/tissues. Their use has been limited by labor-intensive production. We report on a strategy for making multiple genetic modifications in E.coli to construct a producer strain that stably expresses a set of inducible minicircle-assembly enzymes, the øC31-integrase and I-SceI homing-endonuclease. This bacterial strain is capable of producing highly purified minicircle yields in the same time frame as routine plasmid DNA. It is now feasible for minicircle DNA vectors to replace routine plasmids in mammalian transgene expression studies. PMID:21102455
Studies of G-quadruplexes formed within self-assembled DNA mini-circles.

PubMed

Klejevskaja, Beata; Pyne, Alice L B; Reynolds, Matthew; Shivalingam, Arun; Thorogate, Richard; Hoogenboom, Bart W; Ying, Liming; Vilar, Ramon

2016-10-13

We have developed self-assembled DNA mini-circles that contain a G-quadruplex-forming sequence from the c-Myc oncogene promoter and demonstrate by FRET that the G-quadruplex unfolding kinetics are 10-fold slower than for the simpler 24-mer G-quadruplex that is commonly used for FRET experiments.
Accuracy of qPCR for quantifying Leishmania kDNA in different skin layers of patients with American tegumentary leishmaniasis.

PubMed

Sevilha-Santos, L; Dos Santos Júnior, A C M; Medeiros-Silva, V; Bergmann, J O; da Silva, E F; Segato, L F; Arabi, A Y M; de Paula, N A; Sampaio, R N R; Lima, B D; Gomes, C M

2018-05-03

Superficial swab sampling of American tegumentary leishmaniasis (ATL) lesions shows higher amounts of Leishmania than those from biopsy. Subcutaneous involvement is also important in ATL, but parasite quantification according to lesion depth has not been evaluated. We aim to present the best depth at which sampling should be performed for molecular exams of ATL. Patients with a clinical presentation compatible with ATL were allocated to ATL and control groups. Qualitative and quantitative qPCR assays were performed using SYBR Green and primers amplifying the kDNA minicircle of Leishmania spp. in different skin layers, including the epidermis, the superior dermis, the inferior dermis, and the hypodermis. Fifty-nine patients were included in this study, including 40 who had been diagnosed with ATL and 19 controls. The number of parasites was greater in samples of the epidermis and superior dermis (159.1 × 10 6 , range 4.0-781.7, and 75.4 × 10 6 , range 8.0-244.5, mean Leishmania parasite equivalents per μg of tissue DNA, respectively) than those in samples of the inferior dermis and hypodermis (54.6, range 8.0-256.6, and 16.8 × 10 6 , range 8.0-24.1, mean Leishmania parasite equivalents per μg of tissue DNA, respectively). The best diagnostic accuracy was achieved in the superior dermis (77.9%) and was significantly greater than that in the hypodermis (63.3%; p 0.039). We conclude that superficial sampling can retrieve a greater quantity of parasites. Future studies of the role of transepidermal elimination as a mechanism of host defence in ATL must be performed as there is a considerable quantity of Leishmania kDNA in the epidermis. Copyright © 2018 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
Diagnosis of clinical samples spotted on FTA cards using PCR-based methods.

PubMed

Jamjoom, Manal; Sultan, Amal H

2009-04-01

The broad clinical presentation of Leishmaniasis makes the diagnosis of current and past cases of this disease rather difficult. Differential diagnosis is important because diseases caused by other aetiologies and a clinical spectrum similar to that of leishmaniasis (e.g. leprosy, skin cancers and tuberculosis for CL; malaria and schistosomiasis for VL) are often present in endemic areas of endemicity. Presently, a variety of methods have been developed and tested to aid the identification and diagnosis of Leishmania. The advent of the PCR technology has opened new channels for the diagnosis of leishmaniasis in a variety of clinical materials. PCR is a simple, rapid procedure that has been adapted for diagnosis of leishmaniasis. A range of tools is currently available for the diagnosis and identification of leishmaniasis and Leishmania species, respectively. However, none of these diagnostic tools are examined and tested using samples spotted on FTA cards. Three different PCR-based approaches were examined including: kDNA minicircle, Leishmania 18S rRNA gene and PCR-RFLP of Intergenic region of ribosomal protein. PCR primers were designed that sit within the coding sequences of genes (relatively well conserved) but which amplify across the intervening intergenic sequence (relatively variable). These were used in PCR-RFLP on reference isolates of 10 of the most important Leishmania species: L. donovani, L. infantum, L. major & L. tropica. Digestion of PCR products with restriction enzymes produced species-specific restriction patterns allowed discrimination of reference isolates. The kDNA minicircle primers are highly sensitive in diagnosis of both bone marrow and skin smears from FTA cards. Leishmania 18S rRNA gene conserved region is sensitive in identification of bone marrow smear but less sensitive in diagnosing skin smears. The intergenic nested PCR-RFLP using P5 & P6 as well as P1 & P2 newly designed primers showed high level of reproducibility and sensitivity
B-DNA to Z-DNA structural transitions in the SV40 enhancer: stabilization of Z-DNA in negatively supercoiled DNA minicircles

NASA Technical Reports Server (NTRS)

Gruskin, E. A.; Rich, A.

1993-01-01

During replication and transcription, the SV40 control region is subjected to significant levels of DNA unwinding. There are three, alternating purine-pyrimidine tracts within this region that can adopt the Z-DNA conformation in response to negative superhelix density: a single copy of ACACACAT and two copies of ATGCATGC. Since the control region is essential for both efficient transcription and replication, B-DNA to Z-DNA transitions in these vital sequence tracts may have significant biological consequences. We have synthesized DNA minicircles to detect B-DNA to Z-DNA transitions in the SV40 enhancer, and to determine the negative superhelix density required to stabilize the Z-DNA. A variety of DNA sequences, including the entire SV40 enhancer and the two segments of the enhancer with alternating purine-pyrimidine tracts, were incorporated into topologically relaxed minicircles. Negative supercoils were generated, and the resulting topoisomers were resolved by electrophoresis. Using an anti-Z-DNA Fab and an electrophoretic mobility shift assay, Z-DNA was detected in the enhancer-containing minicircles at a superhelix density of -0.05. Fab saturation binding experiments demonstrated that three, independent Z-DNA tracts were stabilized in the supercoiled minicircles. Two other minicircles, each with one of the two alternating purine-pyrimidine tracts, also contained single Z-DNA sites. These results confirm the identities of the Z-DNA-forming sequences within the control region. Moreover, the B-DNA to Z-DNA transitions were detected at superhelix densities observed during normal replication and transcription processes in the SV40 life cycle.

Simple methodology to directly genotype Trypanosoma cruzi discrete typing units in single and mixed infections from human blood samples.

PubMed

Bontempi, Iván A; Bizai, María L; Ortiz, Sylvia; Manattini, Silvia; Fabbro, Diana; Solari, Aldo; Diez, Cristina

2016-09-01

Different DNA markers to genotype Trypanosoma cruzi are now available. However, due to the low quantity of parasites present in biological samples, DNA markers with high copy number like kinetoplast minicircles are needed. The aim of this study was to complete a DNA assay called minicircle lineage specific-PCR (MLS-PCR) previously developed to genotype the T. cruzi DTUs TcV and TcVI, in order to genotype DTUs TcI and TcII and to improve TcVI detection. We screened kinetoplast minicircle hypervariable sequences from cloned PCR products from reference strains belonging to the mentioned DTUs using specific kDNA probes. With the four highly specific sequences selected, we designed primers to be used in the MLS-PCR to directly genotype T. cruzi from biological samples. High specificity and sensitivity were obtained when we evaluated the new approach for TcI, TcII, TcV and TcVI genotyping in twenty two T. cruzi reference strains. Afterward, we compared it with hybridization tests using specific kDNA probes in 32 blood samples from chronic chagasic patients from North Eastern Argentina. With both tests we were able to genotype 94% of the samples and the concordance between them was very good (kappa=0.855). The most frequent T. cruzi DTUs detected were TcV and TcVI, followed by TcII and much lower TcI. A unique T. cruzi DTU was detected in 18 samples meantime more than one in the remaining; being TcV and TcVI the most frequent association. A high percentage of mixed detections were obtained with both assays and its impact was discussed. Copyright © 2016 Elsevier B.V. All rights reserved.
Mitochondrial genome deletions and minicircles are common in lice (Insecta: Phthiraptera)

PubMed Central

2011-01-01

Background The gene composition, gene order and structure of the mitochondrial genome are remarkably stable across bilaterian animals. Lice (Insecta: Phthiraptera) are a major exception to this genomic stability in that the canonical single chromosome with 37 genes found in almost all other bilaterians has been lost in multiple lineages in favour of multiple, minicircular chromosomes with less than 37 genes on each chromosome. Results Minicircular mt genomes are found in six of the ten louse species examined to date and three types of minicircles were identified: heteroplasmic minicircles which coexist with full sized mt genomes (type 1); multigene chromosomes with short, simple control regions, we infer that the genome consists of several such chromosomes (type 2); and multiple, single to three gene chromosomes with large, complex control regions (type 3). Mapping minicircle types onto a phylogenetic tree of lice fails to show a pattern of their occurrence consistent with an evolutionary series of minicircle types. Analysis of the nuclear-encoded, mitochondrially-targetted genes inferred from the body louse, Pediculus, suggests that the loss of mitochondrial single-stranded binding protein (mtSSB) may be responsible for the presence of minicircles in at least species with the most derived type 3 minicircles (Pediculus, Damalinia). Conclusions Minicircular mt genomes are common in lice and appear to have arisen multiple times within the group. Life history adaptive explanations which attribute minicircular mt genomes in lice to the adoption of blood-feeding in the Anoplura are not supported by this expanded data set as minicircles are found in multiple non-blood feeding louse groups but are not found in the blood-feeding genus Heterodoxus. In contrast, a mechanist explanation based on the loss of mtSSB suggests that minicircles may be selectively favoured due to the incapacity of the mt replisome to synthesize long replicative products without mtSSB and thus the
Mitochondrial genome deletions and minicircles are common in lice (Insecta: Phthiraptera).

PubMed

Cameron, Stephen L; Yoshizawa, Kazunori; Mizukoshi, Atsushi; Whiting, Michael F; Johnson, Kevin P

2011-08-04

The gene composition, gene order and structure of the mitochondrial genome are remarkably stable across bilaterian animals. Lice (Insecta: Phthiraptera) are a major exception to this genomic stability in that the canonical single chromosome with 37 genes found in almost all other bilaterians has been lost in multiple lineages in favour of multiple, minicircular chromosomes with less than 37 genes on each chromosome. Minicircular mt genomes are found in six of the ten louse species examined to date and three types of minicircles were identified: heteroplasmic minicircles which coexist with full sized mt genomes (type 1); multigene chromosomes with short, simple control regions, we infer that the genome consists of several such chromosomes (type 2); and multiple, single to three gene chromosomes with large, complex control regions (type 3). Mapping minicircle types onto a phylogenetic tree of lice fails to show a pattern of their occurrence consistent with an evolutionary series of minicircle types. Analysis of the nuclear-encoded, mitochondrially-targetted genes inferred from the body louse, Pediculus, suggests that the loss of mitochondrial single-stranded binding protein (mtSSB) may be responsible for the presence of minicircles in at least species with the most derived type 3 minicircles (Pediculus, Damalinia). Minicircular mt genomes are common in lice and appear to have arisen multiple times within the group. Life history adaptive explanations which attribute minicircular mt genomes in lice to the adoption of blood-feeding in the Anoplura are not supported by this expanded data set as minicircles are found in multiple non-blood feeding louse groups but are not found in the blood-feeding genus Heterodoxus. In contrast, a mechanist explanation based on the loss of mtSSB suggests that minicircles may be selectively favoured due to the incapacity of the mt replisome to synthesize long replicative products without mtSSB and thus the loss of this gene lead to the
Genetic diversity of Leishmania donovani that causes cutaneous leishmaniasis in Sri Lanka: a cross sectional study with regional comparisons.

PubMed

Kariyawasam, Udeshika Lakmini; Selvapandiyan, Angamuthu; Rai, Keshav; Wani, Tasaduq Hussain; Ahuja, Kavita; Beg, Mizra Adil; Premathilake, Hasitha Upendra; Bhattarai, Narayan Raj; Siriwardena, Yamuna Deepani; Zhong, Daibin; Zhou, Guofa; Rijal, Suman; Nakhasi, Hira; Karunaweera, Nadira D

2017-12-22

Leishmania donovani is the etiological agent of visceral leishmaniasis (VL) in the Indian subcontinent. However, it is also known to cause cutaneous leishmaniasis (CL) in Sri Lanka. Sri Lankan L. donovani differs from other L. donovani strains, both at the molecular and biochemical level. To investigate the different species or strain-specific differences of L. donovani in Sri Lanka we evaluated sequence variation of the kinetoplastid DNA (kDNA). Parasites isolated from skin lesions of 34 CL patients and bone marrow aspirates from 4 VL patients were genotyped using the kDNA minicircle PCR analysis. A total of 301 minicircle sequences that included sequences from Sri Lanka, India, Nepal and six reference species of Leishmania were analyzed. Haplotype diversity of Sri Lankan isolates were high (H d = 0.757) with strong inter-geographical genetic differentiation (F ST > 0.25). In this study, L. donovani isolates clustered according to their geographic origin, while Sri Lankan isolates formed a separate cluster and were clearly distinct from other Leishmania species. Within the Sri Lankan group, there were three distinct sub-clusters formed, from CL patients who responded to standard antimony therapy, CL patients who responded poorly to antimony therapy and from VL patients. There was no specific clustering of sequences based on geographical origin within Sri Lanka. This study reveals high levels of haplotype diversity of L. donovani in Sri Lanka with a distinct genetic association with clinically relevant phenotypic characteristics. The use of genetic tools to identify clinically relevant features of Leishmania parasites has important therapeutic implications for leishmaniasis.
Inhibition of Autoimmune Chagas-Like Heart Disease by Bone Marrow Transplantation

PubMed Central

Guimaro, Maria C.; Alves, Rozeneide M.; Rose, Ester; Sousa, Alessandro O.; de Cássia Rosa, Ana; Hecht, Mariana M.; Sousa, Marcelo V.; Andrade, Rafael R.; Vital, Tamires; Plachy, Jiří; Nitz, Nadjar; Hejnar, Jiří; Gomes, Clever C.; L. Teixeira, Antonio R.

2014-01-01

Background Infection with the protozoan Trypanosoma cruzi manifests in mammals as Chagas heart disease. The treatment available for chagasic cardiomyopathy is unsatisfactory. Methods/Principal Findings To study the disease pathology and its inhibition, we employed a syngeneic chicken model refractory to T. cruzi in which chickens hatched from T. cruzi inoculated eggs retained parasite kDNA (1.4 kb) minicircles. Southern blotting with EcoRI genomic DNA digests revealed main 18 and 20 kb bands by hybridization with a radiolabeled minicircle sequence. Breeding these chickens generated kDNA-mutated F1, F2, and F3 progeny. A targeted-primer TAIL-PCR (tpTAIL-PCR) technique was employed to detect the kDNA integrations. Histocompatible reporter heart grafts were used to detect ongoing inflammatory cardiomyopathy in kDNA-mutated chickens. Fluorochromes were used to label bone marrow CD3+, CD28+, and CD45+ precursors of the thymus-dependent CD8α+ and CD8β+ effector cells that expressed TCRγδ, vβ1 and vβ2 receptors, which infiltrated the adult hearts and the reporter heart grafts. Conclusions/Significance Genome modifications in kDNA-mutated chickens can be associated with disruption of immune tolerance to compatible heart grafts and with rejection of the adult host's heart and reporter graft, as well as tissue destruction by effector lymphocytes. Autoimmune heart rejection was largely observed in chickens with kDNA mutations in retrotransposons and in coding genes with roles in cell structure, metabolism, growth, and differentiation. Moreover, killing the sick kDNA-mutated bone marrow cells with cytostatic and anti-folate drugs and transplanting healthy marrow cells inhibited heart rejection. We report here for the first time that healthy bone marrow cells inhibited heart pathology in kDNA+ chickens and thus prevented the genetically driven clinical manifestations of the disease. PMID:25521296
Rational sub-division of plant trypanosomes (Phytomonas spp.) based on minicircle conserved region analysis.

PubMed

Sturm, Nancy R; Dollet, Michel; Lukes, Julius; Campbell, David A

2007-09-01

The sequences of minicircle conserved regions from various plant trypanosomatids have been determined and analyzed. The goal of this study was to add another tool to the arsenal of molecular probes for distinguishing between the different trypanosomatids occurring in plants: systemic trypanosomatids multiplying in the sap, those from the laticiferous tubes, and those developing in fruits, seeds or flowers but not in the plant itself and that are frequently considered as opportunistic insect trypanosomatids. As some plant intraphloemic trypanosomatids are the causative agents of important diseases, a clear definition of the different types of trypanosomatids is critical. The conserved region of the mitochondrial minicircle provides several specific features in a small sequence region containing three functionally elements required for minicircle replication. Trees generated from the analysis recapitulated trees drawn from analyses of isoenzymes, RAPD, and particular gene sequences, supporting the validity of the small region used in this work. Three groups of isolates were significant and in accordance with previous work. The peculiarity of phloem-restricted trypanosomatids associated with wilts of coconut and oil palm in Latin America - group H - is confirmed. In agreement with previous studies on their biological and serological properties the results highlighted this group called 'phloemicola'. It always differentiated from all other latex and fruit isolates or opportunistic trypanosomatids, like insect trypanosomatids. We can assert that phloemicola is the only well-defined taxon among all plant trypanosomatids. A group of non-pathogenic latex isolates from South American euphorbs (G), and a heterogenous group (A) including one fruit, one possible latex and one insect isolate are clearly distinct groups. The group of Mediterranean isolates from latex (D), even with a low boostrap, stood out well from other groups. The remainder of the isolates fell into a
Polymerase Chain Reaction Detection of Leishmania kDNA from the Urine of Peruvian Patients with Cutaneous and Mucocutaneous Leishmaniasis

PubMed Central

Veland, Nicolas; Espinosa, Diego; Valencia, Braulio Mark; Ramos, Ana Pilar; Calderon, Flor; Arevalo, Jorge; Low, Donald E.; Llanos-Cuentas, Alejandro; Boggild, Andrea K.

2011-01-01

We hypothesized that Leishmania kDNA may be present in urine of patients with cutaneous leishmaniasis (CL). Urine samples and standard diagnostic specimens were collected from patients with skin lesions. kDNA polymerase chain reaction (PCR) was performed on samples from patients and 10 healthy volunteers from non-endemic areas. Eighty-six of 108 patients were diagnosed with CL and 18 (21%) had detectable Leishmania Viannia kDNA in the urine. Sensitivity and specificity were 20.9% (95% confidence interval [CI] 12.3–29.5%) and 100%. Six of 8 patients with mucocutaneous involvement had detectable kDNA in urine versus 12 of 78 patients with isolated cutaneous disease (P < 0.001). L. (V.) braziliensis (N = 3), L. (V.) guyanensis (N = 6), and L. (V.) peruviana (N = 3) were identified from urine. No healthy volunteer or patient with an alternate diagnosis had detectable kDNA in urine. Sensitivity of urine PCR is sub-optimal for diagnosis. On the basis of these preliminary data in a small number of patients, detectable kDNA in urine may identify less localized forms of infection and inform treatment decisions. PMID:21460009
Novel Minicircle Vector for Gene Therapy in Murine Myocardial Infarction

PubMed Central

Huang, Mei; Chen, ZhiYing; Hu, Shijun; Jia, Fangjun; Li, Zongjin; Hoyt, Grant; Robbins, Robert C.; Kay, Mark A.; Wu, Joseph C.

2011-01-01

Background Conventional plasmids for gene therapy produce low-level and short-term gene expression. In this study, we develop a novel non-viral vector which robustly and persistently expresses the hypoxia inducible factor-1 alpha (HIF-1α) therapeutic gene in the heart, leading to functional benefits following myocardial infarction (MI). Methods and Results We first created minicircles carrying double fusion (MC-DF) reporter gene consisting of firefly luciferase and enhanced green fluorescent protein (Fluc-eGFP) for noninvasive measurement of transfection efficiency. Mouse C2C12 myoblasts and normal FVB mice were used for in vitro and in vivo confirmation, respectively. Bioluminescence imaging (BLI) showed stable minicircle gene expression in the heart for >12 weeks and the activity level was 5.6±1.2 fold stronger than regular plasmid at day 4 (P<0.01). Next, we created minicircles carrying hypoxia inducible factor-1 alpha (MC-HIF-1α) therapeutic gene for treatment of MI. Adult FVB mice underwent LAD ligation and were injected intramyocardially with (1) MC-HIF-1α, (2) regular plasmid carrying HIF-1α (PL-HIF-1α) as positive control, and (3) PBS as negative control (n=10/group). Echocardiographic study showed a significantly greater improvement of left ventricular ejection fraction (LVEF) in the minicircle group (51.3%±3.6%) compared to regular plasmid group (42.3%±4.1%) and saline group (30.5%±2.8%) at week 4 (P<0.05 for both). Histology demonstrated increased neoangiogenesis in both treatment groups. Finally, Western blot showed minicircles express >50% higher HIF-1α level than regular plasmid. Conclusion Taken together, this is the first study to demonstrate that minicircles can significantly improve transfection efficiency, duration of transgene expression, and cardiac contractility. Given the serious drawbacks associated with most viral vectors, we believe this novel non-viral vector can be of great value for cardiac gene therapy protocols. PMID
Detection and Characterization of Leishmania (Leishmania) and Leishmania (Viannia) by SYBR Green-Based Real-Time PCR and High Resolution Melt Analysis Targeting Kinetoplast Minicircle DNA

PubMed Central

Ceccarelli, Marcello; Galluzzi, Luca; Migliazzo, Antonella; Magnani, Mauro

2014-01-01

Leishmaniasis is a neglected disease with a broad clinical spectrum which includes asymptomatic infection. A thorough diagnosis, able to distinguish and quantify Leishmania parasites in a clinical sample, constitutes a key step in choosing an appropriate therapy, making an accurate prognosis and performing epidemiological studies. Several molecular techniques have been shown to be effective in the diagnosis of leishmaniasis. In particular, a number of PCR methods have been developed on various target DNA sequences including kinetoplast minicircle constant regions. The first aim of this study was to develop a SYBR green-based qPCR assay for Leishmania (Leishmania) infantum detection and quantification, using kinetoplast minicircle constant region as target. To this end, two assays were compared: the first used previously published primer pairs (qPCR1), whereas the second used a nested primer pairs generating a shorter PCR product (qPCR2). The second aim of this study was to evaluate the possibility to discriminate among subgenera Leishmania (Leishmania) and Leishmania (Viannia) using the qPCR2 assay followed by melting or High Resolution Melt (HRM) analysis. Both assays used in this study showed good sensitivity and specificity, and a good correlation with standard IFAT methods in 62 canine clinical samples. However, the qPCR2 assay allowed to discriminate between Leishmania (Leishmania) and Leishmania (Viannia) subgenera through melting or HRM analysis. In addition to developing assays, we investigated the number and genetic variability of kinetoplast minicircles in the Leishmania (L.) infantum WHO international reference strain (MHOM/TN/80/IPT1), highlighting the presence of minicircle subclasses and sequence heterogeneity. Specifically, the kinetoplast minicircle number per cell was estimated to be 26,566±1,192, while the subclass of minicircles amplifiable by qPCR2 was estimated to be 1,263±115. This heterogeneity, also observed in canine clinical samples
The Effect of Angle Restriction on the Topological Characteristics of Minicircle Networks

NASA Astrophysics Data System (ADS)

Arsuaga, J.; Diao, Y.; Hinson, K.

2012-01-01

Networks of topologically linked minicircle polymers are found in diverse natural systems and are a subject of intense research in nanotechonology. In a recent report the authors introduced a new theoretical model to study the effects of polymer density on the formation and on the topological properties of minicircle networks. Three key topological characteristics were identified in the formation and characterization of a network: the critical percolation density, the average saturation density and the mean valence of the network. In this work we report how these characteristics change when an orientation bias is imposed on the minicircles forming the network. We observe that such restrictions have significant effects on the key topological characteristics of the network. In particular while the effects of restriction of the tilting angle can be predicted we find that those of the azimuthal angle can have somewhat unexpected results.
Evaluation of nifurtimox treatment of chronic Chagas disease by means of several parasitological methods.

PubMed

Muñoz, Catalina; Zulantay, Inés; Apt, Werner; Ortiz, Sylvia; Schijman, Alejandro G; Bisio, Margarita; Ferrada, Valentina; Herrera, Cinthya; Martínez, Gabriela; Solari, Aldo

2013-09-01

Currently, evaluation of drug efficacy for Chagas disease remains a controversial issue with no consensus. In this work, we evaluated the parasitological efficacy of Nifurtimox treatment in 21 women with chronic Chagas disease from an area of endemicity in Chile who were treated according to current protocols. Under pre- and posttherapy conditions, blood (B) samples and xenodiagnosis (XD) samples from these patients were subjected to analysis by real-time PCR targeting the nuclear satellite DNA of Trypanosoma cruzi (Sat DNA PCR-B, Sat DNA PCR-XD) and by PCR targeting the minicircle of kinetoplast DNA of T. cruzi (kDNA PCR-B, kDNA PCR-XD) and by T. cruzi genotyping using hybridization minicircle tests in blood and fecal samples of Triatoma infestans feed by XD. In pretherapy, kDNA PCR-B and kDNA PCR-XD detected T. cruzi in 12 (57%) and 18 (86%) cases, respectively, whereas Sat DNA quantitative PCR-B (qPCR-B) and Sat DNA qPCR-XD were positive in 18 cases (86%) each. Regarding T. cruzi genotype analysis, it was possible to observe in pretherapy the combination of TcI, TcII, and TcV lineages, including mixtures of T. cruzi strains in most of the cases. At 13 months posttherapy, T. cruzi DNA was detectable in 6 cases (29.6%) and 4 cases (19.1%) by means of Sat DNA PCR-XD and kDNA PCR-XD, respectively, indicating treatment failure with recovery of live parasites refractory to chemotherapy. In 3 cases, it was possible to identify persistence of the baseline genotypes. The remaining 15 baseline PCR-positive cases gave negative results by all molecular and parasitological methods at 13 months posttreatment, suggesting parasite response. Within this follow-up period, kDNA PCR-XD and Sat DNA qPCR-XD proved to be more sensitive tools for the parasitological evaluation of the efficacy of Nifurtimox treatment than the corresponding PCR methods performed directly from blood samples.
Evaluation of Nifurtimox Treatment of Chronic Chagas Disease by Means of Several Parasitological Methods

PubMed Central

Muñoz, Catalina; Zulantay, Inés; Apt, Werner; Ortiz, Sylvia; Schijman, Alejandro G.; Bisio, Margarita; Ferrada, Valentina; Herrera, Cinthya; Martínez, Gabriela

2013-01-01

Currently, evaluation of drug efficacy for Chagas disease remains a controversial issue with no consensus. In this work, we evaluated the parasitological efficacy of Nifurtimox treatment in 21 women with chronic Chagas disease from an area of endemicity in Chile who were treated according to current protocols. Under pre- and posttherapy conditions, blood (B) samples and xenodiagnosis (XD) samples from these patients were subjected to analysis by real-time PCR targeting the nuclear satellite DNA of Trypanosoma cruzi (Sat DNA PCR-B, Sat DNA PCR-XD) and by PCR targeting the minicircle of kinetoplast DNA of T. cruzi (kDNA PCR-B, kDNA PCR-XD) and by T. cruzi genotyping using hybridization minicircle tests in blood and fecal samples of Triatoma infestans feed by XD. In pretherapy, kDNA PCR-B and kDNA PCR-XD detected T. cruzi in 12 (57%) and 18 (86%) cases, respectively, whereas Sat DNA quantitative PCR-B (qPCR-B) and Sat DNA qPCR-XD were positive in 18 cases (86%) each. Regarding T. cruzi genotype analysis, it was possible to observe in pretherapy the combination of TcI, TcII, and TcV lineages, including mixtures of T. cruzi strains in most of the cases. At 13 months posttherapy, T. cruzi DNA was detectable in 6 cases (29.6%) and 4 cases (19.1%) by means of Sat DNA PCR-XD and kDNA PCR-XD, respectively, indicating treatment failure with recovery of live parasites refractory to chemotherapy. In 3 cases, it was possible to identify persistence of the baseline genotypes. The remaining 15 baseline PCR-positive cases gave negative results by all molecular and parasitological methods at 13 months posttreatment, suggesting parasite response. Within this follow-up period, kDNA PCR-XD and Sat DNA qPCR-XD proved to be more sensitive tools for the parasitological evaluation of the efficacy of Nifurtimox treatment than the corresponding PCR methods performed directly from blood samples. PMID:23836179
Analyzing DNA curvature and its impact on the ionic environment: application to molecular dynamics simulations of minicircles

PubMed Central

Pasi, Marco; Zakrzewska, Krystyna; Maddocks, John H.

2017-01-01

Abstract We propose a method for analyzing the magnitude and direction of curvature within nucleic acids, based on the curvilinear helical axis calculated by Curves+. The method is applied to analyzing curvature within minicircles constructed with varying degrees of over- or under-twisting. Using the molecular dynamics trajectories of three different minicircles, we are able to quantify how curvature varies locally both in space and in time. We also analyze how curvature influences the local environment of the minicircles, notably via increased heterogeneity in the ionic distributions surrounding the double helix. The approach we propose has been integrated into Curves+ and the utilities Canal (time trajectory analysis) and Canion (environmental analysis) and can be used to study a wide variety of static and dynamic structural data on nucleic acids. PMID:28180333
Tightly-wound miniknot vectors for gene therapy: a potential improvement over supercoiled minicircle DNA.

PubMed

Tolmachov, Oleg E

2010-04-01

Minimized derivatives of bacterial plasmids with removed bacterial backbones are promising vectors for the efficient delivery and for the long-term expression of therapeutic genes. The absence of the bacterial plasmid backbone, a known inducer of innate immune response and a known silencer of transgene expression, provides a partial explanation for the high efficiency of gene transfer using minimized DNA vectors. Supercoiled minicircle DNA is a type of minimized DNA vector obtained via intra-plasmid recombination in bacteria. Minicircle vectors seem to get an additional advantage from their physical compactness, which reduces DNA damage due to the mechanical stress during gene delivery. An independent topological means for DNA compression is knotting, with some knotted DNA isoforms offering superior compactness. I propose that, firstly, knotted DNA can be a suitable compact DNA form for the efficient transfection of a range of human cells with therapeutic genes, and, secondly, that knotted minimized DNA vectors without bacterial backbones ("miniknot" vectors) can surpass supercoiled minicircle DNA vectors in the efficiency of therapeutic gene delivery. Crucially, while the introduction of a single nick to a supercoiled DNA molecule leads to the loss of the compact supercoiled status, the introduction of nicks to knotted DNA does not change knotting. Tight miniknot vectors can be readily produced by the direct action of highly concentrated type II DNA topoisomerase on minicircle DNA or, alternatively, by annealing of the 19-base cohesive ends of the minimized vectors confined within the capsids of Escherichia coli bacteriophage P2 or its satellite bacteriophage P4. After reaching the nucleoplasm of the target cell, the knotted DNA is expected to be unknotted through type II topoisomerase activity and thus to become available for transcription, chromosomal integration or episomal maintenance. The hypothesis can be tested by comparing the gene transfer efficiency
Amplification of a specific repetitive DNA sequence for Trypanosoma rangeli identification and its potential application in epidemiological investigations.

PubMed

Vargas, N; Souto, R P; Carranza, J C; Vallejo, G A; Zingales, B

2000-11-01

Trypanosoma rangeli can infect humans as well as the same domestic and wild animals and triatomine vectors infected by Trypanosoma cruzi in Central and South America. This overlapping distribution complicates the epidemiology of American trypanosomiasis due to the cross-reactivity between T. rangeli and T. cruzi antigens and the presence of conserved DNA sequences in these parasites. We have isolated a T. rangeli-specific DNA repetitive element which is represented in approximately 103 copies per parasite genome and is distributed in several chromosomal bands. The 542-bp nucleotide sequence of this element, named P542, was determined and a PCR assay was standardized for its amplification. The sensitivity of the assay is high, allowing the detection of one tenth of the DNA content of a single parasite. The presence of the P542 element was confirmed in 11 T. rangeli isolates from mammalian hosts and insect vectors originating from several countries in Latin America. Negative amplification was observed with different T. cruzi strains and other trypanosomatids. The potential field application of the P542 PCR assay was investigated in simulated samples containing T. rangeli and/or T. cruzi and intestinal tract and feces of Rhodnius prolixus. Epidemiological studies were conducted in DNA preparations obtained from the digestive tracts of 12 Rhodnius colombiensis insects collected in a sylvatic area in Colombia. Positive amplification of the P542 element was obtained in 9/12 insects. We have also compared in the same samples the diagnostic performance of two PCR assays for the amplification of the variable domain of minicircle kinetoplast DNA (kDNA) and of the large subunit (LSU) of the ribosomal RNA gene of T. cruzi and T. rangeli. Data indicate that the kDNA PCR assay does not allow diagnosis of mixed infections in most insects. On the other hand, the PCR assay of the LSU RNA gene showed lower sensitivity in the detection of T. rangeli than the PCR assay of the P542
Detection of Leishmania in Unaffected Mucosal Tissues of Patients with Cutaneous Leishmaniasis Caused by Leishmania (Viannia) Species

PubMed Central

Figueroa, Roger Adrian; Lozano, Leyder Elena; Romero, Ibeth Cristina; Cardona, Maria Teresa; Prager, Martin; Pacheco, Robinson; Diaz, Yira Rosalba; Tellez, Jair Alexander; Saravia, Nancy Gore

2016-01-01

Background Leishmania (Viannia) species are the principal cause of mucosal leishmaniasis. The natural history and pathogenesis of mucosal disease are enigmatic. Parasitological evaluation of mucosal tissues has been constrained by the invasiveness of conventional sampling methods. Methods We evaluated the presence ofLeishmania in the mucosa of 26 patients with cutaneous leishmaniasis and 2 patients with mucocutaneous leishmaniasis. Swab samples of the nasal mucosa, tonsils, and conjunctiva were analyzed using polymerase chain reaction with LV-B1 primers and Southern blot hybridization. Results Two patients with mucocutaneous leishmaniasis and 21 (81%) of 26 patients with cutaneous leishmaniasis had Leishmania kinetoplast minicircle DNA (kDNA) in mucosal tissues. kDNA was amplified from swab samples of nasal mucosa from 14 (58%) of 24 patients, tonsils from 13 (46%) of 28 patients, and conjunctiva from 6 (25%) of 24 patients. kDNA was detected in the mucosa of patients with cutaneous disease caused by Leishmania panamensis, Leishmania guyanensis, and Leishmania braziliensis. Conclusion The asymptomatic presence of parasites in mucosal tissues may be common in patients with Leishmania (Viannia) infection. PMID:19569974
Trypanosoma rangeli: RAPD-PCR and LSSP-PCR analyses of isolates from southeast Brazil and Colombia and their relation with KPI minicircles.

PubMed

Marquez, D S; Ramírez, L E; Moreno, J; Pedrosa, A L; Lages-Silva, E

2007-09-01

This study presents the first genetic characterization of five Trypanosoma rangeli isolates from Minas Gerais, in the southeast of Brazil and their comparison with Colombian populations by minicircle classification, RAPD-PCR and LSSP-PCR analyses. Our results demonstrated a homogenous T. rangeli population circulating among Didelphis albiventris as reservoir host in Brazil while heterogeneous populations were found in different regions of Colombia. KP1(+) minicircles were found in 100% isolates from Brazil and in 36.4% of the Colombian samples, whereas the KP2 and KP3 minicircles were detected in both groups. RAPD-PCR and LSSP-PCR profiles revealed a polymorphism within KP1(+) and KP1(-) T. rangeli populations and allowed the division of T. rangeli in two branches. The Brazilian KP1(+) isolates were more homogenous than the KP1(+) isolates from Colombia. The RAPD-PCR were entirely consistent with the distribution of KP1 minicircles while those obtained by LSSP-PCR were associated in 88.9% and 71.4% with KP1(+) and KP1(-) populations, respectively.
Twist-writhe partitioning in a coarse-grained DNA minicircle model

NASA Astrophysics Data System (ADS)

Sayar, Mehmet; Avşaroǧlu, Barış; Kabakçıoǧlu, Alkan

2010-04-01

Here we present a systematic study of supercoil formation in DNA minicircles under varying linking number by using molecular-dynamics simulations of a two-bead coarse-grained model. Our model is designed with the purpose of simulating long chains without sacrificing the characteristic structural properties of the DNA molecule, such as its helicity, backbone directionality, and the presence of major and minor grooves. The model parameters are extracted directly from full-atomistic simulations of DNA oligomers via Boltzmann inversion; therefore, our results can be interpreted as an extrapolation of those simulations to presently inaccessible chain lengths and simulation times. Using this model, we measure the twist/writhe partitioning in DNA minicircles, in particular its dependence on the chain length and excess linking number. We observe an asymmetric supercoiling transition consistent with experiments. Our results suggest that the fraction of the linking number absorbed as twist and writhe is nontrivially dependent on chain length and excess linking number. Beyond the supercoiling transition, chains of the order of one persistence length carry equal amounts of twist and writhe. For longer chains, an increasing fraction of the linking number is absorbed by the writhe.
Sequences with high propensity to form G-quartet structures in kinetoplast DNA from Phytomonas serpens.

PubMed

Sá-Carvalho, D; Traub-Cseko, Y M

1995-06-01

Naturally occurring sequences containing repetitive guanine motifs have the potential to form tetraplex DNA. Phytomonas serpens minicircle DNA shows some regions where one strand is composed mainly of G and T (GT regions). These regions contain several stretches of contiguous guanines. An oligonucleotide was constructed with the sequence corresponding to one of these regions (Phyto-GT). It was demonstrated by native gel electrophoresis and methylation protection that Phyto-GT forms tetramolecular (G4), bimolecular (G'2) and unimolecular (G4') structures stabilized through G-quartets. Tetraplex DNA formation by this sequence could have biological relevance as it can be formed in physiological conditions and GT regions comprise approximately one-third of P. serpens and Crithidia oncopelti minicircles.
Implications of the dependence of the elastic properties of DNA on nucleotide sequence.

PubMed

Olson, Wilma K; Swigon, David; Coleman, Bernard D

2004-07-15

Recent advances in structural biochemistry have provided evidence that not only the geometric properties but also the elastic moduli of duplex DNA are strongly dependent on nucleotide sequence in a way that is not accounted for by classical rod models of the Kirchhoff type. A theory of sequence-dependent DNA elasticity is employed here to calculate the dependence of the equilibrium configurations of circular DNA on the binding of ligands that can induce changes in intrinsic twist at a single base-pair step. Calculations are presented of the influence on configurations of the assumed values and distribution along the DNA of intrinsic roll and twist and a modulus coupling roll to twist. Among the results obtained are the following. For minicircles formed from intrinsically straight DNA, the distribution of roll-twist coupling strongly affects the dependence of the total elastic energy Psi on the amount alpha of imposed untwisting, and that dependence can be far from quadratic. (In fact, for a periodic distribution of roll-twist coupling with a period equal to the intrinsic helical repeat length, Psi can be essentially independent of alpha for -90 degrees < alpha <90 degrees.) When the minicircle is homogeneous and without roll-twist coupling, but with uniform positive intrinsic roll, the point at which Psi attains its minimum value shifts towards negative values of alpha. It is remarked that there are cases in which one can relate graphs of Psi versus alpha to the 'effective values' of bending and twisting moduli and helical repeat length obtained from measurements of equilibrium distributions of topoisomers and probabilities of ring closure. For a minicircle formed from DNA that has an 'S' shape when stress-free, the graphs of Psi versus alpha have maxima at alpha = 0. As the binding of a twisting agent to such a minicircle results in a net decrease in Psi, the affinity of the twisting agent for binding to the minicircle is greater than its affinity for binding to

Efficient Sleeping Beauty DNA Transposition From DNA Minicircles

PubMed Central

Sharma, Nynne; Cai, Yujia; Bak, Rasmus O; Jakobsen, Martin R; Schrøder, Lisbeth Dahl; Mikkelsen, Jacob Giehm

2013-01-01

DNA transposon-based vectors have emerged as new potential delivery tools in therapeutic gene transfer. Such vectors are now showing promise in hematopoietic stem cells and primary human T cells, and clinical trials with transposon-engineered cells are on the way. However, the use of plasmid DNA as a carrier of the vector raises safety concerns due to the undesirable administration of bacterial sequences. To optimize vectors based on the Sleeping Beauty (SB) DNA transposon for clinical use, we examine here SB transposition from DNA minicircles (MCs) devoid of the bacterial plasmid backbone. Potent DNA transposition, directed by the hyperactive SB100X transposase, is demonstrated from MC donors, and the stable transfection rate is significantly enhanced by expressing the SB100X transposase from MCs. The stable transfection rate is inversely related to the size of circular donor, suggesting that a MC-based SB transposition system benefits primarily from an increased cellular uptake and/or enhanced expression which can be observed with DNA MCs. DNA transposon and transposase MCs are easily produced, are favorable in size, do not carry irrelevant DNA, and are robust substrates for DNA transposition. In accordance, DNA MCs should become a standard source of DNA transposons not only in therapeutic settings but also in the daily use of the SB system. PMID:23443502
Second generation codon optimized minicircle (CoMiC) for nonviral reprogramming of human adult fibroblasts.

PubMed

Diecke, Sebastian; Lisowski, Leszek; Kooreman, Nigel G; Wu, Joseph C

2014-01-01

The ability to induce pluripotency in somatic cells is one of the most important scientific achievements in the fields of stem cell research and regenerative medicine. This technique allows researchers to obtain pluripotent stem cells without the controversial use of embryos, providing a novel and powerful tool for disease modeling and drug screening approaches. However, using viruses for the delivery of reprogramming genes and transcription factors may result in integration into the host genome and cause random mutations within the target cell, thus limiting the use of these cells for downstream applications. To overcome this limitation, various non-integrating techniques, including Sendai virus, mRNA, minicircle, and plasmid-based methods, have recently been developed. Utilizing a newly developed codon optimized 4-in-1 minicircle (CoMiC), we were able to reprogram human adult fibroblasts using chemically defined media and without the need for feeder cells.
DNA minicircles clarify the specific role of DNA structure on retroviral integration

PubMed Central

Pasi, Marco; Mornico, Damien; Volant, Stevenn; Juchet, Anna; Batisse, Julien; Bouchier, Christiane; Parissi, Vincent; Ruff, Marc; Lavery, Richard; Lavigne, Marc

2016-01-01

Chromatin regulates the selectivity of retroviral integration into the genome of infected cells. At the nucleosome level, both histones and DNA structure are involved in this regulation. We propose a strategy that allows to specifically study a single factor: the DNA distortion induced by the nucleosome. This strategy relies on mimicking this distortion using DNA minicircles (MCs) having a fixed rotational orientation of DNA curvature, coupled with atomic-resolution modeling. Contrasting MCs with linear DNA fragments having identical sequences enabled us to analyze the impact of DNA distortion on the efficiency and selectivity of integration. We observed a global enhancement of HIV-1 integration in MCs and an enrichment of integration sites in the outward-facing DNA major grooves. Both of these changes are favored by LEDGF/p75, revealing a new, histone-independent role of this integration cofactor. PFV integration is also enhanced in MCs, but is not associated with a periodic redistribution of integration sites, thus highlighting its distinct catalytic properties. MCs help to separate the roles of target DNA structure, histone modifications and integrase (IN) cofactors during retroviral integration and to reveal IN-specific regulation mechanisms. PMID:27439712
Comparison of four PCR methods for efficient detection of Trypanosoma cruzi in routine diagnostics.

PubMed

Seiringer, Peter; Pritsch, Michael; Flores-Chavez, María; Marchisio, Edoardo; Helfrich, Kerstin; Mengele, Carolin; Hohnerlein, Stefan; Bretzel, Gisela; Löscher, Thomas; Hoelscher, Michael; Berens-Riha, Nicole

2017-07-01

Due to increased migration, Chagas disease has become an international health problem. Reliable diagnosis of chronically infected people is crucial for prevention of non-vectorial transmission as well as treatment. This study compared four distinct PCR methods for detection of Trypanosoma cruzi DNA for the use in well-equipped routine diagnostic laboratories. DNA was extracted of T. cruzi-positive and negative patients' blood samples and cultured T. cruzi, T. rangeli as well as Leishmania spp. One conventional and two real-time PCR methods targeting a repetitive Sat-DNA sequence as well as one conventional PCR method targeting the variable region of the kDNA minicircle were compared for sensitivity, intra- and interassay precision, limit of detection, specificity and cross-reactivity. Considering the performance, costs and ease of use, an algorithm for PCR-diagnosis of patients with a positive serology for T. cruzi antibodies was developed. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
Comparative analysis by polymerase chain reaction amplified minicircles of kinetoplast DNA of a stable strain of Trypanosoma cruzi from São Felipe, Bahia, its clones and subclones: possibility of predominance of a principal clone in this area.

PubMed

Campos, R F; Gonçalves, M S; dos Reis, E A; dos Reis, M G; Andrade, S G

1999-01-01

Molecular characterization of one stable strain of Trypanosoma cruzi, the 21 SF, representative of the pattern of strains isolated from the endemic area of São Felipe, State of Bahia, Brazil, maintained for 15 years in laboratory by serial passages in mice and classified as biodeme Type II and zymodeme 2 has been investigated. The kinetoplast DNA (kDNA) of parental strain, 5 clones and 14 subclones were analyzed. Schizodeme was established by comparative study of the fragments obtained from digestion of the 330-bp fragments amplified by polymerase chain reaction (PCR) from the variable regions of the minicircles, and digested by restriction endonucleases Rsa I and Hinf I. Our results show a high percentual of similarity between the restriction fragment length polymorphism (RFLP) for the parental strain and its clones and among these individual clones and their subclones at a level of 80 to 100%. This homology indicates a predominance of the same "principal clone" in the 21SF strain and confirms the homogeneity previously observed at biological and isozymic analysis. These results suggest the possibility that the T. cruzi strains with similar biological and isoenzymic patterns, circulating in this endemic area, are representative of one dominant clone. The presence of "principal clones" could be responsible for a predominant tropism of the parasites for specific organs and tissues and this could contribute to the pattern of clinico-pathological manifestations of Chagas's disease in one geographical area.
Characterization of kinetoplast DNA from Phytomonas serpens.

PubMed

Sá-Carvalho, D; Perez-Morga, D; Traub-Cseko, Y M

1993-01-01

The restriction enzyme digestion of kinetoplast DNA from four Phytomonas serpens isolates shows an overall similar band pattern. One minicircle from isolate 30T was cloned and sequenced, showing low levels of homology but the same general features and organization as described for minicircles of other trypanosomatids. Extensive regions of the minicircle are composed by G and T on the H strand. These regions are very repetitive and similar to regions in a minicircle of Crithidia oncopelti and to telomeric sequences of Saccharomyces cerevisiae. Conserved Sequence Block 3, present in all trypanosomatids, is one nucleotide different from the consensus in P. serpens and provides a basis to differentiate P. serpens from other trypanosomatids. Electron microscopy of kinetoplast DNA evidenced a network with organization similar to other trypanosomatids and the measurement of minicircles confirmed the size of about 1.45 kb of the sequenced minicircle.
Lineage Analysis of Circulating Trypanosoma cruzi Parasites and Their Association with Clinical Forms of Chagas Disease in Bolivia

PubMed Central

del Puerto, Ramona; Nishizawa, Juan Eiki; Kikuchi, Mihoko; Iihoshi, Naomi; Roca, Yelin; Avilas, Cinthia; Gianella, Alberto; Lora, Javier; Gutierrez Velarde, Freddy Udalrico; Renjel, Luis Alberto; Miura, Sachio; Higo, Hiroo; Komiya, Norihiro; Maemura, Koji; Hirayama, Kenji

2010-01-01

Background The causative agent of Chagas disease, Trypanosoma cruzi, is divided into 6 Discrete Typing Units (DTU): Tc I, IIa, IIb, IIc, IId and IIe. In order to assess the relative pathogenicities of different DTUs, blood samples from three different clinical groups of chronic Chagas disease patients (indeterminate, cardiac, megacolon) from Bolivia were analyzed for their circulating parasites lineages using minicircle kinetoplast DNA polymorphism. Methods and Findings Between 2000 and 2007, patients sent to the Centro Nacional de Enfermedades Tropicales for diagnosis of Chagas from clinics and hospitals in Santa Cruz, Bolivia, were assessed by serology, cardiology and gastro-intestinal examinations. Additionally, patients who underwent colonectomies due to Chagasic magacolon at the Hospital Universitario Japonés were also included. A total of 306 chronic Chagas patients were defined by their clinical types (81 with cardiopathy, 150 without cardiopathy, 100 with megacolon, 144 without megacolon, 164 with cardiopathy or megacolon, 73 indeterminate and 17 cases with both cardiopathy and megacolon). DNA was extracted from 10 ml of peripheral venous blood for PCR analysis. The kinetoplast minicircle DNA (kDNA) was amplified from 196 out of 306 samples (64.1%), of which 104 (53.3%) were Tc IId, 4 (2.0%) Tc I, 7 (3.6%) Tc IIb, 1 (0.5%) Tc IIe, 26 (13.3%) Tc I/IId, 1 (0.5%) Tc I/IIb/IId, 2 (1.0%) Tc IIb/d and 51 (25.9%) were unidentified. Of the 133 Tc IId samples, three different kDNA hypervariable region patterns were detected; Mn (49.6%), TPK like (48.9%) and Bug-like (1.5%). There was no significant association between Tc types and clinical manifestations of disease. Conclusions None of the identified lineages or sublineages was significantly associated with any particular clinical manifestations in the chronic Chagas patients in Bolivia. PMID:20502516
U-insertion/deletion RNA editing multiprotein complexes and mitochondrial ribosomes in Leishmania tarentolae are located in antipodal nodes adjacent to the kinetoplast DNA.

PubMed

Wong, Richard G; Kazane, Katelynn; Maslov, Dmitri A; Rogers, Kestrel; Aphasizhev, Ruslan; Simpson, Larry

2015-11-01

We studied the intramitochondrial localization of several multiprotein complexes involved in U-insertion/deletion RNA editing in trypanosome mitochondria. The editing complexes are located in one or two antipodal nodes adjacent to the kinetoplast DNA (kDNA) disk, which are distinct from but associated with the minicircle catenation nodes. In some cases the proteins are in a bilateral sheet configuration. We also found that mitoribosomes have a nodal configuration. This type of organization is consistent with evidence for protein and RNA interactions of multiple editing complexes to form an ~40S editosome and also an interaction of editosomes with mitochondrial ribosomes. Copyright © 2015 Elsevier B.V. and Mitochondria Research Society. All rights reserved.
Part I: Minicircle vector technology limits DNA size restrictions on ex vivo gene delivery using nanoparticle vectors: Overcoming a translational barrier in neural stem cell therapy.

PubMed

Fernandes, Alinda R; Chari, Divya M

2016-09-28

Genetically engineered neural stem cell (NSC) transplant populations offer key benefits in regenerative neurology, for release of therapeutic biomolecules in ex vivo gene therapy. NSCs are 'hard-to-transfect' but amenable to 'magnetofection'. Despite the high clinical potential of this approach, the low and transient transfection associated with the large size of therapeutic DNA constructs is a critical barrier to translation. We demonstrate for the first time that DNA minicircles (small DNA vectors encoding essential gene expression components but devoid of a bacterial backbone, thereby reducing construct size versus conventional plasmids) deployed with magnetofection achieve the highest, safe non-viral DNA transfection levels (up to 54%) reported so far for primary NSCs. Minicircle-functionalized magnetic nanoparticle (MNP)-mediated gene delivery also resulted in sustained gene expression for up to four weeks. All daughter cell types of engineered NSCs (neurons, astrocytes and oligodendrocytes) were transfected (in contrast to conventional plasmids which usually yield transfected astrocytes only), offering advantages for targeted cell engineering. In addition to enhancing MNP functionality as gene delivery vectors, minicircle technology provides key benefits from safety/scale up perspectives. Therefore, we consider the proof-of-concept of fusion of technologies used here offers high potential as a clinically translatable genetic modification strategy for cell therapy. Copyright © 2016 Elsevier B.V. All rights reserved.
Double Knockdown of Prolyyl Hydroxylase and Factor Inhibiting HIF with Non-Viral Minicircle Gene Therapy Enhances Stem Cell Mobilization and Angiogenesis After Myocardial Infarction

PubMed Central

Huang, Mei; Nguyen, Patricia; Jia, Fangjun; Hu, Shijun; Gong, Yongquan; de Almeida, Patricia E.; Wang, Li; Nag, Divya; Kay, Mark A.; Giaccia, Amato J; Robbins, Robert C.; Wu, Joseph C.

2011-01-01

Background Under normoxic conditions, hypoxia inducible factor-1 alpha (HIF-1α) is rapidly degraded by two hydroxylases, prolyl hydroxylase (PHD) and factor inhibiting HIF-1 (FIH). Because HIF-1α mediates the cardioprotective response to ischemic injury, its up-regulation may be an effective therapeutic option for ischemic heart failure. Methods and Results PHD and FIH were cloned from mouse embryonic stem cells. The best candidate short hairpin sequences for inhibiting PHD isoenzyme 2 (shPHD2) and FIH (shFIH) were inserted into novel non-viral minicircle vectors. In vitro studies after cell transfection of mouse C2C12 myoblasts, HL-1 atrial myocytes, and c-kit+ cardiac progenitor cells (CPCs) demonstrated higher expression of angiogenesis factors in the double knockdown group compared to the single knockdown and shScramble control groups. To confirm in vitro data, shRNA minicircle vectors were injected intramyocardially following LAD ligation in adult FVB mice (n=60). Functional studies using magnetic resonance imaging (MRI), echocardiography, and pressure-volume (PV) loops showed greater improvement in cardiac function in the double knockdown group. To assess mechanism(s) of this functional recovery, we performed a cell trafficking experiment, which demonstrated significantly greater recruitment of bone marrow cells to the ischemic myocardium in the double knockdown group. Fluorescence activated cell sorting (FACS) showed significantly higher activation of endogenous c-kit+ cardiac progenitor cells. Immunostaining showed increased neovascularization and decreased apoptosis in areas of injured myocardium. Finally, western blots and laser capture microdissection (LCM) analysis confirmed up-regulation of HIF-1α protein and angiogenesis genes, respectively. Conclusions We demonstrated that HIF-1α up-regulation by double knockdown of PHD and FIH synergistically increases stem cell mobilization and myocardial angiogenesis, leading to improved cardiac function. PMID
Modulation of cyclobutane thymine photodimer formation in T11-tracts in rotationally phased nucleosome core particles and DNA minicircles.

PubMed

Wang, Kesai; Taylor, John-Stephen A

2017-07-07

Cyclobutane pyrimidine dimers (CPDs) are DNA photoproducts linked to skin cancer, whose mutagenicity depends in part on their frequency of formation and deamination. Nucleosomes modulate CPD formation, favoring outside facing sites and disfavoring inward facing sites. A similar pattern of CPD formation in protein-free DNA loops suggests that DNA bending causes the modulation in nucleosomes. To systematically study the cause and effect of nucleosome structure on CPD formation and deamination, we have developed a circular permutation synthesis strategy for positioning a target sequence at different superhelix locations (SHLs) across a nucleosome in which the DNA has been rotationally phased with respect to the histone octamer by TG motifs. We have used this system to show that the nucleosome dramatically modulates CPD formation in a T11-tract that covers one full turn of the nucleosome helix at seven different SHLs, and that the position of maximum CPD formation at all locations is shifted to the 5΄-side of that found in mixed-sequence nucleosomes. We also show that an 80-mer minicircle DNA using the same TG-motifs faithfully reproduces the CPD pattern in the nucleosome, indicating that it is a good model for protein-free rotationally phased bent DNA of the same curvature as in a nucleosome, and that bending is modulating CPD formation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Self-assembled magnetic theranostic nanoparticles for highly sensitive MRI of minicircle DNA delivery.

PubMed

Wan, Qian; Xie, Lisi; Gao, Lin; Wang, Zhiyong; Nan, Xiang; Lei, Hulong; Long, Xiaojing; Chen, Zhi-Ying; He, Cheng-Yi; Liu, Gang; Liu, Xin; Qiu, Bensheng

2013-01-21

As a versatile gene vector, minicircle DNA (mcDNA) has a great potential for gene therapy. However, some serious challenges remain, such as to effectively deliver mcDNA into targeted cells/tissues and to non-invasively monitor the delivery of the mcDNA. Superparamagnetic iron oxide (SPIO) nanoparticles have been extensively used for both drug/gene delivery and diagnosis. In this study, an MRI visible gene delivery system was developed with a core of SPIO nanocrystals and a shell of biodegradable stearic acid-modified low molecular weight polyethyleneimine (Stearic-LWPEI) via self-assembly. The Stearic-LWPEI-SPIO nanoparticles possess a controlled clustering structure, narrow size distribution and ultrasensitive imaging capacity. Furthermore, the nanoparticle can effectively bind with mcDNA and protect it from enzymatic degradation. In conclusion, the nanoparticle shows synergistic advantages in the effective transfection of mcDNA and non-invasive MRI of gene delivery.
Part II: Functional delivery of a neurotherapeutic gene to neural stem cells using minicircle DNA and nanoparticles: Translational advantages for regenerative neurology.

PubMed

Fernandes, Alinda R; Chari, Divya M

2016-09-28

Both neurotrophin-based therapy and neural stem cell (NSC)-based strategies have progressed to clinical trials for treatment of neurological diseases and injuries. Brain-derived neurotrophic factor (BDNF) in particular can confer neuroprotective and neuro-regenerative effects in preclinical studies, complementing the cell replacement benefits of NSCs. Therefore, combining both approaches by genetically-engineering NSCs to express BDNF is an attractive approach to achieve combinatorial therapy for complex neural injuries. Current genetic engineering approaches almost exclusively employ viral vectors for gene delivery to NSCs though safety and scalability pose major concerns for clinical translation and applicability. Magnetofection, a non-viral gene transfer approach deploying magnetic nanoparticles and DNA with magnetic fields offers a safe alternative but significant improvements are required to enhance its clinical application for delivery of large sized therapeutic plasmids. Here, we demonstrate for the first time the feasibility of using minicircles with magnetofection technology to safely engineer NSCs to overexpress BDNF. Primary mouse NSCs overexpressing BDNF generated increased daughter neuronal cell numbers post-differentiation, with accelerated maturation over a four-week period. Based on our findings we highlight the clinical potential of minicircle/magnetofection technology for therapeutic delivery of key neurotrophic agents. Copyright © 2016 Elsevier B.V. All rights reserved.
Presence of Trypanosoma cruzi in pregnant women and typing of lineages in congenital cases.

PubMed

Ortiz, Sylvia; Zulantay, Inés; Solari, Aldo; Bisio, Margarita; Schijman, Alejandro; Carlier, Yves; Apt, Werner

2012-12-01

The objective of this study was to determine the presence of Trypanosoma cruzi in blood samples of mothers with chronic Chagas disease and their newborn by conventional PCR targeted to minicircle kinetoplastidic DNA (kDNA), and to determine the lineages in mother/newborn pairs of the congenital cases by hybridization assays with probes belonging to the TcII, TcI and TcV Discrete Typing Units (DTU). In 63 (57.2%) of the mothers the presence of circulating T. cruzi was demonstrated by PCR immediately before delivery and in three newborn (3%) congenital transmission was confirmed by serial PCR and conventional serology between 1 and 16 months of life, at which point treatment was started. The hybridization signals showed that two of the newborn had the same DTU as their mother (TcI, TcII and TcV), whilst in the third congenital case only TcV was detected in the cord blood, suggesting that in this infant TcI and TcII did not cross the placenta or the parasite was not present at a detectable level. Levels T. cruzi DNA was determined by TaqMan Probe based Real Time PCR assay targeted to nuclear satellite sequences in these three pairs of samples. Copyright © 2012 Elsevier B.V. All rights reserved.
Screening and Characterization of RAPD Markers in Viscerotropic Leishmania Parasites

PubMed Central

Mkada–Driss, Imen; Talbi, Chiraz; Guerbouj, Souheila; Driss, Mehdi; Elamine, Elwaleed M.; Cupolillo, Elisa; Mukhtar, Moawia M.; Guizani, Ikram

2014-01-01

Visceral leishmaniasis (VL) is mainly due to the Leishmania donovani complex. VL is endemic in many countries worldwide including East Africa and the Mediterranean region where the epidemiology is complex. Taxonomy of these pathogens is under controversy but there is a correlation between their genetic diversity and geographical origin. With steady increase in genome knowledge, RAPD is still a useful approach to identify and characterize novel DNA markers. Our aim was to identify and characterize polymorphic DNA markers in VL Leishmania parasites in diverse geographic regions using RAPD in order to constitute a pool of PCR targets having the potential to differentiate among the VL parasites. 100 different oligonucleotide decamers having arbitrary DNA sequences were screened for reproducible amplification and a selection of 28 was used to amplify DNA from 12 L. donovani, L. archibaldi and L. infantum strains having diverse origins. A total of 155 bands were amplified of which 60.65% appeared polymorphic. 7 out of 28 primers provided monomorphic patterns. Phenetic analysis allowed clustering the parasites according to their geographical origin. Differentially amplified bands were selected, among them 22 RAPD products were successfully cloned and sequenced. Bioinformatic analysis allowed mapping of the markers and sequences and priming sites analysis. This study was complemented with Southern-blot to confirm assignment of markers to the kDNA. The bioinformatic analysis identified 16 nuclear and 3 minicircle markers. Analysis of these markers highlighted polymorphisms at RAPD priming sites with mainly 5′ end transversions, and presence of inter– and intra– taxonomic complex sequence and microsatellites variations; a bias in transitions over transversions and indels between the different sequences compared is observed, which is however less marked between L. infantum and L. donovani. The study delivers a pool of well-documented polymorphic DNA markers, to develop
New primers for the detection Leishmania species by multiplex polymerase chain reaction.

PubMed

Conter, Carolina Cella; Lonardoni, Maria Valdrinez Campana; Aristides, Sandra Mara Alessi; Cardoso, Rosilene Fressatti; Silveira, Thaís Gomes Verzignassi

2018-02-01

Leishmaniasis is caused by protozoa of the Leishmania genus, which is divided into subgenus Viannia and Leishmania. In humans, the course of infection largely depends on the host-parasite relationship and primarily of the infective species. The objective of the present study was to design specific primers to the identification of Leishmania species using multiplex PCR. Four primers were designed, based on the GenBank sequences of the kDNA minicircle, amplifying 127 bp for subgenus Viannia, 100 bp for L. amazonensis, and 60 bp for Leishmania donovani complex and L. major. None of the primers amplified Trypanosoma cruzi or L. mexicana. The limit of detection of multiplex PCR was 2 × 10 -5 parasites for L. braziliensis, 2 x 10 -3 parasites for L. amazonensis, and 1.4 × 10 -3 parasites for L. infantum. The high sensitivity of multiplex PCR was confirmed by the detection of parasites in different biological samples, including lesion scrapings, spleen imprinting of a hamster, sandflies, and blood. The multiplex PCR that was developed herein presented good performance with regard to detecting and identifying the parasite in different biological samples and may thus be useful for diagnosis, decision making with regard to the proper therapeutic approach, and determining the geographic distribution of Leishmania species.
PERFORMANCE OF CONVENTIONAL PCRs BASED ON PRIMERS DIRECTED TO NUCLEAR AND MITOCHONDRIAL GENES FOR THE DETECTION AND IDENTIFICATION OF Leishmania spp.

PubMed Central

LOPES, Estela Gallucci; GERALDO, Carlos Alberto; MARCILI, Arlei; SILVA, Ricardo Duarte; KEID, Lara Borges; OLIVEIRA, Trícia Maria Ferreira da Silva; SOARES, Rodrigo Martins

2016-01-01

In visceral leishmaniasis, the detection of the agent is of paramount importance to identify reservoirs of infection. Here, we evaluated the diagnostic attributes of PCRs based on primers directed to cytochrome-B (cytB), cytochrome-oxidase-subunit II (coxII), cytochrome-C (cytC), and the minicircle-kDNA. Although PCRs directed to cytB, coxII, cytC were able to detect different species of Leishmania, and the nucleotide sequence of their amplicons allowed the unequivocal differentiation of species, the analytical and diagnostic sensitivity of these PCRs were much lower than the analytical and diagnostic sensitivity of the kDNA-PCR. Among the 73 seropositive animals, the asymptomatic dogs had spleen and bone marrow samples collected and tested; only two animals were positive by PCRs based on cytB, coxII, and cytC, whereas 18 were positive by the kDNA-PCR. Considering the kDNA-PCR results, six dogs had positive spleen and bone marrow samples, eight dogs had positive bone marrow results but negative results in spleen samples and, in four dogs, the reverse situation occurred. We concluded that PCRs based on cytB, coxII, and cytC can be useful tools to identify Leishmania species when used in combination with automated sequencing. The discordance between the results of the kDNA-PCR in bone marrow and spleen samples may indicate that conventional PCR lacks sensitivity for the detection of infected dogs. Thus, primers based on the kDNA should be preferred for the screening of infected dogs. PMID:27253743
Variola Type IB DNA Topoisomerase: DNA Binding and Supercoil Unwinding Using Engineered DNA Minicircles

PubMed Central

2015-01-01

Type IB topoisomerases unwind positive and negative DNA supercoils and play a key role in removing supercoils that would otherwise accumulate at replication and transcription forks. An interesting question is whether topoisomerase activity is regulated by the topological state of the DNA, thereby providing a mechanism for targeting the enzyme to highly supercoiled DNA domains in genomes. The type IB enzyme from variola virus (vTopo) has proven to be useful in addressing mechanistic questions about topoisomerase function because it forms a reversible 3′-phosphotyrosyl adduct with the DNA backbone at a specific target sequence (5′-CCCTT-3′) from which DNA unwinding can proceed. We have synthesized supercoiled DNA minicircles (MCs) containing a single vTopo target site that provides highly defined substrates for exploring the effects of supercoil density on DNA binding, strand cleavage and ligation, and unwinding. We observed no topological dependence for binding of vTopo to these supercoiled MC DNAs, indicating that affinity-based targeting to supercoiled DNA regions by vTopo is unlikely. Similarly, the cleavage and religation rates of the MCs were not topologically dependent, but topoisomers with low superhelical densities were found to unwind more slowly than highly supercoiled topoisomers, suggesting that reduced torque at low superhelical densities leads to an increased number of cycles of cleavage and ligation before a successful unwinding event. The K271E charge reversal mutant has an impaired interaction with the rotating DNA segment that leads to an increase in the number of supercoils that were unwound per cleavage event. This result provides evidence that interactions of the enzyme with the rotating DNA segment can restrict the number of supercoils that are unwound. We infer that both superhelical density and transient contacts between vTopo and the rotating DNA determine the efficiency of supercoil unwinding. Such determinants are likely to be
Successful isolation of Leishmania infantum from Rhipicephalus sanguineus sensu lato (Acari: Ixodidae) collected from naturally infected dogs.

PubMed

Medeiros-Silva, Viviane; Gurgel-Gonçalves, Rodrigo; Nitz, Nadjar; Morales, Lucia Emilia D' Anduraim; Cruz, Laurício Monteiro; Sobral, Isabele Gonçalves; Boité, Mariana Côrtes; Ferreira, Gabriel Eduardo Melim; Cupolillo, Elisa; Romero, Gustavo Adolfo Sierra

2015-10-09

The main transmission route of Leishmania infantum is through the bites of sand flies. However, alternative mechanisms are being investigated, such as through the bites of ticks, which could have epidemiological relevance. The objective of this work was to verify the presence of Leishmania spp. in Rhipicephalus sanguineus sensu lato collected from naturally infected dogs in the Federal District of Brazil. Ticks were dissected to remove their intestines and salivary glands for DNA extraction and the subsequent amplification of the conserved region of 120 bp of kDNA and 234 bp of the hsp70 gene of Leishmania spp. The amplified kDNA products were digested with endonucleases HaeIII and BstUI and were submitted to DNA sequencing. Isolated Leishmania parasites from these ticks were analyzed by multilocus enzyme electrophoresis, and the DNA obtained from this culture was subjected to microsatellite analyses. Overall, 130 specimens of R. sanguineus were collected from 27 dogs. Leishmania spp. were successfully isolated in culture from five pools of salivary glands and the intestines of ticks collected from four dogs. The amplified kDNA products from the dog blood samples and from the tick cultures, when digested by HaeIII and BstUI, revealed the presence of L. braziliensis and L. infantum. One strain was cultivated and characterized as L. infantum by enzyme electrophoresis. The amplified kDNA products from the blood of one dog showed a sequence homology with L. braziliensis; however, the amplified kDNA from the ticks collected from this dog showed a sequence homology to L. infantum. The results confirm that the specimens of R. sanguineus that feed on dogs naturally infected by L. infantum contain the parasite DNA in their intestines and salivary glands, and viable L. infantum can be successfully isolated from these ectoparasites.
Massive Gene Transfer and Extensive RNA Editing of a Symbiotic Dinoflagellate Plastid Genome

PubMed Central

Mungpakdee, Sutada; Shinzato, Chuya; Takeuchi, Takeshi; Kawashima, Takeshi; Koyanagi, Ryo; Hisata, Kanako; Tanaka, Makiko; Goto, Hiroki; Fujie, Manabu; Lin, Senjie; Satoh, Nori; Shoguchi, Eiichi

2014-01-01

Genome sequencing of Symbiodinium minutum revealed that 95 of 109 plastid-associated genes have been transferred to the nuclear genome and subsequently expanded by gene duplication. Only 14 genes remain in plastids and occur as DNA minicircles. Each minicircle (1.8–3.3 kb) contains one gene and a conserved noncoding region containing putative promoters and RNA-binding sites. Nine types of RNA editing, including a novel G/U type, were discovered in minicircle transcripts but not in genes transferred to the nucleus. In contrast to DNA editing sites in dinoflagellate mitochondria, which tend to be highly conserved across all taxa, editing sites employed in DNA minicircles are highly variable from species to species. Editing is crucial for core photosystem protein function. It restores evolutionarily conserved amino acids and increases peptidyl hydropathy. It also increases protein plasticity necessary to initiate photosystem complex assembly. PMID:24881086

Massive gene transfer and extensive RNA editing of a symbiotic dinoflagellate plastid genome.

PubMed

Mungpakdee, Sutada; Shinzato, Chuya; Takeuchi, Takeshi; Kawashima, Takeshi; Koyanagi, Ryo; Hisata, Kanako; Tanaka, Makiko; Goto, Hiroki; Fujie, Manabu; Lin, Senjie; Satoh, Nori; Shoguchi, Eiichi

2014-05-31

Genome sequencing of Symbiodinium minutum revealed that 95 of 109 plastid-associated genes have been transferred to the nuclear genome and subsequently expanded by gene duplication. Only 14 genes remain in plastids and occur as DNA minicircles. Each minicircle (1.8-3.3 kb) contains one gene and a conserved noncoding region containing putative promoters and RNA-binding sites. Nine types of RNA editing, including a novel G/U type, were discovered in minicircle transcripts but not in genes transferred to the nucleus. In contrast to DNA editing sites in dinoflagellate mitochondria, which tend to be highly conserved across all taxa, editing sites employed in DNA minicircles are highly variable from species to species. Editing is crucial for core photosystem protein function. It restores evolutionarily conserved amino acids and increases peptidyl hydropathy. It also increases protein plasticity necessary to initiate photosystem complex assembly. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
A fusion of minicircle DNA and nanoparticle delivery technologies facilitates therapeutic genetic engineering of autologous canine olfactory mucosal cells.

PubMed

Delaney, Alexander M; Adams, Christopher F; Fernandes, Alinda R; Al-Shakli, Arwa F; Sen, Jon; Carwardine, Darren R; Granger, Nicolas; Chari, Divya M

2017-06-29

Olfactory ensheathing cells (OECs) promote axonal regeneration and improve locomotor function when transplanted into the injured spinal cord. A recent clinical trial demonstrated improved motor function in domestic dogs with spinal injury following autologous OEC transplantation. Their utility in canines offers promise for human translation, as dogs are comparable to humans in terms of clinical management and genetic/environmental variation. Moreover, the autologous, minimally invasive derivation of OECs makes them viable for human spinal injury investigation. Genetic engineering of transplant populations may augment their therapeutic potential, but relies heavily on viral methods which have several drawbacks for clinical translation. We present here the first proof that magnetic particles deployed with applied magnetic fields and advanced DNA minicircle vectors can safely bioengineer OECs to secrete a key neurotrophic factor, with an efficiency approaching that of viral vectors. We suggest that our alternative approach offers high translational potential for the delivery of augmented clinical cell therapies.
Parasitological Confirmation and Analysis of Leishmania Diversity in Asymptomatic and Subclinical Infection following Resolution of Cutaneous Leishmaniasis.

PubMed

Rosales-Chilama, Mariana; Gongora, Rafael E; Valderrama, Liliana; Jojoa, Jimena; Alexander, Neal; Rubiano, Luisa C; Cossio, Alexandra; Adams, Emily R; Saravia, Nancy G; Gomez, María Adelaida

2015-12-01

The contribution of individuals with subclinical infection to the transmission and endemicity of cutaneous leishmaniasis (CL) is unknown. Immunological evidence of exposure to Leishmania in residents of endemic areas has been the basis for defining the human population with asymptomatic infection. However, parasitological confirmation of subclinical infection is lacking. We investigated the presence and viability of Leishmania in blood and non-invasive mucosal tissue samples from individuals with immunological evidence of subclinical infection in endemic areas for CL caused by Leishmania (Viannia) in Colombia. Detection of Leishmania kDNA was conducted by PCR-Southern Blot, and parasite viability was confirmed by amplification of parasite 7SLRNA gene transcripts. A molecular tool for genetic diversity analysis of parasite populations causing persistent subclinical infection based on PCR amplification and sequence analysis of an 82bp region between kDNA conserved blocks 1 and 2 was developed. Persistent Leishmania infection was demonstrated in 40% (46 of 114) of leishmanin skin test (LST) positive individuals without active disease; parasite viability was established in 59% of these (27 of 46; 24% of total). Parasite burden quantified from circulating blood monocytes, nasal, conjunctival or tonsil mucosal swab samples was comparable, and ranged between 0.2 to 22 parasites per reaction. kDNA sequences were obtained from samples from 2 individuals with asymptomatic infection and from 26 with history of CL, allowing genetic distance analysis that revealed diversity among sequences and clustering within the L. (Viannia) subgenus. Our results provide parasitological confirmation of persistent infection among residents of endemic areas of L. (Viannia) transmission who have experienced asymptomatic infection or recovered from CL, revealing a reservoir of infection that potentially contributes to the endemicity and transmission of disease. kDNA genotyping establishes proof
Scaffold-mediated BMP-2 minicircle DNA delivery accelerated bone repair in a mouse critical-size calvarial defect model.

PubMed

Keeney, Michael; Chung, Michael T; Zielins, Elizabeth R; Paik, Kevin J; McArdle, Adrian; Morrison, Shane D; Ransom, Ryan C; Barbhaiya, Namrata; Atashroo, David; Jacobson, Gunilla; Zare, Richard N; Longaker, Michael T; Wan, Derrick C; Yang, Fan

2016-08-01

Scaffold-mediated gene delivery holds great promise for tissue regeneration. However, previous attempts to induce bone regeneration using scaffold-mediated non-viral gene delivery rarely resulted in satisfactory healing. We report a novel platform with sustained release of minicircle DNA (MC) from PLGA scaffolds to accelerate bone repair. MC was encapsulated inside PLGA scaffolds using supercritical CO2 , which showed prolonged release of MC. Skull-derived osteoblasts transfected with BMP-2 MC in vitro result in higher osteocalcin gene expression and mineralized bone formation. When implanted in a critical-size mouse calvarial defect, scaffolds containing luciferase MC lead to robust in situ protein production up to at least 60 days. Scaffold-mediated BMP-2 MC delivery leads to substantially accelerated bone repair as early as two weeks, which continues to progress over 12 weeks. This platform represents an efficient, long-term nonviral gene delivery system, and may be applicable for enhancing repair of a broad range of tissues types. © 2016 Wiley Periodicals, Inc. J Biomed Mater Res Part A: 104A: 2099-2107, 2016. © 2016 Wiley Periodicals, Inc.
Detecting cancers through tumor-activatable minicircles that lead to a detectable blood biomarker.

PubMed

Ronald, John A; Chuang, Hui-Yen; Dragulescu-Andrasi, Anca; Hori, Sharon S; Gambhir, Sanjiv S

2015-03-10

Earlier detection of cancers can dramatically improve the efficacy of available treatment strategies. However, despite decades of effort on blood-based biomarker cancer detection, many promising endogenous biomarkers have failed clinically because of intractable problems such as highly variable background expression from nonmalignant tissues and tumor heterogeneity. In this work we present a tumor-detection strategy based on systemic administration of tumor-activatable minicircles that use the pan-tumor-specific Survivin promoter to drive expression of a secretable reporter that is detectable in the blood nearly exclusively in tumor-bearing subjects. After systemic administration we demonstrate a robust ability to differentiate mice bearing human melanoma metastases from tumor-free subjects for up to 2 wk simply by measuring blood reporter levels. Cumulative change in reporter levels also identified tumor-bearing subjects, and a receiver operator-characteristic curve analysis highlighted this test's performance with an area of 0.918 ± 0.084. Lung tumor burden additionally correlated (r(2) = 0.714; P < 0.05) with cumulative reporter levels, indicating that determination of disease extent was possible. Continued development of our system could improve tumor detectability dramatically because of the temporally controlled, high reporter expression in tumors and nearly zero background from healthy tissues. Our strategy's highly modular nature also allows it to be iteratively optimized over time to improve the test's sensitivity and specificity. We envision this system could be used first in patients at high risk for tumor recurrence, followed by screening high-risk populations before tumor diagnosis, and, if proven safe and effective, eventually may have potential as a powerful cancer-screening tool for the general population.
A 5′ Noncoding Exon Containing Engineered Intron Enhances Transgene Expression from Recombinant AAV Vectors in vivo

PubMed Central

Lu, Jiamiao; Williams, James A.; Luke, Jeremy; Zhang, Feijie; Chu, Kirk; Kay, Mark A.

2017-01-01

We previously developed a mini-intronic plasmid (MIP) expression system in which the essential bacterial elements for plasmid replication and selection are placed within an engineered intron contained within a universal 5′ UTR noncoding exon. Like minicircle DNA plasmids (devoid of bacterial backbone sequences), MIP plasmids overcome transcriptional silencing of the transgene. However, in addition MIP plasmids increase transgene expression by 2 and often >10 times higher than minicircle vectors in vivo and in vitro. Based on these findings, we examined the effects of the MIP intronic sequences in a recombinant adeno-associated virus (AAV) vector system. Recombinant AAV vectors containing an intron with a bacterial replication origin and bacterial selectable marker increased transgene expression by 40 to 100 times in vivo when compared with conventional AAV vectors. Therefore, inclusion of this noncoding exon/intron sequence upstream of the coding region can substantially enhance AAV-mediated gene expression in vivo. PMID:27903072
Coexistence of minicircular and a highly rearranged mtDNA molecule suggests that recombination shapes mitochondrial genome organization.

PubMed

Mao, Meng; Austin, Andrew D; Johnson, Norman F; Dowton, Mark

2014-03-01

Recombination has been proposed as a possible mechanism to explain mitochondrial (mt) gene rearrangements, although the issue of whether mtDNA recombination occurs in animals has been controversial. In this study, we sequenced the entire mt genome of the megaspilid wasp Conostigmus sp., which possessed a highly rearranged mt genome. The sequence of the A+T-rich region contained a number of different types of repeats, similar to those reported previously in the nematode Meloidogyne javanica, in which recombination was discovered. In Conostigmus, we detected the end products of recombination: a range of minicircles. However, using isolated (cloned) fragments of the A+T-rich region, we established that some of these minicircles were found to be polymerase chain reaction (PCR) artifacts. It appears that regions with repeats are prone to PCR template switching or PCR jumping. Nevertheless, there is strong evidence that one minicircle is real, as amplification primers that straddle the putative breakpoint junction produce a single strong amplicon from genomic DNA but not from the cloned A+T-rich region. The results provide support for the direct link between recombination and mt gene rearrangement. Furthermore, we developed a model of recombination which is important for our understanding of mtDNA evolution.
Molecular identification and genotyping of Trypanosoma cruzi DNA in autochthonous Chagas disease patients from Texas, USA.

PubMed

Garcia, Melissa N; Burroughs, Hadley; Gorchakov, Rodion; Gunter, Sarah M; Dumonteil, Eric; Murray, Kristy O; Herrera, Claudia P

2017-04-01

The parasitic protozoan Trypanosoma cruzi, the causative agent of Chagas disease, is widely distributed throughout the Americas, from the southern United States (US) to northern Argentina, and infects at least 6 million people in endemic areas. Much remains unknown about the dynamics of T. cruzi transmission among mammals and triatomine vectors in sylvatic and peridomestic eco-epidemiological cycles, as well as of the risk of transmission to humans in the US. Identification of T. cruzi DTUs among locally-acquired cases is necessary for enhancing our diagnostic and clinical prognostic capacities, as well as to understand parasite transmission cycles. Blood samples from a cohort of 15 confirmed locally-acquired Chagas disease patients from Texas were used for genotyping T. cruzi. Conventional PCR using primers specific for the minicircle variable region of the kinetoplastid DNA (kDNA) and the highly repetitive genomic satellite DNA (satDNA) confirmed the presence of T. cruzi in 12/15 patients. Genotyping was based on the amplification of the intergenic region of the miniexon gene of T. cruzi and sequencing. Sequences were analyzed by BLAST and phylogenetic analysis by Maximum Likelihood method allowed the identification of non-TcI DTUs infection in six patients, which corresponded to DTUs TcII, TcV or TcVI, but not to TcIII or TcIV. Two of these six patients were also infected with a TcI DTU, indicating mixed infections in those individuals. Electrocardiographic abnormalities were seen among patients with single non-TcI and mixed infections of non-TcI and TcI DTUs. Our results indicate a greater diversity of T. cruzi DTUs circulating among autochthonous human Chagas disease cases in the southern US, including for the first time DTUs from the TcII-TcV-TcVI group. Furthermore, the DTUs infecting human patients in the US are capable of causing Chagasic cardiac disease, highlighting the importance of parasite detection in the population. Copyright © 2017 Elsevier B
Trypanosomatid protozoa in plants of southeastern Spain: characterization by analysis of isoenzymes, kinetoplast DNA, and metabolic behavior.

PubMed

Sánchez-Moreno, M; Fernández-Becerra, C; Fernández-Ramos, C; Luque, F; Rodriguez-Cabezas, M N; Dollet, M; Osuna, A

1998-05-01

Three flagellates of the family Trypanosomatidae were isolated from mango fruits (Mangifera indica) and from the stems of clover (Trifolium glomeratum) and Amaranth (Amaranthus retroflexus) in southeastern Spain and were adapted to in vitro culture in monophase media. The parasites showed an ultrastructural pattern similar to that of other species of the genus Phytomonas. Mango and clover isolates differed from amaranth isolates in ultrastructural terms. The isolates were characterized by isoenzymatic analysis and by kDNA analysis using five different restriction endonucleases. With eight of the nine enzymatic systems, mango and clover isolates were distinguished from those of amaranth. Nevertheless, with the enzymes malate dehydrogenase and superoxide dismutase, flagellates isolated from clover were differentiated from those isolated from mango. Electrophoretic and restriction-endonuclease analysis of kDNA minicircles showed similar restriction cleavage patterns for the isolates from mango and clover, whereas the patterns of the amaranth isolates differed. The results of the present study confirm that the strains isolated from mango and clover constitute a phylogenetically closely related group of plant trypanosomatids, which is more distantly related to the strain isolated from amaranth. The similarities in the results obtained for isolates from mango and clover foliage, on the one hand, and those obtained from tomato and cherimoya fruits (studied previously), on the other, as well as the geographic proximity of the different plants support the contention that only one strain is involved, albeit one strain that can parasitize different plants. Furthermore, some of the plants appear to act as reservoirs for the parasites. On the other hand, the metabolism studies using [1H]-nuclear magnetic resonance spectroscopy did not reveal that the catabolism of Phytomonas in general follows a pattern common to all the species or isolates. Phytomonas are incapable of
International Study to Evaluate PCR Methods for Detection of Trypanosoma cruzi DNA in Blood Samples from Chagas Disease Patients

PubMed Central

Schijman, Alejandro G.; Bisio, Margarita; Orellana, Liliana; Sued, Mariela; Duffy, Tomás; Mejia Jaramillo, Ana M.; Cura, Carolina; Auter, Frederic; Veron, Vincent; Qvarnstrom, Yvonne; Deborggraeve, Stijn; Hijar, Gisely; Zulantay, Inés; Lucero, Raúl Horacio; Velazquez, Elsa; Tellez, Tatiana; Sanchez Leon, Zunilda; Galvão, Lucia; Nolder, Debbie; Monje Rumi, María; Levi, José E.; Ramirez, Juan D.; Zorrilla, Pilar; Flores, María; Jercic, Maria I.; Crisante, Gladys; Añez, Néstor; De Castro, Ana M.; Gonzalez, Clara I.; Acosta Viana, Karla; Yachelini, Pedro; Torrico, Faustino; Robello, Carlos; Diosque, Patricio; Triana Chavez, Omar; Aznar, Christine; Russomando, Graciela; Büscher, Philippe; Assal, Azzedine; Guhl, Felipe; Sosa Estani, Sergio; DaSilva, Alexandre; Britto, Constança; Luquetti, Alejandro; Ladzins, Janis

2011-01-01

Background A century after its discovery, Chagas disease still represents a major neglected tropical threat. Accurate diagnostics tools as well as surrogate markers of parasitological response to treatment are research priorities in the field. The purpose of this study was to evaluate the performance of PCR methods in detection of Trypanosoma cruzi DNA by an external quality evaluation. Methodology/Findings An international collaborative study was launched by expert PCR laboratories from 16 countries. Currently used strategies were challenged against serial dilutions of purified DNA from stocks representing T. cruzi discrete typing units (DTU) I, IV and VI (set A), human blood spiked with parasite cells (set B) and Guanidine Hidrochloride-EDTA blood samples from 32 seropositive and 10 seronegative patients from Southern Cone countries (set C). Forty eight PCR tests were reported for set A and 44 for sets B and C; 28 targeted minicircle DNA (kDNA), 13 satellite DNA (Sat-DNA) and the remainder low copy number sequences. In set A, commercial master mixes and Sat-DNA Real Time PCR showed better specificity, but kDNA-PCR was more sensitive to detect DTU I DNA. In set B, commercial DNA extraction kits presented better specificity than solvent extraction protocols. Sat-DNA PCR tests had higher specificity, with sensitivities of 0.05–0.5 parasites/mL whereas specific kDNA tests detected 5.10−3 par/mL. Sixteen specific and coherent methods had a Good Performance in both sets A and B (10 fg/µl of DNA from all stocks, 5 par/mL spiked blood). The median values of sensitivities, specificities and accuracies obtained in testing the Set C samples with the 16 tests determined to be good performing by analyzing Sets A and B samples varied considerably. Out of them, four methods depicted the best performing parameters in all three sets of samples, detecting at least 10 fg/µl for each DNA stock, 0.5 par/mL and a sensitivity between 83.3–94.4%, specificity of 85–95
Chronic Chagas disease: PCR-xenodiagnosis without previous microscopic observation is a useful tool to detect viable Trypanosoma cruzi.

PubMed

Saavedra, Miguel; Zulantay, Inés; Apt, Werner; Martínez, Gabriela; Rojas, Antonio; Rodríguez, Jorge

2013-01-01

We evaluate the elimination of the microscopic stage of conventional xenodiagnosis (XD) to optimize the parasitological diagnosis of Trypanosoma cruzi in chronic Chagas disease. To this purpose we applied under informed consent two XD cages to 150 Chilean chronic chagasic patients. The fecal samples (FS) of the triatomines at 30, 60 and 90 days post feeding were divided into two parts: in one a microscopic search for mobile trypomastigote and/or epimastigote forms was performed. In the other part, DNA extraction-purification for PCR directed to the conserved region of kDNA minicircles of trypanosomes (PCR-XD), without previous microscopic observation was done. An XD was considered positive when at least one mobile T. cruzi parasite in any one of three periods of incubation was observed, whereas PCR-XD was considered positive when the 330 bp band specific for T. cruzi was detected. 25 of 26 cases with positive conventional XD were PCR-XD positive (concordance 96.2%), whereas 85 of 124 cases with negative conventional XD were positive by PCR-XD (68.5%). Human chromosome 12 detected by Real-time PCR used as exogenous internal control of PCR-XD reaction allowed to discounting of PCR inhibition and false negative in 40 cases with negative PCR-XD. PCR-XD performed without previous microscopic observation is a useful tool for detection of viable parasites with higher efficiency then conventional XD.
Dual Functions of α-Ketoglutarate Dehydrogenase E2 in the Krebs Cycle and Mitochondrial DNA Inheritance in Trypanosoma brucei

PubMed Central

Sykes, Steven E.

2013-01-01

The dihydrolipoyl succinyltransferase (E2) of the multisubunit α-ketoglutarate dehydrogenase complex (α-KD) is an essential Krebs cycle enzyme commonly found in the matrices of mitochondria. African trypanosomes developmentally regulate mitochondrial carbohydrate metabolism and lack a functional Krebs cycle in the bloodstream of mammals. We found that despite the absence of a functional α-KD, bloodstream form (BF) trypanosomes express α-KDE2, which localized to the mitochondrial matrix and inner membrane. Furthermore, α-KDE2 fractionated with the mitochondrial genome, the kinetoplast DNA (kDNA), in a complex with the flagellum. A role for α-KDE2 in kDNA maintenance was revealed in α-KDE2 RNA interference (RNAi) knockdowns. Following RNAi induction, bloodstream trypanosomes showed pronounced growth reduction and often failed to equally distribute kDNA to daughter cells, resulting in accumulation of cells devoid of kDNA (dyskinetoplastic) or containing two kinetoplasts. Dyskinetoplastic trypanosomes lacked mitochondrial membrane potential and contained mitochondria of substantially reduced volume. These results indicate that α-KDE2 is bifunctional, both as a metabolic enzyme and as a mitochondrial inheritance factor necessary for the distribution of kDNA networks to daughter cells at cytokinesis. PMID:23125353
Dual functions of α-ketoglutarate dehydrogenase E2 in the Krebs cycle and mitochondrial DNA inheritance in Trypanosoma brucei.

PubMed

Sykes, Steven E; Hajduk, Stephen L

2013-01-01

The dihydrolipoyl succinyltransferase (E2) of the multisubunit α-ketoglutarate dehydrogenase complex (α-KD) is an essential Krebs cycle enzyme commonly found in the matrices of mitochondria. African trypanosomes developmentally regulate mitochondrial carbohydrate metabolism and lack a functional Krebs cycle in the bloodstream of mammals. We found that despite the absence of a functional α-KD, bloodstream form (BF) trypanosomes express α-KDE2, which localized to the mitochondrial matrix and inner membrane. Furthermore, α-KDE2 fractionated with the mitochondrial genome, the kinetoplast DNA (kDNA), in a complex with the flagellum. A role for α-KDE2 in kDNA maintenance was revealed in α-KDE2 RNA interference (RNAi) knockdowns. Following RNAi induction, bloodstream trypanosomes showed pronounced growth reduction and often failed to equally distribute kDNA to daughter cells, resulting in accumulation of cells devoid of kDNA (dyskinetoplastic) or containing two kinetoplasts. Dyskinetoplastic trypanosomes lacked mitochondrial membrane potential and contained mitochondria of substantially reduced volume. These results indicate that α-KDE2 is bifunctional, both as a metabolic enzyme and as a mitochondrial inheritance factor necessary for the distribution of kDNA networks to daughter cells at cytokinesis.
Molecular diagnosis of canine visceral leishmaniasis: a comparative study of three methods using skin and spleen from dogs with natural Leishmania infantum infection.

PubMed

Reis, Levi Eduardo Soares; Coura-Vital, Wendel; Roatt, Bruno Mendes; Bouillet, Leoneide Érica Maduro; Ker, Henrique Gama; Fortes de Brito, Rory Cristiane; Resende, Daniela de Melo; Carneiro, Mariângela; Giunchetti, Rodolfo Cordeiro; Marques, Marcos José; Carneiro, Cláudia Martins; Reis, Alexandre Barbosa

2013-11-08

Polymerase chain reaction (PCR) and its variations represent highly sensitive and specific methods for Leishmania DNA detection and subsequent canine visceral leishmaniasis (CVL) diagnosis. The aim of this work was to compare three different molecular diagnosis techniques (conventional PCR [cPCR], seminested PCR [snPCR], and quantitative PCR [qPCR]) in samples of skin and spleen from 60 seropositive dogs by immunofluorescence antibody test and enzyme-linked immunosorbent assay. Parasitological analysis was conducted by culture of bone marrow aspirate and optical microscopic assessment of ear skin and spleen samples stained with Giemsa, the standard tests for CVL diagnosis. The primers L150/L152 and LINR4/LIN17/LIN19 were used to amplify the conserved region of the Leishmania kDNA minicircle in the cPCR, and snPCR and qPCR were performed using the DNA polymerase gene (DNA pol α) primers from Leishmania infantum. The parasitological analysis revealed parasites in 61.7% of the samples. Sensitivities were 89.2%, 86.5%, and 97.3% in the skin and 81.1%, 94.6%, and 100.0% in spleen samples used for cPCR, snPCR, and qPCR, respectively. We demonstrated that the qPCR method was the best technique to detect L. infantum in both skin and spleen samples. However, we recommend the use of skin due to the high sensitivity and sampling being less invasive. Copyright © 2013 Elsevier B.V. All rights reserved.
Genome Fragmentation Is Not Confined to the Peridinin Plastid in Dinoflagellates

PubMed Central

Espelund, Mari; Minge, Marianne A.; Gabrielsen, Tove M.; Nederbragt, Alexander J.; Shalchian-Tabrizi, Kamran; Otis, Christian; Turmel, Monique; Lemieux, Claude; Jakobsen, Kjetill S.

2012-01-01

When plastids are transferred between eukaryote lineages through series of endosymbiosis, their environment changes dramatically. Comparison of dinoflagellate plastids that originated from different algal groups has revealed convergent evolution, suggesting that the host environment mainly influences the evolution of the newly acquired organelle. Recently the genome from the anomalously pigmented dinoflagellate Karlodinium veneficum plastid was uncovered as a conventional chromosome. To determine if this haptophyte-derived plastid contains additional chromosomal fragments that resemble the mini-circles of the peridin-containing plastids, we have investigated its genome by in-depth sequencing using 454 pyrosequencing technology, PCR and clone library analysis. Sequence analyses show several genes with significantly higher copy numbers than present in the chromosome. These genes are most likely extrachromosomal fragments, and the ones with highest copy numbers include genes encoding the chaperone DnaK(Hsp70), the rubisco large subunit (rbcL), and two tRNAs (trnE and trnM). In addition, some photosystem genes such as psaB, psaA, psbB and psbD are overrepresented. Most of the dnaK and rbcL sequences are found as shortened or fragmented gene sequences, typically missing the 3′-terminal portion. Both dnaK and rbcL are associated with a common sequence element consisting of about 120 bp of highly conserved AT-rich sequence followed by a trnE gene, possibly serving as a control region. Decatenation assays and Southern blot analysis indicate that the extrachromosomal plastid sequences do not have the same organization or lengths as the minicircles of the peridinin dinoflagellates. The fragmentation of the haptophyte-derived plastid genome K. veneficum suggests that it is likely a sign of a host-driven process shaping the plastid genomes of dinoflagellates. PMID:22719952
Short Hairpin RNA Gene Silencing of Prolyl Hydroxylase-2 with a Minicircle Vector Improves Neovascularization of Hindlimb Ischemia

PubMed Central

Lijkwan, Maarten A.; Hellingman, Alwine A.; Bos, Ernst J.; van der Bogt, Koen E.A.; Huang, Mei; Kooreman, Nigel G.; de Vries, Margreet R.; Peters, Hendrika A.B.; Robbins, Robert C.; Quax, Paul H.A.

2014-01-01

Abstract In this study, we target the hypoxia inducible factor-1 alpha (HIF-1-alpha) pathway by short hairpin RNA interference therapy targeting prolyl hydroxylase-2 (shPHD2). We use the minicircle (MC) vector technology as an alternative for conventional nonviral plasmid (PL) vectors in order to improve neovascularization after unilateral hindlimb ischemia in a murine model. Gene expression and transfection efficiency of MC and PL, both in vitro and in vivo, were assessed using bioluminescence imaging (BLI) and firefly luciferase (Luc) reporter gene. C57Bl6 mice underwent unilateral electrocoagulation of the femoral artery and gastrocnemic muscle injection with MC-shPHD2, PL-shPHD2, or phosphate-buffered saline (PBS) as control. Blood flow recovery was monitored using laser Doppler perfusion imaging, and collaterals were visualized by immunohistochemistry and angiography. MC-Luc showed a 4.6-fold higher in vitro BLI signal compared with PL-Luc. BLI signals in vivo were 4.3×105±3.3×105 (MC-Luc) versus 0.4×105±0.3×105 (PL-Luc) at day 28 (p=0.016). Compared with PL-shPHD2 or PBS, MC-shPHD2 significantly improved blood flow recovery, up to 50% from day 3 until day 14 after ischemia induction. MC-shPHD2 significantly increased collateral density and capillary density, as monitored by alpha-smooth muscle actin expression and CD31+ expression, respectively. Angiography data confirmed the histological findings. Significant downregulation of PHD2 mRNA levels by MC-shPHD2 was confirmed by quantitative polymerase chain reaction. Finally, Western blot analysis confirmed significantly higher levels of HIF-1-alpha protein by MC-shPHD2, compared with PL-shPHD2 and PBS. This study provides initial evidence of a new potential therapeutic approach for peripheral artery disease. The combination of HIF-1-alpha pathway targeting by shPHD2 with the robust nonviral MC plasmid improved postischemic neovascularization, making this approach a promising potential treatment option for
Development and evaluation of a novel LAMP assay for the diagnosis of Cutaneous and Visceral Leishmaniasis.

PubMed

Adams, Emily Rebecca; Schoone, Gerard; Versteeg, Inge; Gomez, Maria Adelaida; Diro, Ermias; Mori, Yasuyoshi; Perlee, Desiree; Downing, Tim; Saravia, Nancy; Assaye, Ashenafi; Hailu, Asrat; Albertini, Audrey; Ndung'u, Joseph Mathu; Schallig, Henk

2018-04-25

A novel Pan-Leishmania LAMP assay was developed for diagnosis of Cutaneous and Visceral Leishmaniasis (CL & VL) which can be used in near-patient settings. Primers were designed on the 18S rDNA and the conserved region of minicircle kDNA selected on the basis of high copy number. LAMP assays were evaluated for CL in a prospective cohort trial of 105 patients in South-West Colombia. Lesion swab samples from CL suspects were collected and tested using LAMP and compared to a composite reference of microscopy AND/OR culture to calculate diagnostic accuracy. LAMP assays were tested on 50 VL suspected patients from Ethiopia, including whole blood, peripheral blood mononuclear cells, and buffy coat. Diagnostic accuracy was calculated against a reference standard of microscopy of splenic or bone marrow aspirates. To calculate analytical specificity 100 clinical samples and isolates with fever causing pathogens including malaria, arboviruses and bacterial infections were tested. The LAMP assay had a sensitivity of 95% (95% CI: 87.2% - 98.5 %) and a specificity of 86% (95% CI: 67.3% -95.9 %) for the diagnosis of CL. On VL suspects the sensitivity was 92% (95% CI: 74.9 - 99.1%) and specificity of 100% (95% CI: 85.8-100%) in whole blood. For CL, LAMP is a sensitive tool for diagnosis and requires less equipment, time and expertise than alternative CL diagnostics. For VL, LAMP is sensitive using a minimally invasive sample as compared to the gold standard. The analytical specificity was 100%. Copyright © 2018 Adams et al.
Species delimitation of common reef corals in the genus Pocillopora using nucleotide sequence phylogenies, population genetics and symbiosis ecology.

PubMed

Pinzón, Jorge H; LaJeunesse, Todd C

2011-01-01

Stony corals in the genus Pocillopora are among the most common and widely distributed of Indo-Pacific corals and, as such, are often the subject of physiological and ecological research. In the far Tropical Eastern Pacific (TEP), they are major constituents of shallow coral communities, exhibiting considerable variability in colony shape and branch morphology and marked differences in response to thermal stress. Numerous intermediates occur between morphospecies that may relate to extensive hybridization. The diversity of the Pocillopora genus in the TEP was analysed genetically using nuclear ribosomal (ITS2) and mitochondrial (ORF) sequences, and population genetic markers (seven microsatellite loci). The resident dinoflagellate endosymbiont (Symbiodinium sp.) in each sample was also characterized using sequences of the internal transcribed spacer 2 (ITS2) rDNA and the noncoding region of the chloroplast psbA minicircle. From these analyses, three symbiotically distinct, reproductively isolated, nonhybridizing, evolutionarily divergent animal lineages were identified. Designated types 1, 2 and 3, these groupings were incongruent with traditional morphospecies classification. Type 1 was abundant and widespread throughout the TEP; type 2 was restricted to the Clipperton Atoll; and type 3 was found only in Panama and the Galapagos Islands. Each type harboured a different Symbiodinium'species lineage' in Clade C, and only type 1 associated with the 'stress-tolerant'Symbiodinium glynni (D1). The accurate delineation of species and implementation of a proper taxonomy may profoundly improve our assessment of Pocillopora's reproductive biology, biogeographic distributions, and resilience to climate warming, information that must be considered when planning for the conservation of reef corals. © 2010 Blackwell Publishing Ltd.
The Trypanosoma cruzi Satellite DNA OligoC-TesT and Trypanosoma cruzi Kinetoplast DNA OligoC-TesT for Diagnosis of Chagas Disease: A Multi-cohort Comparative Evaluation Study

PubMed Central

De Winne, Koen; Büscher, Philippe; Luquetti, Alejandro O.; Tavares, Suelene B. N.; Oliveira, Rodrigo A.; Solari, Aldo; Zulantay, Ines; Apt, Werner; Diosque, Patricio; Monje Rumi, Mercedes; Gironès, Nuria; Fresno, Manuel; Lopez-Velez, Rogelio; Perez-Molina, José A.; Monge-Maillo, Begoña; Garcia, Lineth; Deborggraeve, Stijn

2014-01-01

Background The Trypanosoma cruzi satellite DNA (satDNA) OligoC-TesT is a standardised PCR format for diagnosis of Chagas disease. The sensitivity of the test is lower for discrete typing unit (DTU) TcI than for TcII-VI and the test has not been evaluated in chronic Chagas disease patients. Methodology/Principal Findings We developed a new prototype of the OligoC-TesT based on kinetoplast DNA (kDNA) detection. We evaluated the satDNA and kDNA OligoC-TesTs in a multi-cohort study with 187 chronic Chagas patients and 88 healthy endemic controls recruited in Argentina, Chile and Spain and 26 diseased non-endemic controls from D.R. Congo and Sudan. All specimens were tested in duplicate. The overall specificity in the controls was 99.1% (95% CI 95.2%–99.8%) for the satDNA OligoC-TesT and 97.4% (95% CI 92.6%–99.1%) for the kDNA OligoC-TesT. The overall sensitivity in the patients was 67.9% (95% CI 60.9%–74.2%) for the satDNA OligoC-TesT and 79.1% (95% CI 72.8%–84.4%) for the kDNA OligoC-Test. Conclusions/Significance Specificities of the two T. cruzi OligoC-TesT prototypes are high on non-endemic and endemic controls. Sensitivities are moderate but significantly (p = 0.0004) higher for the kDNA OligoC-TesT compared to the satDNA OligoC-TesT. PMID:24392177
Genotype diversity of Trypanosoma cruzi in small rodents and Triatoma sanguisuga from a rural area in New Orleans, Louisiana.

PubMed

Herrera, Claudia P; Licon, Meredith H; Nation, Catherine S; Jameson, Samuel B; Wesson, Dawn M

2015-02-24

Chagas disease is an anthropozoonosis caused by the protozoan parasite Trypanosoma cruzi that represents a major public health problem in Latin America. Although the United States is defined as non-endemic for Chagas disease due to the rarity of human cases, the presence of T. cruzi has now been amply demonstrated as enzootic in different regions of the south of the country from Georgia to California. In southeastern Louisiana, a high T. cruzi infection rate has been demonstrated in Triatoma sanguisuga, the local vector in this area. However, little is known about the role of small mammals in the wild and peridomestic transmission cycles. This study focused on the molecular identification and genotyping of T. cruzi in both small rodents and T. sanguisuga from a rural area of New Orleans, Louisiana. DNA extractions were prepared from rodent heart, liver, spleen and skeletal muscle tissues and from cultures established from vector feces. T. cruzi infection was determined by standard PCR using primers specific for the minicircle variable region of the kinetoplastid DNA (kDNA) and the highly repetitive genomic satellite DNA (satDNA). Genotyping of discrete typing units (DTUs) was performed by amplification of mini-exon and 18S and 24Sα rRNA genes and subsequent sequence analysis. The DTUs TcI, TcIV and, for the first time, TcII, were identified in tissues of mice and rats naturally infected with T. cruzi captured in an area of New Orleans, close to the house where the first human case of Chagas disease was reported in Louisiana. The T. cruzi infection rate in 59 captured rodents was 76%. The frequencies of the detected DTUs in such mammals were TcI 82%, TcII 22% and TcIV 9%; 13% of all infections contained more than one DTU. Our results indicate a probable presence of a considerably greater diversity in T. cruzi DTUs circulating in the southeastern United States than previously reported. Understanding T. cruzi transmission dynamics in sylvatic and peridomestic cycles

Multiple mitochondrial introgression events and heteroplasmy in trypanosoma cruzi revealed by maxicircle MLST and next generation sequencing.

PubMed

Messenger, Louisa A; Llewellyn, Martin S; Bhattacharyya, Tapan; Franzén, Oscar; Lewis, Michael D; Ramírez, Juan David; Carrasco, Hernan J; Andersson, Björn; Miles, Michael A

2012-01-01

Mitochondrial DNA is a valuable taxonomic marker due to its relatively fast rate of evolution. In Trypanosoma cruzi, the causative agent of Chagas disease, the mitochondrial genome has a unique structural organization consisting of 20-50 maxicircles (∼20 kb) and thousands of minicircles (0.5-10 kb). T. cruzi is an early diverging protist displaying remarkable genetic heterogeneity and is recognized as a complex of six discrete typing units (DTUs). The majority of infected humans are asymptomatic for life while 30-35% develop potentially fatal cardiac and/or digestive syndromes. However, the relationship between specific clinical outcomes and T. cruzi genotype remains elusive. The availability of whole genome sequences has driven advances in high resolution genotyping techniques and re-invigorated interest in exploring the diversity present within the various DTUs. To describe intra-DTU diversity, we developed a highly resolutive maxicircle multilocus sequence typing (mtMLST) scheme based on ten gene fragments. A panel of 32 TcI isolates was genotyped using the mtMLST scheme, GPI, mini-exon and 25 microsatellite loci. Comparison of nuclear and mitochondrial data revealed clearly incongruent phylogenetic histories among different geographical populations as well as major DTUs. In parallel, we exploited read depth data, generated by Illumina sequencing of the maxicircle genome from the TcI reference strain Sylvio X10/1, to provide the first evidence of mitochondrial heteroplasmy (heterogeneous mitochondrial genomes in an individual cell) in T. cruzi. mtMLST provides a powerful approach to genotyping at the sub-DTU level. This strategy will facilitate attempts to resolve phenotypic variation in T. cruzi and to address epidemiologically important hypotheses in conjunction with intensive spatio-temporal sampling. The observations of both general and specific incidences of nuclear-mitochondrial phylogenetic incongruence indicate that genetic recombination is
Interference between Triplex and Protein Binding to Distal Sites on Supercoiled DNA.

PubMed

Noy, Agnes; Maxwell, Anthony; Harris, Sarah A

2017-02-07

We have explored the interdependence of the binding of a DNA triplex and a repressor protein to distal recognition sites on supercoiled DNA minicircles using MD simulations. We observe that the interaction between the two ligands through their influence on their DNA template is determined by a subtle interplay of DNA mechanics and electrostatics, that the changes in flexibility induced by ligand binding play an important role and that supercoiling can instigate additional ligand-DNA contacts that would not be possible in simple linear DNA sequences. Copyright © 2017. Published by Elsevier Inc.
Chagas' disease in Aboriginal and Creole communities from the Gran Chaco Region of Argentina: Seroprevalence and molecular parasitological characterization.

PubMed

Lucero, R H; Brusés, B L; Cura, C I; Formichelli, L B; Juiz, N; Fernández, G J; Bisio, M; Deluca, G D; Besuschio, S; Hernández, D O; Schijman, A G

2016-07-01

Most indigenous ethnias from Northern Argentina live in rural areas of "the Gran Chaco" region, where Trypanosoma cruzi is endemic. Serological and parasitological features have been poorly characterized in Aboriginal populations and scarce information exist regarding relevant T. cruzi discrete typing units (DTU) and parasitic loads. This study was focused to characterize T. cruzi infection in Qom, Mocoit, Pit'laxá and Wichi ethnias (N=604) and Creole communities (N=257) inhabiting rural villages from two highly endemic provinces of the Argentinean Gran Chaco. DNA extracted using Hexadecyltrimethyl Ammonium Bromide reagent from peripheral blood samples was used for conventional PCR targeted to parasite kinetoplastid DNA (kDNA) and identification of DTUs using nuclear genomic markers. In kDNA-PCR positive samples from three rural Aboriginal communities of "Monte Impenetrable Chaqueño", minicircle signatures were characterized by Low stringency single primer-PCR and parasitic loads calculated using Real-Time PCR. Seroprevalence was higher in Aboriginal (47.98%) than in Creole (27.23%) rural communities (Chi square, p=4.e(-8)). A low seroprevalence (4.3%) was detected in a Qom settlement at the suburbs of Resistencia city (Fisher Exact test, p=2.e(-21)).The kDNA-PCR positivity was 42.15% in Aboriginal communities and 65.71% in Creole populations (Chi square, p=5.e(-4)). Among Aboriginal communities kDNA-PCR positivity was heterogeneous (Chi square, p=1.e(-4)). Highest kDNA-PCR positivity (79%) was detected in the Qom community of Colonia Aborigen and the lowest PCR positivity in two different surveys at the Wichi community of Misión Nueva Pompeya (33.3% in 2010 and 20.8% in 2014). TcV (or TcII/V/VI) was predominant in both Aboriginal and Creole communities, in agreement with DTU distribution reported for the region. Besides, two subjects were infected with TcVI, one with TcI and four presented mixed infections of TcV plus TcII/VI. Most minicircle signatures
Distinct Leishmania Species Infecting Wild Caviomorph Rodents (Rodentia: Hystricognathi) from Brazil

PubMed Central

Cássia-Pires, Renata; Boité, Mariana C.; D'Andrea, Paulo S.; Herrera, Heitor M.; Cupolillo, Elisa; Jansen, Ana Maria; Roque, André Luiz R.

2014-01-01

Background Caviomorph rodents, some of the oldest Leishmania spp. hosts, are widely dispersed in Brazil. Despite both experimental and field studies having suggested that these rodents are potential reservoirs of Leishmania parasites, not more than 88 specimens were analyzed in the few studies of natural infection. Our hypothesis was that caviomorph rodents are inserted in the transmission cycles of Leishmania in different regions, more so than is currently recognized. Methodology We investigated the Leishmania infection in spleen fragments of 373 caviomorph rodents from 20 different species collected in five Brazilian biomes in a period of 13 years. PCR reactions targeting kDNA of Leishmania sp. were used to diagnose infection, while Leishmania species identification was performed by DNA sequencing of the amplified products obtained in the HSP70 (234) targeting. Serology by IFAT was performed on the available serum of these rodents. Principal findings In 13 caviomorph rodents, DNA sequencing analyses allowed the identification of 4 species of the subgenus L. (Viannia): L. shawi, L. guyanensis, L. naiffi, and L. braziliensis; and 1 species of the subgenus L. (Leishmania): L. infantum. These include the description of parasite species in areas not previously included in their known distribution: L. shawi in Thrichomys inermis from Northeastern Brazil and L. naiffi in T. fosteri from Western Brazil. From the four other positive rodents, two were positive for HSP70 (234) targeting but did not generate sequences that enabled the species identification, and another two were positive only in kDNA targeting. Conclusions/Significance The infection rate demonstrated by the serology (51.3%) points out that the natural Leishmania infection in caviomorph rodents is much higher than that observed in the molecular diagnosis (4.6%), highlighting that, in terms of the host species responsible for maintaining Leishmania species in the wild, our current knowledge represents only the
Molecular model of the mitochondrial genome segregation machinery in Trypanosoma brucei

PubMed Central

Hoffmann, Anneliese; Käser, Sandro; Jakob, Martin; Amodeo, Simona; Peitsch, Camille; Týč, Jiří; Vaughan, Sue; Schneider, André

2018-01-01

In almost all eukaryotes, mitochondria maintain their own genome. Despite the discovery more than 50 y ago, still very little is known about how the genome is correctly segregated during cell division. The protozoan parasite Trypanosoma brucei contains a single mitochondrion with a singular genome, the kinetoplast DNA (kDNA). Electron microscopy studies revealed the tripartite attachment complex (TAC) to physically connect the kDNA to the basal body of the flagellum and to ensure correct segregation of the mitochondrial genome via the basal bodies movement, during the cell cycle. Using superresolution microscopy, we precisely localize each of the currently known TAC components. We demonstrate that the TAC is assembled in a hierarchical order from the base of the flagellum toward the mitochondrial genome and that the assembly is not dependent on the kDNA itself. Based on the biochemical analysis, the TAC consists of several nonoverlapping subcomplexes, suggesting an overall size of the TAC exceeding 2.8 mDa. We furthermore demonstrate that the TAC is required for correct mitochondrial organelle positioning but not for organelle biogenesis or segregation. PMID:29434039
SYBR Green-based Real-Time PCR targeting kinetoplast DNA can be used to discriminate between the main etiologic agents of Brazilian cutaneous and visceral leishmaniases

PubMed Central

2012-01-01

Background Leishmaniases control has been hampered by the unavailability of rapid detection methods and the lack of suitable therapeutic and prophylactic measures. Accurate diagnosis, which can distinguish between Leishmania isolates, is essential for conducting appropriate prognosis, therapy and epidemiology. Molecular methods are currently being employed to detect Leishmania infection and categorize the parasites up to genus, complex or species level. Real-time PCR offers several advantages over traditional PCR, including faster processing time, higher sensitivity and decreased contamination risk. Results A SYBR Green real-time PCR targeting the conserved region of kinetoplast DNA minicircles was able to differentiate between Leishmania subgenera. A panel of reference strains representing subgenera Leishmania and Viannia was evaluated by the derivative dissociation curve analyses of the amplified fragment. Distinct values for the average melting temperature were observed, being 78.95°C ± 0.01 and 77.36°C ± 0.02 for Leishmania and Viannia, respectively (p < 0.05). Using the Neighbor-Joining method and Kimura 2-parameters, the alignment of 12 sequences from the amplified conserved minicircles segment grouped together L. (V.) braziliensis and L. (V.) shawii with a bootstrap value of 100%; while for L. (L.) infantum and L. (L.) amazonensis, two groups were formed with bootstrap values of 100% and 62%, respectively. The lower dissociation temperature observed for the subgenus Viannia amplicons could be due to a lower proportion of guanine/cytosine sites (43.6%) when compared to species from subgenus Leishmania (average of 48.4%). The method was validated with 30 clinical specimens from visceral or cutaneous leishmaniases patients living in Brazil and also with DNA samples from naturally infected Lutzomyia spp. captured in two Brazilian localities. Conclusions For all tested samples, a characteristic amplicon melting profile was evidenced for each Leishmania
Testing the Use of Implicit Solvent in the Molecular Dynamics Modelling of DNA Flexibility

NASA Astrophysics Data System (ADS)

Mitchell, J.; Harris, S.

DNA flexibility controls packaging, looping and in some cases sequence specific protein binding. Molecular dynamics simulations carried out with a computationally efficient implicit solvent model are potentially a powerful tool for studying larger DNA molecules than can be currently simulated when water and counterions are represented explicitly. In this work we compare DNA flexibility at the base pair step level modelled using an implicit solvent model to that previously determined from explicit solvent simulations and database analysis. Although much of the sequence dependent behaviour is preserved in implicit solvent, the DNA is considerably more flexible when the approximate model is used. In addition we test the ability of the implicit solvent to model stress induced DNA disruptions by simulating a series of DNA minicircle topoisomers which vary in size and superhelical density. When compared with previously run explicit solvent simulations, we find that while the levels of DNA denaturation are similar using both computational methodologies, the specific structural form of the disruptions is different.
Systemic Upregulation of IL-10 (Interleukin-10) Using a Nonimmunogenic Vector Reduces Growth and Rate of Dissecting Abdominal Aortic Aneurysm.

PubMed

Adam, Matti; Kooreman, Nigel; Jagger, Ann; Wagenhaeuser, Markus U; Mehrkens, Dennis; Wang, Yongming; Kayama, Yosuke; Toyama, Kensuke; Raaz, Uwe; Schellinger, Isabel N; Maegdefessel, Lars; Spin, Joshua M; Hamming, Jaap F; Quax, Paul H A; Baldus, Stephan; Wu, Joseph C; Tsao, Philip S

2018-06-07

Recruitment of immunologic competent cells to the vessel wall is a crucial step in formation of abdominal aortic aneurysms (AAA). Innate immunity effectors (eg, macrophages), as well as mediators of adaptive immunity (eg, T cells), orchestrate a local vascular inflammatory response. IL-10 (interleukin-10) is an immune-regulatory cytokine with a crucial role in suppression of inflammatory processes. We hypothesized that an increase in systemic IL-10-levels would mitigate AAA progression. Using a single intravenous injection protocol, we transfected an IL-10 transcribing nonimmunogenic minicircle vector into the Ang II (angiotensin II)-ApoE -/- infusion mouse model of AAA. IL-10 minicircle transfection significantly reduced average aortic diameter measured via ultrasound at day 28 from 166.1±10.8% (control) to 131.0±5.8% (IL-10 transfected). Rates of dissecting AAA were reduced by IL-10 treatment, with an increase in freedom from dissecting AAA from 21.5% to 62.3%. Using flow cytometry of aortic tissue from minicircle IL-10-treated animals, we found a significantly higher percentage of CD4 + /CD25 + /Foxp3 (forkhead box P3) + regulatory T cells, with fewer CD8 + /Granzyme B + cytotoxic T cells. Furthermore, isolated aortic macrophages produced less TNF-α (tumor necrosis factor-α), more IL-10, and were more likely to be MRC1 (mannose receptor, C type 1)-positive alternatively activated macrophages. These results concurred with gene expression analysis of LPS-stimulated and Ang II-primed human peripheral blood mononuclear cells. Taken together, we provide an effective gene therapy approach to AAA in mice by enhancing antiinflammatory and dampening proinflammatory pathways through minicircle-induced augmentation of systemic IL-10 expression. © 2018 American Heart Association, Inc.
Angiotensin converting enzyme 2 amplification limited to the circulation does not protect mice from development of diabetic nephropathy

PubMed Central

Wysocki, Jan; Ye, Minghao; Khattab, Ahmed M.; Fogo, Agnes; Martin, Aline; David, Nicolae Valentin; Kanwar, Yashpal; Osborn, Mark; Batlle, Daniel

2016-01-01

Blockers of the renin-angiotensin system are effective in the treatment of experimental and clinical diabetic nephropathy. An approach different from blocking the formation or action of angiotensin II(1-8) that could also be effective involves fostering its degradation. Angiotensin converting enzyme 2 (ACE2) is a monocarboxypeptidase than cleaves angiotensin II (1-8) to form angiotensin (1-7). Therefore, we examined the renal effects of murine recombinant ACE2 in mice with streptozotocin-induced diabetic nephropathy as well as that of amplification of circulating ACE2 using minicircle DNA delivery prior to induction of experimental diabetes. This delivery resulted in a long-term sustained and profound increase in serum ACE2 activity and enhanced ability to metabolize an acute angiotensin II (1-8) load. In mice with streptozotocin-induced diabetes pretreated with minicircle ACE2, ACE2 protein in plasma increased markedly and this was associated with a more than 100-fold increase in serum ACE2 activity. However, minicircle ACE2 did not result in changes in urinary ACE2 activity as compared to untreated diabetic mice. In both diabetic groups, glomerular filtration rate increased significantly and to the same extent as compared to non-diabetic controls. Albuminuria, glomerular mesangial expansion, glomerular cellularity and glomerular size, were all increased to a similar extent in minicircle ACE2-treated and untreated diabetic mice, as compared to non-diabetic controls. Recombinant mouse ACE2 given for 4 weeks by intraperitoneal daily injections in mice with streptozotocin-induced diabetic nephropathy also failed to improve albuminuria or kidney pathology. Thus, a profound augmentation of ACE2 confined to the circulation failed to ameliorate the glomerular lesions and hyperfiltration characteristic of early diabetic nephropathy. These findings emphasize the importance of targeting the kidney rather than the circulatory renin angiotensin system to combat diabetic
Leishmania (V.) braziliensis infecting bats from Pantanal wetland, Brazil: First records for Platyrrhinus lineatus and Artibeus planirostris.

PubMed

de Castro Ferreira, Eduardo; Pereira, Agnes Antônio Sampaio; Silveira, Maurício; Margonari, Carina; Marcon, Glaucia Elisete Barbosa; de Oliveira França, Adriana; Castro, Ludiele Souza; Bordignon, Marcelo Oscar; Fischer, Erich; Tomas, Walfrido Moraes; Dorval, Maria Elizabeth Cavalheiros; Gontijo, Célia Maria Ferreira

2017-08-01

In the New World genus Leishmania parasites are etiological agents of neglected zoonoses known as leishmaniasis. Its epidemiology is very complex due to the participation of several species of sand fly vectors and mammalian hosts, and man is an accidental host. Control is very difficult because of the different epidemiological patterns of transmission observed. Studies about Leishmania spp. infection in bats are so scarce, which represents a large gap in knowledge about the role of these animals in the transmission cycle of these pathogens, especially when considering that Chiroptera is one of the most abundant and diverse orders among mammals. Leishmaniasis in Mato Grosso do Sul, Brazil are remarkably frequent, probably due to the abundance of its regional mastofauna. The recent record of L. braziliensis in bats from this state indicates the need to clarify the role of these mammals in the transmission cycle. In this study we evaluated the presence of Leishmania parasites in the skin of different species of bats, using PCR directed to Leishmania spp. kDNA for screening followed by PCR/RFLP analysis of the hsp70 gene for the identification of parasite species. Leishmania species identification was confirmed by PCR directed to the G6PD gene of L. braziliensis, followed by sequencing of the PCR product. Samples from 47 bats were processed, of which in three specimens (6.38%) was detected the presence of Leishmania sp. kDNA. PCR/RFLP and sequencing identified the species involved in the infection as L. braziliensis in all of them. This is the first report of Leishmania braziliensis in bats from Pantanal ecosystem and the first record of this species in Platyrrhinus lineatus and Artibeus planirostris, bats with a wide distribution in South America. These results reinforce the need to deepen the knowledge about the possibility of bats act as reservoirs of Leishmania spp. especially considering their ability of dispersion and occupation of anthropic environments
The sequence of sequencers: The history of sequencing DNA

PubMed Central

Heather, James M.; Chain, Benjamin

2016-01-01

Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. PMID:26554401
The sequence of sequencers: The history of sequencing DNA.

PubMed

Heather, James M; Chain, Benjamin

2016-01-01

Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Grewal, Jaspreet S.; McLuskey, Karen; Das, Debanu

The structure of a C11 peptidase PmC11 from the gut bacterium, Parabacteroides merdae, has recently been determined, enabling the identification and characterization of a C11 orthologue, PNT1, in the parasitic protozoon Trypanosoma brucei. A phylogenetic analysis identified PmC11 orthologues in bacteria, archaea, Chromerids, Coccidia, and Kinetoplastida, the latter being the most divergent. A primary sequence alignment of PNT1 with clostripain and PmC11 revealed the position of the characteristic His-Cys catalytic dyad (His 99 and Cys 136), and an Asp (Asp 134) in the potential S 1 binding site. Immunofluorescence and cryoelectron microscopy revealed that PNT1 localizes to the kinetoplast, anmore » organelle containing the mitochondrial genome of the parasite (kDNA), with an accumulation of the protein at or near the antipodal sites. Depletion of PNT1 by RNAi in the T. brucei bloodstream form was lethal both in in vitro culture and in vivo in mice and the induced population accumulated cells lacking a kinetoplast. In contrast, overexpression of PNT1 led to cells having mislocated kinetoplasts. RNAi depletion of PNT1 in a kDNA independent cell line resulted in kinetoplast loss but was viable, indicating that PNT1 is required exclusively for kinetoplast maintenance. Expression of a recoded wild-type PNT1 allele, but not of an active site mutant restored parasite viability after induction in vitro and in vivo confirming that the peptidase activity of PNT1 is essential for parasite survival. Furthermore, these data provide evidence that PNT1 is a cysteine peptidase that is required exclusively for maintenance of the trypanosome kinetoplast.« less
PNT1 is a C11 cysteine peptidase essential for replication of the Trypanosome Kinetoplast

DOE PAGES

Grewal, Jaspreet S.; McLuskey, Karen; Das, Debanu; ...

2016-03-03

The structure of a C11 peptidase PmC11 from the gut bacterium, Parabacteroides merdae, has recently been determined, enabling the identification and characterization of a C11 orthologue, PNT1, in the parasitic protozoon Trypanosoma brucei. A phylogenetic analysis identified PmC11 orthologues in bacteria, archaea, Chromerids, Coccidia, and Kinetoplastida, the latter being the most divergent. A primary sequence alignment of PNT1 with clostripain and PmC11 revealed the position of the characteristic His-Cys catalytic dyad (His 99 and Cys 136), and an Asp (Asp 134) in the potential S 1 binding site. Immunofluorescence and cryoelectron microscopy revealed that PNT1 localizes to the kinetoplast, anmore » organelle containing the mitochondrial genome of the parasite (kDNA), with an accumulation of the protein at or near the antipodal sites. Depletion of PNT1 by RNAi in the T. brucei bloodstream form was lethal both in in vitro culture and in vivo in mice and the induced population accumulated cells lacking a kinetoplast. In contrast, overexpression of PNT1 led to cells having mislocated kinetoplasts. RNAi depletion of PNT1 in a kDNA independent cell line resulted in kinetoplast loss but was viable, indicating that PNT1 is required exclusively for kinetoplast maintenance. Expression of a recoded wild-type PNT1 allele, but not of an active site mutant restored parasite viability after induction in vitro and in vivo confirming that the peptidase activity of PNT1 is essential for parasite survival. Furthermore, these data provide evidence that PNT1 is a cysteine peptidase that is required exclusively for maintenance of the trypanosome kinetoplast.« less
Analytical Validation of Quantitative Real-Time PCR Methods for Quantification of Trypanosoma cruzi DNA in Blood Samples from Chagas Disease Patients

PubMed Central

Ramírez, Juan Carlos; Cura, Carolina Inés; Moreira, Otacilio da Cruz; Lages-Silva, Eliane; Juiz, Natalia; Velázquez, Elsa; Ramírez, Juan David; Alberti, Anahí; Pavia, Paula; Flores-Chávez, María Delmans; Muñoz-Calderón, Arturo; Pérez-Morales, Deyanira; Santalla, José; Guedes, Paulo Marcos da Matta; Peneau, Julie; Marcet, Paula; Padilla, Carlos; Cruz-Robles, David; Valencia, Edward; Crisante, Gladys Elena; Greif, Gonzalo; Zulantay, Inés; Costales, Jaime Alfredo; Alvarez-Martínez, Miriam; Martínez, Norma Edith; Villarroel, Rodrigo; Villarroel, Sandro; Sánchez, Zunilda; Bisio, Margarita; Parrado, Rudy; Galvão, Lúcia Maria da Cunha; da Câmara, Antonia Cláudia Jácome; Espinoza, Bertha; de Noya, Belkisyole Alarcón; Puerta, Concepción; Riarte, Adelina; Diosque, Patricio; Sosa-Estani, Sergio; Guhl, Felipe; Ribeiro, Isabela; Aznar, Christine; Britto, Constança; Yadón, Zaida Estela; Schijman, Alejandro G.

2015-01-01

An international study was performed by 26 experienced PCR laboratories from 14 countries to assess the performance of duplex quantitative real-time PCR (qPCR) strategies on the basis of TaqMan probes for detection and quantification of parasitic loads in peripheral blood samples from Chagas disease patients. Two methods were studied: Satellite DNA (SatDNA) qPCR and kinetoplastid DNA (kDNA) qPCR. Both methods included an internal amplification control. Reportable range, analytical sensitivity, limits of detection and quantification, and precision were estimated according to international guidelines. In addition, inclusivity and exclusivity were estimated with DNA from stocks representing the different Trypanosoma cruzi discrete typing units and Trypanosoma rangeli and Leishmania spp. Both methods were challenged against 156 blood samples provided by the participant laboratories, including samples from acute and chronic patients with varied clinical findings, infected by oral route or vectorial transmission. kDNA qPCR showed better analytical sensitivity than SatDNA qPCR with limits of detection of 0.23 and 0.70 parasite equivalents/mL, respectively. Analyses of clinical samples revealed a high concordance in terms of sensitivity and parasitic loads determined by both SatDNA and kDNA qPCRs. This effort is a major step toward international validation of qPCR methods for the quantification of T. cruzi DNA in human blood samples, aiming to provide an accurate surrogate biomarker for diagnosis and treatment monitoring for patients with Chagas disease. PMID:26320872
RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

PubMed Central

Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

2000-01-01

The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can
Angiotensin-converting enzyme 2 amplification limited to the circulation does not protect mice from development of diabetic nephropathy.

PubMed

Wysocki, Jan; Ye, Minghao; Khattab, Ahmed M; Fogo, Agnes; Martin, Aline; David, Nicolae Valentin; Kanwar, Yashpal; Osborn, Mark; Batlle, Daniel

2017-06-01

Blockers of the renin-angiotensin system are effective in the treatment of experimental and clinical diabetic nephropathy. An approach different from blocking the formation or action of angiotensin II (1-8) that could also be effective involves fostering its degradation. Angiotensin-converting enzyme 2 (ACE2) is a monocarboxypeptidase that cleaves angiotensin II (1-8) to form angiotensin (1-7). Therefore, we examined the renal effects of murine recombinant ACE2 in mice with streptozotocin-induced diabetic nephropathy as well as that of amplification of circulating ACE2 using minicircle DNA delivery prior to induction of experimental diabetes. This delivery resulted in a long-term sustained and profound increase in serum ACE2 activity and enhanced ability to metabolize an acute angiotensin II (1-8) load. In mice with streptozotocin-induced diabetes pretreated with minicircle ACE2, ACE2 protein in plasma increased markedly and this was associated with a more than 100-fold increase in serum ACE2 activity. However, minicircle ACE2 did not result in changes in urinary ACE2 activity as compared to untreated diabetic mice. In both diabetic groups, glomerular filtration rate increased significantly and to the same extent as compared to non-diabetic controls. Albuminuria, glomerular mesangial expansion, glomerular cellularity, and glomerular size were all increased to a similar extent in minicircle ACE2-treated and untreated diabetic mice, as compared to non-diabetic controls. Recombinant mouse ACE2 given for 4 weeks by intraperitoneal daily injections in mice with streptozotocin-induced diabetic nephropathy also failed to improve albuminuria or kidney pathology. Thus, a profound augmentation of ACE2 confined to the circulation failed to ameliorate the glomerular lesions and hyperfiltration characteristic of early diabetic nephropathy. These findings emphasize the importance of targeting the kidney rather than the circulatory renin angiotensin system to combat diabetic
Species-specific markers for the differential diagnosis of Trypanosoma cruzi and Trypanosoma rangeli and polymorphisms detection in Trypanosoma rangeli.

PubMed

Ferreira, Keila Adriana Magalhães; Fajardo, Emanuella Francisco; Baptista, Rodrigo P; Macedo, Andrea Mara; Lages-Silva, Eliane; Ramírez, Luis Eduardo; Pedrosa, André Luiz

2014-06-01

Trypanosoma cruzi and Trypanosoma rangeli are kinetoplastid parasites which are able to infect humans in Central and South America. Misdiagnosis between these trypanosomes can be avoided by targeting barcoding sequences or genes of each organism. This work aims to analyze the feasibility of using species-specific markers for identification of intraspecific polymorphisms and as target for diagnostic methods by PCR. Accordingly, primers which are able to specifically detect T. cruzi or T. rangeli genomic DNA were characterized. The use of intergenic regions, generally divergent in the trypanosomatids, and the serine carboxypeptidase gene were successful. Using T. rangeli genomic sequences for the identification of group-specific polymorphisms and a polymorphic AT(n) dinucleotide repeat permitted the classification of the strains into two groups, which are entirely coincident with T. rangeli main lineages, KP1 (+) and KP1 (-), previously determined by kinetoplast DNA (kDNA) characterization. The sequences analyzed totalize 622 bp (382 bp represent a hypothetical protein sequence, and 240 bp represent an anonymous sequence), and of these, 581 (93.3%) are conserved sites and 41 bp (6.7%) are polymorphic, with 9 transitions (21.9%), 2 transversions (4.9%), and 30 (73.2%) insertion/deletion events. Taken together, the species-specific markers analyzed may be useful for the development of new strategies for the accurate diagnosis of infections. Furthermore, the identification of T. rangeli polymorphisms has a direct impact in the understanding of the population structure of this parasite.
Fundamental Bounds for Sequence Reconstruction from Nanopore Sequencers.

PubMed

Magner, Abram; Duda, Jarosław; Szpankowski, Wojciech; Grama, Ananth

2016-06-01

Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to accurately reconstruct the true sequence with high probability? Our results provide a number of important insights: (i) the probability of accurate reconstruction of a sequence from a single sample in the presence of indel errors tends quickly (i.e., exponentially) to zero as the length of the sequence increases; and (ii) replicated extrusion is an effective technique for accurate reconstruction. We show that for typical distributions of indel errors, the required number of replicas is a slow function (polylogarithmic) of sequence length - implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Moreover, we show that in certain cases, the required number of replicas can be related to information-theoretic parameters of the indel error distributions.
First description of Leishmania (Viannia) infection in Evandromyia saulensis, Pressatia sp. and Trichophoromyia auraensis (Psychodidae: Phlebotominae) in a transmission area of cutaneous leishmaniasis in Acre state, Amazon Basin, Brazil.

PubMed

Araujo-Pereira, Thais de; Pita-Pereira, Daniela de; Boité, Mariana Côrtes; Melo, Myllena; Costa-Rego, Taiana Amancio da; Fuzari, Andressa Alencastre; Brazil, Reginaldo Peçanha; Britto, Constança

2017-01-01

Studies on the sandfly fauna to evaluate natural infection indexes are still limited in the Brazilian Amazon, a region with an increasing incidence of cutaneous leishmaniasis. Here, by using a multiplex polymerase chain reaction directed to Leishmania kDNA and hybridisation, we were able to identify L. (Viannia) subgenus in 12 out of 173 sandflies captured in the municipality of Rio Branco, Acre state, revealing a positivity of 6.94%. By sequencing the Leishmania 234 bp-hsp70 amplified products from positive samples, infection by L. (V.) braziliensis was confirmed in five sandflies: one Evandromyia saulensis, three Trichophoromyia auraensis and one Pressatia sp. The finding of L. (Viannia) DNA in two Ev. saulensis corresponds to the first record of possible infection associated with this sandfly. Moreover, our study reveals for the first time in Brazil, Th. auraensis and Pressatia sp. infected by L. (Viannia) parasites.

First description of Leishmania (Viannia) infection in Evandromyia saulensis, Pressatia sp. and Trichophoromyia auraensis (Psychodidae: Phlebotominae) in a transmission area of cutaneous leishmaniasis in Acre state, Amazon Basin, Brazil

PubMed Central

de Araujo-Pereira, Thais; de Pita-Pereira, Daniela; Boité, Mariana Côrtes; Melo, Myllena; da Costa-Rego, Taiana Amancio; Fuzari, Andressa Alencastre; Brazil, Reginaldo Peçanha; Britto, Constança

2017-01-01

Studies on the sandfly fauna to evaluate natural infection indexes are still limited in the Brazilian Amazon, a region with an increasing incidence of cutaneous leishmaniasis. Here, by using a multiplex polymerase chain reaction directed to Leishmania kDNA and hybridisation, we were able to identify L. (Viannia) subgenus in 12 out of 173 sandflies captured in the municipality of Rio Branco, Acre state, revealing a positivity of 6.94%. By sequencing the Leishmania 234 bp-hsp70 amplified products from positive samples, infection by L. (V.) braziliensis was confirmed in five sandflies: one Evandromyia saulensis, three Trichophoromyia auraensis and one Pressatia sp. The finding of L. (Viannia) DNA in two Ev. saulensis corresponds to the first record of possible infection associated with this sandfly. Moreover, our study reveals for the first time in Brazil, Th. auraensis and Pressatia sp. infected by L. (Viannia) parasites. PMID:28076470
Universal sequence map (USM) of arbitrary discrete sequences

PubMed Central

2002-01-01

Background For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. Results We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM), is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR). The latter enables the representation of 4 unit type sequences (like DNA) as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. Conclusions USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules. PMID:11895567
Is sequence awareness mandatory for perceptual sequence learning: An assessment using a pure perceptual sequence learning design.

PubMed

Deroost, Natacha; Coomans, Daphné

2018-02-01

We examined the role of sequence awareness in a pure perceptual sequence learning design. Participants had to react to the target's colour that changed according to a perceptual sequence. By varying the mapping of the target's colour onto the response keys, motor responses changed randomly. The effect of sequence awareness on perceptual sequence learning was determined by manipulating the learning instructions (explicit versus implicit) and assessing the amount of sequence awareness after the experiment. In the explicit instruction condition (n = 15), participants were instructed to intentionally search for the colour sequence, whereas in the implicit instruction condition (n = 15), they were left uninformed about the sequenced nature of the task. Sequence awareness after the sequence learning task was tested by means of a questionnaire and the process-dissociation-procedure. The results showed that the instruction manipulation had no effect on the amount of perceptual sequence learning. Based on their report to have actively applied their sequence knowledge during the experiment, participants were subsequently regrouped in a sequence strategy group (n = 14, of which 4 participants from the implicit instruction condition and 10 participants from the explicit instruction condition) and a no-sequence strategy group (n = 16, of which 11 participants from the implicit instruction condition and 5 participants from the explicit instruction condition). Only participants of the sequence strategy group showed reliable perceptual sequence learning and sequence awareness. These results indicate that perceptual sequence learning depends upon the continuous employment of strategic cognitive control processes on sequence knowledge. Sequence awareness is suggested to be a necessary but not sufficient condition for perceptual learning to take place. Copyright © 2018 Elsevier B.V. All rights reserved.
Contamination of sequence databases with adaptor sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoshikawa, Takeo; Sanders, A.R.; Detera-Wadleigh, S.D.

Because of the exponential increase in the amount of DNA sequences being added to the public databases on a daily basis, it has become imperative to identify sources of contamination rapidly. Previously, contaminations of sequence databases have been reported to alert the scientific community to the problem. These contaminations can be divided into two categories. The first category comprises host sequences that have been difficult for submitters to manage or control. Examples include anomalous sequences derived from Escherichia coli, which are inserted into the chromosomes (and plasmids) of the bacterial hosts. Insertion sequences are highly mobile and are capable ofmore » transposing themselves into plasmids during cloning manipulation. Another example of the first category is the infection with yeast genomic DNA or with bacterial DNA of some commercially available cDNA libraries from Clontech. The second category of database contamination is due to the inadvertent inclusion of nonhost sequences. This category includes incorporation of cloning-vector sequences and multicloning sites in the database submission. M13-derived artifacts have been common, since M13-based vectors have been widely used for subcloning DNA fragments. Recognizing this problem, the National Center for Biotechnology Information (NCBI) started to screen, in April 1994, all sequences directly submitted to GenBank, against a set of vector data retrieved from GenBank by use of key-word searches, such as {open_quotes}vector.{close_quotes} In this report, we present evidence for another sequence artifact that is widespread but that, to our knowledge, has not yet been reported. 11 refs., 1 tab.« less
Viability and Burden of Leishmania in Extralesional Sites during Human Dermal Leishmaniasis

PubMed Central

Romero, Ibeth; Téllez, Jair; Suárez, Yazmín; Cardona, Maria; Figueroa, Roger; Zelazny, Adrian; Gore Saravia, Nancy

2010-01-01

Background The clinical and epidemiological significance of Leishmania DNA in extralesional sites is obscured by uncertainty of whether the DNA derives from viable parasites. To examine dissemination of Leishmania during active disease and the potential participation of human infection in transmission, Leishmania 7SLRNA was exploited to establish viability and estimate parasite burden in extralesional sites of dermal leishmaniasis patients. Methods The feasibility of discriminating parasite viability by PCR of Leishmania 7SLRNA was evaluated in relation with luciferase activity of luc transfected intracellular amastigotes in dose-response assays of Glucantime cytotoxicity. Monocytes, tonsil swabs, aspirates of normal skin and lesions of 28 cutaneous and 2 mucocutaneous leishmaniasis patients were screened by kDNA amplification/Southern blot. Positive samples were analyzed by quantitative PCR of Leishmania 7SLRNA genes and transcripts. Results 7SLRNA amplification coincided with luciferase activity, confirming discrimination of parasite viability. Of 22 patients presenting kDNA in extralesional samples, Leishmania 7SLRNA genes or transcripts were detected in one or more kDNA positive samples in 100% and 73% of patients, respectively. Gene and transcript copy number amplified from extralesional tissues were comparable to lesions. 7SLRNA transcripts were detected in 13/19 (68%) monocyte samples, 5/12 (42%) tonsil swabs, 4/11 (36%) normal skin aspirates, and 22/25 (88%) lesions; genes were quantifiable in 15/19 (79%) monocyte samples, 12/13 (92%) tonsil swabs, 8/11 (73%) normal skin aspirates. Conclusion Viable parasites are present in extralesional sites, including blood monocytes, tonsils and normal skin of dermal leishmaniasis patients. Leishmania 7SLRNA is an informative target for clinical and epidemiologic investigations of human leishmaniasis. PMID:20856851
Functional and structural analysis of AT-specific minor groove binders that disrupt DNA–protein interactions and cause disintegration of the Trypanosoma brucei kinetoplast

PubMed Central

Millan, Cinthia R.; Acosta-Reyes, Francisco J.; Lagartera, Laura; Ebiloma, Godwin U.; Lemgruber, Leandro; Nué Martínez, J. Jonathan; Saperas, Núria

2017-01-01

Abstract Trypanosoma brucei, the causative agent of sleeping sickness (Human African Trypanosomiasis, HAT), contains a kinetoplast with the mitochondrial DNA (kDNA), comprising of >70% AT base pairs. This has prompted studies of drugs interacting with AT-rich DNA, such as the N-phenylbenzamide bis(2-aminoimidazoline) derivatives 1 [4-((4,5-dihydro-1H-imidazol-2-yl)amino)-N-(4-((4,5-dihydro-1H-imidazol-2-yl)amino)phenyl)benzamide dihydrochloride] and 2 [N-(3-chloro-4-((4,5-dihydro-1H-imidazol-2-yl)amino)phenyl)-4-((4,5-dihydro-1H-imidazol-2-yl)amino)benzamide] as potential drugs for HAT. Both compounds show in vitro effects against T. brucei and in vivo curative activity in a mouse model of HAT. The main objective was to identify their cellular target inside the parasite. We were able to demonstrate that the compounds have a clear effect on the S-phase of T. brucei cell cycle by inflicting specific damage on the kinetoplast. Surface plasmon resonance (SPR)–biosensor experiments show that the drug can displace HMG box-containing proteins essential for kDNA function from their kDNA binding sites. The crystal structure of the complex of the oligonucleotide d[AAATTT]2 with compound 1 solved at 1.25 Å (PDB-ID: 5LIT) shows that the drug covers the minor groove of DNA, displaces bound water and interacts with neighbouring DNA molecules as a cross-linking agent. We conclude that 1 and 2 are powerful trypanocides that act directly on the kinetoplast, a structure unique to the order Kinetoplastida. PMID:28637278
Method to amplify variable sequences without imposing primer sequences

DOEpatents

Bradbury, Andrew M.; Zeytun, Ahmet

2006-11-14

The present invention provides methods of amplifying target sequences without including regions flanking the target sequence in the amplified product or imposing amplification primer sequences on the amplified product. Also provided are methods of preparing a library from such amplified target sequences.
Quantum-Sequencing: Fast electronic single DNA molecule sequencing

NASA Astrophysics Data System (ADS)

Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

2014-03-01

A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.
Multimodal sequence learning.

PubMed

Kemény, Ferenc; Meier, Beat

2016-02-01

While sequence learning research models complex phenomena, previous studies have mostly focused on unimodal sequences. The goal of the current experiment is to put implicit sequence learning into a multimodal context: to test whether it can operate across different modalities. We used the Task Sequence Learning paradigm to test whether sequence learning varies across modalities, and whether participants are able to learn multimodal sequences. Our results show that implicit sequence learning is very similar regardless of the source modality. However, the presence of correlated task and response sequences was required for learning to take place. The experiment provides new evidence for implicit sequence learning of abstract conceptual representations. In general, the results suggest that correlated sequences are necessary for implicit sequence learning to occur. Moreover, they show that elements from different modalities can be automatically integrated into one unitary multimodal sequence. Copyright © 2015 Elsevier B.V. All rights reserved.
Sequence verification of synthetic DNA by assembly of sequencing reads

PubMed Central

Wilson, Mandy L.; Cai, Yizhi; Hanlon, Regina; Taylor, Samantha; Chevreux, Bastien; Setubal, João C.; Tyler, Brett M.; Peccoud, Jean

2013-01-01

Gene synthesis attempts to assemble user-defined DNA sequences with base-level precision. Verifying the sequences of construction intermediates and the final product of a gene synthesis project is a critical part of the workflow, yet one that has received the least attention. Sequence validation is equally important for other kinds of curated clone collections. Ensuring that the physical sequence of a clone matches its published sequence is a common quality control step performed at least once over the course of a research project. GenoREAD is a web-based application that breaks the sequence verification process into two steps: the assembly of sequencing reads and the alignment of the resulting contig with a reference sequence. GenoREAD can determine if a clone matches its reference sequence. Its sophisticated reporting features help identify and troubleshoot problems that arise during the sequence verification process. GenoREAD has been experimentally validated on thousands of gene-sized constructs from an ORFeome project, and on longer sequences including whole plasmids and synthetic chromosomes. Comparing GenoREAD results with those from manual analysis of the sequencing data demonstrates that GenoREAD tends to be conservative in its diagnostic. GenoREAD is available at www.genoread.org. PMID:23042248
Multilocus sequence typing of total-genome-sequenced bacteria.

PubMed

Larsen, Mette V; Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Pontén, Thomas; Ussery, David W; Aarestrup, Frank M; Lund, Ole

2012-04-01

Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST.
Individual sequences in large sets of gene sequences may be distinguished efficiently by combinations of shared sub-sequences

PubMed Central

Gibbs, Mark J; Armstrong, John S; Gibbs, Adrian J

2005-01-01

Background Most current DNA diagnostic tests for identifying organisms use specific oligonucleotide probes that are complementary in sequence to, and hence only hybridise with the DNA of one target species. By contrast, in traditional taxonomy, specimens are usually identified by 'dichotomous keys' that use combinations of characters shared by different members of the target set. Using one specific character for each target is the least efficient strategy for identification. Using combinations of shared bisectionally-distributed characters is much more efficient, and this strategy is most efficient when they separate the targets in a progressively binary way. Results We have developed a practical method for finding minimal sets of sub-sequences that identify individual sequences, and could be targeted by combinations of probes, so that the efficient strategy of traditional taxonomic identification could be used in DNA diagnosis. The sizes of minimal sub-sequence sets depended mostly on sequence diversity and sub-sequence length and interactions between these parameters. We found that 201 distinct cytochrome oxidase subunit-1 (CO1) genes from moths (Lepidoptera) were distinguished using only 15 sub-sequences 20 nucleotides long, whereas only 8–10 sub-sequences 6–10 nucleotides long were required to distinguish the CO1 genes of 92 species from the 9 largest orders of insects. Conclusion The presence/absence of sub-sequences in a set of gene sequences can be used like the questions in a traditional dichotomous taxonomic key; hybridisation probes complementary to such sub-sequences should provide a very efficient means for identifying individual species, subtypes or genotypes. Sequence diversity and sub-sequence length are the major factors that determine the numbers of distinguishing sub-sequences in any set of sequences. PMID:15817134
Science sequence design

NASA Technical Reports Server (NTRS)

Koskela, P. E.; Bollman, W. E.; Freeman, J. E.; Helton, M. R.; Reichert, R. J.; Travers, E. S.; Zawacki, S. J.

1973-01-01

The activities of the following members of the Navigation Team are recorded: the Science Sequence Design Group, responsible for preparing the final science sequence designs; the Advanced Sequence Planning Group, responsible for sequence planning; and the Science Recommendation Team (SRT) representatives, responsible for conducting the necessary sequence design interfaces with the teams during the mission. The interface task included science support in both advance planning and daily operations. Science sequences designed during the mission are also discussed.
Tidying Up International Nucleotide Sequence Databases: Ecological, Geographical and Sequence Quality Annotation of ITS Sequences of Mycorrhizal Fungi

PubMed Central

Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R. Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M.; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

2011-01-01

Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi. PMID:21949797
Pulse sequence programming in a dynamic visual environment: SequenceTree.

PubMed

Magland, Jeremy F; Li, Cheng; Langham, Michael C; Wehrli, Felix W

2016-01-01

To describe SequenceTree, an open source, integrated software environment for implementing MRI pulse sequences and, ideally, exporting them to actual MRI scanners. The software is a user-friendly alternative to vendor-supplied pulse sequence design and editing tools and is suited for programmers and nonprogrammers alike. The integrated user interface was programmed using the Qt4/C++ toolkit. As parameters and code are modified, the pulse sequence diagram is automatically updated within the user interface. Several aspects of pulse programming are handled automatically, allowing users to focus on higher-level aspects of sequence design. Sequences can be simulated using a built-in Bloch equation solver and then exported for use on a Siemens MRI scanner. Ideally, other types of scanners will be supported in the future. SequenceTree has been used for 8 years in our laboratory and elsewhere and has contributed to more than 50 peer-reviewed publications in areas such as cardiovascular imaging, solid state and nonproton NMR, MR elastography, and high-resolution structural imaging. SequenceTree is an innovative, open source, visual pulse sequence environment for MRI combining simplicity with flexibility and is ideal both for advanced users and users with limited programming experience. © 2015 Wiley Periodicals, Inc.
[Complete genome sequencing and sequence analysis of BCG Tice].

PubMed

Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

2012-10-04

The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.
Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

PubMed Central

Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

2010-01-01

Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665
Sequence to Sequence - Video to Text

DTIC Science & Technology

2015-12-11

Saenko, and S. Guadarrama. Generating natural-language video descriptions using text - mined knowledge. In AAAI, July 2013. 2 [20] P. Kuznetsova, V...Sequence to Sequence – Video to Text Subhashini Venugopalan1 Marcus Rohrbach2,4 Jeff Donahue2 Raymond Mooney1 Trevor Darrell2 Kate Saenko3...1. Introduction Describing visual content with natural language text has recently received increased interest, especially describing images with a
Feedback shift register sequences versus uniformly distributed random sequences for correlation chromatography

NASA Technical Reports Server (NTRS)

Kaljurand, M.; Valentin, J. R.; Shao, M.

1996-01-01

Two alternative input sequences are commonly employed in correlation chromatography (CC). They are sequences derived according to the algorithm of the feedback shift register (i.e., pseudo random binary sequences (PRBS)) and sequences derived by using the uniform random binary sequences (URBS). These two sequences are compared. By applying the "cleaning" data processing technique to the correlograms that result from these sequences, we show that when the PRBS is used the S/N of the correlogram is much higher than the one resulting from using URBS.
Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs

PubMed Central

2014-01-01

Background We introduce Sequence Bundles--a novel data visualisation method for representing multiple sequence alignments (MSAs). We identify and address key limitations of the existing bioinformatics data visualisation methods (i.e. the Sequence Logo) by enabling Sequence Bundles to give salient visual expression to sequence motifs and other data features, which would otherwise remain hidden. Methods For the development of Sequence Bundles we employed research-led information design methodologies. Sequences are encoded as uninterrupted, semi-opaque lines plotted on a 2-dimensional reconfigurable grid. Each line represents a single sequence. The thickness and opacity of the stack at each residue in each position indicates the level of conservation and the lines' curved paths expose patterns in correlation and functionality. Several MSAs can be visualised in a composite image. The Sequence Bundles method is designed to favour a tangible, continuous and intuitive display of information. Results We have developed a software demonstration application for generating a Sequence Bundles visualisation of MSAs provided for the BioVis 2013 redesign contest. A subsequent exploration of the visualised line patterns allowed for the discovery of a number of interesting features in the dataset. Reported features include the extreme conservation of sequences displaying a specific residue and bifurcations of the consensus sequence. Conclusions Sequence Bundles is a novel method for visualisation of MSAs and the discovery of sequence motifs. It can aid in generating new insight and hypothesis making. Sequence Bundles is well disposed for future implementation as an interactive visual analytics software, which can complement existing visualisation tools. PMID:25237395

Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

PubMed Central

de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

2000-01-01

Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084
Cloning and characterization of a DNA polymerase beta gene from Trypanosoma cruzi.

PubMed

Venegas, Juan A; Aslund, Lena; Solari, Aldo

2009-06-01

A gene coding for a DNA polymerase beta from the Trypanosoma cruzi Miranda clone, belonging to the TcI lineage, was cloned (Miranda Tcpol beta), using the information from eight peptides of the T. cruzi beta-like DNA polymerase purified previously. The gene encodes for a protein of 403 amino acids which is very similar to the two T. cruzi CL Brener (TcIIe lineage) sequences published, but has three different residues in highly conserved segments. At the amino acid level, the identity of TcI-pol beta with mitochondrial pol beta and pol beta-PAK from other trypanosomatids was between 68-80% and 22-30%, respectively. Miranda Tc-pol beta protein has an N-terminal sequence similar to that described in the mitochondrial Crithidia fasciculata pol beta, which suggests that the TcI-pol beta plays a role in the organelle. Northern and Western analyses showed that this T. cruzi gene is highly expressed both in proliferative and non-proliferative developmental forms. These results suggest that, in addition to replication of kDNA in proliferative cells, this enzyme may have another function in non-proliferative cells, such as DNA repair role similar to that which has extensively been described in a vast spectrum of eukaryotic cells.
Molecular Detection of Leishmania major and L. turanica in Phlebotomus papatasi and First Natural Infection of P. salehi to L. major in North-East of Iran.

PubMed

Rafizadeh, Sayena; Saraei, Mehrzad; Abaei, Mohammad Reza; Oshaghi, Mohammad Ali; Mohebali, Mehdi; Peymani, Amir; Naserpour-Farivar, Taghi; Bakhshi, Hassan; Rassi, Yavar

2016-06-01

Leishmaniasis is an important public health disease in many developing countries as well in Iran. The main objective of this study was to investigate on leishmania infection of wild caught sand flies in an endemic focus of disease in Esfarayen district, north east of Iran. Sand flies were collected by sticky papers and mounted in a drop of Puri's medium for species identification. Polymerase chain reaction techniques of kDNA, ITS1-rDNA, followed by restriction fragment length polymorphism were used for identification of DNA of Leishmania parasites within infected sand flies. Among the collected female sand flies, two species of Phlebotomus papatasi and Phlebotomus salehi were found naturally infected with Leishmania major. Furthermore, mixed infection of Leishmania turanica and L. major was observed in one specimen of P. papatasi. Sequence analysis revealed two parasite ITS1 haplotypes including three L. major with accession numbers: KJ425408, KJ425407, KM056403 and one L. turanica. (KJ425406). The haplotype of L. major was identical (100%) to several L. major sequences deposited in GenBank, including isolates from Iran, (Gen Bank accession nos.AY573187, KC505421, KJ194178) and Uzbekistan (Accession no.FN677357). To our knowledge, this is the first detection of L. major within wild caught P. salehi in northeast of Iran.
Sequence Diversity Diagram for comparative analysis of multiple sequence alignments.

PubMed

Sakai, Ryo; Aerts, Jan

2014-01-01

The sequence logo is a graphical representation of a set of aligned sequences, commonly used to depict conservation of amino acid or nucleotide sequences. Although it effectively communicates the amount of information present at every position, this visual representation falls short when the domain task is to compare between two or more sets of aligned sequences. We present a new visual presentation called a Sequence Diversity Diagram and validate our design choices with a case study. Our software was developed using the open-source program called Processing. It loads multiple sequence alignment FASTA files and a configuration file, which can be modified as needed to change the visualization. The redesigned figure improves on the visual comparison of two or more sets, and it additionally encodes information on sequential position conservation. In our case study of the adenylate kinase lid domain, the Sequence Diversity Diagram reveals unexpected patterns and new insights, for example the identification of subgroups within the protein subfamily. Our future work will integrate this visual encoding into interactive visualization tools to support higher level data exploration tasks.
MRO Sequence Checking Tool

NASA Technical Reports Server (NTRS)

Fisher, Forest; Gladden, Roy; Khanampornpan, Teerapat

2008-01-01

The MRO Sequence Checking Tool program, mro_check, automates significant portions of the MRO (Mars Reconnaissance Orbiter) sequence checking procedure. Though MRO has similar checks to the ODY s (Mars Odyssey) Mega Check tool, the checks needed for MRO are unique to the MRO spacecraft. The MRO sequence checking tool automates the majority of the sequence validation procedure and check lists that are used to validate the sequences generated by MRO MPST (mission planning and sequencing team). The tool performs more than 50 different checks on the sequence. The automation varies from summarizing data about the sequence needed for visual verification of the sequence, to performing automated checks on the sequence and providing a report for each step. To allow for the addition of new checks as needed, this tool is built in a modular fashion.
Megabase sequencing of human genome by ordered-shotgun-sequencing (OSS) strategy

NASA Astrophysics Data System (ADS)

Chen, Ellson Y.

1997-05-01

So far we have used OSS strategy to sequence over 2 megabases DNA in large-insert clones from regions of human X chromosomes with different characteristic levels of GC content. The method starts by randomly fragmenting a BAC, YAC or PAC to 8-12 kb pieces and subcloning those into lambda phage. Insert-ends of these clones are sequenced and overlapped to create a partial map. Complete sequencing is then done on a minimal tiling path of selected subclones, recursively focusing on those at the edges of contigs to facilitate mergers of clones across the entire target. To reduce manual labor, PCR processes have been adapted to prepare sequencing templates throughout the entire operation. The streamlined process can thus lend itself to further automation. The OSS approach is suitable for large- scale genomic sequencing, providing considerable flexibility in the choice of subclones or regions for more or less intensive sequencing. For example, subclones containing contaminating host cell DNA or cloning vector can be recognized and ignored with minimal sequencing effort; regions overlapping a neighboring clone already sequenced need not be redone; and segments containing tandem repeats or long repetitive sequences can be spotted early on and targeted for additional attention.
Software for pre-processing Illumina next-generation sequencing short read sequences

PubMed Central

2014-01-01

Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7. Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference
Coordinate cytokine regulatory sequences

DOEpatents

Frazer, Kelly A.; Rubin, Edward M.; Loots, Gabriela G.

2005-05-10

The present invention provides CNS sequences that regulate the cytokine gene expression, expression cassettes and vectors comprising or lacking the CNS sequences, host cells and non-human transgenic animals comprising the CNS sequences or lacking the CNS sequences. The present invention also provides methods for identifying compounds that modulate the functions of CNS sequences as well as methods for diagnosing defects in the CNS sequences of patients.
Sequence investigation of 34 forensic autosomal STRs with massively parallel sequencing.

PubMed

Zhang, Suhua; Niu, Yong; Bian, Yingnan; Dong, Rixia; Liu, Xiling; Bao, Yun; Jin, Chao; Zheng, Hancheng; Li, Chengtao

2018-05-01

STRs vary not only in the length of the repeat units and the number of repeats but also in the region with which they conform to an incremental repeat pattern. Massively parallel sequencing (MPS) offers new possibilities in the analysis of STRs since they can simultaneously sequence multiple targets in a single reaction and capture potential internal sequence variations. Here, we sequenced 34 STRs applied in the forensic community of China with a custom-designed panel. MPS performance were evaluated from sequencing reads analysis, concordance study and sensitivity testing. High coverage sequencing data were obtained to determine the constitute ratios and heterozygous balance. No actual inconsistent genotypes were observed between capillary electrophoresis (CE) and MPS, demonstrating the reliability of the panel and the MPS technology. With the sequencing data from the 200 investigated individuals, 346 and 418 alleles were obtained via CE and MPS technologies at the 34 STRs, indicating MPS technology provides higher discrimination than CE detection. The whole study demonstrated that STR genotyping with the custom panel and MPS technology has the potential not only to reveal length and sequence variations but also to satisfy the demands of high throughput and high multiplexing with acceptable sensitivity.
Metagenome assembly through clustering of next-generation sequencing data using protein sequences.

PubMed

Sim, Mikang; Kim, Jaebum

2015-02-01

The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.
Pulse Sequence Programming in a Dynamic Visual Environment: SequenceTree

PubMed Central

Magland, Jeremy F.; Li, Cheng; Langham, Michael C.; Wehrli, Felix W.

2015-01-01

Purpose To describe SequenceTree (ST), an open source. integrated software environment for implementing MRI pulse sequences, and ideally exported them to actual MRI scanners. The software is a user-friendly alternative to vendor-supplied pulse sequence design and editing tools and is suited for non-programmers and programmers alike. Methods The integrated user interface was programmed using the Qt4/C++ toolkit. As parameters and code are modified, the pulse sequence diagram is automatically updated within the user interface. Several aspects of pulse programming are handled automatically allowing users to focus on higher-level aspects of sequence design. Sequences can be simulated using a built-in Bloch equation solver and then exported for use on a Siemens MRI scanner. Ideally other types of scanners will be supported in the future. Results The software has been used for eight years in the authors’ laboratory and elsewhere and has been utilized in more than fifty peer-reviewed publications in areas such as cardiovascular imaging, solid state and non-proton NMR, MR elastography, and high resolution structural imaging. Conclusion ST is an innovative, open source, visual pulse sequence environment for MRI combining simplicity with flexibility and is ideal for both advanced users and those with limited programming experience. PMID:25754837
Probabilistic Motor Sequence Yields Greater Offline and Less Online Learning than Fixed Sequence

PubMed Central

Du, Yue; Prashad, Shikha; Schoenbrun, Ilana; Clark, Jane E.

2016-01-01

It is well acknowledged that motor sequences can be learned quickly through online learning. Subsequently, the initial acquisition of a motor sequence is boosted or consolidated by offline learning. However, little is known whether offline learning can drive the fast learning of motor sequences (i.e., initial sequence learning in the first training session). To examine offline learning in the fast learning stage, we asked four groups of young adults to perform the serial reaction time (SRT) task with either a fixed or probabilistic sequence and with or without preliminary knowledge (PK) of the presence of a sequence. The sequence and PK were manipulated to emphasize either procedural (probabilistic sequence; no preliminary knowledge (NPK)) or declarative (fixed sequence; with PK) memory that were found to either facilitate or inhibit offline learning. In the SRT task, there were six learning blocks with a 2 min break between each consecutive block. Throughout the session, stimuli followed the same fixed or probabilistic pattern except in Block 5, in which stimuli appeared in a random order. We found that PK facilitated the learning of a fixed sequence, but not a probabilistic sequence. In addition to overall learning measured by the mean reaction time (RT), we examined the progressive changes in RT within and between blocks (i.e., online and offline learning, respectively). It was found that the two groups who performed the fixed sequence, regardless of PK, showed greater online learning than the other two groups who performed the probabilistic sequence. The groups who performed the probabilistic sequence, regardless of PK, did not display online learning, as indicated by a decline in performance within the learning blocks. However, they did demonstrate remarkably greater offline improvement in RT, which suggests that they are learning the probabilistic sequence offline. These results suggest that in the SRT task, the fast acquisition of a motor sequence is driven
Probabilistic Motor Sequence Yields Greater Offline and Less Online Learning than Fixed Sequence.

PubMed

Du, Yue; Prashad, Shikha; Schoenbrun, Ilana; Clark, Jane E

2016-01-01

It is well acknowledged that motor sequences can be learned quickly through online learning. Subsequently, the initial acquisition of a motor sequence is boosted or consolidated by offline learning. However, little is known whether offline learning can drive the fast learning of motor sequences (i.e., initial sequence learning in the first training session). To examine offline learning in the fast learning stage, we asked four groups of young adults to perform the serial reaction time (SRT) task with either a fixed or probabilistic sequence and with or without preliminary knowledge (PK) of the presence of a sequence. The sequence and PK were manipulated to emphasize either procedural (probabilistic sequence; no preliminary knowledge (NPK)) or declarative (fixed sequence; with PK) memory that were found to either facilitate or inhibit offline learning. In the SRT task, there were six learning blocks with a 2 min break between each consecutive block. Throughout the session, stimuli followed the same fixed or probabilistic pattern except in Block 5, in which stimuli appeared in a random order. We found that PK facilitated the learning of a fixed sequence, but not a probabilistic sequence. In addition to overall learning measured by the mean reaction time (RT), we examined the progressive changes in RT within and between blocks (i.e., online and offline learning, respectively). It was found that the two groups who performed the fixed sequence, regardless of PK, showed greater online learning than the other two groups who performed the probabilistic sequence. The groups who performed the probabilistic sequence, regardless of PK, did not display online learning, as indicated by a decline in performance within the learning blocks. However, they did demonstrate remarkably greater offline improvement in RT, which suggests that they are learning the probabilistic sequence offline. These results suggest that in the SRT task, the fast acquisition of a motor sequence is driven
Rapid and Accurate Sequencing of Enterovirus Genomes Using MinION Nanopore Sequencer.

PubMed

Wang, Ji; Ke, Yue Hua; Zhang, Yong; Huang, Ke Qiang; Wang, Lei; Shen, Xin Xin; Dong, Xiao Ping; Xu, Wen Bo; Ma, Xue Jun

2017-10-01

Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly used to sequence various viral pathogens in many clinical situations because of its long reads, portability, real-time accessibility of sequenced data, and very low initial costs. However, information is lacking on MinION sequencing of enterovirus genomes. In this proof-of-concept study using Enterovirus 71 (EV71) and Coxsackievirus A16 (CA16) strains as examples, we established an amplicon-based whole genome sequencing method using MinION. We explored the accuracy, minimum sequencing time, discrimination and high-throughput sequencing ability of MinION, and compared its performance with Sanger sequencing. Within the first minute (min) of sequencing, the accuracy of MinION was 98.5% for the single EV71 strain and 94.12%-97.33% for 10 genetically-related CA16 strains. In as little as 14 min, 99% identity was reached for the single EV71 strain, and in 17 min (on average), 99% identity was achieved for 10 CA16 strains in a single run. MinION is suitable for whole genome sequencing of enteroviruses with sufficient accuracy and fine discrimination and has the potential as a fast, reliable and convenient method for routine use. Copyright © 2017 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.
[PCR as a tool in confirming the experimental transmission of Leishmania chagasi to hamsters by Lutzomyia longipalpis (Diptera:Psychodidae)].

PubMed

Cabrera, Olga L; Munstermann, Leonard E; Cárdenas, Rocío; Ferro, Cristina

2003-06-01

The use of PCR (polymerase chain reaction) was evaluated for its effectiveness as a tool in the detection of transmission of Leishmania chagasi to a hamster host, Mesocricetus auratus, by insect vector bite. Two pairs of uninfected and anesthetized hamsters were introduced into cages containing infected females of the typical phlebotomine sand fly vector, Lutzomyia longipalpis. The flies were experimentally infected with Leishmania chagasi and the infection was verified by dissection of subsamples. At 37 and 51 days after exposure to the infected flies, biopsies of each hamster's liver and spleen were subjected to direct histopathological and PCR examination. DNA was extracted with Chelex 100; for PCR amplification, primers specific to Leishmania minicircle DNA were used. PCR product was separated on agarose gels and visualized with UV. A band of approximately 120 base pairs was observed in 3 of the 4 biopsies, corresponding to the expected minicircle size. PCR was the only method that detected presence of the parasite. The results demonstrated that the sensitivity of PCR greatly expedites the confirmation process of a particular phlebotomine species as a vector of leishmaniasis.
Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification.

PubMed

Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

2015-05-01

Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. ivan.borozan@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

PubMed Central

Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

2015-01-01

Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913
What are Whole Exome Sequencing and Whole Genome Sequencing?

MedlinePlus

... the future. For more information about DNA sequencing technologies and their use: Genetics Home Reference discusses whether ... University in St. Louis describes the different sequencing technologies and what the new technologies have meant for ...
PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

PubMed Central

2011-01-01

Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/. PMID:21385349
PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities.

PubMed

Troshin, Peter V; Postis, Vincent Lg; Ashworth, Denise; Baldwin, Stephen A; McPherson, Michael J; Barton, Geoffrey J

2011-03-07

Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

Effects of the Ion PGM™ Hi-Q™ sequencing chemistry on sequence data quality.

PubMed

Churchill, Jennifer D; King, Jonathan L; Chakraborty, Ranajit; Budowle, Bruce

2016-09-01

Massively parallel sequencing (MPS) offers substantial improvements over current forensic DNA typing methodologies such as increased resolution, scalability, and throughput. The Ion PGM™ is a promising MPS platform for analysis of forensic biological evidence. The system employs a sequencing-by-synthesis chemistry on a semiconductor chip that measures a pH change due to the release of hydrogen ions as nucleotides are incorporated into the growing DNA strands. However, implementation of MPS into forensic laboratories requires a robust chemistry. Ion Torrent's Hi-Q™ Sequencing Chemistry was evaluated to determine if it could improve on the quality of the generated sequence data in association with selected genetic marker targets. The whole mitochondrial genome and the HID-Ion STR 10-plex panel were sequenced on the Ion PGM™ system with the Ion PGM™ Sequencing 400 Kit and the Ion PGM™ Hi-Q™ Sequencing Kit. Concordance, coverage, strand balance, noise, and deletion ratios were assessed in evaluating the performance of the Ion PGM™ Hi-Q™ Sequencing Kit. The results indicate that reliable, accurate data are generated and that sequencing through homopolymeric regions can be improved with the use of Ion Torrent's Hi-Q™ Sequencing Chemistry. Overall, the quality of the generated sequencing data supports the potential for use of the Ion PGM™ in forensic genetic laboratories.
Biological sequence compression algorithms.

PubMed

Matsumoto, T; Sadakane, K; Imai, H

2000-01-01

Today, more and more DNA sequences are becoming available. The information about DNA sequences are stored in molecular biology databases. The size and importance of these databases will be bigger and bigger in the future, therefore this information must be stored or communicated efficiently. Furthermore, sequence compression can be used to define similarities between biological sequences. The standard compression algorithms such as gzip or compress cannot compress DNA sequences, but only expand them in size. On the other hand, CTW (Context Tree Weighting Method) can compress DNA sequences less than two bits per symbol. These algorithms do not use special structures of biological sequences. Two characteristic structures of DNA sequences are known. One is called palindromes or reverse complements and the other structure is approximate repeats. Several specific algorithms for DNA sequences that use these structures can compress them less than two bits per symbol. In this paper, we improve the CTW so that characteristic structures of DNA sequences are available. Before encoding the next symbol, the algorithm searches an approximate repeat and palindrome using hash and dynamic programming. If there is a palindrome or an approximate repeat with enough length then our algorithm represents it with length and distance. By using this preprocessing, a new program achieves a little higher compression ratio than that of existing DNA-oriented compression algorithms. We also describe new compression algorithm for protein sequences.
Bellerophon: A program to detect chimeric sequences in multiple sequence alignments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip

2003-12-23

Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments.
Long sequence correlation coprocessor

NASA Astrophysics Data System (ADS)

Gage, Douglas W.

1994-09-01

A long sequence correlation coprocessor (LSCC) accelerates the bitwise correlation of arbitrarily long digital sequences by calculating in parallel the correlation score for 16, for example, adjacent bit alignments between two binary sequences. The LSCC integrated circuit is incorporated into a computer system with memory storage buffers and a separate general purpose computer processor which serves as its controller. Each of the LSCC's set of sequential counters simultaneously tallies a separate correlation coefficient. During each LSCC clock cycle, computer enable logic associated with each counter compares one bit of a first sequence with one bit of a second sequence to increment the counter if the bits are the same. A shift register assures that the same bit of the first sequence is simultaneously compared to different bits of the second sequence to simultaneously calculate the correlation coefficient by the different counters to represent different alignments of the two sequences.
Nonparametric Combinatorial Sequence Models

NASA Astrophysics Data System (ADS)

Wauthier, Fabian L.; Jordan, Michael I.; Jojic, Nebojsa

This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This paper presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three sequence datasets which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution induced by the prior. By integrating out the posterior our method compares favorably to leading binding predictors.
Next-Generation Sequencing Platforms

NASA Astrophysics Data System (ADS)

Mardis, Elaine R.

2013-06-01

Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.
Optimization of sequence alignment for simple sequence repeat regions.

PubMed

Jighly, Abdulqader; Hamwieh, Aladdin; Ogbonnaya, Francis C

2011-07-20

Microsatellites, or simple sequence repeats (SSRs), are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs) mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs).SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type.When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic phylogenic relationship.
Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization.

PubMed

Bauer, Markus; Klau, Gunnar W; Reinert, Knut

2007-07-27

The discovery of functional non-coding RNA sequences has led to an increasing interest in algorithms related to RNA analysis. Traditional sequence alignment algorithms, however, fail at computing reliable alignments of low-homology RNA sequences. The spatial conformation of RNA sequences largely determines their function, and therefore RNA alignment algorithms have to take structural information into account. We present a graph-based representation for sequence-structure alignments, which we model as an integer linear program (ILP). We sketch how we compute an optimal or near-optimal solution to the ILP using methods from combinatorial optimization, and present results on a recently published benchmark set for RNA alignments. The implementation of our algorithm yields better alignments in terms of two published scores than the other programs that we tested: This is especially the case with an increasing number of input sequences. Our program LARA is freely available for academic purposes from http://www.planet-lisa.net.
Sequence information signal processor

DOEpatents

Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

1999-01-01

An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.
Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

PubMed

Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

2017-07-01

PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.
Dna Sequencing

DOEpatents

Tabor, Stanley; Richardson, Charles C.

1995-04-25

A method for sequencing a strand of DNA, including the steps off: providing the strand of DNA; annealing the strand with a primer able to hybridize to the strand to give an annealed mixture; incubating the mixture with four deoxyribonucleoside triphosphates, a DNA polymerase, and at least three deoxyribonucleoside triphosphates in different amounts, under conditions in favoring primer extension to form nucleic acid fragments complementory to the DNA to be sequenced; labelling the nucleic and fragments; separating them and determining the position of the deoxyribonucleoside triphosphates by differences in the intensity of the labels, thereby to determine the DNA sequence.
Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

PubMed Central

Laehnemann, David; Borkhardt, Arndt

2016-01-01

Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159
Use of the Minion nanopore sequencer for rapid sequencing of avian influenza virus isolates

USDA-ARS?s Scientific Manuscript database

A relatively new sequencing technology, the MinION nanopore sequencer, provides a platform that is smaller, faster, and cheaper than existing Next Generation Sequence (NGS) technologies. The MinION sequences of individual strands of DNA and can produce millions of sequencing reads. The cost of the s...
Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models

PubMed Central

2017-01-01

We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem. The end-to-end trained model has an encoder–decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation. The model is trained on 50,000 experimental reaction examples from the United States patent literature, which span 10 broad reaction types that are commonly used by medicinal chemists. We find that our model performs comparably with a rule-based expert system baseline model, and also overcomes certain limitations associated with rule-based expert systems and with any machine learning approach that contains a rule-based expert system component. Our model provides an important first step toward solving the challenging problem of computational retrosynthetic analysis. PMID:29104927
Unlocking hidden genomic sequence

PubMed Central

Keith, Jonathan M.; Cochran, Duncan A. E.; Lala, Gita H.; Adams, Peter; Bryant, Darryn; Mitchelson, Keith R.

2004-01-01

Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs. PMID:14973330
Chromosome specific repetitive DNA sequences

DOEpatents

Moyzis, Robert K.; Meyne, Julianne

1991-01-01

A method is provided for determining specific nucleotide sequences useful in forming a probe which can identify specific chromosomes, preferably through in situ hybridization within the cell itself. In one embodiment, chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family me This invention is the result of a contract with the Department of Energy (Contract No. W-7405-ENG-36).
Targeted therapy according to next generation sequencing-based panel sequencing.

PubMed

Saito, Motonobu; Momma, Tomoyuki; Kono, Koji

2018-04-17

Targeted therapy against actionable gene mutations shows a significantly higher response rate as well as longer survival compared to conventional chemotherapy, and has become a standard therapy for many cancers. Recent progress in next-generation sequencing (NGS) has enabled to identify huge number of genetic aberrations. Based on sequencing results, patients recommend to undergo targeted therapy or immunotherapy. In cases where there are no available approved drugs for the genetic mutations detected in the patients, it is recommended to be facilitate the registration for the clinical trials. For that purpose, a NGS-based sequencing panel that can simultaneously target multiple genes in a single investigation has been used in daily clinical practice. To date, various types of sequencing panels have been developed to investigate genetic aberrations with tumor somatic genome variants (gain-of-function or loss-of-function mutations, high-level copy number alterations, and gene fusions) through comprehensive bioinformatics. Because sequencing panels are efficient and cost-effective, they are quickly being adopted outside the lab, in hospitals and clinics, in order to identify personal targeted therapy for individual cancer patients.
Regulatory sequence analysis tools.

PubMed

van Helden, Jacques

2003-07-01

The web resource Regulatory Sequence Analysis Tools (RSAT) (http://rsat.ulb.ac.be/rsat) offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative formats are supported for the representation of regulatory motifs (strings or position-specific scoring matrices) and several algorithms are proposed for pattern discovery. RSAT currently holds >100 fully sequenced genomes and these data are regularly updated from GenBank.
Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

PubMed Central

Matochko, Wadim L.; Derda, Ratmir

2013-01-01

Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071
Children's discrimination of vowel sequences

NASA Astrophysics Data System (ADS)

Coady, Jeffry A.; Kluender, Keith R.; Evans, Julia

2003-10-01

Children's ability to discriminate sequences of steady-state vowels was investigated. Vowels (as in ``beet,'' ``bat,'' ``bought,'' and ``boot'') were synthesized at durations of 40, 80, 160, 320, 640, and 1280 ms. Four different vowel sequences were created by concatenating different orders of vowels for each duration, separated by 10-ms intervening silence. Thus, sequences differed in vowel order and duration (rate). Sequences were 12 s in duration, with amplitude ramped linearly over the first and last 2 s. Sequence pairs included both same (identical sequences) and different trials (sequences with vowels in different orders). Sequences with vowel of equal duration were presented on individual trials. Children aged 7;0 to 10;6 listened to pairs of sequences (with 100 ms between sequences) and responded whether sequences sounded the same or different. Results indicate that children are best able to discriminate sequences of intermediate-duration vowels, typical of conversational speaking rate. Children were less accurate with both shorter and longer vowels. Results are discussed in terms of auditory processing (shortest vowels) and memory (longest vowels). [Research supported by NIDCD DC-05263, DC-04072, and DC-005650.

Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

PubMed Central

Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

2012-01-01

The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697
Long-range barcode labeling-sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Feng; Zhang, Tao; Singh, Kanwar K.

Methods for sequencing single large DNA molecules by clonal multiple displacement amplification using barcoded primers. Sequences are binned based on barcode sequences and sequenced using a microdroplet-based method for sequencing large polynucleotide templates to enable assembly of haplotype-resolved complex genomes and metagenomes.
Bellerophon: a program to detect chimeric sequences in multiple sequence alignments.

PubMed

Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip

2004-09-22

Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments. Bellerophon is available as an interactive web server at http://foo.maths.uq.edu.au/~huber/bellerophon.pl
BAC sequencing using pooled methods.

PubMed

Saski, Christopher A; Feltus, F Alex; Parida, Laxmi; Haiminen, Niina

2015-01-01

Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.
Program Synthesizes UML Sequence Diagrams

NASA Technical Reports Server (NTRS)

Barry, Matthew R.; Osborne, Richard N.

2006-01-01

A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.
Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV)

PubMed Central

Martin, Andrew C. R.

2014-01-01

The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and ’dotifying’ repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/. PMID:25653836
Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

PubMed

Martin, Andrew C R

2014-01-01

The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.
Sequence repeats and protein structure

NASA Astrophysics Data System (ADS)

Hoang, Trinh X.; Trovato, Antonio; Seno, Flavio; Banavar, Jayanth R.; Maritan, Amos

2012-11-01

Repeats are frequently found in known protein sequences. The level of sequence conservation in tandem repeats correlates with their propensities to be intrinsically disordered. We employ a coarse-grained model of a protein with a two-letter amino acid alphabet, hydrophobic (H) and polar (P), to examine the sequence-structure relationship in the realm of repeated sequences. A fraction of repeated sequences comprises a distinct class of bad folders, whose folding temperatures are much lower than those of random sequences. Imperfection in sequence repetition improves the folding properties of the bad folders while deteriorating those of the good folders. Our results may explain why nature has utilized repeated sequences for their versatility and especially to design functional proteins that are intrinsically unstructured at physiological temperatures.
Multilocus sequence analysis and rpoB sequencing of Mycobacterium abscessus (sensu lato) strains.

PubMed

Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

2011-02-01

Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536(T), M. massiliense CIP 108297(T), and M. bolletii CIP 108541(T)) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the
Multilocus Sequence Analysis and rpoB Sequencing of Mycobacterium abscessus (Sensu Lato) Strains▿

PubMed Central

Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

2011-01-01

Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536T, M. massiliense CIP 108297T, and M. bolletii CIP 108541T) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the clustering
Real-Time DNA Sequencing in the Antarctic Dry Valleys Using the Oxford Nanopore Sequencer

PubMed Central

Johnson, Sarah S.; Zaikova, Elena; Goerlitz, David S.; Bai, Yu; Tighe, Scott W.

2017-01-01

The ability to sequence DNA outside of the laboratory setting has enabled novel research questions to be addressed in the field in diverse areas, ranging from environmental microbiology to viral epidemics. Here, we demonstrate the application of offline DNA sequencing of environmental samples using a hand-held nanopore sequencer in a remote field location: the McMurdo Dry Valleys, Antarctica. Sequencing was performed using a MK1B MinION sequencer from Oxford Nanopore Technologies (ONT; Oxford, United Kingdom) that was equipped with software to operate without internet connectivity. One-direction (1D) genomic libraries were prepared using portable field techniques on DNA isolated from desiccated microbial mats. By adequately insulating the sequencer and laptop, it was possible to run the sequencing protocol for up to 2½ h under arduous conditions. PMID:28337073
Contributions from associative and explicit sequence knowledge to the execution of discrete keying sequences.

PubMed

Verwey, Willem B

2015-05-01

Research has provided many indications that highly practiced 6-key sequences are carried out in a chunking mode in which key-specific stimuli past the first are largely ignored. When in such sequences a deviating stimulus occasionally occurs at an unpredictable location, participants fall back to responding to individual stimuli (Verwey & Abrahamse, 2012). The observation that in such a situation execution still benefits from prior practice has been attributed to the possibility to operate in an associative mode. To better understand the contribution to the execution of keying sequences of motor chunks, associative sequence knowledge and also of explicit sequence knowledge, the present study tested three alternative accounts for the earlier finding of an execution rate increase at the end of 6-key sequences performed in the associative mode. The results provide evidence that the earlier observed execution rate increase can be attributed to the use of explicit sequence knowledge. In the present experiment this benefit was limited to sequences that are executed at the moderately fast rates of the associative mode, and occurred at both the earlier and final elements of the sequences. Copyright © 2015 Elsevier B.V. All rights reserved.
Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes

PubMed Central

Shiroguchi, Katsuyuki; Jia, Tony Z.; Sims, Peter A.; Xie, X. Sunney

2012-01-01

RNA sequencing (RNA-Seq) is a powerful tool for transcriptome profiling, but is hampered by sequence-dependent bias and inaccuracy at low copy numbers intrinsic to exponential PCR amplification. We developed a simple strategy for mitigating these complications, allowing truly digital RNA-Seq. Following reverse transcription, a large set of barcode sequences is added in excess, and nearly every cDNA molecule is uniquely labeled by random attachment of barcode sequences to both ends. After PCR, we applied paired-end deep sequencing to read the two barcodes and cDNA sequences. Rather than counting the number of reads, RNA abundance is measured based on the number of unique barcode sequences observed for a given cDNA sequence. We optimized the barcodes to be unambiguously identifiable, even in the presence of multiple sequencing errors. This method allows counting with single-copy resolution despite sequence-dependent bias and PCR-amplification noise, and is analogous to digital PCR but amendable to quantifying a whole transcriptome. We demonstrated transcriptome profiling of Escherichia coli with more accurate and reproducible quantification than conventional RNA-Seq. PMID:22232676
Sequence Capture versus Restriction Site Associated DNA Sequencing for Shallow Systematics.

PubMed

Harvey, Michael G; Smith, Brian Tilston; Glenn, Travis C; Faircloth, Brant C; Brumfield, Robb T

2016-09-01

Sequence capture and restriction site associated DNA sequencing (RAD-Seq) are two genomic enrichment strategies for applying next-generation sequencing technologies to systematics studies. At shallow timescales, such as within species, RAD-Seq has been widely adopted among researchers, although there has been little discussion of the potential limitations and benefits of RAD-Seq and sequence capture. We discuss a series of issues that may impact the utility of sequence capture and RAD-Seq data for shallow systematics in non-model species. We review prior studies that used both methods, and investigate differences between the methods by re-analyzing existing RAD-Seq and sequence capture data sets from a Neotropical bird (Xenops minutus). We suggest that the strengths of RAD-Seq data sets for shallow systematics are the wide dispersion of markers across the genome, the relative ease and cost of laboratory work, the deep coverage and read overlap at recovered loci, and the high overall information that results. Sequence capture's benefits include flexibility and repeatability in the genomic regions targeted, success using low-quality samples, more straightforward read orthology assessment, and higher per-locus information content. The utility of a method in systematics, however, rests not only on its performance within a study, but on the comparability of data sets and inferences with those of prior work. In RAD-Seq data sets, comparability is compromised by low overlap of orthologous markers across species and the sensitivity of genetic diversity in a data set to an interaction between the level of natural heterozygosity in the samples examined and the parameters used for orthology assessment. In contrast, sequence capture of conserved genomic regions permits interrogation of the same loci across divergent species, which is preferable for maintaining comparability among data sets and studies for the purpose of drawing general conclusions about the impact of
Making sense of deep sequencing

PubMed Central

Goldman, D.; Domschke, K.

2016-01-01

This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of ‘big data’, to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered. PMID:24925306
MSLICE Sequencing

NASA Technical Reports Server (NTRS)

Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Norris, Jeffrey S.; Morris, John R.

2011-01-01

MSLICE Sequencing is a graphical tool for writing sequences and integrating them into RML files, as well as for producing SCMF files for uplink. When operated in a testbed environment, it also supports uplinking these SCMF files to the testbed via Chill. This software features a free-form textural sequence editor featuring syntax coloring, automatic content assistance (including command and argument completion proposals), complete with types, value ranges, unites, and descriptions from the command dictionary that appear as they are typed. The sequence editor also has a "field mode" that allows tabbing between arguments and displays type/range/units/description for each argument as it is edited. Color-coded error and warning annotations on problematic tokens are included, as well as indications of problems that are not visible in the current scroll range. "Quick Fix" suggestions are made for resolving problems, and all the features afforded by modern source editors are also included such as copy/cut/paste, undo/redo, and a sophisticated find-and-replace system optionally using regular expressions. The software offers a full XML editor for RML files, which features syntax coloring, content assistance and problem annotations as above. There is a form-based, "detail view" that allows structured editing of command arguments and sequence parameters when preferred. The "project view" shows the user s "workspace" as a tree of "resources" (projects, folders, and files) that can subsequently be opened in editors by double-clicking. Files can be added, deleted, dragged-dropped/copied-pasted between folders or projects, and these operations are undoable and redoable. A "problems view" contains a tabular list of all problems in the current workspace. Double-clicking on any row in the table opens an editor for the appropriate sequence, scrolling to the specific line with the problem, and highlighting the problematic characters. From there, one can invoke "quick fix" as described
Statistical properties of filtered pseudorandom digital sequences formed from the sum of maximum-length sequences

NASA Technical Reports Server (NTRS)

Wallace, G. R.; Weathers, G. D.; Graf, E. R.

1973-01-01

The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.
Program for Editing Spacecraft Command Sequences

NASA Technical Reports Server (NTRS)

Gladden, Roy; Waggoner, Bruce; Kordon, Mark; Hashemi, Mahnaz; Hanks, David; Salcedo, Jose

2006-01-01

Sequence Translator, Editor, and Expander Resource (STEER) is a computer program that facilitates construction of sequences and blocks of sequences (hereafter denoted generally as sequence products) for commanding a spacecraft. STEER also provides mechanisms for translating among various sequence product types and quickly expanding activities of a given sequence in chronological order for review and analysis of the sequence. To date, construction of sequence products has generally been done by use of such clumsy mechanisms as text-editor programs, translating among sequence product types has been challenging, and expanding sequences to time-ordered lists has involved arduous processes of converting sequence products to "real" sequences and running them through Class-A software (defined, loosely, as flight and ground software critical to a spacecraft mission). Also, heretofore, generating sequence products in standard formats has been troublesome because precise formatting and syntax are required. STEER alleviates these issues by providing a graphical user interface containing intuitive fields in which the user can enter the necessary information. The STEER expansion function provides a "quick and dirty" means of seeing how a sequence and sequence block would expand into a chronological list, without need to use of Class-A software.
Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

PubMed Central

2012-01-01

Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence
Multiplexed fragaria chloroplast genome sequencing

Treesearch

W. Njuguna; A. Liston; R. Cronn; N.V. Bassil

2010-01-01

A method to sequence multiple chloroplast genomes using ultra high throughput sequencing technologies was recently described. Complete chloroplast genome sequences can resolve phylogenetic relationships at low taxonomic levels and identify informative point mutations and indels. The objective of this research was to sequence multiple Fragaria...

A vision for ubiquitous sequencing

PubMed Central

Erlich, Yaniv

2015-01-01

Genomics has recently celebrated reaching the $1000 genome milestone, making affordable DNA sequencing a reality. With this goal successfully completed, the next goal of the sequencing revolution can be sequencing sensors—miniaturized sequencing devices that are manufactured for real-time applications and deployed in large quantities at low costs. The first part of this manuscript envisions applications that will benefit from moving the sequencers to the samples in a range of domains. In the second part, the manuscript outlines the critical barriers that need to be addressed in order to reach the goal of ubiquitous sequencing sensors. PMID:26430149
Automatic Command Sequence Generation

NASA Technical Reports Server (NTRS)

Fisher, Forest; Gladded, Roy; Khanampompan, Teerapat

2007-01-01

Automatic Sequence Generator (Autogen) Version 3.0 software automatically generates command sequences for the Mars Reconnaissance Orbiter (MRO) and several other JPL spacecraft operated by the multi-mission support team. Autogen uses standard JPL sequencing tools like APGEN, ASP, SEQGEN, and the DOM database to automate the generation of uplink command products, Spacecraft Command Message Format (SCMF) files, and the corresponding ground command products, DSN Keywords Files (DKF). Autogen supports all the major multi-mission mission phases including the cruise, aerobraking, mapping/science, and relay mission phases. Autogen is a Perl script, which functions within the mission operations UNIX environment. It consists of two parts: a set of model files and the autogen Perl script. Autogen encodes the behaviors of the system into a model and encodes algorithms for context sensitive customizations of the modeled behaviors. The model includes knowledge of different mission phases and how the resultant command products must differ for these phases. The executable software portion of Autogen, automates the setup and use of APGEN for constructing a spacecraft activity sequence file (SASF). The setup includes file retrieval through the DOM (Distributed Object Manager), an object database used to store project files. This step retrieves all the needed input files for generating the command products. Depending on the mission phase, Autogen also uses the ASP (Automated Sequence Processor) and SEQGEN to generate the command product sent to the spacecraft. Autogen also provides the means for customizing sequences through the use of configuration files. By automating the majority of the sequencing generation process, Autogen eliminates many sequence generation errors commonly introduced by manually constructing spacecraft command sequences. Through the layering of commands into the sequence by a series of scheduling algorithms, users are able to rapidly and reliably construct the
Sequence Matters but How Exactly? A Method for Evaluating Activity Sequences from Data

ERIC Educational Resources Information Center

Doroudi, Shayan; Holstein, Kenneth; Aleven, Vincent; Brunskill, Emma

2016-01-01

How should a wide variety of educational activities be sequenced to maximize student learning? Although some experimental studies have addressed this question, educational data mining methods may be able to evaluate a wider range of possibilities and better handle many simultaneous sequencing constraints. We introduce Sequencing Constraint…
ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos.

PubMed

Roca, Alberto I

2014-01-01

The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.
Career Academy Course Sequences.

ERIC Educational Resources Information Center

Markham, Thom; Lenz, Robert

This career academy course sequence guide is designed to give teachers a quick overview of the course sequences of well-known career academy and career pathway programs from across the country. The guide presents a variety of sample course sequences for the following academy themes: (1) arts and communication; (2) business and finance; (3)…
Large-Scale Concatenation cDNA Sequencing

PubMed Central

Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

1997-01-01

A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174
Is the kinetoplast DNA a percolating network of linked rings at its critical point?

NASA Astrophysics Data System (ADS)

Michieletto, Davide; Marenduzzo, Davide; Orlandini, Enzo

2015-05-01

In this work we present a computational study of the kinetoplast genome, modelled as a large number of semiflexible unknotted loops, which are allowed to link with each other. As the DNA density increases, the systems shows a percolation transition between a gas of unlinked rings and a network of linked loops which spans the whole system. Close to the percolation transition, we find that the mean valency of the network, i.e. the average number of loops which are linked to any one loop, is around three, as found experimentally for the kinetoplast DNA (kDNA). Even more importantly, by simulating the digestion of the network by a restriction enzyme, we show that the distribution of oligomers, i.e. structures formed by a few loops which remain linked after digestion, quantitatively matches experimental data obtained from gel electrophoresis, provided that the density is, once again, close to the percolation transition. With respect to previous work, our analysis builds on a reduced number of assumptions, yet can still fully explain the experimental data. Our findings suggest that the kDNA can be viewed as a network of linked loops positioned very close to the percolation transition, and we discuss the possible biological implications of this remarkable fact.
Kilo-sequencing: an ordered strategy for rapid DNA sequence data acquisition.

PubMed Central

Barnes, W M; Bevan, M

1983-01-01

A strategy for rapid DNA sequence acquisition in an ordered, nonrandom manner, while retaining all of the conveniences of the dideoxy method with M13 transducing phage DNA template, is described. Target DNA 3 to 14 kb in size can be stably carried by our M13 vectors. Suitable targets are stretches of DNA which lack an enzyme recognition site which is unique on our cloning vectors and adjacent to the sequencing primer; current sites that are so useful when lacking are Pst, Xba, HindIII, BglII, EcoRI. By an in vitro procedure, we cut RF DNA once randomly and once specifically, to create thousands of deletions which start at the unique restriction site adjacent to the dideoxy sequencing primer and extend various distances across the target DNA. Phage carrying a desired size of deletions, whose DNA as template will give rise to DNA sequence data in a desired location along the target DNA, may be purified by electrophoresis alive on agarose gels. Phage running in the same location on the agarose gel thus conveniently give rise to nucleotide sequence data from the same kilobase of target DNA. Images PMID:6298723
Biomolecule Sequencer: Next-Generation DNA Sequencing Technology for In-Flight Environmental Monitoring, Research, and Beyond

NASA Technical Reports Server (NTRS)

Smith, David J.; Burton, Aaron; Castro-Wallace, Sarah; John, Kristen; Stahl, Sarah E.; Dworkin, Jason Peter; Lupisella, Mark L.

2016-01-01

On the International Space Station (ISS), technologies capable of rapid microbial identification and disease diagnostics are not currently available. NASA still relies upon sample return for comprehensive, molecular-based sample characterization. Next-generation DNA sequencing is a powerful approach for identifying microorganisms in air, water, and surfaces onboard spacecraft. The Biomolecule Sequencer payload, manifested to SpaceX-9 and scheduled on the Increment 4748 research plan (June 2016), will assess the functionality of a commercially-available next-generation DNA sequencer in the microgravity environment of ISS. The MinION device from Oxford Nanopore Technologies (Oxford, UK) measures picoamp changes in electrical current dependent on nucleotide sequences of the DNA strand migrating through nanopores in the system. The hardware is exceptionally small (9.5 x 3.2 x 1.6 cm), lightweight (120 grams), and powered only by a USB connection. For the ISS technology demonstration, the Biomolecule Sequencer will be powered by a Microsoft Surface Pro3. Ground-prepared samples containing lambda bacteriophage, Escherichia coli, and mouse genomic DNA, will be launched and stored frozen on the ISS until experiment initiation. Immediately prior to sequencing, a crew member will collect and thaw frozen DNA samples, connect the sequencer to the Surface Pro3, inject thawed samples into a MinION flow cell, and initiate sequencing. At the completion of the sequencing run, data will be downlinked for ground analysis. Identical, synchronous ground controls will be used for data comparisons to determine sequencer functionality, run-time sequence, current dynamics, and overall accuracy. We will present our latest results from the ISS flight experiment the first time DNA has ever been sequenced in space and discuss the many potential applications of the Biomolecule Sequencer for environmental monitoring, medical diagnostics, higher fidelity and more adaptable Space Biology Human
Studying long 16S rDNA sequences with ultrafast-metagenomic sequence classification using exact alignments (Kraken).

PubMed

Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco

2016-03-01

Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences. Copyright © 2016 Elsevier B.V. All rights reserved.
Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers

PubMed Central

Pabinger, Stephan; Ernst, Karina; Pulverer, Walter; Kallmeyer, Rainer; Valdes, Ana M.; Metrustry, Sarah; Katic, Denis; Nuzzo, Angelo; Kriegner, Albert; Vierlinger, Klemens; Weinhaeusel, Andreas

2016-01-01

Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely
DNA Sequencing by Capillary Electrophoresis

PubMed Central

Karger, Barry L.; Guttman, Andras

2009-01-01

Sequencing of human and other genomes has been at the center of interest in the biomedical field over the past several decades and is now leading toward an era of personalized medicine. During this time, DNA sequencing methods have evolved from the labor intensive slab gel electrophoresis, through automated multicapillary electrophoresis systems using fluorophore labeling with multispectral imaging, to the “next generation” technologies of cyclic array, hybridization based, nanopore and single molecule sequencing. Deciphering the genetic blueprint and follow-up confirmatory sequencing of Homo sapiens and other genomes was only possible by the advent of modern sequencing technologies that was a result of step by step advances with a contribution of academics, medical personnel and instrument companies. While next generation sequencing is moving ahead at break-neck speed, the multicapillary electrophoretic systems played an essential role in the sequencing of the Human Genome, the foundation of the field of genomics. In this prospective, we wish to overview the role of capillary electrophoresis in DNA sequencing based in part of several of our articles in this journal. PMID:19517496
Sequence search on a supercomputer.

PubMed

Gotoh, O; Tagashira, Y

1986-01-10

A set of programs was developed for searching nucleic acid and protein sequence data bases for sequences similar to a given sequence. The programs, written in FORTRAN 77, were optimized for vector processing on a Hitachi S810-20 supercomputer. A search of a 500-residue protein sequence against the entire PIR data base Ver. 1.0 (1) (0.5 M residues) is carried out in a CPU time of 45 sec. About 4 min is required for an exhaustive search of a 1500-base nucleotide sequence against all mammalian sequences (1.2M bases) in Genbank Ver. 29.0. The CPU time is reduced to about a quarter with a faster version.
HIV Sequence Compendium 2015

DOE Office of Scientific and Technical Information (OSTI.GOV)

Foley, Brian Thomas; Leitner, Thomas Kenneth; Apetrei, Cristian

This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database ismore » still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.« less
ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos

PubMed Central

2014-01-01

Background The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. Results The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. Conclusions The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org. PMID:25237393
GuiTope: an application for mapping random-sequence peptides to protein sequences.

PubMed

Halperin, Rebecca F; Stafford, Phillip; Emery, Jack S; Navalkar, Krupa Arun; Johnston, Stephen Albert

2012-01-03

Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC) at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.
Targeted sequencing of plant genomes

Treesearch

Mark D. Huynh

2014-01-01

Next-generation sequencing (NGS) has revolutionized the field of genetics by providing a means for fast and relatively affordable sequencing. With the advancement of NGS, wholegenome sequencing (WGS) has become more commonplace. However, sequencing an entire genome is still not cost effective or even beneficial in all cases. In studies that do not require a whole-...
Enrichment of target sequences for next-generation sequencing applications in research and diagnostics.

PubMed

Altmüller, Janine; Budde, Birgit S; Nürnberg, Peter

2014-02-01

Abstract Targeted re-sequencing such as gene panel sequencing (GPS) has become very popular in medical genetics, both for research projects and in diagnostic settings. The technical principles of the different enrichment methods have been reviewed several times before; however, new enrichment products are constantly entering the market, and researchers are often puzzled about the requirement to take decisions about long-term commitments, both for the enrichment product and the sequencing technology. This review summarizes important considerations for the experimental design and provides helpful recommendations in choosing the best sequencing strategy for various research projects and diagnostic applications.
Pairwise Sequence Alignment Library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jeff Daily, PNNL

2015-05-20

Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, amore » novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.« less
Sequence History Update Tool

NASA Technical Reports Server (NTRS)

Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

2008-01-01

The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

Solid phase sequencing of biopolymers

DOEpatents

Cantor, Charles; Koster, Hubert

2010-09-28

This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include DNA or RNA in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.
Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

PubMed

Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

2017-06-01

Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
The sequence specificity of UV-induced DNA damage in a systematically altered DNA sequence.

PubMed

Khoe, Clairine V; Chung, Long H; Murray, Vincent

2018-06-01

The sequence specificity of UV-induced DNA damage was investigated in a specifically designed DNA plasmid using two procedures: end-labelling and linear amplification. Absorption of UV photons by DNA leads to dimerisation of pyrimidine bases and produces two major photoproducts, cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). A previous study had determined that two hexanucleotide sequences, 5'-GCTC*AC and 5'-TATT*AA, were high intensity UV-induced DNA damage sites. The UV clone plasmid was constructed by systematically altering each nucleotide of these two hexanucleotide sequences. One of the main goals of this study was to determine the influence of single nucleotide alterations on the intensity of UV-induced DNA damage. The sequence 5'-GCTC*AC was designed to examine the sequence specificity of 6-4PPs and the highest intensity 6-4PP damage sites were found at 5'-GTTC*CC nucleotides. The sequence 5'-TATT*AA was devised to investigate the sequence specificity of CPDs and the highest intensity CPD damage sites were found at 5'-TTTT*CG nucleotides. It was proposed that the tetranucleotide DNA sequence, 5'-YTC*Y (where Y is T or C), was the consensus sequence for the highest intensity UV-induced 6-4PP adduct sites; while it was 5'-YTT*C for the highest intensity UV-induced CPD damage sites. These consensus tetranucleotides are composed entirely of consecutive pyrimidines and must have a DNA conformation that is highly productive for the absorption of UV photons. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
"First generation" automated DNA sequencing technology.

PubMed

Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

2011-10-01

Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.
The recurrence sequences via Sylvester matrices

NASA Astrophysics Data System (ADS)

Karaduman, Erdal; Deveci, Ömür

2017-07-01

In this work, we define the Pell-Jacobsthal-Slyvester sequence and the Jacobsthal-Pell-Slyvester sequence by using the Slyvester matrices which are obtained from the characteristic polynomials of the Pell and Jacobsthal sequences and then, we study the sequences defined modulo m. Also, we obtain the cyclic groups and the semigroups from the generating matrices of these sequences when read modulo m and then, we derive the relationships among the orders of the cyclic groups and the periods of the sequences. Furthermore, we redefine Pell-Jacobsthal-Slyvester sequence and the Jacobsthal-Pell-Slyvester sequence by means of the elements of the groups and then, we examine them in the finite groups.
Trypanosoma cruzi I genotype among isolates from patients with chronic Chagas disease followed at the Evandro Chagas National Institute of Infectious Diseases (FIOCRUZ, Brazil).

PubMed

Oliveira, Tatiana da Silva Fonseca de; Santos, Barbara Neves Dos; Galdino, Tainah Silva; Hasslocher-Moreno, Alejandro Marcel; Bastos, Otilio Machado Pereira; Sousa, Maria Auxiliadora de

2017-01-01

Trypanosoma cruzi is the etiologic agent of Chagas disease in humans, mainly in Latin America. Trypanosome stocks were isolated by hemoculture from patients followed at Evandro Chagas National Institute of Infectious Diseases (FIOCRUZ) and studied using different approaches. For species and genotype identification, the stocks were analyzed by parasitological techniques, polymerase chain reaction assays targeted to specific DNA sequences, isoenzyme patterns, besides sequencing of a polymorphic locus of TcSC5D gene (one stock). The isolates presented typical T. cruzi morphology and usually grew well in routine culture media. Metacyclic trypomastigotes were found in cultures or experimentally infected Triatoma infestans. All isolates were pure T. cruzi cultures, presenting typical 330-bp products from kinetoplast DNA minicircles, and 250 or 200-bp amplicons from the mini-exon non-transcribed spacer. Their genetic type assignment was resolved by their isoenzyme profiles. The finding of TcI in one asymptomatic patient from Paraíba was confirmed by the sequencing assay. TcVI was found in two asymptomatic individuals from Bahia and Rio Grande do Sul. TcII was identified in six patients from Pernambuco, Bahia and Minas Gerais, who presented different clinical forms: cardiac (2), digestive with megaesophagus (1), and indeterminate (3). The main T. cruzi genotypes found in Brazilian chronic patients were identified in this work, including TcI, which is less frequent and usually causes asymptomatic disease, unlike that in other American countries. This study emphasizes the importance of T. cruzi genotyping for possible correlations between the parasite and patient' responses to therapeutic treatment or disease clinical manifestations.
Chip-based sequencing nucleic acids

DOEpatents

Beer, Neil Reginald

2014-08-26

A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.
Reversible second-order conditional sequences in incidental sequence learning tasks.

PubMed

Pasquali, Antoine; Cleeremans, Axel; Gaillard, Vinciane

2018-06-01

In sequence learning tasks, participants' sensitivity to the sequential structure of a series of events often overshoots their ability to express relevant knowledge intentionally, as in generation tasks that require participants to produce either the next element of a sequence (inclusion) or a different element (exclusion). Comparing generation performance under inclusion and exclusion conditions makes it possible to assess the respective influences of conscious and unconscious learning. Recently, two main concerns have been expressed concerning such tasks. First, it is often difficult to design control sequences in such a way that they enable clear comparisons with the training material. Second, it is challenging to ask participants to perform appropriately under exclusion instructions, for the requirement to exclude familiar responses often leads them to adopt degenerate strategies (e.g., pushing on the same key all the time), which then need to be specifically singled out as invalid. To overcome both concerns, we introduce reversible second-order conditional (RSOC) sequences and show (a) that they elicit particularly strong transfer effects, (b) that dissociation of implicit and explicit influences becomes possible thanks to the removal of salient transitions in RSOCs, and (c) that exclusion instructions can be greatly simplified without losing sensitivity.
HIV Sequence Compendium 2010

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kuiken, Carla; Foley, Brian; Leitner, Thomas

This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is stillmore » increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.« less
Complete genome sequence of southern tomato virus identified from China using next generation sequencing

USDA-ARS?s Scientific Manuscript database

Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...
Identification of Trypanosoma cruzi Discrete Typing Units (DTUs) in Latin-American migrants in Barcelona (Spain).

PubMed

Abras, Alba; Gállego, Montserrat; Muñoz, Carmen; Juiz, Natalia A; Ramírez, Juan Carlos; Cura, Carolina I; Tebar, Silvia; Fernández-Arévalo, Anna; Pinazo, María-Jesús; de la Torre, Leonardo; Posada, Elizabeth; Navarro, Ferran; Espinal, Paula; Ballart, Cristina; Portús, Montserrat; Gascón, Joaquim; Schijman, Alejandro G

2017-04-01

Trypanosoma cruzi, the causative agent of Chagas disease, is divided into six Discrete Typing Units (DTUs): TcI-TcVI. We aimed to identify T. cruzi DTUs in Latin-American migrants in the Barcelona area (Spain) and to assess different molecular typing approaches for the characterization of T. cruzi genotypes. Seventy-five peripheral blood samples were analyzed by two real-time PCR methods (qPCR) based on satellite DNA (SatDNA) and kinetoplastid DNA (kDNA). The 20 samples testing positive in both methods, all belonging to Bolivian individuals, were submitted to DTU characterization using two PCR-based flowcharts: multiplex qPCR using TaqMan probes (MTq-PCR), and conventional PCR. These samples were also studied by sequencing the SatDNA and classified as type I (TcI/III), type II (TcII/IV) and type I/II hybrid (TcV/VI). Ten out of the 20 samples gave positive results in the flowcharts: TcV (5 samples), TcII/V/VI (3) and mixed infections by TcV plus TcII (1) and TcV plus TcII/VI (1). By SatDNA sequencing, we classified the 20 samples, 19 as type I/II and one as type I. The most frequent DTU identified by both flowcharts, and suggested by SatDNA sequencing in the remaining samples with low parasitic loads, TcV, is common in Bolivia and predominant in peripheral blood. The mixed infection by TcV-TcII was detected for the first time simultaneously in Bolivian migrants. PCR-based flowcharts are very useful to characterize DTUs during acute infection. SatDNA sequence analysis cannot discriminate T. cruzi populations at the level of a single DTU but it enabled us to increase the number of characterized cases in chronically infected patients. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Nanopore sequencing in microgravity

PubMed Central

McIntyre, Alexa B R; Rizzardi, Lindsay; Yu, Angela M; Alexander, Noah; Rosen, Gail L; Botkin, Douglas J; Stahl, Sarah E; John, Kristen K; Castro-Wallace, Sarah L; McGrath, Ken; Burton, Aaron S; Feinberg, Andrew P; Mason, Christopher E

2016-01-01

Rapid DNA sequencing and analysis has been a long-sought goal in remote research and point-of-care medicine. In microgravity, DNA sequencing can facilitate novel astrobiological research and close monitoring of crew health, but spaceflight places stringent restrictions on the mass and volume of instruments, crew operation time, and instrument functionality. The recent emergence of portable, nanopore-based tools with streamlined sample preparation protocols finally enables DNA sequencing on missions in microgravity. As a first step toward sequencing in space and aboard the International Space Station (ISS), we tested the Oxford Nanopore Technologies MinION during a parabolic flight to understand the effects of variable gravity on the instrument and data. In a successful proof-of-principle experiment, we found that the instrument generated DNA reads over the course of the flight, including the first ever sequenced in microgravity, and additional reads measured after the flight concluded its parabolas. Here we detail modifications to the sample-loading procedures to facilitate nanopore sequencing aboard the ISS and in other microgravity environments. We also evaluate existing analysis methods and outline two new approaches, the first based on a wave-fingerprint method and the second on entropy signal mapping. Computationally light analysis methods offer the potential for in situ species identification, but are limited by the error profiles (stays, skips, and mismatches) of older nanopore data. Higher accuracies attainable with modified sample processing methods and the latest version of flow cells will further enable the use of nanopore sequencers for diagnostics and research in space. PMID:28725742
Quantitative phenotyping via deep barcode sequencing.

PubMed

Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

2009-10-01

Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.
Quantitative phenotyping via deep barcode sequencing

PubMed Central

Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey

2009-01-01

Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793
The advantages of SMRT sequencing.

PubMed

Roberts, Richard J; Carneiro, Mauricio O; Schatz, Michael C

2013-07-03

Of the current next-generation sequencing technologies, SMRT sequencing is sometimes overlooked. However, attributes such as long reads, modified base detection and high accuracy make SMRT a useful technology and an ideal approach to the complete sequencing of small genomes.
A safe an easy method for building consensus HIV sequences from 454 massively parallel sequencing data.

PubMed

Fernández-Caballero Rico, Jose Ángel; Chueca Porcuna, Natalia; Álvarez Estévez, Marta; Mosquera Gutiérrez, María Del Mar; Marcos Maeso, María Ángeles; García, Federico

2018-02-01

To show how to generate a consensus sequence from the information of massive parallel sequences data obtained from routine HIV anti-retroviral resistance studies, and that may be suitable for molecular epidemiology studies. Paired Sanger (Trugene-Siemens) and next-generation sequencing (NGS) (454 GSJunior-Roche) HIV RT and protease sequences from 62 patients were studied. NGS consensus sequences were generated using Mesquite, using 10%, 15%, and 20% thresholds. Molecular evolutionary genetics analysis (MEGA) was used for phylogenetic studies. At a 10% threshold, NGS-Sanger sequences from 17/62 patients were phylogenetically related, with a median bootstrap-value of 88% (IQR83.5-95.5). Association increased to 36/62 sequences, median bootstrap 94% (IQR85.5-98)], using a 15% threshold. Maximum association was at the 20% threshold, with 61/62 sequences associated, and a median bootstrap value of 99% (IQR98-100). A safe method is presented to generate consensus sequences from HIV-NGS data at 20% threshold, which will prove useful for molecular epidemiological studies. Copyright © 2016 Elsevier España, S.L.U. and Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.
Studies of a biochemical factory: tomato trichome deep expressed sequence tag sequencing and proteomics.

PubMed

Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L

2010-07-01

Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.
MALDI Top-Down sequencing: calling N- and C-terminal protein sequences with high confidence and speed.

PubMed

Suckau, Detlev; Resemann, Anja

2009-12-01

The ability to match Top-Down protein sequencing (TDS) results by MALDI-TOF to protein sequences by classical protein database searching was evaluated in this work. Resulting from these analyses were the protein identity, the simultaneous assignment of the N- and C-termini and protein sequences of up to 70 residues from either terminus. In combination with de novo sequencing using the MALDI-TDS data, even fusion proteins were assigned and the detailed sequence around the fusion site was elucidated. MALDI-TDS allowed to efficiently match protein sequences quickly and to validate recombinant protein structures-in particular, protein termini-on the level of undigested proteins.
Rapid Diagnostics of Onboard Sequences

NASA Technical Reports Server (NTRS)

Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

2012-01-01

Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command
WebLogo: A Sequence Logo Generator

PubMed Central

Crooks, Gavin E.; Hon, Gary; Chandonia, John-Marc; Brenner, Steven E.

2004-01-01

WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with additional features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization. PMID:15173120

Characterization and complete genome sequence of a panicovirus from Bermuda grass by high-throughput sequencing.

PubMed

Tahir, Muhammad N; Lockhart, Ben; Grinstead, Samuel; Mollov, Dimitre

2017-04-01

Bermuda grass samples were examined by transmission electron microscopy and 28-30 nm spherical virus particles were observed. Total RNA from these plants was subjected to high-throughput sequencing (HTS). The nearly full genome sequence of a panicovirus was identified from one HTS scaffold. Sanger sequencing was used to confirm the HTS results and complete the genome sequence of 4404 nt. This virus was provisionally named Bermuda grass latent virus (BGLV). Its predicted open reading frames follow the typical arrangement of the genus Panicovirus. Based on sequence comparisons and phylogenetic analyses BGLV differs from other viruses and therefore taxonomically it is a new member of the genus Panicovirus, family Tombusviridae.
Identification of Sequence Specificity of 5-Methylcytosine Oxidation by Tet1 Protein with High-Throughput Sequencing.

PubMed

Kizaki, Seiichiro; Chandran, Anandhakumar; Sugiyama, Hiroshi

2016-03-02

Tet (ten-eleven translocation) family proteins have the ability to oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC). However, the oxidation reaction of Tet is not understood completely. Evaluation of genomic-level epigenetic changes by Tet protein requires unbiased identification of the highly selective oxidation sites. In this study, we used high-throughput sequencing to investigate the sequence specificity of mC oxidation by Tet1. A 6.6×10(4) -member mC-containing random DNA-sequence library was constructed. The library was subjected to Tet-reactive pulldown followed by high-throughput sequencing. Analysis of the obtained sequence data identified the Tet1-reactive sequences. We identified mCpG as a highly reactive sequence of Tet1 protein. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Sequence Data for Clostridium autoethanogenum using Three Generations of Sequencing Technologies

DOE PAGES

Utturkar, Sagar M.; Klingeman, Dawn Marie; Bruno-Barcena, José M.; ...

2015-04-14

During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20 kb) to be generated. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. In this paper, we describe high quality sequencemore » datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data.« less
Haplotype estimation using sequencing reads.

PubMed

Delaneau, Olivier; Howie, Bryan; Cox, Anthony J; Zagury, Jean-François; Marchini, Jonathan

2013-10-03

High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Theta oscillations promote temporal sequence learning.

PubMed

Crivelli-Decker, Jordan; Hsieh, Liang-Tien; Clarke, Alex; Ranganath, Charan

2018-05-17

Many theoretical models suggest that neural oscillations play a role in learning or retrieval of temporal sequences, but the extent to which oscillations support sequence representation remains unclear. To address this question, we used scalp electroencephalography (EEG) to examine oscillatory activity over learning of different object sequences. Participants made semantic decisions on each object as they were presented in a continuous stream. For three "Consistent" sequences, the order of the objects was always fixed. Activity during Consistent sequences was compared to "Random" sequences that consisted of the same objects presented in a different order on each repetition. Over the course of learning, participants made faster semantic decisions to objects in Consistent, as compared to objects in Random sequences. Thus, participants were able to use sequence knowledge to predict upcoming items in Consistent sequences. EEG analyses revealed decreased oscillatory power in the theta (4-7 Hz) band at frontal sites following decisions about objects in Consistent sequences, as compared with objects in Random sequences. The theta power difference between Consistent and Random only emerged in the second half of the task, as participants were more effectively able to predict items in Consistent sequences. Moreover, we found increases in parieto-occipital alpha (10-13 Hz) and beta (14-28 Hz) power during the pre-response period for objects in Consistent sequences, relative to objects in Random sequences. Linear mixed effects modeling revealed that single trial theta oscillations were related to reaction time for future objects in a sequence, whereas beta and alpha oscillations were only predictive of reaction time on the current trial. These results indicate that theta and alpha/beta activity preferentially relate to future and current events, respectively. More generally our findings highlight the importance of band-specific neural oscillations in the learning of
77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-10-29

... DEPARTMENT OF COMMERCE Patent and Trademark Office Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request... Patent applications that contain nucleotide and/or amino acid sequence disclosures must include a copy of...
Counting Patterns in Degenerated Sequences

NASA Astrophysics Data System (ADS)

Nuel, Grégory

Biological sequences like DNA or proteins, are always obtained through a sequencing process which might produce some uncertainty. As a result, such sequences are usually written in a degenerated alphabet where some symbols may correspond to several possible letters (ex: IUPAC DNA alphabet). When counting patterns in such degenerated sequences, the question that naturally arises is: how to deal with degenerated positions ? Since most (usually 99%) of the positions are not degenerated, it is considered harmless to discard the degenerated positions in order to get an observation, but the exact consequences of such a practice are unclear. In this paper, we introduce a rigorous method to take into account the uncertainty of sequencing for biological sequences (DNA, Proteins). We first introduce a Forward-Backward approach to compute the marginal distribution of the constrained sequence and use it both to perform a Expectation-Maximization estimation of parameters, as well as deriving a heterogeneous Markov distribution for the constrained sequence. This distribution is hence used along with known DFA-based pattern approaches to obtain the exact distribution of the pattern count under the constraints. As an illustration, we consider a EST dataset from the EMBL database. Despite the fact that only 1% of the positions in this dataset are degenerated, we show that not taking into account these positions might lead to erroneous observations, further proving the interest of our approach.
The EMBL nucleotide sequence database

PubMed Central

Stoesser, Guenter; Baker, Wendy; van den Broek, Alexandra; Camon, Evelyn; Garcia-Pastor, Maria; Kanz, Carola; Kulikova, Tamara; Lombard, Vincent; Lopez, Rodrigo; Parkinson, Helen; Redaschi, Nicole; Sterk, Peter; Stoehr, Peter; Tuli, Mary Ann

2001-01-01

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. PMID:11125039
Sequencing the Connectome

PubMed Central

Zador, Anthony M.; Dubnau, Joshua; Oyibo, Hassana K.; Zhan, Huiqing; Cao, Gang; Peikon, Ian D.

2012-01-01

Connectivity determines the function of neural circuits. Historically, circuit mapping has usually been viewed as a problem of microscopy, but no current method can achieve high-throughput mapping of entire circuits with single neuron precision. Here we describe a novel approach to determining connectivity. We propose BOINC (“barcoding of individual neuronal connections”), a method for converting the problem of connectivity into a form that can be read out by high-throughput DNA sequencing. The appeal of using sequencing is that its scale—sequencing billions of nucleotides per day is now routine—is a natural match to the complexity of neural circuits. An inexpensive high-throughput technique for establishing circuit connectivity at single neuron resolution could transform neuroscience research. PMID:23109909
Computational analysis of sequence selection mechanisms.

PubMed

Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

2004-04-01

Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.
Turtle Graphics of Morphic Sequences

NASA Astrophysics Data System (ADS)

Zantema, Hans

2016-02-01

The simplest infinite sequences that are not ultimately periodic are pure morphic sequences: fixed points of particular morphisms mapping single symbols to strings of symbols. A basic way to visualize a sequence is by a turtle curve: for every alphabet symbol fix an angle, and then consecutively for all sequence elements draw a unit segment and turn the drawing direction by the corresponding angle. This paper investigates turtle curves of pure morphic sequences. In particular, criteria are given for turtle curves being finite (consisting of finitely many segments), and for being fractal or self-similar: it contains an up-scaled copy of itself. Also space-filling turtle curves are considered, and a turtle curve that is dense in the plane. As a particular result we give an exact relationship between the Koch curve and a turtle curve for the Thue-Morse sequence, where until now for such a result only approximations were known.
Insertion Sequences

PubMed Central

Mahillon, Jacques; Chandler, Michael

1998-01-01

Insertion sequences (ISs) constitute an important component of most bacterial genomes. Over 500 individual ISs have been described in the literature to date, and many more are being discovered in the ongoing prokaryotic and eukaryotic genome-sequencing projects. The last 10 years have also seen some striking advances in our understanding of the transposition process itself. Not least of these has been the development of various in vitro transposition systems for both prokaryotic and eukaryotic elements and, for several of these, a detailed understanding of the transposition process at the chemical level. This review presents a general overview of the organization and function of insertion sequences of eubacterial, archaebacterial, and eukaryotic origins with particular emphasis on bacterial elements and on different aspects of the transposition mechanism. It also attempts to provide a framework for classification of these elements by assigning them to various families or groups. A total of 443 members of the collection have been grouped in 17 families based on combinations of the following criteria: (i) similarities in genetic organization (arrangement of open reading frames); (ii) marked identities or similarities in the enzymes which mediate the transposition reactions, the recombinases/transposases (Tpases); (iii) similar features of their ends (terminal IRs); and (iv) fate of the nucleotide sequence of their target sites (generation of a direct target duplication of determined length). A brief description of the mechanism(s) involved in the mobility of individual ISs in each family and of the structure-function relationships of the individual Tpases is included where available. PMID:9729608
Cellulases and coding sequences

DOEpatents

Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

2001-02-20

The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.
Cellulases and coding sequences

DOEpatents

Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

2001-01-01

The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.
Sequence Factorial and Its Applications

ERIC Educational Resources Information Center

Asiru, Muniru A.

2012-01-01

In this note, we introduce sequence factorial and use this to study generalized M-bonomial coefficients. For the sequence of natural numbers, the twin concepts of sequence factorial and generalized M-bonomial coefficients, respectively, extend the corresponding concepts of factorial of an integer and binomial coefficients. Some latent properties…
SNMR pulse sequence phase cycling

DOEpatents

Walsh, David O; Grunewald, Elliot D

2013-11-12

Technologies applicable to SNMR pulse sequence phase cycling are disclosed, including SNMR acquisition apparatus and methods, SNMR processing apparatus and methods, and combinations thereof. SNMR acquisition may include transmitting two or more SNMR pulse sequences and applying a phase shift to a pulse in at least one of the pulse sequences, according to any of a variety cycling techniques. SNMR processing may include combining SNMR from a plurality of pulse sequences comprising pulses of different phases, so that desired signals are preserved and indesired signals are canceled.
Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition.

PubMed

Lewis, Leslie A; Astatke, Mekbib; Umekubo, Peter T; Alvi, Shaheen; Saby, Robert; Afrose, Jehan; Oliveira, Pedro H; Monteiro, Gabriel A; Prazeres, Duarte Mf

2012-01-26

Transposition in IS3, IS30, IS21 and IS256 insertion sequence (IS) families utilizes an unconventional two-step pathway. A figure-of-eight intermediate in Step I, from asymmetric single-strand cleavage and joining reactions, is converted into a double-stranded minicircle whose junction (the abutted left and right ends) is the substrate for symmetrical transesterification attacks on target DNA in Step II, suggesting intrinsically different synaptic complexes (SC) for each step. Transposases of these ISs bind poorly to cognate DNA and comparative biophysical analyses of SC I and SC II have proven elusive. We have prepared a native, soluble, active, GFP-tagged fusion derivative of the IS2 transposase that creates fully formed complexes with single-end and minicircle junction (MCJ) substrates and used these successfully in hydroxyl radical footprinting experiments. In IS2, Step I reactions are physically and chemically asymmetric; the left imperfect, inverted repeat (IRL), the exclusive recipient end, lacks donor function. In SC I, different protection patterns of the cleavage domains (CDs) of the right imperfect inverted repeat (IRR; extensive in cis) and IRL (selective in trans) at the single active cognate IRR catalytic center (CC) are related to their donor and recipient functions. In SC II, extensive binding of the IRL CD in trans and of the abutted IRR CD in cis at this CC represents the first phase of the complex. An MCJ substrate precleaved at the 3' end of IRR revealed a temporary transition state with the IRL CD disengaged from the protein. We propose that in SC II, sequential 3' cleavages at the bound abutted CDs trigger a conformational change, allowing the IRL CD to complex to its cognate CC, producing the second phase. Corroborating data from enhanced residues and curvature propensity plots suggest that CD to CD interactions in SC I and SC II require IRL to assume a bent structure, to facilitate binding in trans. Different transpososomes are assembled in
Chameleon sequences in neurodegenerative diseases.

PubMed

Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Salari, Ali

2016-03-25

Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to "helix to strand (HE)", "helix to coil (HC)" and "strand to coil (CE)" alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases. Copyright © 2016 Elsevier Inc. All rights reserved.
Chameleon sequences in neurodegenerative diseases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bahramali, Golnaz; Goliaei, Bahram, E-mail: goliaei@ut.ac.ir; Minuchehr, Zarrin, E-mail: minuchehr@nigeb.ac.ir

2016-03-25

Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to “helix to strand (HE)”, “helix tomore » coil (HC)” and “strand to coil (CE)” alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.« less
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

PubMed Central

Pruitt, Kim D.; Tatusova, Tatiana; Maglott, Donna R.

2005-01-01

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248

Rover Sequencing and Visualization Program

NASA Technical Reports Server (NTRS)

Cooper, Brian; Hartman, Frank; Maxwell, Scott; Yen, Jeng; Wright, John; Balacuit, Carlos

2005-01-01

The Rover Sequencing and Visualization Program (RSVP) is the software tool for use in the Mars Exploration Rover (MER) mission for planning rover operations and generating command sequences for accomplishing those operations. RSVP combines three-dimensional (3D) visualization for immersive exploration of the operations area, stereoscopic image display for high-resolution examination of the downlinked imagery, and a sophisticated command-sequence editing tool for analysis and completion of the sequences. RSVP is linked with actual flight-code modules for operations rehearsal to provide feedback on the expected behavior of the rover prior to committing to a particular sequence. Playback tools allow for review of both rehearsed rover behavior and downlinked results of actual rover operations. These can be displayed simultaneously for comparison of rehearsed and actual activities for verification. The primary inputs to RSVP are downlink data products from the Operations Storage Server (OSS) and activity plans generated by the science team. The activity plans are high-level goals for the next day s activities. The downlink data products include imagery, terrain models, and telemetered engineering data on rover activities and state. The Rover Sequence Editor (RoSE) component of RSVP performs activity expansion to command sequences, command creation and editing with setting of command parameters, and viewing and management of rover resources. The HyperDrive component of RSVP performs 2D and 3D visualization of the rover s environment, graphical and animated review of rover-predicted and telemetered state, and creation and editing of command sequences related to mobility and Instrument Deployment Device (IDD) operations. Additionally, RoSE and HyperDrive together evaluate command sequences for potential violations of flight and safety rules. The products of RSVP include command sequences for uplink that are stored in the Distributed Object Manager (DOM) and predicted rover
Full genome sequence of Rocio virus reveal substantial variations from the prototype Rocio virus SPH 34675 sequence.

PubMed

Setoh, Yin Xiang; Amarilla, Alberto A; Peng, Nias Y; Slonchak, Andrii; Periasamy, Parthiban; Figueiredo, Luiz T M; Aquino, Victor H; Khromykh, Alexander A

2018-01-01

Rocio virus (ROCV) is an arbovirus belonging to the genus Flavivirus, family Flaviviridae. We present an updated sequence of ROCV strain SPH 34675 (GenBank: AY632542.4), the only available full genome sequence prior to this study. Using next-generation sequencing of the entire genome, we reveal substantial sequence variation from the prototype sequence, with 30 nucleotide differences amounting to 14 amino acid changes, as well as significant changes to predicted 3'UTR RNA structures. Our results present an updated and corrected sequence of a potential emerging human-virulent flavivirus uniquely indigenous to Brazil (GenBank: MF461639).
Comparison of ZP3 protein sequences among vertebrate species: to obtain a consensus sequence for immunocontraception.

PubMed

Zhu, X; Naz, R K

1999-03-01

The deduced ZP3 amino acid (aa) sequences of 13 vertebrate species namely mouse, hamster, rabbit, pig, porcine, cow, dog, cat, human, bonnet, marmoset, carp, and frog were compared using the PILEUP and PRETTY alignment programs (GCG, Wisconsin, USA). The published aa sequences obtained from 13 vertebrate species indicated the overall evolutionarily conservation in the N-terminus, central region, and C-terminus of the ZP3 polypeptide. More variations of ZP3 polypeptide sequences were seen in the alignments of carp and frog from the 11 mammalian species making the leader sequence more prominent. The canonical furin proteolytic processing signal at the C-terminus was found in all the ZP3 polypeptide sequences except of carp and frog. In the central region, the ZP3 deduced aa sequences of all the 13 vertebrate species aligned well, and six relatively conserved sequences were found. There are 11 conserved cysteine residues in the central region across all species including carp and frog, indicating that these residues have longer evolutionary history. The ZP3 aa sequence similarities were examined using the GAP program (GCG). The highest aa similarities are observed between the members of the same order within the class mammalia, and also (95.4%) between pig (ungulata) and rabbit (lagomorpha). The deduced ZP3 aa sequences per se may not be enough to build a phylogenetic tree.
International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

PubMed Central

Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.

2015-01-01

This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030
Sequencing Adventure Activities: A New Perspective.

ERIC Educational Resources Information Center

Bisson, Christian

Sequencing in adventure education involves putting activities in an order appropriate to the needs of the group. Contrary to the common assumption that each adventure sequence is unique, a review of literature concerning five sequencing models reveals a certain universality. These models present sequences that move through four phases: group…
Implicit Sequence Learning in Dyslexia: A Within-Sequence Comparison of First- and Higher-Order Information

ERIC Educational Resources Information Center

Du, Wenchong; Kelly, Steve W.

2013-01-01

The present study examines implicit sequence learning in adult dyslexics with a focus on comparing sequence transitions with different statistical complexities. Learning of a 12-item deterministic sequence was assessed in 12 dyslexic and 12 non-dyslexic university students. Both groups showed equivalent standard reaction time increments when the…
Teaching Task Sequencing via Verbal Mediation.

ERIC Educational Resources Information Center

Rusch, Frank R.; And Others

1987-01-01

Verbal sequence training was used to teach a moderately mentally retarded woman to sequence job-related tasks. Learning to say the tasks in the proper sequence resulted in the employee performing her tasks in that sequence, and the employee was capable of mediating her own work behavior when scheduled changes occurred. (Author/JDD)
Compressing DNA sequence databases with coil.

PubMed

White, W Timothy J; Hendy, Michael D

2008-05-20

Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression - an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression - the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.
Elimination sequence optimization for SPAR

NASA Technical Reports Server (NTRS)

Hogan, Harry A.

1986-01-01

SPAR is a large-scale computer program for finite element structural analysis. The program allows user specification of the order in which the joints of a structure are to be eliminated since this order can have significant influence over solution performance, in terms of both storage requirements and computer time. An efficient elimination sequence can improve performance by over 50% for some problems. Obtaining such sequences, however, requires the expertise of an experienced user and can take hours of tedious effort to affect. Thus, an automatic elimination sequence optimizer would enhance productivity by reducing the analysts' problem definition time and by lowering computer costs. Two possible methods for automating the elimination sequence specifications were examined. Several algorithms based on the graph theory representations of sparse matrices were studied with mixed results. Significant improvement in the program performance was achieved, but sequencing by an experienced user still yields substantially better results. The initial results provide encouraging evidence that the potential benefits of such an automatic sequencer would be well worth the effort.
Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology

Treesearch

Richard Cronn; Aaron Liston; Matthew Parks; David S. Gernandt; Rongkun Shen; Todd Mockler

2008-01-01

Organellar DNA sequences are widely used in evolutionary and population genetic studies; however, the conservative nature of chloroplast gene and genome evolution often limits phylogenetic resolution and statistical power. To gain maximal access to the historical record contained within chloroplast genomes, we have adapted multiplex sequencing-by-synthesis (MSBS) to...
Cerebellar activation during motor sequence learning is associated with subsequent transfer to new sequences.

PubMed

Shimizu, Renee E; Wu, Allan D; Knowlton, Barbara J

2016-12-01

Effective learning results not only in improved performance on a practiced task, but also in the ability to transfer the acquired knowledge to novel, similar tasks. Using a modified serial reaction time (RT) task, the authors examined the ability to transfer to novel sequences after practicing sequences in a repetitive order versus a nonrepeating interleaved order. Interleaved practice resulted in better performance on new sequences than repetitive practice. In a second study, participants practiced interleaved sequences in a functional MRI (fMRI) scanner and received a transfer test of novel sequences. Transfer ability was positively correlated with cerebellar blood oxygen level dependent activity during practice, indicating that greater cerebellar engagement during training resulted in better subsequent transfer performance. Interleaved practice may thus result in a more generalized representation that is robust to interference, and the degree of activation in the cerebellum may be a reflection of the instantiation and engagement of internal models. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Two DNA-binding factors recognize specific sequences at silencers, upstream activating sequences, autonomously replicating sequences, and telomeres in Saccharomyces cerevisiae

DOE Office of Scientific and Technical Information (OSTI.GOV)

Buchman, A.R.; Kimmerly, W.J.; Rine, J.

1988-01-01

Two DNA-binding factors from Saccharomyces cerevisiae have been characterized, GRFI (general regulatory factor I) and ABFI (ARS-binding factor I), that recognize specific sequences within diverse genetic elements. GRFI bound to sequences at the negative regulatory elements (silencers) of the silent mating type loci HML E and HMR E and to the upstream activating sequence (UAS) required for transcription of the MAT ..cap alpha.. genes. A putative conserved UAS located at genes involved in translation (RPG box) was also recognized by GRFI. In addition, GRFI bound with high affinity to sequences within the (C/sub 1-3/A)-repeat region at yeast telomeres. Binding sitesmore » for GRFI with the highest affinity appeared to be of the form 5'-(A/G)(A/C)ACCCAN NCA(T/C)(T/C)-3', where N is any nucleotide. ABFI-binding sites were located next to autonomously replicating sequences (ARSs) at controlling elements of the silent mating type loci HMR E, HMR I, and HML I and were associated with ARS1, ARS2, and the 2..mu..m plasmid ARS. Two tandem ABFI binding sites were found between the HIS3 and DED1 genes, several kilobase pairs from any ARS, indicating that ABFI-binding sites are not restricted to ARSs. The sequences recognized by AFBI showed partial dyad-symmetry and appeared to be variations of the consensus 5'-TATCATTNNNNACGA-3'. GRFI and ABFI were both abundant DNA-binding factors and did not appear to be encoded by the SIR genes, whose product are required for repression of the silent mating type loci. Together, these results indicate that both GRFI and ABFI play multiple roles within the cell.« less
Sequencing Technologies Panel at SFAF

DOE Office of Scientific and Technical Information (OSTI.GOV)

Turner, Steve; Fiske, Haley; Knight, Jim

2010-06-02

From left to right: Steve Turner of Pacific Biosciences, Haley Fiske of Illumina, Jim Knight of Roche, Michael Rhodes of Life Technologies and Peter Vander Horn of Life Technologies' Single Molecule Sequencing group discuss new sequencing technologies and applications on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM
Genomic sequencing of Pleistocene cave bears

DOE Office of Scientific and Technical Information (OSTI.GOV)

Noonan, James P.; Hofreiter, Michael; Smith, Doug

2005-04-01

Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome,more » the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.« less
Sequence independent amplification of DNA

DOEpatents

Bohlander, S.K.

1998-03-24

The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example, the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei. 25 figs.
Sequence independent amplification of DNA

DOEpatents

Bohlander, Stefan K.

1998-01-01

The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei.
Establishing homologies in protein sequences

NASA Technical Reports Server (NTRS)

Dayhoff, M. O.; Barker, W. C.; Hunt, L. T.

1983-01-01

Computer-based statistical techniques used to determine homologies between proteins occurring in different species are reviewed. The technique is based on comparison of two protein sequences, either by relating all segments of a given length in one sequence to all segments of the second or by finding the best alignment of the two sequences. Approaches discussed include selection using printed tabulations, identification of very similar sequences, and computer searches of a database. The use of the SEARCH, RELATE, and ALIGN programs (Dayhoff, 1979) is explained; sample data are presented in graphs, diagrams, and tables and the construction of scoring matrices is considered.
A simplified Sanger sequencing method for full genome sequencing of multiple subtypes of human influenza A viruses.

PubMed

Deng, Yi-Mo; Spirason, Natalie; Iannello, Pina; Jelley, Lauren; Lau, Hilda; Barr, Ian G

2015-07-01

Full genome sequencing of influenza A viruses (IAV), including those that arise from annual influenza epidemics, is undertaken to determine if reassorting has occurred or if other pathogenic traits are present. Traditionally IAV sequencing has been biased toward the major surface glycoproteins haemagglutinin and neuraminidase, while the internal genes are often ignored. Despite the development of next generation sequencing (NGS), many laboratories are still reliant on conventional Sanger sequencing to sequence IAV. To develop a minimal and robust set of primers for Sanger sequencing of the full genome of IAV currently circulating in humans. A set of 13 primer pairs was designed that enabled amplification of the six internal genes of multiple human IAV subtypes including the recent avian influenza A(H7N9) virus from China. Specific primers were designed to amplify the HA and NA genes of each IAV subtype of interest. Each of the primers also incorporated a binding site at its 5'-end for either a forward or reverse M13 primer, such that only two M13 primers were required for all subsequent sequencing reactions. This minimal set of primers was suitable for sequencing the six internal genes of all currently circulating human seasonal influenza A subtypes as well as the avian A(H7N9) viruses that have infected humans in China. This streamlined Sanger sequencing protocol could be used to generate full genome sequence data more rapidly and easily than existing influenza genome sequencing protocols. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Memory and learning with rapid audiovisual sequences

PubMed Central

Keller, Arielle S.; Sekuler, Robert

2015-01-01

We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed. PMID:26575193
Memory and learning with rapid audiovisual sequences.

PubMed

Keller, Arielle S; Sekuler, Robert

2015-01-01

We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed.

Shotgun Protein Sequencing with Meta-contig Assembly*

PubMed Central

Guthals, Adrian; Clauser, Karl R.; Bandeira, Nuno

2012-01-01

Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings. PMID:22798278
Shotgun protein sequencing with meta-contig assembly.

PubMed

Guthals, Adrian; Clauser, Karl R; Bandeira, Nuno

2012-10-01

Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.
Method and apparatus for biological sequence comparison

DOEpatents

Marr, T.G.; Chang, W.I.

1997-12-23

A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.
Method and apparatus for biological sequence comparison

DOEpatents

Marr, Thomas G.; Chang, William I-Wei

1997-01-01

A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence.
Genetic homogeneity among Leishmania (Leishmania) infantum isolates from dog and human samples in Belo Horizonte Metropolitan Area (BHMA), Minas Gerais, Brazil.

PubMed

da Silva, Thais Almeida Marques; Gomes, Luciana Inácia; Oliveira, Edward; Coura-Vital, Wendel; Silva, Letícia de Azevedo; Pais, Fabiano Sviatopolk-Mirsky; Ker, Henrique Gama; Reis, Alexandre Barbosa; Rabello, Ana; Carneiro, Mariangela

2015-04-15

Certain municipalities in the Belo Horizonte Metropolitan Area (BHMA), Minas Gerais, Brazil, have the highest human visceral leishmaniasis (VL) mortality rates in the country and also demonstrate high canine seropositivity. In Brazil, the etiologic agent of VL is Leishmania (Leishmania) infantum. The aim of this study was to evaluate the intraspecific genetic variability of parasites from humans and from dogs with different clinical forms of VL in five municipalities of BHMA using PCR-RFLP and two target genes: kinetoplast DNA (kDNA) and gp63. In total, 45 samples of DNA extracted from clinical samples (n = 35) or L. infantum culture (n = 10) were evaluated. These samples originated from three groups: adults (with or without Leishmania/HIV co-infection; n = 14), children (n = 18) and dogs (n = 13). The samples were amplified for the kDNA target using the MC1 and MC2 primers (447 bp), while the Sg1 and Sg2 (1330 bp) primers were used for the gp63 glycoprotein target gene. The restriction enzyme patterns of all the samples tested were monomorphic. These findings reveal a high degree of genetic homogeneity for the evaluated gene targets among L. infantum samples isolated from different hosts and representing different clinical forms of VL in the municipalities of BHMA studied.
Compressing DNA sequence databases with coil

PubMed Central

White, W Timothy J; Hendy, Michael D

2008-01-01

Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work. PMID:18489794
Application of population sequencing (POPSEQ) for ordering and inputting genotyping-by-sequencing markers in hexaploid wheat

USDA-ARS?s Scientific Manuscript database

The advancement of next-generation sequencing technologies in conjunction with new bioinformatics tools enabled fine-tuning of sequence-based high resolution mapping strategies for complex genomes. Although genotyping-by-sequencing (GBS) provides a large number of markers, its application for assoc...
DNA sequencing using polymerase substrate-binding kinetics

PubMed Central

Previte, Michael John Robert; Zhou, Chunhong; Kellinger, Matthew; Pantoja, Rigo; Chen, Cheng-Yao; Shi, Jin; Wang, BeiBei; Kia, Amirali; Etchin, Sergey; Vieceli, John; Nikoomanzar, Ali; Bomati, Erin; Gloeckner, Christian; Ronaghi, Mostafa; He, Molly Min

2015-01-01

Next-generation sequencing (NGS) has transformed genomic research by decreasing the cost of sequencing. However, whole-genome sequencing is still costly and complex for diagnostics purposes. In the clinical space, targeted sequencing has the advantage of allowing researchers to focus on specific genes of interest. Routine clinical use of targeted NGS mandates inexpensive instruments, fast turnaround time and an integrated and robust workflow. Here we demonstrate a version of the Sequencing by Synthesis (SBS) chemistry that potentially can become a preferred targeted sequencing method in the clinical space. This sequencing chemistry uses natural nucleotides and is based on real-time recording of the differential polymerase/DNA-binding kinetics in the presence of correct or mismatch nucleotides. This ensemble SBS chemistry has been implemented on an existing Illumina sequencing platform with integrated cluster amplification. We discuss the advantages of this sequencing chemistry for targeted sequencing as well as its limitations for other applications. PMID:25612848
Comparison of an In Vitro Diagnostic Next-Generation Sequencing Assay with Sanger Sequencing for HIV-1 Genotypic Resistance Testing.

PubMed

Tzou, Philip L; Ariyaratne, Pramila; Varghese, Vici; Lee, Charlie; Rakhmanaliev, Elian; Villy, Carolin; Yee, Meiqi; Tan, Kevin; Michel, Gerd; Pinsky, Benjamin A; Shafer, Robert W

2018-06-01

The ability of next-generation sequencing (NGS) technologies to detect low frequency HIV-1 drug resistance mutations (DRMs) not detected by dideoxynucleotide Sanger sequencing has potential advantages for improved patient outcomes. We compared the performance of an in vitro diagnostic (IVD) NGS assay, the Sentosa SQ HIV genotyping assay for HIV-1 genotypic resistance testing, with Sanger sequencing on 138 protease/reverse transcriptase (RT) and 39 integrase sequences. The NGS assay used a 5% threshold for reporting low-frequency variants. The level of complete plus partial nucleotide sequence concordance between Sanger sequencing and NGS was 99.9%. Among the 138 protease/RT sequences, a mean of 6.4 DRMs was identified by both Sanger and NGS, a mean of 0.5 DRM was detected by NGS alone, and a mean of 0.1 DRM was detected by Sanger sequencing alone. Among the 39 integrase sequences, a mean of 1.6 DRMs was detected by both Sanger sequencing and NGS and a mean of 0.15 DRM was detected by NGS alone. Compared with Sanger sequencing, NGS estimated higher levels of resistance to one or more antiretroviral drugs for 18.2% of protease/RT sequences and 5.1% of integrase sequences. There was little evidence for technical artifacts in the NGS sequences, but the G-to-A hypermutation was detected in three samples. In conclusion, the IVD NGS assay evaluated in this study was highly concordant with Sanger sequencing. At the 5% threshold for reporting minority variants, NGS appeared to attain a modestly increased sensitivity for detecting low-frequency DRMs without compromising sequence accuracy. Copyright © 2018 American Society for Microbiology.
Lichenase and coding sequences

DOEpatents

Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

2000-08-15

The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.
Sequences, Series, and Mathematica.

ERIC Educational Resources Information Center

Mathews, John H.

1992-01-01

Describes how the computer algebra system Mathematica can be used to enhance the teaching of the topics of sequences and series. Examines its capabilities to find exact, approximate, and graphically generated approximate solutions to problems from these topics and to understand proofs about sequences. (MDH)
Metagenomic ventures into outer sequence space.

PubMed

Dutilh, Bas E

Sequencing DNA or RNA directly from the environment often results in many sequencing reads that have no homologs in the database. These are referred to as "unknowns," and reflect the vast unexplored microbial sequence space of our biosphere, also known as "biological dark matter." However, unknowns also exist because metagenomic datasets are not optimally mined. There is a pressure on researchers to publish and move on, and the unknown sequences are often left for what they are, and conclusions drawn based on reads with annotated homologs. This can cause abundant and widespread genomes to be overlooked, such as the recently discovered human gut bacteriophage crAssphage. The unknowns may be enriched for bacteriophage sequences, the most abundant and genetically diverse component of the biosphere and of sequence space. However, it remains an open question, what is the actual size of biological sequence space? The de novo assembly of shotgun metagenomes is the most powerful tool to address this question.
[Target gene sequence capture and next generation sequencing technology to diagnose four children with Alagille syndrome].

PubMed

Gao, M L; Zhong, X M; Ma, X; Ning, H J; Zhu, D; Zou, J Z

2016-06-02

To make genetic diagnosis of Alagille syndrome (ALGS) patients using target gene sequence capture and next generation sequencing technology. Target gene sequence capture and next generation sequencing were used to detect ALGS gene of 4 patients. They were hospitalized at the Affiliated Hospital, Capital Institute of Pediatrics between January 2014 and December 2015, referred to clinical diagnosis of ALGS typical and atypical respectively in 2 cases. Blood samples were collected from patients and their parents and genomic DNA was extracted from lymphocytes. Target gene sequence capture and next generation sequencing was detected. Sanger sequencing was used to confirm the results of the patients and their parents. Cholestasis, heart defects, inverted triangular face and butterfly vertebrae were presented as main clinical features in 4 male patients. The first hospital visiting ages ranged from 3 months and 14 days to 3 years and 1 month. The age of onset ranged from 3 days to 42 days (median 23 days). According to the clinical diagnostic criteria of ALGS, patient 1 and patient 2 were considered as typical ALGS. The other 2 patients were considered as atypical ALGS. Four Jagged 1(JAG1) pathogenic mutations were detected. Three different missense mutations were detected in patient 1 to patient 3 with ALGS(c.839C>T(p.W280X), c. 703G>A(p.R235X), c. 1720C>T(p.V574M)). The JAG1 mutation of patient 3 was first reported. Patient 4 had one novel insertion mutation (c.1779_1780insA(p.Ile594AsnfsTer23)). Parental analysis verified that the JAG1 missense mutation of 3 patients were de novo. The results of sanger sequencing was consistent with the results of the next generation sequencing. Target gene sequence capture combined with next generation sequencing can detect two pathogenic genes in ALGS and test genes of other related diseases in infantile cholestatic diseases simultaneously and presents a high throughput, high efficiency and low cost. It may provide molecular
Archaebacterial rhodopsin sequences: Implications for evolution

NASA Technical Reports Server (NTRS)

Lanyi, J. K.

1991-01-01

It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.
Experimental investigation of an RNA sequence space

NASA Technical Reports Server (NTRS)

Lee, Youn-Hyung; Dsouza, Lisa; Fox, George E.

1993-01-01

Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring, and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs.
Haemagglutinin and neuraminidase sequencing delineate nosocomial influenza outbreaks with accuracy equivalent to whole genome sequencing.

PubMed

Houghton, Rebecca; Ellis, Joanna; Galiano, Monica; Clark, Tristan W; Wyllie, Sarah

2017-04-01

We describe haemagglutinin (HA) and neuraminidase (NA) sequencing in an apparent cross-site influenza A(H1N1) outbreak in renal transplant and haemodialysis patients, confirmed with whole genome sequencing (WGS). Isolates were sequenced from influenza positive individuals. Phylogenetic trees were constructed using HA and NA sequencing and subsequently WGS. Sequence data was analysed to determine genetic relatedness of viruses obtained from inpatient and outpatient cohorts and compared with epidemiological outbreak information. There were 6 patient cases of influenza in the inpatient renal ward cohort (associated with 3 deaths) and 9 patient cases in the outpatient haemodialysis unit cohort (no deaths). WGS confirmed clustered transmission of two genetically different influenza A(H1N1)pdm09 strains initially identified by analysis of HA and NA genes. WGS took longer, and in this case was not required to determine whether or not the two seemingly linked outbreaks were related. Rapid sequencing of HA and NA genes may be sufficient to aid early influenza outbreak investigation making it appealing for future outbreak investigation. However, as next generation sequencing becomes cheaper and more widely available and bioinformatics software is now freely accessible next generation whole genome analysis may increasingly become a valuable tool for real-time Influenza outbreak investigation. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
Representations of mechanical assembly sequences

NASA Technical Reports Server (NTRS)

Homem De Mello, Luiz S.; Sanderson, Arthur C.

1991-01-01

Five types of representations for assembly sequences are reviewed: the directed graph of feasible assembly sequences, the AND/OR graph of feasible assembly sequences, the set of establishment conditions, and two types of sets of precedence relationships. (precedence relationships between the establishment of one connection between parts and the establishment of another connection, and precedence relationships between the establishment of one connection and states of the assembly process). The mappings of one representation into the others are established. The correctness and completeness of these representations are established. The results presented are needed in the proof of correctness and completeness of algorithms for the generation of mechanical assembly sequences.
Biosensors for DNA sequence detection

NASA Technical Reports Server (NTRS)

Vercoutere, Wenonah; Akeson, Mark

2002-01-01

DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.
Analysis of Pteridium ribosomal RNA sequences by rapid direct sequencing.

PubMed

Tan, M K

1991-08-01

A total of 864 bases from 5 regions interspersed in the 18S and 26S rRNA molecules from various clones of Pteridium covering the general geographical distribution of the genus was analysed using a rapid rRNA sequencing technique. No base difference has been detected amongst the three major lineages, two of which apparently separated before the breakup of the ancient supercontinent, Pangaea. These regions of the rRNA sequences have thus been conserved for at least 160 million years and are here compared with other eukaryotic, especially plant rRNAs.
DNA sequence analysis of ARS elements from chromosome III of Saccharomyces cerevisiae: identification of a new conserved sequence.

PubMed Central

Palzkill, T G; Oliver, S G; Newlon, C S

1986-01-01

Four fragments of Saccharomyces cerevisiae chromosome III DNA which carry ARS elements have been sequenced. Each fragment contains multiple copies of sequences that have at least 10 out of 11 bases of homology to a previously reported 11 bp core consensus sequence. A survey of these new ARS sequences and previously reported sequences revealed the presence of an additional 11 bp conserved element located on the 3' side of the T-rich strand of the core consensus. Subcloning analysis as well as deletion and transposon insertion mutagenesis of ARS fragments support a role for 3' conserved sequence in promoting ARS activity. PMID:3529036

The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

PubMed

Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske

2007-02-14

The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses
Palindromic Sequence Artifacts Generated during Next Generation Sequencing Library Preparation from Historic and Ancient DNA

PubMed Central

Star, Bastiaan; Nederbragt, Alexander J.; Hansen, Marianne H. S.; Skage, Morten; Gilfillan, Gregor D.; Bradbury, Ian R.; Pampoulie, Christophe; Stenseth, Nils Chr; Jakobsen, Kjetill S.; Jentoft, Sissel

2014-01-01

Degradation-specific processes and variation in laboratory protocols can bias the DNA sequence composition from samples of ancient or historic origin. Here, we identify a novel artifact in sequences from historic samples of Atlantic cod (Gadus morhua), which forms interrupted palindromes consisting of reverse complementary sequence at the 5′ and 3′-ends of sequencing reads. The palindromic sequences themselves have specific properties – the bases at the 5′-end align well to the reference genome, whereas extensive misalignments exists among the bases at the terminal 3′-end. The terminal 3′ bases are artificial extensions likely caused by the occurrence of hairpin loops in single stranded DNA (ssDNA), which can be ligated and amplified in particular library creation protocols. We propose that such hairpin loops allow the inclusion of erroneous nucleotides, specifically at the 3′-end of DNA strands, with the 5′-end of the same strand providing the template. We also find these palindromes in previously published ancient DNA (aDNA) datasets, albeit at varying and substantially lower frequencies. This artifact can negatively affect the yield of endogenous DNA in these types of samples and introduces sequence bias. PMID:24608104
SOBA: sequence ontology bioinformatics analysis.

PubMed

Moore, Barry; Fan, Guozhen; Eilbeck, Karen

2010-07-01

The advent of cheaper, faster sequencing technologies has pushed the task of sequence annotation from the exclusive domain of large-scale multi-national sequencing projects to that of research laboratories and small consortia. The bioinformatics burden placed on these laboratories, some with very little programming experience can be daunting. Fortunately, there exist software libraries and pipelines designed with these groups in mind, to ease the transition from an assembled genome to an annotated and accessible genome resource. We have developed the Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome. We envisage its use during annotation jamborees, genome comparison and for use by developers for rapid feedback during annotation software development and testing. SOBA also provides annotation consistency feedback to ensure correct use of terminology within annotations, and guides users to add new terms to the Sequence Ontology when required. SOBA is available at http://www.sequenceontology.org/cgi-bin/soba.cgi.
A measurement of disorder in binary sequences

NASA Astrophysics Data System (ADS)

Gong, Longyan; Wang, Haihong; Cheng, Weiwen; Zhao, Shengmei

2015-03-01

We propose a complex quantity, AL, to characterize the degree of disorder of L-length binary symbolic sequences. As examples, we respectively apply it to typical random and deterministic sequences. One kind of random sequences is generated from a periodic binary sequence and the other is generated from the logistic map. The deterministic sequences are the Fibonacci and Thue-Morse sequences. In these analyzed sequences, we find that the modulus of AL, denoted by |AL | , is a (statistically) equivalent quantity to the Boltzmann entropy, the metric entropy, the conditional block entropy and/or other quantities, so it is a useful quantitative measure of disorder. It can be as a fruitful index to discern which sequence is more disordered. Moreover, there is one and only one value of |AL | for the overall disorder characteristics. It needs extremely low computational costs. It can be easily experimentally realized. From all these mentioned, we believe that the proposed measure of disorder is a valuable complement to existing ones in symbolic sequences.
Prefrontal neural correlates of memory for sequences.

PubMed

Averbeck, Bruno B; Lee, Daeyeol

2007-02-28

The sequence of actions appropriate to solve a problem often needs to be discovered by trial and error and recalled in the future when faced with the same problem. Here, we show that when monkeys had to discover and then remember a sequence of decisions across trials, ensembles of prefrontal cortex neurons reflected the sequence of decisions the animal would make throughout the interval between trials. This signal could reflect either an explicit memory process or a sequence-planning process that begins far in advance of the actual sequence execution. This finding extended to error trials such that, when the neural activity during the intertrial interval specified the wrong sequence, the animal also attempted to execute an incorrect sequence. More specifically, we used a decoding analysis to predict the sequence the monkey was planning to execute at the end of the fore-period, just before sequence execution. When this analysis was applied to error trials, we were able to predict where in the sequence the error would occur, up to three movements into the future. This suggests that prefrontal neural activity can retain information about sequences between trials, and that regardless of whether information is remembered correctly or incorrectly, the prefrontal activity veridically reflects the animal's action plan.
Cosmetology: Scope and Sequence.

ERIC Educational Resources Information Center

Nashville - Davidson County Metropolitan Public Schools, TN.

This scope and sequence guide, developed for a cosmetology vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…
Microfluidic droplet enrichment for targeted sequencing

PubMed Central

Eastburn, Dennis J.; Huang, Yong; Pellegrino, Maurizio; Sciambi, Adam; Ptáček, Louis J.; Abate, Adam R.

2015-01-01

Targeted sequence enrichment enables better identification of genetic variation by providing increased sequencing coverage for genomic regions of interest. Here, we report the development of a new target enrichment technology that is highly differentiated from other approaches currently in use. Our method, MESA (Microfluidic droplet Enrichment for Sequence Analysis), isolates genomic DNA fragments in microfluidic droplets and performs TaqMan PCR reactions to identify droplets containing a desired target sequence. The TaqMan positive droplets are subsequently recovered via dielectrophoretic sorting, and the TaqMan amplicons are removed enzymatically prior to sequencing. We demonstrated the utility of this approach by generating an average 31.6-fold sequence enrichment across 250 kb of targeted genomic DNA from five unique genomic loci. Significantly, this enrichment enabled a more comprehensive identification of genetic polymorphisms within the targeted loci. MESA requires low amounts of input DNA, minimal prior locus sequence information and enriches the target region without PCR bias or artifacts. These features make it well suited for the study of genetic variation in a number of research and diagnostic applications. PMID:25873629
Winnowing DNA for Rare Sequences: Highly Specific Sequence and Methylation Based Enrichment

PubMed Central

Thompson, Jason D.; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre

2012-01-01

Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue. PMID:22355378
Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

PubMed

Thompson, Jason D; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre

2012-01-01

Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.
Orthogonal Polynomials Associated with Complementary Chain Sequences

NASA Astrophysics Data System (ADS)

Behera, Kiran Kumar; Sri Ranga, A.; Swaminathan, A.

2016-07-01

Using the minimal parameter sequence of a given chain sequence, we introduce the concept of complementary chain sequences, which we view as perturbations of chain sequences. Using the relation between these complementary chain sequences and the corresponding Verblunsky coefficients, the para-orthogonal polynomials and the associated Szegő polynomials are analyzed. Two illustrations, one involving Gaussian hypergeometric functions and the other involving Carathéodory functions are also provided. A connection between these two illustrations by means of complementary chain sequences is also observed.
Quantum sequencing: opportunities and challenges

NASA Astrophysics Data System (ADS)

di Ventra, Massimiliano

Personalized or precision medicine refers to the ability of tailoring drugs to the specific genome and transcriptome of each individual. It is however not yet feasible due the high costs and slow speed of present DNA sequencing methods. I will discuss a sequencing protocol that requires the measurement of the distributions of transverse tunneling currents during the translocation of single-stranded DNA into nanochannels. I will show that such a quantum sequencing approach can reach unprecedented speeds, without requiring any chemical preparation, amplification or labeling. I will discuss recent experiments that support these theoretical predictions, the advantages of this approach over other sequencing methods, and stress the challenges that need to be overcome to render it commercially viable.
Integer sequence discovery from small graphs

PubMed Central

Hoppe, Travis; Petrone, Anna

2015-01-01

We have exhaustively enumerated all simple, connected graphs of a finite order and have computed a selection of invariants over this set. Integer sequences were constructed from these invariants and checked against the Online Encyclopedia of Integer Sequences (OEIS). 141 new sequences were added and six sequences were extended. From the graph database, we were able to programmatically suggest relationships among the invariants. It will be shown that we can readily visualize any sequence of graphs with a given criteria. The code has been released as an open-source framework for further analysis and the database was constructed to be extensible to invariants not considered in this work. PMID:27034526
Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

PubMed

Schirmer, Melanie; Ijaz, Umer Z; D'Amore, Rosalinda; Hall, Neil; Sloan, William T; Quince, Christopher

2015-03-31

With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing

PubMed Central

Green, Richard E.; Malaspinas, Anna-Sapfo; Krause, Johannes; Briggs, Adrian W.; Johnson, Philip L. F.; Uhler, Caroline; Meyer, Matthias; Good, Jeffrey M.; Maricic, Tomislav; Stenzel, Udo; Prüfer, Kay; Siebauer, Michael; Burbano, Hernán A.; Ronan, Michael; Rothberg, Jonathan M.; Egholm, Michael; Rudan, Pavao; Brajković, Dejana; Kućan, Željko; Gušić, Ivan; Wikström, Mårten; Laakkonen, Liisa; Kelso, Janet; Slatkin, Montgomery; Pääbo, Svante

2008-01-01

Summary A complete mitochondrial (mt) genome sequence was reconstructed from a 38,000-year-old Neandertal individual using 8,341 mtDNA sequences identified among 4.8 Gb of DNA generated from ~0.3 grams of bone. Analysis of the assembled sequence unequivocally establishes that the Neandertal mtDNA falls outside the variation of extant human mtDNAs and allows an estimate of the divergence date between the two mtDNA lineages of 660,000±140,000 years. Of the 13 proteins encoded in the mtDNA, subunit 2 of cytochrome c oxidase of the mitochondrial electron transport chain has experienced the largest number of amino acid substitutions in human ancestors since the separation from Neandertals. There is evidence that purifying selection in the Neandertal mtDNA was reduced compared to other primate lineages suggesting that the effective population size of Neandertals was small. PMID:18692465
Graphene Nanopores for Protein Sequencing.

PubMed

Wilson, James; Sloman, Leila; He, Zhiren; Aksimentiev, Aleksei

2016-07-19

An inexpensive, reliable method for protein sequencing is essential to unraveling the biological mechanisms governing cellular behavior and disease. Current protein sequencing methods suffer from limitations associated with the size of proteins that can be sequenced, the time, and the cost of the sequencing procedures. Here, we report the results of all-atom molecular dynamics simulations that investigated the feasibility of using graphene nanopores for protein sequencing. We focus our study on the biologically significant phenylalanine-glycine repeat peptides (FG-nups)-parts of the nuclear pore transport machinery. Surprisingly, we found FG-nups to behave similarly to single stranded DNA: the peptides adhere to graphene and exhibit step-wise translocation when subject to a transmembrane bias or a hydrostatic pressure gradient. Reducing the peptide's charge density or increasing the peptide's hydrophobicity was found to decrease the translocation speed. Yet, unidirectional and stepwise translocation driven by a transmembrane bias was observed even when the ratio of charged to hydrophobic amino acids was as low as 1:8. The nanopore transport of the peptides was found to produce stepwise modulations of the nanopore ionic current correlated with the type of amino acids present in the nanopore, suggesting that protein sequencing by measuring ionic current blockades may be possible.
Sequences for Student Investigation

ERIC Educational Resources Information Center

Barton, Jeffrey; Feil, David; Lartigue, David; Mullins, Bernadette

2004-01-01

We describe two classes of sequences that give rise to accessible problems for undergraduate research. These problems may be understood with virtually no prerequisites and are well suited for computer-aided investigation. The first sequence is a variation of one introduced by Stephen Wolfram in connection with his study of cellular automata. The…
Agriculture: Scope and Sequence.

ERIC Educational Resources Information Center

Nashville - Davidson County Metropolitan Public Schools, TN.

This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 3-year program in agriculture. The guide consists of a course description; general course objectives;…
First complete genome sequence of infectious laryngotracheitis virus

PubMed Central

2011-01-01

Background Infectious laryngotracheitis virus (ILTV) is an alphaherpesvirus that causes acute respiratory disease in chickens worldwide. To date, only one complete genomic sequence of ILTV has been reported. This sequence was generated by concatenating partial sequences from six different ILTV strains. Thus, the full genomic sequence of a single (individual) strain of ILTV has not been determined previously. This study aimed to use high throughput sequencing technology to determine the complete genomic sequence of a live attenuated vaccine strain of ILTV. Results The complete genomic sequence of the Serva vaccine strain of ILTV was determined, annotated and compared to the concatenated ILTV reference sequence. The genome size of the Serva strain was 152,628 bp, with a G + C content of 48%. A total of 80 predicted open reading frames were identified. The Serva strain had 96.5% DNA sequence identity with the concatenated ILTV sequence. Notably, the concatenated ILTV sequence was found to lack four large regions of sequence, including 528 bp and 594 bp of sequence in the UL29 and UL36 genes, respectively, and two copies of a 1,563 bp sequence in the repeat regions. Considerable differences in the size of the predicted translation products of 4 other genes (UL54, UL30, UL37 and UL38) were also identified. More than 530 single-nucleotide polymorphisms (SNPs) were identified. Most SNPs were located within three genomic regions, corresponding to sequence from the SA-2 ILTV vaccine strain in the concatenated ILTV sequence. Conclusions This is the first complete genomic sequence of an individual ILTV strain. This sequence will facilitate future comparative genomic studies of ILTV by providing an appropriate reference sequence for the sequence analysis of other ILTV strains. PMID:21501528
DNA Sequencing Using capillary Electrophoresis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dr. Barry Karger

2011-05-09

The overall goal of this program was to develop capillary electrophoresis as the tool to be used to sequence for the first time the Human Genome. Our program was part of the Human Genome Project. In this work, we were highly successful and the replaceable polymer we developed, linear polyacrylamide, was used by the DOE sequencing lab in California to sequence a significant portion of the human genome using the MegaBase multiple capillary array electrophoresis instrument. In this final report, we summarize our efforts and success. We began our work by separating by capillary electrophoresis double strand oligonucleotides using cross-linkedmore » polyacrylamide gels in fused silica capillaries. This work showed the potential of the methodology. However, preparation of such cross-linked gel capillaries was difficult with poor reproducibility, and even more important, the columns were not very stable. We improved stability by using non-cross linked linear polyacrylamide. Here, the entangled linear chains could move when osmotic pressure (e.g. sample injection) was imposed on the polymer matrix. This relaxation of the polymer dissipated the stress in the column. Our next advance was to use significantly lower concentrations of the linear polyacrylamide that the polymer could be automatically blown out after each run and replaced with fresh linear polymer solution. In this way, a new column was available for each analytical run. Finally, while testing many linear polymers, we selected linear polyacrylamide as the best matrix as it was the most hydrophilic polymer available. Under our DOE program, we demonstrated initially the success of the linear polyacrylamide to separate double strand DNA. We note that the method is used even today to assay purity of double stranded DNA fragments. Our focus, of course, was on the separation of single stranded DNA for sequencing purposes. In one paper, we demonstrated the success of our approach in sequencing up to 500 bases. Other
Object tracking using plenoptic image sequences

NASA Astrophysics Data System (ADS)

Kim, Jae Woo; Bae, Seong-Joon; Park, Seongjin; Kim, Do Hyung

2017-05-01

Object tracking is a very important problem in computer vision research. Among the difficulties of object tracking, partial occlusion problem is one of the most serious and challenging problems. To address the problem, we proposed novel approaches to object tracking on plenoptic image sequences. Our approaches take advantage of the refocusing capability that plenoptic images provide. Our approaches input the sequences of focal stacks constructed from plenoptic image sequences. The proposed image selection algorithms select the sequence of optimal images that can maximize the tracking accuracy from the sequence of focal stacks. Focus measure approach and confidence measure approach were proposed for image selection and both of the approaches were validated by the experiments using thirteen plenoptic image sequences that include heavily occluded target objects. The experimental results showed that the proposed approaches were satisfactory comparing to the conventional 2D object tracking algorithms.

Construction and characterization of an in-vivo linear covalently closed DNA vector production system.

PubMed

Nafissi, Nafiseh; Slavcev, Roderick

2012-12-06

While safer than their viral counterparts, conventional non-viral gene delivery DNA vectors offer a limited safety profile. They often result in the delivery of unwanted prokaryotic sequences, antibiotic resistance genes, and the bacterial origins of replication to the target, which may lead to the stimulation of unwanted immunological responses due to their chimeric DNA composition. Such vectors may also impart the potential for chromosomal integration, thus potentiating oncogenesis. We sought to engineer an in vivo system for the quick and simple production of safer DNA vector alternatives that were devoid of non-transgene bacterial sequences and would lethally disrupt the host chromosome in the event of an unwanted vector integration event. We constructed a parent eukaryotic expression vector possessing a specialized manufactured multi-target site called "Super Sequence", and engineered E. coli cells (R-cell) that conditionally produce phage-derived recombinase Tel (PY54), TelN (N15), or Cre (P1). Passage of the parent plasmid vector through R-cells under optimized conditions, resulted in rapid, efficient, and one step in vivo generation of mini lcc--linear covalently closed (Tel/TelN-cell), or mini ccc--circular covalently closed (Cre-cell), DNA constructs, separated from the backbone plasmid DNA. Site-specific integration of lcc plasmids into the host chromosome resulted in chromosomal disruption and 10(5) fold lower viability than that seen with the ccc counterpart. We offer a high efficiency mini DNA vector production system that confers simple, rapid and scalable in vivo production of mini lcc DNA vectors that possess all the benefits of "minicircle" DNA vectors and virtually eliminate the potential for undesirable vector integration events.
Reading biological processes from nucleotide sequences

NASA Astrophysics Data System (ADS)

Murugan, Anand

Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical
Applications of Single-Cell Sequencing for Multiomics.

PubMed

Xu, Yungang; Zhou, Xiaobo

2018-01-01

Single-cell sequencing interrogates the sequence or chromatin information from individual cells with advanced next-generation sequencing technologies. It provides a higher resolution of cellular differences and a better understanding of the underlying genetic and epigenetic mechanisms of an individual cell in the context of its survival and adaptation to microenvironment. However, it is more challenging to perform single-cell sequencing and downstream data analysis, owing to the minimal amount of starting materials, sample loss, and contamination. In addition, due to the picogram level of the amount of nucleic acids used, heavy amplification is often needed during sample preparation of single-cell sequencing, resulting in the uneven coverage, noise, and inaccurate quantification of sequencing data. All these unique properties raise challenges in and thus high demands for computational methods that specifically fit single-cell sequencing data. We here comprehensively survey the current strategies and challenges for multiple single-cell sequencing, including single-cell transcriptome, genome, and epigenome, beginning with a brief introduction to multiple sequencing techniques for single cells.
SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly

PubMed Central

Wala, Jeremiah; Beroukhim, Rameen

2017-01-01

Abstract We present SeqLib, a C ++ API and command line tool that provides a rapid and user-friendly interface to BAM/SAM/CRAM files, global sequence alignment operations and sequence assembly. Four C libraries perform core operations in SeqLib: HTSlib for BAM access, BWA-MEM and BLAT for sequence alignment and Fermi for error correction and sequence assembly. Benchmarking indicates that SeqLib has lower CPU and memory requirements than leading C ++ sequence analysis APIs. We demonstrate an example of how minimal SeqLib code can extract, error-correct and assemble reads from a CRAM file and then align with BWA-MEM. SeqLib also provides additional capabilities, including chromosome-aware interval queries and read plotting. Command line tools are available for performing integrated error correction, micro-assemblies and alignment. Availability and Implementation: SeqLib is available on Linux and OSX for the C ++98 standard and later at github.com/walaj/SeqLib. SeqLib is released under the Apache2 license. Additional capabilities for BLAT alignment are available under the BLAT license. Contact: jwala@broadinstitue.org; rameen@broadinstitute.org PMID:28011768
Single-Cell Semiconductor Sequencing

PubMed Central

Kohn, Andrea B.; Moroz, Tatiana P.; Barnes, Jeffrey P.; Netherton, Mandy; Moroz, Leonid L.

2014-01-01

RNA-seq or transcriptome analysis of individual cells and small-cell populations is essential for virtually any biomedical field. It is especially critical for developmental, aging, and cancer biology as well as neuroscience where the enormous heterogeneity of cells present a significant methodological and conceptual challenge. Here we present two methods that allow for fast and cost-efficient transcriptome sequencing from ultra-small amounts of tissue or even from individual cells using semiconductor sequencing technology (Ion Torrent, Life Technologies). The first method is a reduced representation sequencing which maximizes capture of RNAs and preserves transcripts’ directionality. The second, a template-switch protocol, is designed for small mammalian neurons. Both protocols, from cell/tissue isolation to final sequence data, take up to 4 days. The efficiency of these protocols has been validated with single hippocampal neurons and various invertebrate tissues including individually identified neurons within a simpler memory-forming circuit of Aplysia californica and early (1-, 2-, 4-, 8-cells) embryonic and developmental stages from basal metazoans. PMID:23929110
Spaces of ideal convergent sequences.

PubMed

Mursaleen, M; Sharma, Sunil K

2014-01-01

In the present paper, we introduce some sequence spaces using ideal convergence and Musielak-Orlicz function ℳ = (M(k)). We also examine some topological properties of the resulting sequence spaces.
Automated Sequence Generation Process and Software

NASA Technical Reports Server (NTRS)

Gladden, Roy

2007-01-01

"Automated sequence generation" (autogen) signifies both a process and software used to automatically generate sequences of commands to operate various spacecraft. The autogen software comprises the autogen script plus the Activity Plan Generator (APGEN) program. APGEN can be used for planning missions and command sequences.
Optical Processing Techniques For Pseudorandom Sequence Prediction

NASA Astrophysics Data System (ADS)

Gustafson, Steven C.

1983-11-01

Pseudorandom sequences are series of apparently random numbers generated, for example, by linear or nonlinear feedback shift registers. An important application of these sequences is in spread spectrum communication systems, in which, for example, the transmitted carrier phase is digitally modulated rapidly and pseudorandomly and in which the information to be transmitted is incorporated as a slow modulation in the pseudorandom sequence. In this case the transmitted information can be extracted only by a receiver that uses for demodulation the same pseudorandom sequence used by the transmitter, and thus this type of communication system has a very high immunity to third-party interference. However, if a third party can predict in real time the probable future course of the transmitted pseudorandom sequence given past samples of this sequence, then interference immunity can be significantly reduced.. In this application effective pseudorandom sequence prediction techniques should be (1) applicable in real time to rapid (e.g., megahertz) sequence generation rates, (2) applicable to both linear and nonlinear pseudorandom sequence generation processes, and (3) applicable to error-prone past sequence samples of limited number and continuity. Certain optical processing techniques that may meet these requirements are discussed in this paper. In particular, techniques based on incoherent optical processors that perform general linear transforms or (more specifically) matrix-vector multiplications are considered. Computer simulation examples are presented which indicate that significant prediction accuracy can be obtained using these transforms for simple pseudorandom sequences. However, the useful prediction of more complex pseudorandom sequences will probably require the application of more sophisticated optical processing techniques.
Sequence invariant state machines

NASA Technical Reports Server (NTRS)

Whitaker, S.; Manjunath, S.

1990-01-01

A synthesis method and new VLSI architecture are introduced to realize sequential circuits that have the ability to implement any state machine having N states and m inputs, regardless of the actual sequence specified in the flow table. A design method is proposed that utilizes BTS logic to implement regular and dense circuits. A given state sequence can be programmed with power supply connections or dynamically reallocated if stored in a register. Arbitrary flow table sequences can be modified or programmed to dynamically alter the function of the machine. This allows VLSI controllers to be designed with the programmability of a general purpose processor but with the compact size and performance of dedicated logic.
Sequence-invariant state machines

NASA Technical Reports Server (NTRS)

Whitaker, Sterling R.; Manjunath, Shamanna K.; Maki, Gary K.

1991-01-01

A synthesis method and an MOS VLSI architecture are presented to realize sequential circuits that have the ability to implement any state machine having N states and m inputs, regardless of the actual sequence specified in the flow table. The design method utilizes binary tree structured (BTS) logic to implement regular and dense circuits. The desired state sequence can be hardwired with power supply connections or can be dynamically reallocated if stored in a register. This allows programmable VLSI controllers to be designed with a compact size and performance approaching that of dedicated logic. Results of ICV implementations are reported and an example sequence-invariant state machine is contrasted with implementations based on traditional methods.
Molecular beacon sequence design algorithm.

PubMed

Monroe, W Todd; Haselton, Frederick R

2003-01-01

A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.
Performance evaluation of Sanger sequencing for the diagnosis of primary hyperoxaluria and comparison with targeted next generation sequencing

PubMed Central

Williams, Emma L; Bagg, Eleanor A L; Mueller, Michael; Vandrovcova, Jana; Aitman, Timothy J; Rumsby, Gill

2015-01-01

Definitive diagnosis of primary hyperoxaluria (PH) currently utilizes sequential Sanger sequencing of the AGXT, GRPHR, and HOGA1 genes but efficacy is unproven. This analysis is time-consuming, relatively expensive, and delays in diagnosis and inappropriate treatment can occur if not pursued early in the diagnostic work-up. We reviewed testing outcomes of Sanger sequencing in 200 consecutive patient samples referred for analysis. In addition, the Illumina Truseq custom amplicon system was evaluated for paralleled next-generation sequencing (NGS) of AGXT,GRHPR, and HOGA1 in 90 known PH patients. AGXT sequencing was requested in all patients, permitting a diagnosis of PH1 in 50%. All remaining patients underwent targeted exon sequencing of GRHPR and HOGA1 with 8% diagnosed with PH2 and 8% with PH3. Complete sequencing of both GRHPR and HOGA1 was not requested in 25% of patients referred leaving their diagnosis in doubt. NGS analysis showed 98% agreement with Sanger sequencing and both approaches had 100% diagnostic specificity. Diagnostic sensitivity of Sanger sequencing was 98% and for NGS it was 97%. NGS has comparable diagnostic performance to Sanger sequencing for the diagnosis of PH and, if implemented, would screen for all forms of PH simultaneously ensuring prompt diagnosis at decreased cost. PMID:25629080
Modeling genome coverage in single-cell sequencing

PubMed Central

Daley, Timothy; Smith, Andrew D.

2014-01-01

Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873
SNP discovery through de novo deep sequencing using the next generation of DNA sequencers

USDA-ARS?s Scientific Manuscript database

The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....
Statistical properties of DNA sequences

NASA Technical Reports Server (NTRS)

Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

1995-01-01

We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.
Experimental Design-Based Functional Mining and Characterization of High-Throughput Sequencing Data in the Sequence Read Archive

PubMed Central

Nakazato, Takeru; Ohta, Tazro; Bono, Hidemasa

2013-01-01

High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called “Gendoo”. We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called “DBCLS SRA” (http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA. PMID:24167589
Comparison of Next-Generation Sequencing Systems

PubMed Central

Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie

2012-01-01

With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized. PMID:22829749
Multiple alignment-free sequence comparison

PubMed Central

Ren, Jie; Song, Kai; Sun, Fengzhu; Deng, Minghua; Reinert, Gesine

2013-01-01

Motivation: Recently, a range of new statistics have become available for the alignment-free comparison of two sequences based on k-tuple word content. Here, we extend these statistics to the simultaneous comparison of more than two sequences. Our suite of statistics contains, first, and , extensions of statistics for pairwise comparison of the joint k-tuple content of all the sequences, and second, , and , averages of sums of pairwise comparison statistics. The two tasks we consider are, first, to identify sequences that are similar to a set of target sequences, and, second, to measure the similarity within a set of sequences. Results: Our investigation uses both simulated data as well as cis-regulatory module data where the task is to identify cis-regulatory modules with similar transcription factor binding sites. We find that although for real data, all of our statistics show a similar performance, on simulated data the Shepp-type statistics are in some instances outperformed by star-type statistics. The multiple alignment-free statistics are more sensitive to contamination in the data than the pairwise average statistics. Availability: Our implementation of the five statistics is available as R package named ‘multiAlignFree’ at be http://www-rcf.usc.edu/∼fsun/Programs/multiAlignFree/multiAlignFreemain.html. Contact: reinert@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23990418
Gene finding in metatranscriptomic sequences.

PubMed

Ismail, Wazim Mohammed; Ye, Yuzhen; Tang, Haixu

2014-01-01

Metatranscriptomic sequencing is a highly sensitive bioassay of functional activity in a microbial community, providing complementary information to the metagenomic sequencing of the community. The acquisition of the metatranscriptomic sequences will enable us to refine the annotations of the metagenomes, and to study the gene activities and their regulation in complex microbial communities and their dynamics. In this paper, we present TransGeneScan, a software tool for finding genes in assembled transcripts from metatranscriptomic sequences. By incorporating several features of metatranscriptomic sequencing, including strand-specificity, short intergenic regions, and putative antisense transcripts into a Hidden Markov Model, TranGeneScan can predict a sense transcript containing one or multiple genes (in an operon) or an antisense transcript. We tested TransGeneScan on a mock metatranscriptomic data set containing three known bacterial genomes. The results showed that TranGeneScan performs better than metagenomic gene finders (MetaGeneMark and FragGeneScan) on predicting protein coding genes in assembled transcripts, and achieves comparable or even higher accuracy than gene finders for microbial genomes (Glimmer and GeneMark). These results imply, with the assistance of metatranscriptomic sequencing, we can obtain a broad and precise picture about the genes (and their functions) in a microbial community. TransGeneScan is available as open-source software on SourceForge at https://sourceforge.net/projects/transgenescan/.
HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment1

PubMed Central

Johnson, Matthew G.; Gardner, Elliot M.; Liu, Yang; Medina, Rafael; Goffinet, Bernard; Shaw, A. Jonathan; Zerega, Nyree J. C.; Wickett, Norman J.

2016-01-01

Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper. PMID:27437175

Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

PubMed

Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin

2007-12-01

Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide
Joint Sequence Analysis: Association and Clustering

ERIC Educational Resources Information Center

Piccarreta, Raffaella

2017-01-01

In its standard formulation, sequence analysis aims at finding typical patterns in a set of life courses represented as sequences. Recently, some proposals have been introduced to jointly analyze sequences defined on different domains (e.g., work career, partnership, and parental histories). We introduce measures to evaluate whether a set of…
Recursive sequences in first-year calculus

NASA Astrophysics Data System (ADS)

Krainer, Thomas

2016-02-01

This article provides ready-to-use supplementary material on recursive sequences for a second-semester calculus class. It equips first-year calculus students with a basic methodical procedure based on which they can conduct a rigorous convergence or divergence analysis of many simple recursive sequences on their own without the need to invoke inductive arguments as is typically required in calculus textbooks. The sequences that are accessible to this kind of analysis are predominantly (eventually) monotonic, but also certain recursive sequences that alternate around their limit point as they converge can be considered.
SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly.

PubMed

Wala, Jeremiah; Beroukhim, Rameen

2017-03-01

We present SeqLib, a C ++ API and command line tool that provides a rapid and user-friendly interface to BAM/SAM/CRAM files, global sequence alignment operations and sequence assembly. Four C libraries perform core operations in SeqLib: HTSlib for BAM access, BWA-MEM and BLAT for sequence alignment and Fermi for error correction and sequence assembly. Benchmarking indicates that SeqLib has lower CPU and memory requirements than leading C ++ sequence analysis APIs. We demonstrate an example of how minimal SeqLib code can extract, error-correct and assemble reads from a CRAM file and then align with BWA-MEM. SeqLib also provides additional capabilities, including chromosome-aware interval queries and read plotting. Command line tools are available for performing integrated error correction, micro-assemblies and alignment. SeqLib is available on Linux and OSX for the C ++98 standard and later at github.com/walaj/SeqLib. SeqLib is released under the Apache2 license. Additional capabilities for BLAT alignment are available under the BLAT license. jwala@broadinstitue.org ; rameen@broadinstitute.org. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Transcription blockage by homopurine DNA sequences: role of sequence composition and single-strand breaks

PubMed Central

Belotserkovskii, Boris P.; Neil, Alexander J.; Saleh, Syed Shayon; Shin, Jane Hae Soo; Mirkin, Sergei M.; Hanawalt, Philip C.

2013-01-01

The ability of DNA to adopt non-canonical structures can affect transcription and has broad implications for genome functioning. We have recently reported that guanine-rich (G-rich) homopurine-homopyrimidine sequences cause significant blockage of transcription in vitro in a strictly orientation-dependent manner: when the G-rich strand serves as the non-template strand [Belotserkovskii et al. (2010) Mechanisms and implications of transcription blockage by guanine-rich DNA sequences., Proc. Natl Acad. Sci. USA, 107, 12816–12821]. We have now systematically studied the effect of the sequence composition and single-stranded breaks on this blockage. Although substitution of guanine by any other base reduced the blockage, cytosine and thymine reduced the blockage more significantly than adenine substitutions, affirming the importance of both G-richness and the homopurine-homopyrimidine character of the sequence for this effect. A single-strand break in the non-template strand adjacent to the G-rich stretch dramatically increased the blockage. Breaks in the non-template strand result in much weaker blockage signals extending downstream from the break even in the absence of the G-rich stretch. Our combined data support the notion that transcription blockage at homopurine-homopyrimidine sequences is caused by R-loop formation. PMID:23275544
Overcoming Sequence Misalignments with Weighted Structural Superposition

PubMed Central

Khazanov, Nickolay A.; Damm-Ganamet, Kelly L.; Quang, Daniel X.; Carlson, Heather A.

2012-01-01

An appropriate structural superposition identifies similarities and differences between homologous proteins that are not evident from sequence alignments alone. We have coupled our Gaussian-weighted RMSD (wRMSD) tool with a sequence aligner and seed extension (SE) algorithm to create a robust technique for overlaying structures and aligning sequences of homologous proteins (HwRMSD). HwRMSD overcomes errors in the initial sequence alignment that would normally propagate into a standard RMSD overlay. SE can generate a corrected sequence alignment from the improved structural superposition obtained by wRMSD. HwRMSD’s robust performance and its superiority over standard RMSD are demonstrated over a range of homologous proteins. Its better overlay results in corrected sequence alignments with good agreement to HOMSTRAD. Finally, HwRMSD is compared to established structural alignment methods: FATCAT, SSM, CE, and Dalilite. Most methods are comparable at placing residue pairs within 2 Å, but HwRMSD places many more residue pairs within 1 Å, providing a clear advantage. Such high accuracy is essential in drug design, where small distances can have a large impact on computational predictions. This level of accuracy is also needed to correct sequence alignments in an automated fashion, especially for omics-scale analysis. HwRMSD can align homologs with low sequence identity and large conformational differences, cases where both sequence-based and structural-based methods may fail. The HwRMSD pipeline overcomes the dependency of structural overlays on initial sequence pairing and removes the need to determine the best sequence-alignment method, substitution matrix, and gap parameters for each unique pair of homologs. PMID:22733542
Complete genome sequence of Southern tomato virus naturally infecting tomatoes in Bangladesh using small RNA deep sequencing

USDA-ARS?s Scientific Manuscript database

The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...
Sequencing the Unrearranged Human Immunoglobin

DOE Office of Scientific and Technical Information (OSTI.GOV)

Warren, Rene

2010-06-03

Rene Warren from Canada's Michael Smith Genome Sciences Centre discusses sequencing and finishing the IgH heavy chain locus on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.
A Multicultural Sequence of Humanities Electives.

ERIC Educational Resources Information Center

Anderson, Gwendolyn; Ewing, Dessa

In order to promote multi-cultural literacy among its students, Delaware County Community College (DCCC) developed a multi-cultural sequence of humanities electives. The sequence emerged as a response to the predominantly White student body's lack of knowledge or curiosity about other cultures. The first of the four courses in the sequence is…
SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing.

PubMed

Jeong, Seongmun; Kim, Jiwoong; Park, Won; Jeon, Hongmin; Kim, Namshin

2017-01-01

Over the last decade, a large number of nucleotide sequences have been generated by next-generation sequencing technologies and deposited to public databases. However, most of these datasets do not specify the sex of individuals sampled because researchers typically ignore or hide this information. Male and female genomes in many species have distinctive sex chromosomes, XX/XY and ZW/ZZ, and expression levels of many sex-related genes differ between the sexes. Herein, we describe how to develop sex marker sequences from syntenic regions of sex chromosomes and use them to quickly identify the sex of individuals being analyzed. Array-based technologies routinely use either known sex markers or the B-allele frequency of X or Z chromosomes to deduce the sex of an individual. The same strategy has been used with whole-exome/genome sequence data; however, all reads must be aligned onto a reference genome to determine the B-allele frequency of the X or Z chromosomes. SEXCMD is a pipeline that can extract sex marker sequences from reference sex chromosomes and rapidly identify the sex of individuals from whole-exome/genome and RNA sequencing after training with a known dataset through a simple machine learning approach. The pipeline counts total numbers of hits from sex-specific marker sequences and identifies the sex of the individuals sampled based on the fact that XX/ZZ samples do not have Y or W chromosome hits. We have successfully validated our pipeline with mammalian (Homo sapiens; XY) and avian (Gallus gallus; ZW) genomes. Typical calculation time when applying SEXCMD to human whole-exome or RNA sequencing datasets is a few minutes, and analyzing human whole-genome datasets takes about 10 minutes. Another important application of SEXCMD is as a quality control measure to avoid mixing samples before bioinformatics analysis. SEXCMD comprises simple Python and R scripts and is freely available at https://github.com/lovemun/SEXCMD.
Sequence Complexity of Chromosome 3 in Caenorhabditis elegans

PubMed Central

Pierro, Gaetano

2012-01-01

The nucleotide sequences complexity in chromosome 3 of Caenorhabditis elegans (C. elegans) is studied. The complexity of these sequences is compared with some random sequences. Moreover, by using some parameters related to complexity such as fractal dimension and frequency, indicator matrix is given a first classification of sequences of C. elegans. In particular, the sequences with highest and lowest fractal value are singled out. It is shown that the intrinsic nature of the low fractal dimension sequences has many common features with the random sequences. PMID:22919380
FRESCO: Referential compression of highly similar sequences.

PubMed

Wandelt, Sebastian; Leser, Ulf

2013-01-01

In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.
SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

PubMed

Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

2008-05-01

Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are
SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

PubMed Central

Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

2008-01-01

Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of
Comparative performance of the BGISEQ-500 vs Illumina HiSeq2500 sequencing platforms for palaeogenomic sequencing

PubMed Central

Mak, Sarah Siu Tze; Gopalakrishnan, Shyam; Carøe, Christian; Geng, Chunyu; Liu, Shanlin; Sinding, Mikkel-Holger S; Kuderna, Lukas F K; Zhang, Wenwei; Fu, Shujin; Vieira, Filipe G; Germonpré, Mietje; Bocherens, Hervé; Fedorov, Sergey; Petersen, Bent; Sicheritz-Pontén, Thomas; Marques-Bonet, Tomas; Zhang, Guojie; Jiang, Hui; Gilbert, M Thomas P

2017-01-01

Abstract Ancient DNA research has been revolutionized following development of next-generation sequencing platforms. Although a number of such platforms have been applied to ancient DNA samples, the Illumina series are the dominant choice today, mainly because of high production capacities and short read production. Recently a potentially attractive alternative platform for palaeogenomic data generation has been developed, the BGISEQ-500, whose sequence output are comparable with the Illumina series. In this study, we modified the standard BGISEQ-500 library preparation specifically for use on degraded DNA, then directly compared the sequencing performance and data quality of the BGISEQ-500 to the Illumina HiSeq2500 platform on DNA extracted from 8 historic and ancient dog and wolf samples. The data generated were largely comparable between sequencing platforms, with no statistically significant difference observed for parameters including level (P = 0.371) and average sequence length (P = 0718) of endogenous nuclear DNA, sequence GC content (P = 0.311), double-stranded DNA damage rate (v. 0.309), and sequence clonality (P = 0.093). Small significant differences were found in single-strand DNA damage rate (δS; slightly lower for the BGISEQ-500, P = 0.011) and the background rate of difference from the reference genome (θ; slightly higher for BGISEQ-500, P = 0.012). This may result from the differences in amplification cycles used to polymerase chain reaction–amplify the libraries. A significant difference was also observed in the mitochondrial DNA percentages recovered (P = 0.018), although we believe this is likely a stochastic effect relating to the extremely low levels of mitochondria that were sequenced from 3 of the samples with overall very low levels of endogenous DNA. Although we acknowledge that our analyses were limited to animal material, our observations suggest that the BGISEQ-500 holds the potential to represent a valid and potentially valuable
Robot Sequencing and Visualization Program (RSVP)

NASA Technical Reports Server (NTRS)

Cooper, Brian K.; Maxwell,Scott A.; Hartman, Frank R.; Wright, John R.; Yen, Jeng; Toole, Nicholas T.; Gorjian, Zareh; Morrison, Jack C

2013-01-01

The Robot Sequencing and Visualization Program (RSVP) is being used in the Mars Science Laboratory (MSL) mission for downlink data visualization and command sequence generation. RSVP reads and writes downlink data products from the operations data server (ODS) and writes uplink data products to the ODS. The primary users of RSVP are members of the Rover Planner team (part of the Integrated Planning and Execution Team (IPE)), who use it to perform traversability/articulation analyses, take activity plan input from the Science and Mission Planning teams, and create a set of rover sequences to be sent to the rover every sol. The primary inputs to RSVP are downlink data products and activity plans in the ODS database. The primary outputs are command sequences to be placed in the ODS for further processing prior to uplink to each rover. RSVP is composed of two main subsystems. The first, called the Robot Sequence Editor (RoSE), understands the MSL activity and command dictionaries and takes care of converting incoming activity level inputs into command sequences. The Rover Planners use the RoSE component of RSVP to put together command sequences and to view and manage command level resources like time, power, temperature, etc. (via a transparent realtime connection to SEQGEN). The second component of RSVP is called HyperDrive, a set of high-fidelity computer graphics displays of the Martian surface in 3D and in stereo. The Rover Planners can explore the environment around the rover, create commands related to motion of all kinds, and see the simulated result of those commands via its underlying tight coupling with flight navigation, motor, and arm software. This software is the evolutionary replacement for the Rover Sequencing and Visualization software used to create command sequences (and visualize the Martian surface) for the Mars Exploration Rover mission.
Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leung, Elo; Huang, Amy; Cadag, Eithon

In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less
Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

DOE PAGES

Leung, Elo; Huang, Amy; Cadag, Eithon; ...

2016-01-20

In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less
Multiplexed microsatellite recovery using massively parallel sequencing

USGS Publications Warehouse

Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.

2011-01-01

Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).
Sequence analysis of Leukemia DNA

NASA Astrophysics Data System (ADS)

Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

2018-03-01

Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

PubMed Central

Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

2010-01-01

Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087
Design of Protein Multi-specificity Using an Independent Sequence Search Reduces the Barrier to Low Energy Sequences

PubMed Central

Sevy, Alexander M.; Jacobs, Tim M.; Crowe, James E.; Meiler, Jens

2015-01-01

Computational protein design has found great success in engineering proteins for thermodynamic stability, binding specificity, or enzymatic activity in a ‘single state’ design (SSD) paradigm. Multi-specificity design (MSD), on the other hand, involves considering the stability of multiple protein states simultaneously. We have developed a novel MSD algorithm, which we refer to as REstrained CONvergence in multi-specificity design (RECON). The algorithm allows each state to adopt its own sequence throughout the design process rather than enforcing a single sequence on all states. Convergence to a single sequence is encouraged through an incrementally increasing convergence restraint for corresponding positions. Compared to MSD algorithms that enforce (constrain) an identical sequence on all states the energy landscape is simplified, which accelerates the search drastically. As a result, RECON can readily be used in simulations with a flexible protein backbone. We have benchmarked RECON on two design tasks. First, we designed antibodies derived from a common germline gene against their diverse targets to assess recovery of the germline, polyspecific sequence. Second, we design “promiscuous”, polyspecific proteins against all binding partners and measure recovery of the native sequence. We show that RECON is able to efficiently recover native-like, biologically relevant sequences in this diverse set of protein complexes. PMID:26147100
Simple and efficient identification of rare recessive pathologically important sequence variants from next generation exome sequence data.

PubMed

Carr, Ian M; Morgan, Joanne; Watson, Christopher; Melnik, Svitlana; Diggle, Christine P; Logan, Clare V; Harrison, Sally M; Taylor, Graham R; Pena, Sergio D J; Markham, Alexander F; Alkuraya, Fowzan S; Black, Graeme C M; Ali, Manir; Bonthron, David T

2013-07-01

Massively parallel ("next generation") DNA sequencing (NGS) has quickly become the method of choice for seeking pathogenic mutations in rare uncharacterized monogenic diseases. Typically, before DNA sequencing, protein-coding regions are enriched from patient genomic DNA, representing either the entire genome ("exome sequencing") or selected mapped candidate loci. Sequence variants, identified as differences between the patient's and the human genome reference sequences, are then filtered according to various quality parameters. Changes are screened against datasets of known polymorphisms, such as dbSNP and the 1000 Genomes Project, in the effort to narrow the list of candidate causative variants. An increasing number of commercial services now offer to both generate and align NGS data to a reference genome. This potentially allows small groups with limited computing infrastructure and informatics skills to utilize this technology. However, the capability to effectively filter and assess sequence variants is still an important bottleneck in the identification of deleterious sequence variants in both research and diagnostic settings. We have developed an approach to this problem comprising a user-friendly suite of programs that can interactively analyze, filter and screen data from enrichment-capture NGS data. These programs ("Agile Suite") are particularly suitable for small-scale gene discovery or for diagnostic analysis. © 2013 WILEY PERIODICALS, INC.
Illumina Synthetic Long Read Sequencing Allows Recovery of Missing Sequences even in the “Finished” C. elegans Genome

PubMed Central

Li, Runsheng; Hsieh, Chia-Ling; Young, Amanda; Zhang, Zhihong; Ren, Xiaoliang; Zhao, Zhongying

2015-01-01

Most next-generation sequencing platforms permit acquisition of high-throughput DNA sequences, but the relatively short read length limits their use in genome assembly or finishing. Illumina has recently released a technology called Synthetic Long-Read Sequencing that can produce reads of unusual length, i.e., predominately around 10 Kb. However, a systematic assessment of their use in genome finishing and assembly is still lacking. We evaluate the promise and deficiency of the long reads in these aspects using isogenic C. elegans genome with no gap. First, the reads are highly accurate and capable of recovering most types of repetitive sequences. However, the presence of tandem repetitive sequences prevents pre-assembly of long reads in the relevant genomic region. Second, the reads are able to reliably detect missing but not extra sequences in the C. elegans genome. Third, the reads of smaller size are more capable of recovering repetitive sequences than those of bigger size. Fourth, at least 40 Kbp missing genomic sequences are recovered in the C. elegans genome using the long reads. Finally, an N50 contig size of at least 86 Kbp can be achieved with 24×reads but with substantial mis-assembly errors, highlighting a need for novel assembly algorithm for the long reads. PMID:26039588
Incidental Sequence Learning across the Lifespan

ERIC Educational Resources Information Center

Weiermann, Brigitte; Meier, Beat

2012-01-01

The purpose of the present study was to investigate incidental sequence learning across the lifespan. We tested 50 children (aged 7-16), 50 young adults (aged 20-30), and 50 older adults (aged >65) with a sequence learning paradigm that involved both a task and a response sequence. After several blocks of practice, all age groups slowed down…
PLAN-IT: Knowledge-Based Mission Sequencing

NASA Astrophysics Data System (ADS)

Biefeld, Eric W.

1987-02-01

Mission sequencing consumes a large amount of time and manpower during a space exploration effort. Plan-It is a knowledge-based approach to assist in mission sequencing. Plan-It uses a combined frame and blackboard architecture. This paper reports on the approach implemented by Plan-It and the current applications of Plan-It for sequencing at NASA.
Quantitative comparison between a multiecho sequence and a single-echo sequence for susceptibility-weighted phase imaging.

PubMed

Gilbert, Guillaume; Savard, Geneviève; Bard, Céline; Beaudoin, Gilles

2012-06-01

The aim of this study was to investigate the benefits arising from the use of a multiecho sequence for susceptibility-weighted phase imaging using a quantitative comparison with a standard single-echo acquisition. Four healthy adult volunteers were imaged on a clinical 3-T system using a protocol comprising two different three-dimensional susceptibility-weighted gradient-echo sequences: a standard single-echo sequence and a multiecho sequence. Both sequences were repeated twice in order to evaluate the local noise contribution by a subtraction of the two acquisitions. For the multiecho sequence, the phase information from each echo was independently unwrapped, and the background field contribution was removed using either homodyne filtering or the projection onto dipole fields method. The phase information from all echoes was then combined using a weighted linear regression. R2 maps were also calculated from the multiecho acquisitions. The noise standard deviation in the reconstructed phase images was evaluated for six manually segmented regions of interest (frontal white matter, posterior white matter, globus pallidus, putamen, caudate nucleus and lateral ventricle). The use of the multiecho sequence for susceptibility-weighted phase imaging led to a reduction of the noise standard deviation for all subjects and all regions of interest investigated in comparison to the reference single-echo acquisition. On average, the noise reduction ranged from 18.4% for the globus pallidus to 47.9% for the lateral ventricle. In addition, the amount of noise reduction was found to be strongly inversely correlated to the estimated R2 value (R=-0.92). In conclusion, the use of a multiecho sequence is an effective way to decrease the noise contribution in susceptibility-weighted phase images, while preserving both contrast and acquisition time. The proposed approach additionally permits the calculation of R2 maps. Copyright © 2012 Elsevier Inc. All rights reserved.
Early molecular diagnosis of acute Chagas disease after transplantation with organs from Trypanosoma cruzi-infected donors.

PubMed

Cura, C I; Lattes, R; Nagel, C; Gimenez, M J; Blanes, M; Calabuig, E; Iranzo, A; Barcan, L A; Anders, M; Schijman, A G

2013-12-01

Organ transplantation (TX) is a novel transmission modality of Chagas disease. The results of molecular diagnosis and characterization of Trypanosoma cruzi acute infection in naïve TX recipients transplanted with organs from infected deceased donors are reported. Peripheral blood and cerebrospinal fluid samples from the TX recipients of organs from infected donors were prospectively and sequentially studied for detection of T. cruzi by means of kinetoplastid DNA polymerase chain reaction (kDNA-PCR). In positive blood samples, a PCR algorithm for identification of T. cruzi Discrete Typing Units (DTUs) and quantitative real-time PCR (qPCR) to quantify parasitic loads were performed. Minicircle signatures of T. cruzi infecting populations were also analyzed using restriction fragment length polymorphism (RFLP)-PCR. Eight seronegative TX recipients from four infected donors were studied. In five, the infection was detected at 68.4 days post-TX (36-98 days). In one case, it was transmitted to two of three TX recipients. The comparison of the minicircle signatures revealed nearly identical RFLP-PCR profiles, confirming a common source of infection. The five cases were infected by DTU TcV. This report reveals the relevance of systematic monitoring of TX recipients using PCR strategies in order to provide an early diagnosis allowing timely anti-trypanosomal treatment. © Copyright 2013 The American Society of Transplantation and the American Society of Transplant Surgeons.
Marks of Change in Sequences

NASA Astrophysics Data System (ADS)

Jürgensen, H.

2011-12-01

Given a sequence of events, how does one recognize that a change has occurred? We explore potential definitions of the concept of change in a sequence and propose that words in relativized solid codes might serve as indicators of change.
The Biomolecule Sequencer Project: Nanopore Sequencing as a Dual-Use Tool for Crew Health and Astrobiology Investigations

NASA Technical Reports Server (NTRS)

John, K. K.; Botkin, D. S.; Burton, A. S.; Castro-Wallace, S. L.; Chaput, J. D.; Dworkin, J. P.; Lehman, N.; Lupisella, M. L.; Mason, C. E.; Smith, D. J.;

2016-01-01

Human missions to Mars will fundamentally transform how the planet is explored, enabling new scientific discoveries through more sophisticated sample acquisition and processing than can currently be implemented in robotic exploration. The presence of humans also poses new challenges, including ensuring astronaut safety and health and monitoring contamination. Because the capability to transfer materials to Earth will be extremely limited, there is a strong need for in situ diagnostic capabilities. Nucleotide sequencing is a particularly powerful tool because it can be used to: (1) mitigate microbial risks to crew by allowing identification of microbes in water, in air, and on surfaces; (2) identify optimal treatment strategies for infections that arise in crew members; and (3) track how crew members, microbes, and mission-relevant organisms (e.g., farmed plants) respond to conditions on Mars through transcriptomic and genomic changes. Sequencing would also offer benefits for science investigations occurring on the surface of Mars by permitting identification of Earth-derived contamination in samples. If Mars contains indigenous life, and that life is based on nucleic acids or other closely related molecules, sequencing would serve as a critical tool for the characterization of those molecules. Therefore, spaceflight-compatible nucleic acid sequencing would be an important capability for both crew health and astrobiology exploration. Advances in sequencing technology on Earth have been driven largely by needs for higher throughput and read accuracy. Although some reduction in size has been achieved, nearly all commercially available sequencers are not compatible with spaceflight due to size, power, and operational requirements. Exceptions are nanopore-based sequencers that measure changes in current caused by DNA passing through pores; these devices are inherently much smaller and require significantly less power than sequencers using other detection methods

Sequencing at sea: challenges and experiences in Ion Torrent PGM sequencing during the 2013 Southern Line Islands Research Expedition.

PubMed

Lim, Yan Wei; Cuevas, Daniel A; Silva, Genivaldo Gueiros Z; Aguinaldo, Kristen; Dinsdale, Elizabeth A; Haas, Andreas F; Hatay, Mark; Sanchez, Savannah E; Wegley-Kelly, Linda; Dutilh, Bas E; Harkins, Timothy T; Lee, Clarence C; Tom, Warren; Sandin, Stuart A; Smith, Jennifer E; Zgliczynski, Brian; Vermeij, Mark J A; Rohwer, Forest; Edwards, Robert A

2014-01-01

Genomics and metagenomics have revolutionized our understanding of marine microbial ecology and the importance of microbes in global geochemical cycles. However, the process of DNA sequencing has always been an abstract extension of the research expedition, completed once the samples were returned to the laboratory. During the 2013 Southern Line Islands Research Expedition, we started the first effort to bring next generation sequencing to some of the most remote locations on our planet. We successfully sequenced twenty six marine microbial genomes, and two marine microbial metagenomes using the Ion Torrent PGM platform on the Merchant Yacht Hanse Explorer. Onboard sequence assembly, annotation, and analysis enabled us to investigate the role of the microbes in the coral reef ecology of these islands and atolls. This analysis identified phosphonate as an important phosphorous source for microbes growing in the Line Islands and reinforced the importance of L-serine in marine microbial ecosystems. Sequencing in the field allowed us to propose hypotheses and conduct experiments and further sampling based on the sequences generated. By eliminating the delay between sampling and sequencing, we enhanced the productivity of the research expedition. By overcoming the hurdles associated with sequencing on a boat in the middle of the Pacific Ocean we proved the flexibility of the sequencing, annotation, and analysis pipelines.
mirVAFC: A Web Server for Prioritizations of Pathogenic Sequence Variants from Exome Sequencing Data via Classifications.

PubMed

Li, Zhongshan; Liu, Zhenwei; Jiang, Yi; Chen, Denghui; Ran, Xia; Sun, Zhong Sheng; Wu, Jinyu

2017-01-01

Exome sequencing has been widely used to identify the genetic variants underlying human genetic disorders for clinical diagnoses, but the identification of pathogenic sequence variants among the huge amounts of benign ones is complicated and challenging. Here, we describe a new Web server named mirVAFC for pathogenic sequence variants prioritizations from clinical exome sequencing (CES) variant data of single individual or family. The mirVAFC is able to comprehensively annotate sequence variants, filter out most irrelevant variants using custom criteria, classify variants into different categories as for estimated pathogenicity, and lastly provide pathogenic variants prioritizations based on classifications and mutation effects. Case studies using different types of datasets for different diseases from publication and our in-house data have revealed that mirVAFC can efficiently identify the right pathogenic candidates as in original work in each case. Overall, the Web server mirVAFC is specifically developed for pathogenic sequence variant identifications from family-based CES variants using classification-based prioritizations. The mirVAFC Web server is freely accessible at https://www.wzgenomics.cn/mirVAFC/. © 2016 WILEY PERIODICALS, INC.
Phylogenomics of Phrynosomatid Lizards: Conflicting Signals from Sequence Capture versus Restriction Site Associated DNA Sequencing

PubMed Central

Leaché, Adam D.; Chavez, Andreas S.; Jones, Leonard N.; Grummer, Jared A.; Gottscho, Andrew D.; Linkem, Charles W.

2015-01-01

Sequence capture and restriction site associated DNA sequencing (RADseq) are popular methods for obtaining large numbers of loci for phylogenetic analysis. These methods are typically used to collect data at different evolutionary timescales; sequence capture is primarily used for obtaining conserved loci, whereas RADseq is designed for discovering single nucleotide polymorphisms (SNPs) suitable for population genetic or phylogeographic analyses. Phylogenetic questions that span both “recent” and “deep” timescales could benefit from either type of data, but studies that directly compare the two approaches are lacking. We compared phylogenies estimated from sequence capture and double digest RADseq (ddRADseq) data for North American phrynosomatid lizards, a species-rich and diverse group containing nine genera that began diversifying approximately 55 Ma. Sequence capture resulted in 584 loci that provided a consistent and strong phylogeny using concatenation and species tree inference. However, the phylogeny estimated from the ddRADseq data was sensitive to the bioinformatics steps used for determining homology, detecting paralogs, and filtering missing data. The topological conflicts among the SNP trees were not restricted to any particular timescale, but instead were associated with short internal branches. Species tree analysis of the largest SNP assembly, which also included the most missing data, supported a topology that matched the sequence capture tree. This preferred phylogeny provides strong support for the paraphyly of the earless lizard genera Holbrookia and Cophosaurus, suggesting that the earless morphology either evolved twice or evolved once and was subsequently lost in Callisaurus. PMID:25663487
Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA

NASA Astrophysics Data System (ADS)

Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.

2017-07-01

DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.
Comparative modeling without implicit sequence alignments.

PubMed

Kolinski, Andrzej; Gront, Dominik

2007-10-01

The number of known protein sequences is about thousand times larger than the number of experimentally solved 3D structures. For more than half of the protein sequences a close or distant structural analog could be identified. The key starting point in a classical comparative modeling is to generate the best possible sequence alignment with a template or templates. With decreasing sequence similarity, the number of errors in the alignments increases and these errors are the main causes of the decreasing accuracy of the molecular models generated. Here we propose a new approach to comparative modeling, which does not require the implicit alignment - the model building phase explores geometric, evolutionary and physical properties of a template (or templates). The proposed method requires prior identification of a template, although the initial sequence alignment is ignored. The model is built using a very efficient reduced representation search engine CABS to find the best possible superposition of the query protein onto the template represented as a 3D multi-featured scaffold. The criteria used include: sequence similarity, predicted secondary structure consistency, local geometric features and hydrophobicity profile. For more difficult cases, the new method qualitatively outperforms existing schemes of comparative modeling. The algorithm unifies de novo modeling, 3D threading and sequence-based methods. The main idea is general and could be easily combined with other efficient modeling tools as Rosetta, UNRES and others.
Method for identifying and quantifying nucleic acid sequence aberrations

DOEpatents

Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

1998-01-01

A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.
Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

PubMed Central

2012-01-01

Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742
AMPLIFICATION OF RIBOSOMAL RNA SEQUENCES

EPA Science Inventory

This book chapter offers an overview of the use of ribosomal RNA sequences. A history of the technology traces the evolution of techniques to measure bacterial phylogenetic relationships and recent advances in obtaining rRNA sequence information. The manual also describes procedu...
Corruption of genomic databases with anomalous sequence.

PubMed

Lamperti, E D; Kittelberger, J M; Smith, T F; Villa-Komaroff, L

1992-06-11

We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%.
DSAP: deep-sequencing small RNA analysis pipeline.

PubMed

Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

2010-07-01

DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

Universal Sequence Replication, Reversible Polymerization and Early Functional Biopolymers: A Model for the Initiation of Prebiotic Sequence Evolution

PubMed Central

Walker, Sara Imari; Grover, Martha A.; Hud, Nicholas V.

2012-01-01

Many models for the origin of life have focused on understanding how evolution can drive the refinement of a preexisting enzyme, such as the evolution of efficient replicase activity. Here we present a model for what was, arguably, an even earlier stage of chemical evolution, when polymer sequence diversity was generated and sustained before, and during, the onset of functional selection. The model includes regular environmental cycles (e.g. hydration-dehydration cycles) that drive polymers between times of replication and functional activity, which coincide with times of different monomer and polymer diffusivity. Template-directed replication of informational polymers, which takes place during the dehydration stage of each cycle, is considered to be sequence-independent. New sequences are generated by spontaneous polymer formation, and all sequences compete for a finite monomer resource that is recycled via reversible polymerization. Kinetic Monte Carlo simulations demonstrate that this proposed prebiotic scenario provides a robust mechanism for the exploration of sequence space. Introduction of a polymer sequence with monomer synthetase activity illustrates that functional sequences can become established in a preexisting pool of otherwise non-functional sequences. Functional selection does not dominate system dynamics and sequence diversity remains high, permitting the emergence and spread of more than one functional sequence. It is also observed that polymers spontaneously form clusters in simulations where polymers diffuse more slowly than monomers, a feature that is reminiscent of a previous proposal that the earliest stages of life could have been defined by the collective evolution of a system-wide cooperation of polymer aggregates. Overall, the results presented demonstrate the merits of considering plausible prebiotic polymer chemistries and environments that would have allowed for the rapid turnover of monomer resources and for regularly varying monomer
The sequence measurement system of the IR camera

NASA Astrophysics Data System (ADS)

Geng, Ai-hui; Han, Hong-xia; Zhang, Hai-bo

2011-08-01

Currently, the IR cameras are broadly used in the optic-electronic tracking, optic-electronic measuring, fire control and optic-electronic countermeasure field, but the output sequence of the most presently applied IR cameras in the project is complex and the giving sequence documents from the leave factory are not detailed. Aiming at the requirement that the continuous image transmission and image procession system need the detailed sequence of the IR cameras, the sequence measurement system of the IR camera is designed, and the detailed sequence measurement way of the applied IR camera is carried out. The FPGA programming combined with the SignalTap online observation way has been applied in the sequence measurement system, and the precise sequence of the IR camera's output signal has been achieved, the detailed document of the IR camera has been supplied to the continuous image transmission system, image processing system and etc. The sequence measurement system of the IR camera includes CameraLink input interface part, LVDS input interface part, FPGA part, CameraLink output interface part and etc, thereinto the FPGA part is the key composed part in the sequence measurement system. Both the video signal of the CmaeraLink style and the video signal of LVDS style can be accepted by the sequence measurement system, and because the image processing card and image memory card always use the CameraLink interface as its input interface style, the output signal style of the sequence measurement system has been designed into CameraLink interface. The sequence measurement system does the IR camera's sequence measurement work and meanwhile does the interface transmission work to some cameras. Inside the FPGA of the sequence measurement system, the sequence measurement program, the pixel clock modification, the SignalTap file configuration and the SignalTap online observation has been integrated to realize the precise measurement to the IR camera. Te sequence measurement
Identifying Group-Specific Sequences for Microbial Communities Using Long k-mer Sequence Signatures

PubMed Central

Wang, Ying; Fu, Lei; Ren, Jie; Yu, Zhaoxia; Chen, Ting; Sun, Fengzhu

2018-01-01

Comparing metagenomic samples is crucial for understanding microbial communities. For different groups of microbial communities, such as human gut metagenomic samples from patients with a certain disease and healthy controls, identifying group-specific sequences offers essential information for potential biomarker discovery. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered “group-specific” in our study. Our main purpose is to discover group-specific sequence regions between control and case groups as disease-associated markers. We developed a long k-mer (k ≥ 30 bps)-based computational pipeline to detect group-specific sequences at strain resolution free from reference sequences, sequence alignments, and metagenome-wide de novo assembly. We called our method MetaGO: Group-specific oligonucleotide analysis for metagenomic samples. An open-source pipeline on Apache Spark was developed with parallel computing. We applied MetaGO to one simulated and three real metagenomic datasets to evaluate the discriminative capability of identified group-specific markers. In the simulated dataset, 99.11% of group-specific logical 40-mers covered 98.89% disease-specific regions from the disease-associated strain. In addition, 97.90% of group-specific numerical 40-mers covered 99.61 and 96.39% of differentially abundant genome and regions between two groups, respectively. For a large-scale metagenomic liver cirrhosis (LC)-associated dataset, we identified 37,647 group-specific 40-mer features. Any one of the features can predict disease status of the training samples with the average of sensitivity and specificity higher than 0.8. The random forests classification using the top 10 group-specific features yielded a higher AUC (from ∼0.8 to ∼0.9) than that of previous studies. All group-specific 40-mers were present in LC patients, but not healthy controls. All the assembled 11 LC-specific sequences can be mapped to two
Sequence dependent aggregation of peptides and fibril formation

NASA Astrophysics Data System (ADS)

Hung, Nguyen Ba; Le, Duy-Manh; Hoang, Trinh X.

2017-09-01

Deciphering the links between amino acid sequence and amyloid fibril formation is key for understanding protein misfolding diseases. Here we use Monte Carlo simulations to study the aggregation of short peptides in a coarse-grained model with hydrophobic-polar (HP) amino acid sequences and correlated side chain orientations for hydrophobic contacts. A significant heterogeneity is observed in the aggregate structures and in the thermodynamics of aggregation for systems of different HP sequences and different numbers of peptides. Fibril-like ordered aggregates are found for several sequences that contain the common HPH pattern, while other sequences may form helix bundles or disordered aggregates. A wide variation of the aggregation transition temperatures among sequences, even among those of the same hydrophobic fraction, indicates that not all sequences undergo aggregation at a presumable physiological temperature. The transition is found to be the most cooperative for sequences forming fibril-like structures. For a fibril-prone sequence, it is shown that fibril formation follows the nucleation and growth mechanism. Interestingly, a binary mixture of peptides of an aggregation-prone and a non-aggregation-prone sequence shows the association and conversion of the latter to the fibrillar structure. Our study highlights the role of a sequence in selecting fibril-like aggregates and also the impact of a structural template on fibril formation by peptides of unrelated sequences.
Nonspatial Sequence Coding in CA1 Neurons

PubMed Central

Allen, Timothy A.; Salz, Daniel M.; McKenzie, Sam

2016-01-01

The hippocampus is critical to the memory for sequences of events, a defining feature of episodic memory. However, the fundamental neuronal mechanisms underlying this capacity remain elusive. While considerable research indicates hippocampal neurons can represent sequences of locations, direct evidence of coding for the memory of sequential relationships among nonspatial events remains lacking. To address this important issue, we recorded neural activity in CA1 as rats performed a hippocampus-dependent sequence-memory task. Briefly, the task involves the presentation of repeated sequences of odors at a single port and requires rats to identify each item as “in sequence” or “out of sequence”. We report that, while the animals' location and behavior remained constant, hippocampal activity differed depending on the temporal context of items—in this case, whether they were presented in or out of sequence. Some neurons showed this effect across items or sequence positions (general sequence cells), while others exhibited selectivity for specific conjunctions of item and sequence position information (conjunctive sequence cells) or for specific probe types (probe-specific sequence cells). We also found that the temporal context of individual trials could be accurately decoded from the activity of neuronal ensembles, that sequence coding at the single-cell and ensemble level was linked to sequence memory performance, and that slow-gamma oscillations (20–40 Hz) were more strongly modulated by temporal context and performance than theta oscillations (4–12 Hz). These findings provide compelling evidence that sequence coding extends beyond the domain of spatial trajectories and is thus a fundamental function of the hippocampus. SIGNIFICANCE STATEMENT The ability to remember the order of life events depends on the hippocampus, but the underlying neural mechanisms remain poorly understood. Here we addressed this issue by recording neural activity in hippocampal
Reporting Differences Between Spacecraft Sequence Files

NASA Technical Reports Server (NTRS)

Khanampompan, Teerapat; Gladden, Roy E.; Fisher, Forest W.

2010-01-01

A suite of computer programs, called seq diff suite, reports differences between the products of other computer programs involved in the generation of sequences of commands for spacecraft. These products consist of files of several types: replacement sequence of events (RSOE), DSN keyword file [DKF (wherein DSN signifies Deep Space Network)], spacecraft activities sequence file (SASF), spacecraft sequence file (SSF), and station allocation file (SAF). These products can include line numbers, request identifications, and other pieces of information that are not relevant when generating command sequence products, though these fields can result in the appearance of many changes to the files, particularly when using the UNIX diff command to inspect file differences. The outputs of prior software tools for reporting differences between such products include differences in these non-relevant pieces of information. In contrast, seq diff suite removes the fields containing the irrelevant pieces of information before processing to extract differences, so that only relevant differences are reported. Thus, seq diff suite is especially useful for reporting changes between successive versions of the various products and in particular flagging difference in fields relevant to the sequence command generation and review process.
Method for identifying and quantifying nucleic acid sequence aberrations

DOEpatents

Lucas, J.N.; Straume, T.; Bogen, K.T.

1998-07-21

A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.
WebPrInSeS: automated full-length clone sequence identification and verification using high-throughput sequencing data.

PubMed

Massouras, Andreas; Decouttere, Frederik; Hens, Korneel; Deplancke, Bart

2010-07-01

High-throughput sequencing (HTS) is revolutionizing our ability to obtain cheap, fast and reliable sequence information. Many experimental approaches are expected to benefit from the incorporation of such sequencing features in their pipeline. Consequently, software tools that facilitate such an incorporation should be of great interest. In this context, we developed WebPrInSeS, a web server tool allowing automated full-length clone sequence identification and verification using HTS data. WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays. Both tools perform de novo assembly using HTS data from any of the three major sequencing platforms. Thus, WebPrInSeS provides a highly integrated, cost-effective and efficient way to sequence-verify or identify clones of interest. WebPrInSeS is available at http://webprinses.epfl.ch/ and is open to all users.
WebPrInSeS: automated full-length clone sequence identification and verification using high-throughput sequencing data

PubMed Central

Massouras, Andreas; Decouttere, Frederik; Hens, Korneel; Deplancke, Bart

2010-01-01

High-throughput sequencing (HTS) is revolutionizing our ability to obtain cheap, fast and reliable sequence information. Many experimental approaches are expected to benefit from the incorporation of such sequencing features in their pipeline. Consequently, software tools that facilitate such an incorporation should be of great interest. In this context, we developed WebPrInSeS, a web server tool allowing automated full-length clone sequence identification and verification using HTS data. WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays. Both tools perform de novo assembly using HTS data from any of the three major sequencing platforms. Thus, WebPrInSeS provides a highly integrated, cost-effective and efficient way to sequence-verify or identify clones of interest. WebPrInSeS is available at http://webprinses.epfl.ch/ and is open to all users. PMID:20501601
Sequencing technologies - the next generation.

PubMed

Metzker, Michael L

2010-01-01

Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalysed the development of next-generation sequencing (NGS) technologies. The inexpensive production of large volumes of sequence data is the primary advantage over conventional methods. Here, I present a technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments. I also outline the broad range of applications for NGS technologies, in addition to providing guidelines for platform selection to address biological questions of interest.
Single-primer fluorescent sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ruth, J.L.; Morgan, C.A.; Middendorf, L.R.

Modified linker arm oligonucleotides complementary to standard M13 priming sites were synthesized, labelled with either one, two, or three fluoresceins, and purified by reverse-phase HPLC. When used as primers in standard dideoxy M13 sequencing with /sup 32/P-dNTPs, normal autoradiographic patterns were obtained. To eliminate the radioactivity, direct on-line fluorescence detection was achieved by the use of a scanning 10 mW Argon laser emitting 488 nm light. Fluorescent bands were detected directly in standard 0.2 or 0.35 mm thick polyacrylamide gels at a distance of 24 cm from the loading wells by a photomultiplier tube filtered at 520 nm. Horizontal andmore » temporal location of each band was displayed by computer as a band in real time, providing visual appearance similar to normal 4-lane autoradiograms. Using a single primer labelled with two fluoresceins, sequences of between 500 and 600 bases have been read in a single loading with better than 98% accuracy; up to 400 bases can be read reproducibly with no errors. More than 50 sequences have been determined by this method. This approach requires only 1-2 ug of cloned template, and produces continuous sequence data at about one band per minute.« less
Local alignment of two-base encoded DNA sequence

PubMed Central

Homer, Nils; Merriman, Barry; Nelson, Stanley F

2009-01-01

Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732
Geoseq: a tool for dissecting deep-sequencing datasets.

PubMed

Gurtowski, James; Cancio, Anthony; Shah, Hardik; Levovitz, Chaya; George, Ajish; Homann, Robert; Sachidanandam, Ravi

2010-10-12

Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.
Sequencing at sea: challenges and experiences in Ion Torrent PGM sequencing during the 2013 Southern Line Islands Research Expedition

PubMed Central

Lim, Yan Wei; Cuevas, Daniel A.; Silva, Genivaldo Gueiros Z.; Aguinaldo, Kristen; Dinsdale, Elizabeth A.; Haas, Andreas F.; Hatay, Mark; Sanchez, Savannah E.; Wegley-Kelly, Linda; Dutilh, Bas E.; Harkins, Timothy T.; Lee, Clarence C.; Tom, Warren; Sandin, Stuart A.; Smith, Jennifer E.; Zgliczynski, Brian; Vermeij, Mark J.A.; Rohwer, Forest

2014-01-01

Genomics and metagenomics have revolutionized our understanding of marine microbial ecology and the importance of microbes in global geochemical cycles. However, the process of DNA sequencing has always been an abstract extension of the research expedition, completed once the samples were returned to the laboratory. During the 2013 Southern Line Islands Research Expedition, we started the first effort to bring next generation sequencing to some of the most remote locations on our planet. We successfully sequenced twenty six marine microbial genomes, and two marine microbial metagenomes using the Ion Torrent PGM platform on the Merchant Yacht Hanse Explorer. Onboard sequence assembly, annotation, and analysis enabled us to investigate the role of the microbes in the coral reef ecology of these islands and atolls. This analysis identified phosphonate as an important phosphorous source for microbes growing in the Line Islands and reinforced the importance of L-serine in marine microbial ecosystems. Sequencing in the field allowed us to propose hypotheses and conduct experiments and further sampling based on the sequences generated. By eliminating the delay between sampling and sequencing, we enhanced the productivity of the research expedition. By overcoming the hurdles associated with sequencing on a boat in the middle of the Pacific Ocean we proved the flexibility of the sequencing, annotation, and analysis pipelines. PMID:25177534
Discrete sequence prediction and its applications

NASA Technical Reports Server (NTRS)

Laird, Philip

1992-01-01

Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We apply sequence prediction using a simple and practical sequence-prediction algorithm, called TDAG. The TDAG algorithm is first tested by comparing its performance with some common data compression algorithms. Then it is adapted to the detailed requirements of dynamic program optimization, with excellent results.
Next-Generation Sequencing in the Mycology Lab.

PubMed

Zoll, Jan; Snelders, Eveline; Verweij, Paul E; Melchers, Willem J G

New state-of-the-art techniques in sequencing offer valuable tools in both detection of mycobiota and in understanding of the molecular mechanisms of resistance against antifungal compounds and virulence. Introduction of new sequencing platform with enhanced capacity and a reduction in costs for sequence analysis provides a potential powerful tool in mycological diagnosis and research. In this review, we summarize the applications of next-generation sequencing techniques in mycology.
GASP: Gapped Ancestral Sequence Prediction for proteins

PubMed Central

Edwards, Richard J; Shields, Denis C

2004-01-01

Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike. PMID:15350199
Project Report: Automatic Sequence Processor Software Analysis

NASA Technical Reports Server (NTRS)

Benjamin, Brandon

2011-01-01

The Mission Planning and Sequencing (MPS) element of Multi-Mission Ground System and Services (MGSS) provides space missions with multi-purpose software to plan spacecraft activities, sequence spacecraft commands, and then integrate these products and execute them on spacecraft. Jet Propulsion Laboratory (JPL) is currently is flying many missions. The processes for building, integrating, and testing the multi-mission uplink software need to be improved to meet the needs of the missions and the operations teams that command the spacecraft. The Multi-Mission Sequencing Team is responsible for collecting and processing the observations, experiments and engineering activities that are to be performed on a selected spacecraft. The collection of these activities is called a sequence and ultimately a sequence becomes a sequence of spacecraft commands. The operations teams check the sequence to make sure that no constraints are violated. The workflow process involves sending a program start command, which activates the Automatic Sequence Processor (ASP). The ASP is currently a file-based system that is comprised of scripts written in perl, c-shell and awk. Once this start process is complete, the system checks for errors and aborts if there are any; otherwise the system converts the commands to binary, and then sends the resultant information to be radiated to the spacecraft.
Multiplex De Novo Sequencing of Peptide Antibiotics

NASA Astrophysics Data System (ADS)

Mohimani, Hosein; Liu, Wei-Ting; Yang, Yu-Liang; Gaudêncio, Susana P.; Fenical, William; Dorrestein, Pieter C.; Pevzner, Pavel A.

Proliferation of drug-resistant diseases raises the challenge of searching for new, more efficient antibiotics. Currently, some of the most effective antibiotics (i.e., Vancomycin and Daptomycin) are cyclic peptides produced by non-ribosomal biosynthetic pathways. The isolation and sequencing of cyclic peptide antibiotics, unlike the same activity with linear peptides, is time-consuming and error-prone. The dominant technique for sequencing cyclic peptides is NMR-based and requires large amounts (milligrams) of purified materials that, for most compounds, are not possible to obtain. Given these facts, there is a need for new tools to sequence cyclic NRPs using picograms of material. Since nearly all cyclic NRPs are produced along with related analogs, we develop a mass spectrometry approach for sequencing all related peptides at once (in contrast to the existing approach that analyzes individual peptides). Our results suggest that instead of attempting to isolate and NMR-sequence the most abundant compound, one should acquire spectra of many related compounds and sequence all of them simultaneously using tandem mass spectrometry. We illustrate applications of this approach by sequencing new variants of cyclic peptide antibiotics from Bacillus brevis, as well as sequencing a previously unknown familiy of cyclic NRPs produced by marine bacteria.
A Comprehensive, Automatically Updated Fungal ITS Sequence Dataset for Reference-Based Chimera Control in Environmental Sequencing Efforts.

PubMed

Nilsson, R Henrik; Tedersoo, Leho; Ryberg, Martin; Kristiansson, Erik; Hartmann, Martin; Unterseher, Martin; Porter, Teresita M; Bengtsson-Palme, Johan; Walker, Donald M; de Sousa, Filipe; Gamper, Hannes Andres; Larsson, Ellen; Larsson, Karl-Henrik; Kõljalg, Urmas; Edgar, Robert C; Abarenkov, Kessy

2015-01-01

The nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen genetic marker for the molecular identification of fungi in environmental sequencing and molecular ecology studies. Several analytical issues complicate such efforts, one of which is the formation of chimeric-artificially joined-DNA sequences during PCR amplification or sequence assembly. Several software tools are currently available for chimera detection, but rely to various degrees on the presence of a chimera-free reference dataset for optimal performance. However, no such dataset is available for use with the fungal ITS region. This study introduces a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database for the molecular identification of fungi. This dataset supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial (ITS1 or ITS2 only) datasets. The performance of the dataset on a large set of artificial chimeras was above 99.5%, and we subsequently used the dataset to remove nearly 1,000 compromised fungal ITS sequences from public circulation. The dataset is available at http://unite.ut.ee/repository.php and is subject to web-based third-party curation.

A Comprehensive, Automatically Updated Fungal ITS Sequence Dataset for Reference-Based Chimera Control in Environmental Sequencing Efforts

PubMed Central

Nilsson, R. Henrik; Tedersoo, Leho; Ryberg, Martin; Kristiansson, Erik; Hartmann, Martin; Unterseher, Martin; Porter, Teresita M.; Bengtsson-Palme, Johan; Walker, Donald M.; de Sousa, Filipe; Gamper, Hannes Andres; Larsson, Ellen; Larsson, Karl-Henrik; Kõljalg, Urmas; Edgar, Robert C.; Abarenkov, Kessy

2015-01-01

The nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen genetic marker for the molecular identification of fungi in environmental sequencing and molecular ecology studies. Several analytical issues complicate such efforts, one of which is the formation of chimeric—artificially joined—DNA sequences during PCR amplification or sequence assembly. Several software tools are currently available for chimera detection, but rely to various degrees on the presence of a chimera-free reference dataset for optimal performance. However, no such dataset is available for use with the fungal ITS region. This study introduces a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database for the molecular identification of fungi. This dataset supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial (ITS1 or ITS2 only) datasets. The performance of the dataset on a large set of artificial chimeras was above 99.5%, and we subsequently used the dataset to remove nearly 1,000 compromised fungal ITS sequences from public circulation. The dataset is available at http://unite.ut.ee/repository.php and is subject to web-based third-party curation. PMID:25786896
Sequence and Structure Dependent DNA-DNA Interactions

NASA Astrophysics Data System (ADS)

Kopchick, Benjamin; Qiu, Xiangyun

Molecular forces between dsDNA strands are largely dominated by electrostatics and have been extensively studied. Quantitative knowledge has been accumulated on how DNA-DNA interactions are modulated by varied biological constituents such as ions, cationic ligands, and proteins. Despite its central role in biology, the sequence of DNA has not received substantial attention and ``random'' DNA sequences are typically used in biophysical studies. However, ~50% of human genome is composed of non-random-sequence DNAs, particularly repetitive sequences. Furthermore, covalent modifications of DNA such as methylation play key roles in gene functions. Such DNAs with specific sequences or modifications often take on structures other than the canonical B-form. Here we present series of quantitative measurements of the DNA-DNA forces with the osmotic stress method on different DNA sequences, from short repeats to the most frequent sequences in genome, and to modifications such as bromination and methylation. We observe peculiar behaviors that appear to be strongly correlated with the incurred structural changes. We speculate the causalities in terms of the differences in hydration shell and DNA surface structures.
Mutation detection using automated fluorescence-based sequencing.

PubMed

Montgomery, Kate T; Iartchouck, Oleg; Li, Li; Perera, Anoja; Yassin, Yosuf; Tamburino, Alex; Loomis, Stephanie; Kucherlapati, Raju

2008-04-01

The development of high-throughput DNA sequencing techniques has made direct DNA sequencing of PCR-amplified genomic DNA a rapid and economical approach to the identification of polymorphisms that may play a role in disease. Point mutations as well as small insertions or deletions are readily identified by DNA sequencing. The mutations may be heterozygous (occurring in one allele while the other allele retains the normal sequence) or homozygous (occurring in both alleles). Sequencing alone cannot discriminate between true homozygosity and apparent homozygosity due to the loss of one allele due to a large deletion. In this unit, strategies are presented for using PCR amplification and automated fluorescence-based sequencing to identify sequence variation. The size of the project and laboratory preference and experience will dictate how the data is managed and which software tools are used for analysis. A high-throughput protocol is given that has been used to search for mutations in over 200 different genes at the Harvard Medical School - Partners Center for Genetics and Genomics (HPCGG, http://www.hpcgg.org/). Copyright 2008 by John Wiley & Sons, Inc.
Sequence-Mandated, Distinct Assembly of Giant Molecules

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Wei; Lu, Xinlin; Mao, Jialin

Although controlling the primary structure of synthetic polymers is itself a great challenge, the potential of sequence control for tailoring hierarchical structures remains to be exploited, especially in the creation of new and unconventional phases. A series of model amphiphilic chain-like giant molecules was designed and synthesized by interconnecting both hydrophobic and hydrophilic molecular nanoparticles in precisely defined sequence and composition to investigate their sequence-dependent phase structures. Not only compositional variation changed the self-assembled supramolecular phases, but also specific sequences induce unconventional phase formation, including Frank-Kasper phases. The formation mechanism was attributed to the conformational change driven by the collectivemore » hydrogen bonding and the sequence-mandated topology of the molecules. Lastly, these results show that sequence control in synthetic polymers can have a dramatic impact on polymer properties and self-assembly.« less
Sequence-Mandated, Distinct Assembly of Giant Molecules

DOE PAGES

Zhang, Wei; Lu, Xinlin; Mao, Jialin; ...

2017-10-24

Although controlling the primary structure of synthetic polymers is itself a great challenge, the potential of sequence control for tailoring hierarchical structures remains to be exploited, especially in the creation of new and unconventional phases. A series of model amphiphilic chain-like giant molecules was designed and synthesized by interconnecting both hydrophobic and hydrophilic molecular nanoparticles in precisely defined sequence and composition to investigate their sequence-dependent phase structures. Not only compositional variation changed the self-assembled supramolecular phases, but also specific sequences induce unconventional phase formation, including Frank-Kasper phases. The formation mechanism was attributed to the conformational change driven by the collectivemore » hydrogen bonding and the sequence-mandated topology of the molecules. Lastly, these results show that sequence control in synthetic polymers can have a dramatic impact on polymer properties and self-assembly.« less
Sequencing intractable DNA to close microbial genomes.

PubMed

Hurt, Richard A; Brown, Steven D; Podar, Mircea; Palumbo, Anthony V; Elias, Dwayne A

2012-01-01

Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.
Single-cell genomic sequencing using Multiple Displacement Amplification.

PubMed

Lasken, Roger S

2007-10-01

Single microbial cells can now be sequenced using DNA amplified by the Multiple Displacement Amplification (MDA) reaction. The few femtograms of DNA in a bacterium are amplified into micrograms of high molecular weight DNA suitable for DNA library construction and Sanger sequencing. The MDA-generated DNA also performs well when used directly as template for pyrosequencing by the 454 Life Sciences method. While MDA from single cells loses some of the genomic sequence, this approach will greatly accelerate the pace of sequencing from uncultured microbes. The genetically linked sequences from single cells are also a powerful tool to be used in guiding genomic assembly of shotgun sequences of multiple organisms from environmental DNA extracts (metagenomic sequences).
Draft Genome Sequence, and a Sequence-Defined Genetic Linkage Map of the Legume Crop Species Lupinus angustifolius L

PubMed Central

Zheng, Zequn; Zhang, Qisen; Zhou, Gaofeng; Sweetingham, Mark W.; Howieson, John G.; Li, Chengdao

2013-01-01

Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species. PMID:23734219
Draft genome sequence, and a sequence-defined genetic linkage map of the legume crop species Lupinus angustifolius L.

PubMed

Yang, Huaan; Tao, Ye; Zheng, Zequn; Zhang, Qisen; Zhou, Gaofeng; Sweetingham, Mark W; Howieson, John G; Li, Chengdao

2013-01-01

Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species.
In silico Analysis of 3′-End-Processing Signals in Aspergillus oryzae Using Expressed Sequence Tags and Genomic Sequencing Data

PubMed Central

Tanaka, Mizuki; Sakai, Yoshifumi; Yamada, Osamu; Shintani, Takahiro; Gomi, Katsuya

2011-01-01

To investigate 3′-end-processing signals in Aspergillus oryzae, we created a nucleotide sequence data set of the 3′-untranslated region (3′ UTR) plus 100 nucleotides (nt) sequence downstream of the poly(A) site using A. oryzae expressed sequence tags and genomic sequencing data. This data set comprised 1065 sequences derived from 1042 unique genes. The average 3′ UTR length in A. oryzae was 241 nt, which is greater than that in yeast but similar to that in plants. The 3′ UTR and 100 nt sequence downstream of the poly(A) site is notably U-rich, while the region located 15–30 nt upstream of the poly(A) site is markedly A-rich. The most frequently found hexanucleotide in this A-rich region is AAUGAA, although this sequence accounts for only 6% of all transcripts. These data suggested that A. oryzae has no highly conserved sequence element equivalent to AAUAAA, a mammalian polyadenylation signal. We identified that putative 3′-end-processing signals in A. oryzae, while less well conserved than those in mammals, comprised four sequence elements: the furthest upstream U-rich element, A-rich sequence, cleavage site, and downstream U-rich element flanking the cleavage site. Although these putative 3′-end-processing signals are similar to those in yeast and plants, some notable differences exist between them. PMID:21586533
Protein Interaction Profile Sequencing (PIP-seq).

PubMed

Foley, Shawn W; Gregory, Brian D

2016-10-10

Every eukaryotic RNA transcript undergoes extensive post-transcriptional processing from the moment of transcription up through degradation. This regulation is performed by a distinct cohort of RNA-binding proteins which recognize their target transcript by both its primary sequence and secondary structure. Here, we describe protein interaction profile sequencing (PIP-seq), a technique that uses ribonuclease-based footprinting followed by high-throughput sequencing to globally assess both protein-bound RNA sequences and RNA secondary structure. PIP-seq utilizes single- and double-stranded RNA-specific nucleases in the absence of proteins to infer RNA secondary structure. These libraries are also compared to samples that undergo nuclease digestion in the presence of proteins in order to find enriched protein-bound sequences. Combined, these four libraries provide a comprehensive, transcriptome-wide view of RNA secondary structure and RNA protein interaction sites from a single experimental technique. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
Predicting DNA hybridization kinetics from sequence

NASA Astrophysics Data System (ADS)

Zhang, Jinny X.; Fang, John Z.; Duan, Wei; Wu, Lucia R.; Zhang, Angela W.; Dalchau, Neil; Yordanov, Boyan; Petersen, Rasmus; Phillips, Andrew; Zhang, David Yu

2018-01-01

Hybridization is a key molecular process in biology and biotechnology, but so far there is no predictive model for accurately determining hybridization rate constants based on sequence information. Here, we report a weighted neighbour voting (WNV) prediction algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants. To construct this algorithm we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (36 nt sub-sequences of the CYCS and VEGF genes) at temperatures ranging from 28 to 55 °C. Automated feature selection and weighting optimization resulted in a final six-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 3 with ∼91% accuracy, based on leave-one-out cross-validation. Accurate prediction of hybridization kinetics allows the design of efficient probe sequences for genomics research.
Ion Torren Semiconductor Sequencing Allows Rapid, Low Cost Sequencing of the Human Exome (7th Annual SFAF Meeting, 2012)

ScienceCinema

Jenkins, David

2018-01-10

David Jenkins on "Ion Torrent semiconductor sequencing allows rapid, low-cost sequencing of the human exome" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Ion Torren Semiconductor Sequencing Allows Rapid, Low Cost Sequencing of the Human Exome (7th Annual SFAF Meeting, 2012)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jenkins, David

David Jenkins on "Ion Torrent semiconductor sequencing allows rapid, low-cost sequencing of the human exome" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Unlocking Short Read Sequencing for Metagenomics

DOE PAGES

Rodrigue, Sébastien; Materna, Arne C.; Timberlake, Sonia C.; ...

2010-07-28

We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read.
Composition for nucleic acid sequencing

DOEpatents

Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

2008-08-26

The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.
A disruptive sequencer meets disruptive publishing.

PubMed

Loman, Nick; Goodwin, Sarah; Jansen, Hans; Loose, Matt

2015-01-01

Nanopore sequencing was recently made available to users in the form of the Oxford Nanopore MinION. Released to users through an early access programme, the MinION is made unique by its tiny form factor and ability to generate very long sequences from single DNA molecules. The platform is undergoing rapid evolution with three distinct nanopore types and five updates to library preparation chemistry in the last 18 months. To keep pace with the rapid evolution of this sequencing platform, and to provide a space where new analysis methods can be openly discussed, we present a new F1000Research channel devoted to updates to and analysis of nanopore sequence data.
The limits of protein sequence comparison?

PubMed Central

Pearson, William R; Sierk, Michael L

2010-01-01

Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized. PMID:15919194
Preliminary Classification of Novel Hemorrhagic Fever-Causing Viruses Using Sequence-Based PAirwise Sequence Comparison (PASC) Analysis.

PubMed

Bào, Yīmíng; Kuhn, Jens H

2018-01-01

During the last decade, genome sequence-based classification of viruses has become increasingly prominent. Viruses can be even classified based on coding-complete genome sequence data alone. Nevertheless, classification remains arduous as experts are required to establish phylogenetic trees to depict the evolutionary relationships of such sequences for preliminary taxonomic placement. Pairwise sequence comparison (PASC) of genomes is one of several novel methods for establishing relationships among viruses. This method, provided by the US National Center for Biotechnology Information as an open-access tool, circumvents phylogenetics, and yet PASC results are often in agreement with those of phylogenetic analyses. Computationally inexpensive, PASC can be easily performed by non-taxonomists. Here we describe how to use the PASC tool for the preliminary classification of novel viral hemorrhagic fever-causing viruses.
BAC-pool 454-sequencing: A rapid and efficient approach to sequence complex tetraploid cotton genomes

USDA-ARS?s Scientific Manuscript database

New and emerging next generation sequencing technologies have been promising in reducing sequencing costs, but not significantly for complex polyploid plant genomes such as cotton. Large and highly repetitive genome of G. hirsutum (~2.5GB) is less amenable and cost-intensive with traditional BAC-by...

Animal selection for whole genome sequencing by quantifying the unique contribution of homozygous haplotypes sequenced

USDA-ARS?s Scientific Manuscript database

Major whole genome sequencing projects promise to identify rare and causal variants within livestock species; however, the efficient selection of animals for sequencing remains a major problem within these surveys. The goal of this project was to develop a library of high accuracy genetic variants f...
Update on Rover Sequencing and Visualization Program

NASA Technical Reports Server (NTRS)

Cooper, Brian; Hartman, Frank; Maxwell, Scott; Yen, Jeng; Wright, John; Balacuit, Carlos

2005-01-01

The Rover Sequencing and Visualization Program (RSVP) has been updated. RSVP was reported in Rover Sequencing and Visualization Program (NPO-30845), NASA Tech Briefs, Vol. 29, No. 4 (April 2005), page 38. To recapitulate: The Rover Sequencing and Visualization Program (RSVP) is the software tool to be used in the Mars Exploration Rover (MER) mission for planning rover operations and generating command sequences for accomplishing those operations. RSVP combines three-dimensional (3D) visualization for immersive exploration of the operations area, stereoscopic image display for high-resolution examination of the downlinked imagery, and a sophisticated command-sequence editing tool for analysis and completion of the sequences. RSVP is linked with actual flight code modules for operations rehearsal to provide feedback on the expected behavior of the rover prior to committing to a particular sequence. Playback tools allow for review of both rehearsed rover behavior and downlinked results of actual rover operations. These can be displayed simultaneously for comparison of rehearsed and actual activities for verification. The primary inputs to RSVP are downlink data products from the Operations Storage Server (OSS) and activity plans generated by the science team. The activity plans are high-level goals for the next day s activities. The downlink data products include imagery, terrain models, and telemetered engineering data on rover activities and state. The Rover Sequence Editor (RoSE) component of RSVP performs activity expansion to command sequences, command creation and editing with setting of command parameters, and viewing and management of rover resources. The HyperDrive component of RSVP performs 2D and 3D visualization of the rover s environment, graphical and animated review of rover predicted and telemetered state, and creation and editing of command sequences related to mobility and Instrument Deployment Device (robotic arm) operations. Additionally, RoSE and
Processing Translational Motion Sequences.

DTIC Science & Technology

1982-10-01

the initial ROADSIGN image using a (del)**2g mask with a width of 5 pixels The distinctiveness values were computed using features which were 5x5 pixel...the initial step size of the local search quite large. 34 4. EX P R g NTg The following experiments were performed using the roadsign and industrial...the initial image of the sequence. The third experiment involves processing the roadsign image sequence using the features extracted at the positions
Efficient generation of complete sequences of MDR-encoding plasmids by rapid assembly of MinION barcoding sequencing data.

PubMed

Li, Ruichao; Xie, Miaomiao; Dong, Ning; Lin, Dachuan; Yang, Xuemei; Wong, Marcus Ho Yin; Chan, Edward Wai-Chi; Chen, Sheng

2018-03-01

Multidrug resistance (MDR)-encoding plasmids are considered major molecular vehicles responsible for transmission of antibiotic resistance genes among bacteria of the same or different species. Delineating the complete sequences of such plasmids could provide valuable insight into the evolution and transmission mechanisms underlying bacterial antibiotic resistance development. However, due to the presence of multiple repeats of mobile elements, complete sequencing of MDR plasmids remains technically complicated, expensive, and time-consuming. Here, we demonstrate a rapid and efficient approach to obtaining multiple MDR plasmid sequences through the use of the MinION nanopore sequencing platform, which is incorporated in a portable device. By assembling the long sequencing reads generated by a single MinION run according to a rapid barcoding sequencing protocol, we obtained the complete sequences of 20 plasmids harbored by multiple bacterial strains. Importantly, single long reads covering a plasmid end-to-end were recorded, indicating that de novo assembly may be unnecessary if the single reads exhibit high accuracy. This workflow represents a convenient and cost-effective approach for systematic assessment of MDR plasmids responsible for treatment failure of bacterial infections, offering the opportunity to perform detailed molecular epidemiological studies to probe the evolutionary and transmission mechanisms of MDR-encoding elements.
Social and behavioral research in genomic sequencing: approaches from the Clinical Sequencing Exploratory Research Consortium Outcomes and Measures Working Group.

PubMed

Gray, Stacy W; Martins, Yolanda; Feuerman, Lindsay Z; Bernhardt, Barbara A; Biesecker, Barbara B; Christensen, Kurt D; Joffe, Steven; Rini, Christine; Veenstra, David; McGuire, Amy L

2014-10-01

The routine use of genomic sequencing in clinical medicine has the potential to dramatically alter patient care and medical outcomes. To fully understand the psychosocial and behavioral impact of sequencing integration into clinical practice, it is imperative that we identify the factors that influence sequencing-related decision making and patient outcomes. In an effort to develop a collaborative and conceptually grounded approach to studying sequencing adoption, members of the National Human Genome Research Institute's Clinical Sequencing Exploratory Research Consortium formed the Outcomes and Measures Working Group. Here we highlight the priority areas of investigation and psychosocial and behavioral outcomes identified by the Working Group. We also review some of the anticipated challenges to measurement in social and behavioral research related to genomic sequencing; opportunities for instrument development; and the importance of qualitative, quantitative, and mixed-method approaches. This work represents the early, shared efforts of multiple research teams as we strive to understand individuals' experiences with genomic sequencing. The resulting body of knowledge will guide recommendations for the optimal use of sequencing in clinical practice.
High-throughput sequence alignment using Graphics Processing Units

PubMed Central

Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

2007-01-01

Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356
Auto Mechanics: Scope and Sequence.

ERIC Educational Resources Information Center

Nashville - Davidson County Metropolitan Public Schools, TN.

This scope and sequence guide, developed for an auto mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…
Diesel Mechanics: Scope and Sequence.

ERIC Educational Resources Information Center

Nashville - Davidson County Metropolitan Public Schools, TN.

This scope and sequence guide, developed for a diesel mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…
Venturia carpophila draft genome sequence

USDA-ARS?s Scientific Manuscript database

Venturia carpophila causes peach scab, a disease that renders peach fruit unmarketable. We report a high-quality draft genome sequence (36.9 Mb) of V. carpophila from an isolate collected from a peach tree in central Georgia in the United States. The genome sequence described will be a useful resour...
Combining and Sequencing Games Skills

ERIC Educational Resources Information Center

Belka, David E.

2004-01-01

This article discusses the combination of skills into sequences. Combining skills into usable, challenging, and meaningful sequences is often neglected or under-used in many school and community game programs. Reasons for this under-use are discussed. Combinations of skills build on proficiency in performing separate skills and serve as…
Commercial Art: Scope and Sequence.

ERIC Educational Resources Information Center

Nashville - Davidson County Metropolitan Public Schools, TN.

This scope and sequence guide, developed for a commercial art vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…
Aircraft Mechanics: Scope and Sequence.

ERIC Educational Resources Information Center

Nashville - Davidson County Metropolitan Public Schools, TN.

This scope and sequence guide, developed for an aircraft mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and…
Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications

PubMed Central

Yilmaz, Pelin; Kottmann, Renzo; Field, Dawn; Knight, Rob; Cole, James R; Amaral-Zettler, Linda; Gilbert, Jack A; Karsch-Mizrachi, Ilene; Johnston, Anjanette; Cochrane, Guy; Vaughan, Robert; Hunter, Christopher; Park, Joonhong; Morrison, Norman; Rocca-Serra, Philippe; Sterk, Peter; Arumugam, Manimozhiyan; Bailey, Mark; Baumgartner, Laura; Birren, Bruce W; Blaser, Martin J; Bonazzi, Vivien; Booth, Tim; Bork, Peer; Bushman, Frederic D; Buttigieg, Pier Luigi; Chain, Patrick S G; Charlson, Emily; Costello, Elizabeth K; Huot-Creasy, Heather; Dawyndt, Peter; DeSantis, Todd; Fierer, Noah; Fuhrman, Jed A; Gallery, Rachel E; Gevers, Dirk; Gibbs, Richard A; Gil, Inigo San; Gonzalez, Antonio; Gordon, Jeffrey I; Guralnick, Robert; Hankeln, Wolfgang; Highlander, Sarah; Hugenholtz, Philip; Jansson, Janet; Kau, Andrew L; Kelley, Scott T; Kennedy, Jerry; Knights, Dan; Koren, Omry; Kuczynski, Justin; Kyrpides, Nikos; Larsen, Robert; Lauber, Christian L; Legg, Teresa; Ley, Ruth E; Lozupone, Catherine A; Ludwig, Wolfgang; Lyons, Donna; Maguire, Eamonn; Methé, Barbara A; Meyer, Folker; Muegge, Brian; Nakielny, Sara; Nelson, Karen E; Nemergut, Diana; Neufeld, Josh D; Newbold, Lindsay K; Oliver, Anna E; Pace, Norman R; Palanisamy, Giriprakash; Peplies, Jörg; Petrosino, Joseph; Proctor, Lita; Pruesse, Elmar; Quast, Christian; Raes, Jeroen; Ratnasingham, Sujeevan; Ravel, Jacques; Relman, David A; Assunta-Sansone, Susanna; Schloss, Patrick D; Schriml, Lynn; Sinha, Rohini; Smith, Michelle I; Sodergren, Erica; Spor, Aymé; Stombaugh, Jesse; Tiedje, James M; Ward, Doyle V; Weinstock, George M; Wendel, Doug; White, Owen; Whiteley, Andrew; Wilke, Andreas; Wortman, Jennifer R; Yatsunenko, Tanya; Glöckner, Frank Oliver

2012-01-01

Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences—the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The ‘environmental packages’ apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere. PMID:21552244
Comparison of next generation sequencing technologies for transcriptome characterization

PubMed Central

2009-01-01

Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary
Ancient DNA sequence revealed by error-correcting codes.

PubMed

Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

2015-07-10

A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.
Ancient DNA sequence revealed by error-correcting codes

PubMed Central

Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

2015-01-01

A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228
Vander Lugt correlation of DNA sequence data

NASA Astrophysics Data System (ADS)

Christens-Barry, William A.; Hawk, James F.; Martin, James C.

1990-12-01

DNA, the molecule containing the genetic code of an organism, is a linear chain of subunits. It is the sequence of subunits, of which there are four kinds, that constitutes the unique blueprint of an individual. This sequence is the focus of a large number of analyses performed by an army of geneticists, biologists, and computer scientists. Most of these analyses entail searches for specific subsequences within the larger set of sequence data. Thus, most analyses are essentially pattern recognition or correlation tasks. Yet, there are special features to such analysis that influence the strategy and methods of an optical pattern recognition approach. While the serial processing employed in digital electronic computers remains the main engine of sequence analyses, there is no fundamental reason that more efficient parallel methods cannot be used. We describe an approach using optical pattern recognition (OPR) techniques based on matched spatial filtering. This allows parallel comparison of large blocks of sequence data. In this study we have simulated a Vander Lugt1 architecture implementing our approach. Searches for specific target sequence strings within a block of DNA sequence from the Co/El plasmid2 are performed.
Rényi continuous entropy of DNA sequences.

PubMed

Vinga, Susana; Almeida, Jonas S

2004-12-07

Entropy measures of DNA sequences estimate their randomness or, inversely, their repeatability. L-block Shannon discrete entropy accounts for the empirical distribution of all length-L words and has convergence problems for finite sequences. A new entropy measure that extends Shannon's formalism is proposed. Renyi's quadratic entropy calculated with Parzen window density estimation method applied to CGR/USM continuous maps of DNA sequences constitute a novel technique to evaluate sequence global randomness without some of the former method drawbacks. The asymptotic behaviour of this new measure was analytically deduced and the calculation of entropies for several synthetic and experimental biological sequences was performed. The results obtained were compared with the distributions of the null model of randomness obtained by simulation. The biological sequences have shown a different p-value according to the kernel resolution of Parzen's method, which might indicate an unknown level of organization of their patterns. This new technique can be very useful in the study of DNA sequence complexity and provide additional tools for DNA entropy estimation. The main MATLAB applications developed and additional material are available at the webpage . Specialized functions can be obtained from the authors.
Sequence Polishing Library (SPL) v10.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oberortner, Ernst

The Sequence Polishing Library (SPL) is a suite of software tools in order to automate "Design for Synthesis and Assembly" workflows. Specifically: The SPL "Converter" tool converts files among the following sequence data exchange formats: CSV, FASTA, GenBank, and Synthetic Biology Open Language (SBOL); The SPL "Juggler" tool optimizes the codon usages of DNA coding sequences according to an optimization strategy, a user-specific codon usage table and genetic code. In addition, the SPL "Juggler" can translate amino acid sequences into DNA sequences.:The SPL "Polisher" verifies NA sequences against DNA synthesis constraints, such as GC content, repeating k-mers, and restriction sites.more » In case of violations, the "Polisher" reports the violations in a comprehensive manner. The "Polisher" tool can also modify the violating regions according to an optimization strategy, a user-specific codon usage table and genetic code;The SPL "Partitioner" decomposes large DNA sequences into smaller building blocks with partial overlaps that enable an efficient assembly. The "Partitioner" enables the user to configure the characteristics of the overlaps, which are mostly determined by the utilized assembly protocol, such as length, GC content, or melting temperature.« less
Health Occupations: Scope and Sequence.

ERIC Educational Resources Information Center

Nashville - Davidson County Metropolitan Public Schools, TN.

This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 3-year program in health occupations. The guide consists of a course description; general course…

Sequence Learning and Selection Difficulty

ERIC Educational Resources Information Center

Rowland, Lee A.; Shanks, David R.

2006-01-01

The authors studied the role of attention as a selection mechanism in implicit learning by examining the effect on primary sequence learning of performing a demanding target-selection task. Participants were trained on probabilistic sequences in a novel version of the serial reaction time (SRT) task, with dual- and triple-stimulus participants…
Urban Horticulture: Scope and Sequence.

ERIC Educational Resources Information Center

Nashville - Davidson County Metropolitan Public Schools, TN.

This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 4-year program in urban horticulture. The guide consists of a course description; general course…
VOE Accounting: Scope and Sequence.

ERIC Educational Resources Information Center

Nashville - Davidson County Metropolitan Public Schools, TN.

This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 2-year program in accounting. The guide consists of a course description; general course objectives;…
Simple sequence repeat marker development from bacterial artificial chromosome end sequences and expressed sequence tags of flax (Linum usitatissimum L.).

PubMed

Cloutier, Sylvie; Miranda, Evelyn; Ward, Kerry; Radovanovic, Natasa; Reimer, Elsa; Walichnowski, Andrzej; Datla, Raju; Rowland, Gordon; Duguid, Scott; Ragupathy, Raja

2012-08-01

Flax is an important oilseed crop in North America and is mostly grown as a fibre crop in Europe. As a self-pollinated diploid with a small estimated genome size of ~370 Mb, flax is well suited for fast progress in genomics. In the last few years, important genetic resources have been developed for this crop. Here, we describe the assessment and comparative analyses of 1,506 putative simple sequence repeats (SSRs) of which, 1,164 were derived from BAC-end sequences (BESs) and 342 from expressed sequence tags (ESTs). The SSRs were assessed on a panel of 16 flax accessions with 673 (58 %) and 145 (42 %) primer pairs being polymorphic in the BESs and ESTs, respectively. With 818 novel polymorphic SSR primer pairs reported in this study, the repertoire of available SSRs in flax has more than doubled from the combined total of 508 of all previous reports. Among nucleotide motifs, trinucleotides were the most abundant irrespective of the class, but dinucleotides were the most polymorphic. SSR length was also positively correlated with polymorphism. Two dinucleotide (AT/TA and AG/GA) and two trinucleotide (AAT/ATA/TAA and GAA/AGA/AAG) motifs and their iterations, different from those reported in many other crops, accounted for more than half of all the SSRs and were also more polymorphic (63.4 %) than the rest of the markers (42.7 %). This improved resource promises to be useful in genetic, quantitative trait loci (QTL) and association mapping as well as for anchoring the physical/genetic map with the whole genome shotgun reference sequence of flax.
A 28,000 Years Old Cro-Magnon mtDNA Sequence Differs from All Potentially Contaminating Modern Sequences

PubMed Central

Caramelli, David; Milani, Lucio; Vai, Stefania; Modi, Alessandra; Pecchioli, Elena; Girardi, Matteo; Pilli, Elena; Lari, Martina; Lippi, Barbara; Ronchitelli, Annamaria; Mallegni, Francesco; Casoli, Antonella; Bertorelle, Giorgio; Barbujani, Guido

2008-01-01

Background DNA sequences from ancient speciments may in fact result from undetected contamination of the ancient specimens by modern DNA, and the problem is particularly challenging in studies of human fossils. Doubts on the authenticity of the available sequences have so far hampered genetic comparisons between anatomically archaic (Neandertal) and early modern (Cro-Magnoid) Europeans. Methodology/Principal Findings We typed the mitochondrial DNA (mtDNA) hypervariable region I in a 28,000 years old Cro-Magnoid individual from the Paglicci cave, in Italy (Paglicci 23) and in all the people who had contact with the sample since its discovery in 2003. The Paglicci 23 sequence, determined through the analysis of 152 clones, is the Cambridge reference sequence, and cannot possibly reflect contamination because it differs from all potentially contaminating modern sequences. Conclusions/Significance: The Paglicci 23 individual carried a mtDNA sequence that is still common in Europe, and which radically differs from those of the almost contemporary Neandertals, demonstrating a genealogical continuity across 28,000 years, from Cro-Magnoid to modern Europeans. Because all potential sources of modern DNA contamination are known, the Paglicci 23 sample will offer a unique opportunity to get insight for the first time into the nuclear genes of early modern Europeans. PMID:18628960
Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

PubMed

Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

2013-01-01

Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.
Chromosomal localization and sequence analysis of a human episomal sequence with in vitro differentiating activity

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boccaccio, C.; Deshatrette, J.; Meunier-Rotival, M.

1994-05-01

The genomic fragment carrying the human activator of liver function, previously described as an episome capable of inducing differentiation upon transfection into a dedifferentiated rat hepatoma cell line, was mapped on human chromosome 12q24.2-12q24.3. This chromosomal location was indistinguishable by in situ hybridization from that of the gene coding for the hepatic transcription factor HNF1. The sequence of the integrated form of the episome as well as its flanking sequences show that it is rich in retroposons. It contains a human ribosomal protein L21 processed pseudogene, one truncated L1Hs sequence, and 10 Alu repeats, which belong to different subfamilies.
Identification of Genomic Insertion and Flanking Sequence of G2-EPSPS and GAT Transgenes in Soybean Using Whole Genome Sequencing Method.

PubMed

Guo, Bingfu; Guo, Yong; Hong, Huilong; Qiu, Li-Juan

2016-01-01

Molecular characterization of sequence flanking exogenous fragment insertion is essential for safety assessment and labeling of genetically modified organism (GMO). In this study, the T-DNA insertion sites and flanking sequences were identified in two newly developed transgenic glyphosate-tolerant soybeans GE-J16 and ZH10-6 based on whole genome sequencing (WGS) method. More than 22.4 Gb sequence data (∼21 × coverage) for each line was generated on Illumina HiSeq 2500 platform. The junction reads mapped to boundaries of T-DNA and flanking sequences in these two events were identified by comparing all sequencing reads with soybean reference genome and sequence of transgenic vector. The putative insertion loci and flanking sequences were further confirmed by PCR amplification, Sanger sequencing, and co-segregation analysis. All these analyses supported that exogenous T-DNA fragments were integrated in positions of Chr19: 50543767-50543792 and Chr17: 7980527-7980541 in these two transgenic lines. Identification of genomic insertion sites of G2-EPSPS and GAT transgenes will facilitate the utilization of their glyphosate-tolerant traits in soybean breeding program. These results also demonstrated that WGS was a cost-effective and rapid method for identifying sites of T-DNA insertions and flanking sequences in soybean.
Single-cell sequencing technologies: current and future.

PubMed

Liang, Jialong; Cai, Wanshi; Sun, Zhongsheng

2014-10-20

Intensively developed in the last few years, single-cell sequencing technologies now present numerous advantages over traditional sequencing methods for solving the problems of biological heterogeneity and low quantities of available biological materials. The application of single-cell sequencing technologies has profoundly changed our understanding of a series of biological phenomena, including gene transcription, embryo development, and carcinogenesis. However, before single-cell sequencing technologies can be used extensively, researchers face the serious challenge of overcoming inherent issues of high amplification bias, low accuracy and reproducibility. Here, we simply summarize the techniques used for single-cell isolation, and review the current technologies used in single-cell genomic, transcriptomic, and epigenomic sequencing. We discuss the merits, defects, and scope of application of single-cell sequencing technologies and then speculate on the direction of future developments. Copyright © 2014 Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. Published by Elsevier Ltd. All rights reserved.
Sequence to Structure (S2S): display, manipulate and interconnect RNA data from sequence to structure.

PubMed

Jossinet, Fabrice; Westhof, Eric

2005-08-01

Efficient RNA sequence manipulations (such as multiple alignments) need to be constrained by rules of RNA structure folding. The structural knowledge has increased dramatically in the last years with the accumulation of several large RNA structures similar to those of the bacterial ribosome subunits. However, no tool in the RNA community provides an easy way to link and integrate progress made at the sequence level using the available three-dimensional information. Sequence to Structure (S2S) proposes a framework in which an user can easily display, manipulate and interconnect heterogeneous RNA data, such as multiple sequence alignments, secondary and tertiary structures. S2S has been implemented using the Java language and has been developed and tested under UNIX systems, such as Linux and MacOSX. S2S is available at http://bioinformatics.org/S2S/.
MIPS: a database for genomes and protein sequences.

PubMed Central

Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D

1999-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138
Sequencing BPS spectra

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar

In this article, we provide both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explainmore » from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincar e polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (re fined) modular S-matrix. This leads to the identi fication of modular transformations in Chern-Simons theory and 3d N = 2 theory via the 3d/3d correspondence. In conclusion, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.« less
Sequencing BPS spectra

DOE PAGES

Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; ...

2016-03-02

In this article, we provide both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explainmore » from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincar e polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (re fined) modular S-matrix. This leads to the identi fication of modular transformations in Chern-Simons theory and 3d N = 2 theory via the 3d/3d correspondence. In conclusion, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.« less
Advances in high throughput DNA sequence data compression.

PubMed

Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz

2016-06-01

Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.
Low-Energy Electron-Induced Strand Breaks in Telomere-Derived DNA Sequences-Influence of DNA Sequence and Topology.

PubMed

Rackwitz, Jenny; Bald, Ilko

2018-03-26

During cancer radiation therapy high-energy radiation is used to reduce tumour tissue. The irradiation produces a shower of secondary low-energy (<20 eV) electrons, which are able to damage DNA very efficiently by dissociative electron attachment. Recently, it was suggested that low-energy electron-induced DNA strand breaks strongly depend on the specific DNA sequence with a high sensitivity of G-rich sequences. Here, we use DNA origami platforms to expose G-rich telomere sequences to low-energy (8.8 eV) electrons to determine absolute cross sections for strand breakage and to study the influence of sequence modifications and topology of telomeric DNA on the strand breakage. We find that the telomeric DNA 5'-(TTA GGG) 2 is more sensitive to low-energy electrons than an intermixed sequence 5'-(TGT GTG A) 2 confirming the unique electronic properties resulting from G-stacking. With increasing length of the oligonucleotide (i.e., going from 5'-(GGG ATT) 2 to 5'-(GGG ATT) 4 ), both the variety of topology and the electron-induced strand break cross sections increase. Addition of K + ions decreases the strand break cross section for all sequences that are able to fold G-quadruplexes or G-intermediates, whereas the strand break cross section for the intermixed sequence remains unchanged. These results indicate that telomeric DNA is rather sensitive towards low-energy electron-induced strand breakage suggesting significant telomere shortening that can also occur during cancer radiation therapy. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Genome sequence of Phytophthora ramorum: implications for management

Treesearch

Brett Tyler; Sucheta Tripathy; Nik Grunwald; Kurt Lamour; Kelly Ivors; Matteo Garbelotto; Daniel Rokhsar; Nik Putnam; Igor Grigoriev; Jeffrey Boore

2006-01-01

A draft genome sequence has been determined for Phytophthora ramorum, together with a draft sequence of the soybean pathogen Phytophthora sojae. The P. ramorum genome was sequenced to a depth of 7-fold coverage, while the P. sojae genome was sequenced to a depth of 9-fold coverage. The genome...
RSAT 2015: Regulatory Sequence Analysis Tools

PubMed Central

Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2015-01-01

RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632
Identifying functionally informative evolutionary sequence profiles.

PubMed

Gil, Nelson; Fiser, Andras

2018-04-15

Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.
Attentional load and implicit sequence learning.

PubMed

Shanks, David R; Rowland, Lee A; Ranger, Mandeep S

2005-06-01

A widely employed conceptualization of implicit learning hypothesizes that it makes minimal demands on attentional resources. This conjecture was investigated by comparing learning under single-task and dual-task conditions in the sequential reaction time (SRT) task. Participants learned probabilistic sequences, with dual-task participants additionally having to perform a counting task using stimuli that were targets in the SRT display. Both groups were then tested for sequence knowledge under single-task (Experiments 1 and 2) or dual-task (Experiment 3) conditions. Participants also completed a free generation task (Experiments 2 and 3) under inclusion or exclusion conditions to determine if sequence knowledge was conscious or unconscious in terms of its access to intentional control. The experiments revealed that the secondary task impaired sequence learning and that sequence knowledge was consciously accessible. These findings disconfirm both the notion that implicit learning is able to proceed normally under conditions of divided attention, and that the acquired knowledge is inaccessible to consciousness. A unitary framework for conceptualizing implicit and explicit learning is proposed.
Snake Genome Sequencing: Results and Future Prospects

PubMed Central

Kerkkamp, Harald M. I.; Kini, R. Manjunatha; Pospelov, Alexey S.; Vonk, Freek J.; Henkel, Christiaan V.; Richardson, Michael K.

2016-01-01

Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression. PMID:27916957

Snake Genome Sequencing: Results and Future Prospects.

PubMed

Kerkkamp, Harald M I; Kini, R Manjunatha; Pospelov, Alexey S; Vonk, Freek J; Henkel, Christiaan V; Richardson, Michael K

2016-12-01

Snake genome sequencing is in its infancy-very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.
Sequencing and comparing whole mitochondrial genomes ofanimals

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

2005-04-22

Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based onmore » our experiences to date with determining and comparing complete mtDNA sequences.« less
Sequence-Selective Formation of Synthetic H-Bonded Duplexes

PubMed Central

2017-01-01

Oligomers equipped with a sequence of phenol and pyridine N-oxide groups form duplexes via H-bonding interactions between these recognition units. Reductive amination chemistry was used to synthesize all possible 3-mer sequences: AAA, AAD, ADA, DAA, ADD, DAD, DDA, and DDD. Pairwise interactions between the oligomers were investigated using NMR titration and dilution experiments in toluene. The measured association constants vary by 3 orders of magnitude (102 to 105 M–1). Antiparallel sequence-complementary oligomers generally form more stable complexes than mismatched duplexes. Mismatched duplexes that have an excess of H-bond donors are stabilized by the interaction of two phenol donors with one pyridine N-oxide acceptor. Oligomers that have a H-bond donor and acceptor on the ends of the chain can fold to form intramolecular H-bonds in the free state. The 1,3-folding equilibrium competes with duplex formation and lowers the stability of duplexes involving these sequences. As a result, some of the mismatch duplexes are more stable than some of the sequence-complementary duplexes. However, the most stable mismatch duplexes contain DDD and compete with the most stable sequence-complementary duplex, AAA·DDD, so in mixtures that contain all eight sequences, sequence-complementary duplexes dominate. Even higher fidelity sequence selectivity can be achieved if alternating donor–acceptor sequences are avoided. PMID:28857551
High-Throughput Next-Generation Sequencing of Polioviruses

PubMed Central

Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.

2016-01-01

ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929
Sequence Segmentation with changeptGUI.

PubMed

Tasker, Edward; Keith, Jonathan M

2017-01-01

Many biological sequences have a segmental structure that can provide valuable clues to their content, structure, and function. The program changept is a tool for investigating the segmental structure of a sequence, and can also be applied to multiple sequences in parallel to identify a common segmental structure, thus providing a method for integrating multiple data types to identify functional elements in genomes. In the previous edition of this book, a command line interface for changept is described. Here we present a graphical user interface for this package, called changeptGUI. This interface also includes tools for pre- and post-processing of data and results to facilitate investigation of the number and characteristics of segment classes.
A Main Sequence For Quasars

NASA Astrophysics Data System (ADS)

Marziani, Paola; Sulentic, J. W.; Dultzin, D.; Negrete, A.; del Olmo, A.; Martínez-Carballo, M. A.; Stirpe, G. M.; D'Onofrio, M.; Perea, J.

2016-10-01

The 4D eigenvector 1 parameter space defined by Sulentic et al. may be seen as a surrogate H-R diagram for quasars. As in the stellar H-R diagram, a source sequence can be easily identified. In the case of quasars, the main sequence appears to be mainly driven by Eddington ratio. A transition Eddington ratio may in part explain the striking observational differences between quasars at opposite ends of the main sequence. The eigenvector-1 approach opens the door towards properly contextualized models of quasar physics, geometry and kinematics. We review some of the progress that has been made over the past 15 years, and point out still unsolved issues.
On the Modularity of Implicit Sequence Learning: Independent Acquisition of Spatial, Symbolic, and Manual Sequences

ERIC Educational Resources Information Center

Goschke, Thomas; Bolte, Annette

2012-01-01

Learning sequential structures is of fundamental importance for a wide variety of human skills. While it has long been debated whether implicit sequence learning is perceptual or response-based, here we propose an alternative framework that cuts across this dichotomy and assumes that sequence learning rests on associative changes that can occur…
Sequence information gain based motif analysis.

PubMed

Maynou, Joan; Pairó, Erola; Marco, Santiago; Perera, Alexandre

2015-11-09

The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70% of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.
HPV-QUEST: A highly customized system for automated HPV sequence analysis capable of processing Next Generation sequencing data set.

PubMed

Yin, Li; Yao, Jiqiang; Gardner, Brent P; Chang, Kaifen; Yu, Fahong; Goodenow, Maureen M

2012-01-01

Next Generation sequencing (NGS) applied to human papilloma viruses (HPV) can provide sensitive methods to investigate the molecular epidemiology of multiple type HPV infection. Currently a genotyping system with a comprehensive collection of updated HPV reference sequences and a capacity to handle NGS data sets is lacking. HPV-QUEST was developed as an automated and rapid HPV genotyping system. The web-based HPV-QUEST subtyping algorithm was developed using HTML, PHP, Perl scripting language, and MYSQL as the database backend. HPV-QUEST includes a database of annotated HPV reference sequences with updated nomenclature covering 5 genuses, 14 species and 150 mucosal and cutaneous types to genotype blasted query sequences. HPV-QUEST processes up to 10 megabases of sequences within 1 to 2 minutes. Results are reported in html, text and excel formats and display e-value, blast score, and local and coverage identities; provide genus, species, type, infection site and risk for the best matched reference HPV sequence; and produce results ready for additional analyses.
Spatio-temporal alignment of pedobarographic image sequences.

PubMed

Oliveira, Francisco P M; Sousa, Andreia; Santos, Rubim; Tavares, João Manuel R S

2011-07-01

This article presents a methodology to align plantar pressure image sequences simultaneously in time and space. The spatial position and orientation of a foot in a sequence are changed to match the foot represented in a second sequence. Simultaneously with the spatial alignment, the temporal scale of the first sequence is transformed with the aim of synchronizing the two input footsteps. Consequently, the spatial correspondence of the foot regions along the sequences as well as the temporal synchronizing is automatically attained, making the study easier and more straightforward. In terms of spatial alignment, the methodology can use one of four possible geometric transformation models: rigid, similarity, affine, or projective. In the temporal alignment, a polynomial transformation up to the 4th degree can be adopted in order to model linear and curved time behaviors. Suitable geometric and temporal transformations are found by minimizing the mean squared error (MSE) between the input sequences. The methodology was tested on a set of real image sequences acquired from a common pedobarographic device. When used in experimental cases generated by applying geometric and temporal control transformations, the methodology revealed high accuracy. In addition, the intra-subject alignment tests from real plantar pressure image sequences showed that the curved temporal models produced better MSE results (P < 0.001) than the linear temporal model. This article represents an important step forward in the alignment of pedobarographic image data, since previous methods can only be applied on static images.
Protecting genomic sequence anonymity with generalization lattices.

PubMed

Malin, B A

2005-01-01

Current genomic privacy technologies assume the identity of genomic sequence data is protected if personal information, such as demographics, are obscured, removed, or encrypted. While demographic features can directly compromise an individual's identity, recent research demonstrates such protections are insufficient because sequence data itself is susceptible to re-identification. To counteract this problem, we introduce an algorithm for anonymizing a collection of person-specific DNA sequences. The technique is termed DNA lattice anonymization (DNALA), and is based upon the formal privacy protection schema of k -anonymity. Under this model, it is impossible to observe or learn features that distinguish one genetic sequence from k-1 other entries in a collection. To maximize information retained in protected sequences, we incorporate a concept generalization lattice to learn the distance between two residues in a single nucleotide region. The lattice provides the most similar generalized concept for two residues (e.g. adenine and guanine are both purines). The method is tested and evaluated with several publicly available human population datasets ranging in size from 30 to 400 sequences. Our findings imply the anonymization schema is feasible for the protection of sequences privacy. The DNALA method is the first computational disclosure control technique for general DNA sequences. Given the computational nature of the method, guarantees of anonymity can be formally proven. There is room for improvement and validation, though this research provides the groundwork from which future researchers can construct genomics anonymization schemas tailored to specific datasharing scenarios.
Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing.

PubMed

Liu, Yu; Koyutürk, Mehmet; Maxwell, Sean; Xiang, Min; Veigl, Martina; Cooper, Richard S; Tayo, Bamidele O; Li, Li; LaFramboise, Thomas; Wang, Zhenghe; Zhu, Xiaofeng; Chance, Mark R

2014-08-16

Sequences up to several megabases in length have been found to be present in individual genomes but absent in the human reference genome. These sequences may be common in populations, and their absence in the reference genome may indicate rare variants in the genomes of individuals who served as donors for the human genome project. As the reference genome is used in probe design for microarray technology and mapping short reads in next generation sequencing (NGS), this missing sequence could be a source of bias in functional genomic studies and variant analysis. One End Anchor (OEA) and/or orphan reads from paired-end sequencing have been used to identify novel sequences that are absent in reference genome. However, there is no study to investigate the distribution, evolution and functionality of those sequences in human populations. To systematically identify and study the missing common sequences (micSeqs), we extended the previous method by pooling OEA reads from large number of individuals and applying strict filtering methods to remove false sequences. The pipeline was applied to data from phase 1 of the 1000 Genomes Project. We identified 309 micSeqs that are present in at least 1% of the human population, but absent in the reference genome. We confirmed 76% of these 309 micSeqs by comparison to other primate genomes, individual human genomes, and gene expression data. Furthermore, we randomly selected fifteen micSeqs and confirmed their presence using PCR validation in 38 additional individuals. Functional analysis using published RNA-seq and ChIP-seq data showed that eleven micSeqs are highly expressed in human brain and three micSeqs contain transcription factor (TF) binding regions, suggesting they are functional elements. In addition, the identified micSeqs are absent in non-primates and show dynamic acquisition during primate evolution culminating with most micSeqs being present in Africans, suggesting some micSeqs may be important sources of human
Recently published protein sequences. I.

NASA Technical Reports Server (NTRS)

Jukes, T. H.; Holmquist, R.

1972-01-01

Some polypeptide sequences that have been published in the 1972 scientific literature are listed. Only selected sequences are included. The compilation has two objectives. Current information between periods when more comprehensive compilations are published is to be assembled and the use of data that do not include arrangements of unsequenced peptides for 'maximum homology' is to be encouraged.
Computational identification of CDR3 sequence archetypes among immunoglobulin sequences in chronic lymphocytic leukemia.

PubMed

Messmer, Bradley T; Raphael, Benjamin J; Aerni, Sarah J; Widhopf, George F; Rassenti, Laura Z; Gribben, John G; Kay, Neil E; Kipps, Thomas J

2009-03-01

The leukemia cells of unrelated patients with chronic lymphocytic leukemia (CLL) display a restricted repertoire of immunoglobulin (Ig) gene rearrangements with preferential usage of certain Ig gene segments. We developed a computational method to rigorously quantify biases in Ig sequence similarity in large patient databases and to identify groups of patients with unusual levels of sequence similarity. We applied our method to sequences from 1577 CLL patients through the CLL Research Consortium (CRC), and identified 67 similarity groups into which roughly 20% of all patients could be assigned. Immunoglobulin light chain class was highly correlated within all groups and light chain gene usage was similar within sets. Surprisingly, over 40% of the identified groups were composed of somatically mutated genes. This study significantly expands the evidence that antigen selection shapes the Ig repertoire in CLL.
Inferring Short-Range Linkage Information from Sequencing Chromatograms

PubMed Central

Beggel, Bastian; Neumann-Fraune, Maria; Kaiser, Rolf; Verheyen, Jens; Lengauer, Thomas

2013-01-01

Direct Sanger sequencing of viral genome populations yields multiple ambiguous sequence positions. It is not straightforward to derive linkage information from sequencing chromatograms, which in turn hampers the correct interpretation of the sequence data. We present a method for determining the variants existing in a viral quasispecies in the case of two nearby ambiguous sequence positions by exploiting the effect of sequence context-dependent incorporation of dideoxynucleotides. The computational model was trained on data from sequencing chromatograms of clonal variants and was evaluated on two test sets of in vitro mixtures. The approach achieved high accuracies in identifying the mixture components of 97.4% on a test set in which the positions to be analyzed are only one base apart from each other, and of 84.5% on a test set in which the ambiguous positions are separated by three bases. In silico experiments suggest two major limitations of our approach in terms of accuracy. First, due to a basic limitation of Sanger sequencing, it is not possible to reliably detect minor variants with a relative frequency of no more than 10%. Second, the model cannot distinguish between mixtures of two or four clonal variants, if one of two sets of linear constraints is fulfilled. Furthermore, the approach requires repetitive sequencing of all variants that might be present in the mixture to be analyzed. Nevertheless, the effectiveness of our method on the two in vitro test sets shows that short-range linkage information of two ambiguous sequence positions can be inferred from Sanger sequencing chromatograms without any further assumptions on the mixture composition. Additionally, our model provides new insights into the established and widely used Sanger sequencing technology. The source code of our method is made available at http://bioinf.mpi-inf.mpg.de/publications/beggel/linkageinformation.zip. PMID:24376502
Folding and Stabilization of Native-Sequence-Reversed Proteins

PubMed Central

Zhang, Yuanzhao; Weber, Jeffrey K; Zhou, Ruhong

2016-01-01

Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols. PMID:27113844
Folding and Stabilization of Native-Sequence-Reversed Proteins

NASA Astrophysics Data System (ADS)

Zhang, Yuanzhao; Weber, Jeffrey K.; Zhou, Ruhong

2016-04-01

Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols.
The diploid genome sequence of an Asian individual

PubMed Central

Wang, Jun; Wang, Wei; Li, Ruiqiang; Li, Yingrui; Tian, Geng; Goodman, Laurie; Fan, Wei; Zhang, Junqing; Li, Jun; Zhang, Juanbin; Guo, Yiran; Feng, Binxiao; Li, Heng; Lu, Yao; Fang, Xiaodong; Liang, Huiqing; Du, Zhenglin; Li, Dong; Zhao, Yiqing; Hu, Yujie; Yang, Zhenzhen; Zheng, Hancheng; Hellmann, Ines; Inouye, Michael; Pool, John; Yi, Xin; Zhao, Jing; Duan, Jinjie; Zhou, Yan; Qin, Junjie; Ma, Lijia; Li, Guoqing; Yang, Zhentao; Zhang, Guojie; Yang, Bin; Yu, Chang; Liang, Fang; Li, Wenjie; Li, Shaochuan; Li, Dawei; Ni, Peixiang; Ruan, Jue; Li, Qibin; Zhu, Hongmei; Liu, Dongyuan; Lu, Zhike; Li, Ning; Guo, Guangwu; Zhang, Jianguo; Ye, Jia; Fang, Lin; Hao, Qin; Chen, Quan; Liang, Yu; Su, Yeyang; san, A.; Ping, Cuo; Yang, Shuang; Chen, Fang; Li, Li; Zhou, Ke; Zheng, Hongkun; Ren, Yuanyuan; Yang, Ling; Gao, Yang; Yang, Guohua; Li, Zhuo; Feng, Xiaoli; Kristiansen, Karsten; Wong, Gane Ka-Shu; Nielsen, Rasmus; Durbin, Richard; Bolund, Lars; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian

2009-01-01

Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics. PMID:18987735
Coarse-grained sequences for protein folding and design

PubMed Central

Brown, Scott; Fawzi, Nicolas J.; Head-Gordon, Teresa

2003-01-01

We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the α/β ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design. PMID:12963815
Coarse-grained sequences for protein folding and design.

PubMed

Brown, Scott; Fawzi, Nicolas J; Head-Gordon, Teresa

2003-09-16

We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the alpha/beta ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design.

FASH: A web application for nucleotides sequence search.

PubMed

Veksler-Lublinksy, Isana; Barash, Danny; Avisar, Chai; Troim, Einav; Chew, Paul; Kedem, Klara

2008-05-27

: FASH (Fourier Alignment Sequence Heuristics) is a web application, based on the Fast Fourier Transform, for finding remote homologs within a long nucleic acid sequence. Given a query sequence and a long text-sequence (e.g, the human genome), FASH detects subsequences within the text that are remotely-similar to the query. FASH offers an alternative approach to Blast/Fasta for querying long RNA/DNA sequences. FASH differs from these other approaches in that it does not depend on the existence of contiguous seed-sequences in its initial detection phase. The FASH web server is user friendly and very easy to operate. FASH can be accessed athttps://fash.bgu.ac.il:8443/fash/default.jsp (secured website).
Aftershock occurrence rate decay for individual sequences and catalogs

NASA Astrophysics Data System (ADS)

Nyffenegger, Paul A.

One of the earliest observations of the Earth's seismicity is that the rate of aftershock occurrence decays with time according to a power law commonly known as modified Omori-law (MOL) decay. However, the physical reasons for aftershock occurrence and the empirical decay in rate remain unclear despite numerous models that yield similar rate decay behavior. Key problems in relating the observed empirical relationship to the physical conditions of the mainshock and fault are the lack of studies including small magnitude mainshocks and the lack of uniformity between studies. We use simulated aftershock sequences to investigate the factors which influence the maximum likelihood (ML) estimate of the Omori-law p value, the parameter describing aftershock occurrence rate decay, for both individual aftershock sequences and "stacked" or superposed sequences. Generally the ML estimate of p is accurate, but since the ML estimated uncertainty is unaffected by whether the sequence resembles an MOL model, a goodness-of-fit test such as the Anderson-Darling statistic is necessary. While stacking aftershock sequences permits the study of entire catalogs and sequences with small aftershock populations, stacking introduces artifacts. The p value for stacked sequences is approximately equal to the mean of the individual sequence p values. We apply single-link cluster analysis to identify all aftershock sequences from eleven regional seismicity catalogs. We observe two new mathematically predictable empirical relationships for the distribution of aftershock sequence populations. The average properties of aftershock sequences are not correlated with tectonic environment, but aftershock populations and p values do show a depth dependence. The p values show great variability with time, and large values or changes in p sometimes precedes major earthquakes. Studies of teleseismic earthquake catalogs over the last twenty years have led seismologists to question seismicity models and
Gelada vocal sequences follow Menzerath's linguistic law.

PubMed

Gustison, Morgan L; Semple, Stuart; Ferrer-I-Cancho, Ramon; Bergman, Thore J

2016-05-10

Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath's law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath's law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath's law reflects compression-the principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language.
Unified Deep Learning Architecture for Modeling Biology Sequence.

PubMed

Wu, Hongjie; Cao, Chengyuan; Xia, Xiaoyan; Lu, Qiang

2017-10-09

Prediction of the spatial structure or function of biological macromolecules based on their sequence remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, characteristics, such as long-range interactions between basic units, the complicated and variable output of labeled structures, and the variable length of biological sequences, usually lead to different solutions on a case-by-case basis. This study proposed the use of bidirectional recurrent neural networks based on long short-term memory or a gated recurrent unit to capture long-range interactions by designing the optional reshape operator to adapt to the diversity of the output labels and implementing a training algorithm to support the training of sequence models capable of processing variable-length sequences. Additionally, the merge and pooling operators enhanced the ability to capture short-range interactions between basic units of biological sequences. The proposed deep-learning model and its training algorithm might be capable of solving currently known biological sequence-modeling problems through the use of a unified framework. We validated our model on one of the most difficult biological sequence-modeling problems currently known, with our results indicating the ability of the model to obtain predictions of protein residue interactions that exceeded the accuracy of current popular approaches by 10% based on multiple benchmarks.
Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition

PubMed Central

2012-01-01

Background Transposition in IS3, IS30, IS21 and IS256 insertion sequence (IS) families utilizes an unconventional two-step pathway. A figure-of-eight intermediate in Step I, from asymmetric single-strand cleavage and joining reactions, is converted into a double-stranded minicircle whose junction (the abutted left and right ends) is the substrate for symmetrical transesterification attacks on target DNA in Step II, suggesting intrinsically different synaptic complexes (SC) for each step. Transposases of these ISs bind poorly to cognate DNA and comparative biophysical analyses of SC I and SC II have proven elusive. We have prepared a native, soluble, active, GFP-tagged fusion derivative of the IS2 transposase that creates fully formed complexes with single-end and minicircle junction (MCJ) substrates and used these successfully in hydroxyl radical footprinting experiments. Results In IS2, Step I reactions are physically and chemically asymmetric; the left imperfect, inverted repeat (IRL), the exclusive recipient end, lacks donor function. In SC I, different protection patterns of the cleavage domains (CDs) of the right imperfect inverted repeat (IRR; extensive in cis) and IRL (selective in trans) at the single active cognate IRR catalytic center (CC) are related to their donor and recipient functions. In SC II, extensive binding of the IRL CD in trans and of the abutted IRR CD in cis at this CC represents the first phase of the complex. An MCJ substrate precleaved at the 3' end of IRR revealed a temporary transition state with the IRL CD disengaged from the protein. We propose that in SC II, sequential 3' cleavages at the bound abutted CDs trigger a conformational change, allowing the IRL CD to complex to its cognate CC, producing the second phase. Corroborating data from enhanced residues and curvature propensity plots suggest that CD to CD interactions in SC I and SC II require IRL to assume a bent structure, to facilitate binding in trans. Conclusions Different
New Sequences with Low Correlation and Large Family Size

NASA Astrophysics Data System (ADS)

Zeng, Fanxin

In direct-sequence code-division multiple-access (DS-CDMA) communication systems and direct-sequence ultra wideband (DS-UWB) radios, sequences with low correlation and large family size are important for reducing multiple access interference (MAI) and accepting more active users, respectively. In this paper, a new collection of families of sequences of length pn-1, which includes three constructions, is proposed. The maximum number of cyclically distinct families without GMW sequences in each construction is φ(pn-1)/n·φ(pm-1)/m, where p is a prime number, n is an even number, and n=2m, and these sequences can be binary or polyphase depending upon choice of the parameter p. In Construction I, there are pn distinct sequences within each family and the new sequences have at most d+2 nontrivial periodic correlation {-pm-1, -1, pm-1, 2pm-1,…,dpm-1}. In Construction II, the new sequences have large family size p2n and possibly take the nontrivial correlation values in {-pm-1, -1, pm-1, 2pm-1,…,(3d-4)pm-1}. In Construction III, the new sequences possess the largest family size p(d-1)n and have at most 2d correlation levels {-pm-1, -1,pm-1, 2pm-1,…,(2d-2)pm-1}. Three constructions are near-optimal with respect to the Welch bound because the values of their Welch-Ratios are moderate, WR_??_d, WR_??_3d-4 and WR_??_2d-2, respectively. Each family in Constructions I, II and III contains a GMW sequence. In addition, Helleseth sequences and Niho sequences are special cases in Constructions I and III, and their restriction conditions to the integers m and n, pm≠2 (mod 3) and n≅0 (mod 4), respectively, are removed in our sequences. Our sequences in Construction III include the sequences with Niho type decimation 3·2m-2, too. Finally, some open questions are pointed out and an example that illustrates the performance of these sequences is given.
The first genome sequences of human bocaviruses from Vietnam

PubMed Central

Thanh, Tran Tan; Van, Hoang Minh Tu; Hong, Nguyen Thi Thu; Nhu, Le Nguyen Truc; Anh, Nguyen To; Tuan, Ha Manh; Hien, Ho Van; Tuong, Nguyen Manh; Kien, Trinh Trung; Khanh, Truong Huu; Nhan, Le Nguyen Thanh; Hung, Nguyen Thanh; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H. Rogier; Tan, Le Van

2017-01-01

As part of an ongoing effort to generate complete genome sequences of hand, foot and mouth disease-causing enteroviruses directly from clinical specimens, two complete coding sequences and two partial genomic sequences of human bocavirus 1 (n=3) and 2 (n=1) were co-amplified and sequenced, representing the first genome sequences of human bocaviruses from Vietnam. The sequences may aid future study aiming at understanding the evolution of the virus. PMID:28090592
Burrow-generated false facies and phantom sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wanless, H.R.; Tagett, M.

Callianassa (=Ophiomorpha) and other burrowers deeply rework shallow marine sequences. Through in-situ reworking, they create false sedimentary facies and stratigraphic sequences. Callianassa's key to effectiveness is that it expels sand and mud from burrow excavations but concentrates coarse material at the base of the burrow complex. Coarse material can be derived by falling into the burrow entrance, by reworking the existing sediment sequence, or by a combination of both. Examples come from shallow marine carbonate environments of south Florida and the Turks and Caicos Islands, British West Indies. Many mudbanks in south Florida are formed as stacks of layered mudstonemore » units 20-100 cm thick. Between events, seagrasses may recolonize, and a burrowing benthic community may repopulate the substrate. The layered mudstone beneath older areas of mudbank flats can gradually be converted to a bioturbated skeletal wackestone by the deep burrowing community. Burrowing also causes mixing of faunal assemblages. On Caicos Bank, an extensive carbonate tidal flat (3-4 m thick) is slowly being transgressed. About 1 m of tidal-flat sequence is eroded at the shoreline. The remaining 2-3 m could be preserved as part of the transgressive sequence. Callianassa burrowing, however, quickly reworks the sequence, replacing tidal-flat sands and muds with marine peloidal and skeletal sediment. Within 100 m of the shoreline, the only evidence of the tidal-flat sequence is a concentration of high-spired gastropods in Calliannassa burrows at the base of the Holocene sequence and a few patches of tidal-flat sediment that burrowers missed. What looks like a basal transgressive lag is in fact a biogenic concentrate from in-situ reworking of a now phantom sequence.« less
Robust temporal alignment of multimodal cardiac sequences

NASA Astrophysics Data System (ADS)

Perissinotto, Andrea; Queirós, Sandro; Morais, Pedro; Baptista, Maria J.; Monaghan, Mark; Rodrigues, Nuno F.; D'hooge, Jan; Vilaça, João. L.; Barbosa, Daniel

2015-03-01

Given the dynamic nature of cardiac function, correct temporal alignment of pre-operative models and intraoperative images is crucial for augmented reality in cardiac image-guided interventions. As such, the current study focuses on the development of an image-based strategy for temporal alignment of multimodal cardiac imaging sequences, such as cine Magnetic Resonance Imaging (MRI) or 3D Ultrasound (US). First, we derive a robust, modality-independent signal from the image sequences, estimated by computing the normalized cross-correlation between each frame in the temporal sequence and the end-diastolic frame. This signal is a resembler for the left-ventricle (LV) volume curve over time, whose variation indicates different temporal landmarks of the cardiac cycle. We then perform the temporal alignment of these surrogate signals derived from MRI and US sequences of the same patient through Dynamic Time Warping (DTW), allowing to synchronize both sequences. The proposed framework was evaluated in 98 patients, which have undergone both 3D+t MRI and US scans. The end-systolic frame could be accurately estimated as the minimum of the image-derived surrogate signal, presenting a relative error of 1.6 +/- 1.9% and 4.0 +/- 4.2% for the MRI and US sequences, respectively, thus supporting its association with key temporal instants of the cardiac cycle. The use of DTW reduces the desynchronization of the cardiac events in MRI and US sequences, allowing to temporally align multimodal cardiac imaging sequences. Overall, a generic, fast and accurate method for temporal synchronization of MRI and US sequences of the same patient was introduced. This approach could be straightforwardly used for the correct temporal alignment of pre-operative MRI information and intra-operative US images.
Discours polemique, refutation et resolution des sequences conversationnelles (Argumentative Discourse, Refutation and Outcome of Conversational Sequences).

ERIC Educational Resources Information Center

Moeschler, Jacques

1981-01-01

Analyzes the strategies employed in terminating conversational exchanges, with particular attention to argumentative sequences. Examines the features that distinguish these sequences from those that have a transactional character, and discusses the patterns of verbal interaction attendant to negative responses. Societe Nouvelle Didier Erudition,…
Investigation into the sequence structure of 23 Y chromosomal STR loci using massively parallel sequencing.

PubMed

Kwon, So Yeun; Lee, Hwan Young; Kim, Eun Hye; Lee, Eun Young; Shin, Kyoung-Jin

2016-11-01

Next-generation sequencing (NGS) can produce massively parallel sequencing (MPS) data for many targeted regions with a high depth of coverage, suggesting its successful application to the amplicons of forensic genetic markers. In the present study, we evaluated the practical utility of MPS in Y-chromosome short tandem repeat (Y-STR) analysis using a multiplex polymerase chain reaction (PCR) system. The multiplex PCR system simultaneously amplified 24 Y-chromosomal markers, including the PowerPlex ® Y23 loci (DYS19, DYS385ab, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS481, DYS533, DYS549, DYS570, DYS576, DYS635, DYS643, and YGATAH4) and the M175 marker with the small-sized amplicons ranging from 85 to 253bp. The barcoded libraries for the amplicons of the 24 Y-chromosomal markers were produced using a simplified PCR-based library preparation method and successfully sequenced using MPS on a MiSeq ® System with samples from 250 unrelated Korean males. The genotyping concordance between MPS and the capillary electrophoresis (CE) method, as well as the sequence structure of the 23 Y-STRs, were investigated. Three samples exhibited discordance between the MPS and CE results at DYS385, DYS439, and DYS576. There were 12 Y-STR loci that showed sequence variations in the alleles by a fragment size determination, and the most varied alleles occurred in DYS389II with a different sequence structure in the repeat region. The largest increase in gene diversity between the CE and MPS results was in DYS437 at +34.41%. Single nucleotide polymorphisms (SNPs), insertions, and deletions (indels) were observed in the flanking regions of DYS481, DYS576, and DYS385, respectively. Stutter and noise ratios of the 23 Y-STRs using the developed MPS system were also investigated. Based on these results, the MPS analysis system used in this study could facilitate the investigation into the sequences of the 23 Y-STRs in forensic
Discriminative prediction of mammalian enhancers from DNA sequence

PubMed Central

Lee, Dongwon; Karchin, Rachel; Beer, Michael A.

2011-01-01

Accurately predicting regulatory sequences and enhancers in entire genomes is an important but difficult problem, especially in large vertebrate genomes. With the advent of ChIP-seq technology, experimental detection of genome-wide EP300/CREBBP bound regions provides a powerful platform to develop predictive tools for regulatory sequences and to study their sequence properties. Here, we develop a support vector machine (SVM) framework which can accurately identify EP300-bound enhancers using only genomic sequence and an unbiased set of general sequence features. Moreover, we find that the predictive sequence features identified by the SVM classifier reveal biologically relevant sequence elements enriched in the enhancers, but we also identify other features that are significantly depleted in enhancers. The predictive sequence features are evolutionarily conserved and spatially clustered, providing further support of their functional significance. Although our SVM is trained on experimental data, we also predict novel enhancers and show that these putative enhancers are significantly enriched in both ChIP-seq signal and DNase I hypersensitivity signal in the mouse brain and are located near relevant genes. Finally, we present results of comparisons between other EP300/CREBBP data sets using our SVM and uncover sequence elements enriched and/or depleted in the different classes of enhancers. Many of these sequence features play a role in specifying tissue-specific or developmental-stage-specific enhancer activity, but our results indicate that some features operate in a general or tissue-independent manner. In addition to providing a high confidence list of enhancer targets for subsequent experimental investigation, these results contribute to our understanding of the general sequence structure of vertebrate enhancers. PMID:21875935
Neutrality and evolvability of designed protein sequences

NASA Astrophysics Data System (ADS)

Bhattacherjee, Arnab; Biswas, Parbati

2010-07-01

The effect of foldability on protein’s evolvability is analyzed by a two-prong approach consisting of a self-consistent mean-field theory and Monte Carlo simulations. Theory and simulation models representing protein sequences with binary patterning of amino acid residues compatible with a particular foldability criteria are used. This generalized foldability criterion is derived using the high temperature cumulant expansion approximating the free energy of folding. The effect of cumulative point mutations on these designed proteins is studied under neutral condition. The robustness, protein’s ability to tolerate random point mutations is determined with a selective pressure of stability (ΔΔG) for the theory designed sequences, which are found to be more robust than that of Monte Carlo and mean-field-biased Monte Carlo generated sequences. The results show that this foldability criterion selects viable protein sequences more effectively compared to the Monte Carlo method, which has a marked effect on how the selective pressure shapes the evolutionary sequence space. These observations may impact de novo sequence design and its applications in protein engineering.
SequenceCEROSENE: a computational method and web server to visualize spatial residue neighborhoods at the sequence level.

PubMed

Heinke, Florian; Bittrich, Sebastian; Kaiser, Florian; Labudde, Dirk

2016-01-01

To understand the molecular function of biopolymers, studying their structural characteristics is of central importance. Graphics programs are often utilized to conceive these properties, but with the increasing number of available structures in databases or structure models produced by automated modeling frameworks this process requires assistance from tools that allow automated structure visualization. In this paper a web server and its underlying method for generating graphical sequence representations of molecular structures is presented. The method, called SequenceCEROSENE (color encoding of residues obtained by spatial neighborhood embedding), retrieves the sequence of each amino acid or nucleotide chain in a given structure and produces a color coding for each residue based on three-dimensional structure information. From this, color-highlighted sequences are obtained, where residue coloring represent three-dimensional residue locations in the structure. This color encoding thus provides a one-dimensional representation, from which spatial interactions, proximity and relations between residues or entire chains can be deduced quickly and solely from color similarity. Furthermore, additional heteroatoms and chemical compounds bound to the structure, like ligands or coenzymes, are processed and reported as well. To provide free access to SequenceCEROSENE, a web server has been implemented that allows generating color codings for structures deposited in the Protein Data Bank or structure models uploaded by the user. Besides retrieving visualizations in popular graphic formats, underlying raw data can be downloaded as well. In addition, the server provides user interactivity with generated visualizations and the three-dimensional structure in question. Color encoded sequences generated by SequenceCEROSENE can aid to quickly perceive the general characteristics of a structure of interest (or entire sets of complexes), thus supporting the researcher in the initial
Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.

PubMed

Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

2015-01-01

Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.
Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.

PubMed

Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio

2017-10-06

Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.
Compact rotary sequencer

NASA Technical Reports Server (NTRS)

Appleberry, W. T.

1980-01-01

Rotary sequencer is assembled from conventional planetary differential gearset and latching mechanism utilizing inputs and outputs which are coaxial. Applications include automated production-line equipment in home appliances and in vehicles.
Numerical classification of coding sequences

NASA Technical Reports Server (NTRS)

Collins, D. W.; Liu, C. C.; Jukes, T. H.

1992-01-01

DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.
Simultaneous phylogeny reconstruction and multiple sequence alignment

PubMed Central

Yue, Feng; Shi, Jian; Tang, Jijun

2009-01-01

Background A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to affect the quality of the inferred phylogeny. At the same time, all the current multiple sequence alignment programs use a guide tree to produce the alignment and experiments showed that good guide trees can significantly improve the multiple alignment quality. Results We devise a new algorithm to simultaneously align multiple sequences and search for the phylogenetic tree that leads to the best alignment. We also implemented the algorithm as a C program package, which can handle both DNA and protein data and can take simple cost model as well as complex substitution matrices, such as PAM250 or BLOSUM62. The performance of the new method are compared with those from other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. Experimental results suggest that this method has good performance in terms of both phylogeny accuracy and alignment quality. Conclusion We present an algorithm to align multiple sequences and reconstruct the phylogenies that minimize the alignment score, which is based on an efficient algorithm to solve the median problems for three sequences. Our extensive experiments suggest that this method is very promising and can produce high quality phylogenies and alignments. PMID:19208110
Deep Sequencing to Identify the Causes of Viral Encephalitis

PubMed Central

Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.

2014-01-01

Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691

Sequence conservation on the Y chromosome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gibson, L.H.; Yang-Feng, L.; Lau, C.

The Y chromosome is present in all mammals and is considered to be essential to sex determination. Despite intense genomic research, only a few genes have been identified and mapped to this chromosome in humans. Several of them, such as SRY and ZFY, have been demonstrated to be conserved and Y-located in other mammals. In order to address the issue of sequence conservation on the Y chromosome, we performed fluorescence in situ hybridization (FISH) with DNA from a human Y cosmid library as a probe to study the Y chromosomes from other mammalian species. Total DNA from 3,000-4,500 cosmid poolsmore » were labeled with biotinylated-dUTP and hybridized to metaphase chromosomes. For human and primate preparations, human cot1 DNA was included in the hybridization mixture to suppress the hybridization from repeat sequences. FISH signals were detected on the Y chromosomes of human, gorilla, orangutan and baboon (Old World monkey) and were absent on those of squirrel monkey (New World monkey), Indian munjac, wood lemming, Chinese hamster, rat and mouse. Since sequence analysis suggested that specific genes, e.g. SRY and ZFY, are conserved between these two groups, the lack of detectable hybridization in the latter group implies either that conservation of the human Y sequences is limited to the Y chromosomes of the great apes and Old World monkeys, or that the size of the syntenic segment is too small to be detected under the resolution of FISH, or that homologeous sequences have undergone considerable divergence. Further studies with reduced hybridization stringency are currently being conducted. Our results provide some clues as to Y-sequence conservation across species and demonstrate the limitations of FISH across species with total DNA sequences from a particular chromosome.« less
Fluorescent signatures for variable DNA sequences

PubMed Central

Rice, John E.; Reis, Arthur H.; Rice, Lisa M.; Carver-Brown, Rachel K.; Wangh, Lawrence J.

2012-01-01

Life abounds with genetic variations writ in sequences that are often only a few hundred nucleotides long. Rapid detection of these variations for identification of genetic diseases, pathogens and organisms has become the mainstay of molecular science and medicine. This report describes a new, highly informative closed-tube polymerase chain reaction (PCR) strategy for analysis of both known and unknown sequence variations. It combines efficient quantitative amplification of single-stranded DNA targets through LATE-PCR with sets of Lights-On/Lights-Off probes that hybridize to their target sequences over a broad temperature range. Contiguous pairs of Lights-On/Lights-Off probes of the same fluorescent color are used to scan hundreds of nucleotides for the presence of mutations. Sets of probes in different colors can be combined in the same tube to analyze even longer single-stranded targets. Each set of hybridized Lights-On/Lights-Off probes generates a composite fluorescent contour, which is mathematically converted to a sequence-specific fluorescent signature. The versatility and broad utility of this new technology is illustrated in this report by characterization of variant sequences in three different DNA targets: the rpoB gene of Mycobacterium tuberculosis, a sequence in the mitochondrial cytochrome C oxidase subunit 1 gene of nematodes and the V3 hypervariable region of the bacterial 16 s ribosomal RNA gene. We anticipate widespread use of these technologies for diagnostics, species identification and basic research. PMID:22879378
Computational Identification Of CDR3 Sequence Archetypes Among Immunoglobulin Sequences in Chronic Lymphocytic Leukemia

PubMed Central

Messmer, Bradley T; Raphael, Benjamin J; Aerni, Sarah J; Widhopf, George F; Rassenti, Laura Z; Gribben, John G; Kay, Neil E; Kipps, Thomas J

2009-01-01

The leukemia cells of unrelated patients with chronic lymphocytic leukemia (CLL) display a restricted repertoire of immunoglobulin (Ig) gene rearrangements with preferential usage of certain Ig gene segments. We developed a computational method to rigorously quantify biases in Ig sequence similarity in large patient databases and to identify groups of patients with unusual levels of sequence similarity. We applied our method to sequences from 1577 CLL patients through the CLL Research Consortium (CRC), and identified 67 similarity groups into which roughly 20% of all patients could be assigned. Immunoglobulin light chain class was highly correlated within all groups and light chain gene usage was similar within sets. Surprisingly, over 40% of the identified groups were composed of somatically mutated genes. This study significantly expands the evidence that antigen selection shapes the Ig repertoire in CLL. PMID:18640719
Sequence memory based on coherent spin-interaction neural networks.

PubMed

Xia, Min; Wong, W K; Wang, Zhijie

2014-12-01

Sequence information processing, for instance, the sequence memory, plays an important role on many functions of brain. In the workings of the human brain, the steady-state period is alterable. However, in the existing sequence memory models using heteroassociations, the steady-state period cannot be changed in the sequence recall. In this work, a novel neural network model for sequence memory with controllable steady-state period based on coherent spininteraction is proposed. In the proposed model, neurons fire collectively in a phase-coherent manner, which lets a neuron group respond differently to different patterns and also lets different neuron groups respond differently to one pattern. The simulation results demonstrating the performance of the sequence memory are presented. By introducing a new coherent spin-interaction sequence memory model, the steady-state period can be controlled by dimension parameters and the overlap between the input pattern and the stored patterns. The sequence storage capacity is enlarged by coherent spin interaction compared with the existing sequence memory models. Furthermore, the sequence storage capacity has an exponential relationship to the dimension of the neural network.
Towards predicting the encoding capability of MR fingerprinting sequences.

PubMed

Sommer, K; Amthor, T; Doneva, M; Koken, P; Meineke, J; Börnert, P

2017-09-01

Sequence optimization and appropriate sequence selection is still an unmet need in magnetic resonance fingerprinting (MRF). The main challenge in MRF sequence design is the lack of an appropriate measure of the sequence's encoding capability. To find such a measure, three different candidates for judging the encoding capability have been investigated: local and global dot-product-based measures judging dictionary entry similarity as well as a Monte Carlo method that evaluates the noise propagation properties of an MRF sequence. Consistency of these measures for different sequence lengths as well as the capability to predict actual sequence performance in both phantom and in vivo measurements was analyzed. While the dot-product-based measures yielded inconsistent results for different sequence lengths, the Monte Carlo method was in a good agreement with phantom experiments. In particular, the Monte Carlo method could accurately predict the performance of different flip angle patterns in actual measurements. The proposed Monte Carlo method provides an appropriate measure of MRF sequence encoding capability and may be used for sequence optimization. Copyright © 2017 Elsevier Inc. All rights reserved.
On the Delta Sequence of the Thue-Morse Sequence

DTIC Science & Technology

2007-02-27

S. Plouffe, B.E. Sagan, A relative of the Thue-Morse sequence, in For- mal power series and algebraic combinatorics (Montreal, PQ, 1992), Discrete ... Math . 139, 455–461, 1995. [2] J.-P. Allouche, J. Shallit, The ubiquitous Prouhet-Thue-Morse se- quence, In C. Ding, T. Helleseth,and H. Niederreiter
SeqCompress: an algorithm for biological sequence compression.

PubMed

Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz; Bajwa, Hassan

2014-10-01

The growth of Next Generation Sequencing technologies presents significant research challenges, specifically to design bioinformatics tools that handle massive amount of data efficiently. Biological sequence data storage cost has become a noticeable proportion of total cost in the generation and analysis. Particularly increase in DNA sequencing rate is significantly outstripping the rate of increase in disk storage capacity, which may go beyond the limit of storage capacity. It is essential to develop algorithms that handle large data sets via better memory management. This article presents a DNA sequence compression algorithm SeqCompress that copes with the space complexity of biological sequences. The algorithm is based on lossless data compression and uses statistical model as well as arithmetic coding to compress DNA sequences. The proposed algorithm is compared with recent specialized compression tools for biological sequences. Experimental results show that proposed algorithm has better compression gain as compared to other existing algorithms. Copyright © 2014 Elsevier Inc. All rights reserved.
Solid phase sequencing of double-stranded nucleic acids

DOEpatents

Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

2002-01-01

This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.
RSAT 2015: Regulatory Sequence Analysis Tools.

PubMed

Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2015-07-01

RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Continuous aesthetic judgment of image sequences.

PubMed

Khaw, Mel W; Freedberg, David

2018-05-18

Perceptual judgments are said to be reference-dependent as they change on the basis of recent experiences. Here we quantify sequence effects within two types of aesthetic judgments: (i) individual ratings of single images (during self-paced trials) and (ii) continuous ratings of image sequences. As in the case of known contrast effects, trial-by-trial aesthetic responses are negatively correlated with judgments made toward the preceding image. During continuous judgment, a different type of bias is observed. The onset of change within a sequence introduces a persistent increase in ratings (relative to when the same images are judged in isolation). Furthermore, subjects indicate adjustment patterns and choices that selectively favor sequences that are rich in change. Sequence effects in aesthetic judgments thus differ greatly depending on the continuity and arrangement of presented stimuli. The effects highlighted here are important in understanding sustained aesthetic responses over time, such as those elicited during choreographic and musical arrangements. In contrast, standard measurements of aesthetic responses (over trials) may represent a series of distinct aesthetic experiences (e.g., viewing artworks in a museum). Copyright © 2018 Elsevier B.V. All rights reserved.
SNAD: Sequence Name Annotation-based Designer.

PubMed

Sidorov, Igor A; Reshetov, Denis A; Gorbalenya, Alexander E

2009-08-14

A growing diversity of biological data is tagged with unique identifiers (UIDs) associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Here we introduce SNAD (Sequence Name Annotation-based Designer) that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list) into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.
Social intuition as a form of implicit learning: Sequences of body movements are learned less explicitly than letter sequences

PubMed Central

Norman, Elisabeth; Price, Mark C.

2012-01-01

In the current paper, we first evaluate the suitability of traditional serial reaction time (SRT) and artificial grammar learning (AGL) experiments for measuring implicit learning of social signals. We then report the results of a novel sequence learning task which combines aspects of the SRT and AGL paradigms to meet our suggested criteria for how implicit learning experiments can be adapted to increase their relevance to situations of social intuition. The sequences followed standard finite-state grammars. Sequence learning and consciousness of acquired knowledge were compared between 2 groups of 24 participants viewing either sequences of individually presented letters or sequences of body-posture pictures, which were described as series of yoga movements. Participants in both conditions showed above-chance classification accuracy, indicating that sequence learning had occurred in both stimulus conditions. This shows that sequence learning can still be found when learning procedures reflect the characteristics of social intuition. Rule awareness was measured using trial-by-trial evaluation of decision strategy (Dienes & Scott, 2005; Scott & Dienes, 2008). For letters, sequence classification was best on trials where participants reported responding on the basis of explicit rules or memory, indicating some explicit learning in this condition. For body-posture, classification was not above chance on these types of trial, but instead showed a trend to be best on those trials where participants reported that their responses were based on intuition, familiarity, or random choice, suggesting that learning was more implicit. Results therefore indicate that the use of traditional stimuli in research on sequence learning might underestimate the extent to which learning is implicit in domains such as social learning, contributing to ongoing debate about levels of conscious awareness in implicit learning. PMID:22679467
AlignMe—a membrane protein sequence alignment web server

PubMed Central

Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.

2014-01-01

We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425
Alternation blindness in the representation of binary sequences.

PubMed

Yu, Ru Qi; Osherson, Daniel; Zhao, Jiaying

2018-03-01

Binary information is prevalent in the environment and contains 2 distinct outcomes. Binary sequences consist of a mixture of alternation and repetition. Understanding how people perceive such sequences would contribute to a general theory of information processing. In this study, we examined how people process alternation and repetition in binary sequences. Across 4 paradigms involving estimation, working memory, change detection, and visual search, we found that the number of alternations is underestimated compared with repetitions (Experiment 1). Moreover, recall for binary sequences deteriorates as the sequence alternates more (Experiment 2). Changes in bits are also harder to detect as the sequence alternates more (Experiment 3). Finally, visual targets superimposed on bits of a binary sequence take longer to process as alternation increases (Experiment 4). Overall, our results indicate that compared with repetition, alternation in a binary sequence is less salient in the sense of requiring more attention for successful encoding. The current study thus reveals the cognitive constraints in the representation of alternation and provides a new explanation for the overalternation bias in randomness perception. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Comparison of MR imaging sequences for liver and head and neck interventions: is there a single optimal sequence for all purposes?

PubMed

Boll, Daniel T; Lewin, Jonathan S; Duerk, Jeffrey L; Aschoff, Andrik J; Merkle, Elmar M

2004-05-01

To compare the appropriate pulse sequences for interventional device guidance during magnetic resonance (MR) imaging at 0.2 T and to evaluate the dependence of sequence selection on the anatomic region of the procedure. Using a C-arm 0.2 T system, four interventional MR sequences were applied in 23 liver cases and during MR-guided neck interventions in 13 patients. The imaging protocol consisted of: multislice turbo spin echo (TSE) T2w, sequential-slice fast imaging with steady precession (FISP), a time-reversed version of FISP (PSIF), and FISP with balanced gradients in all spatial directions (True-FISP) sequences. Vessel conspicuity was rated and contrast-to-noise ratio (CNR) was calculated for each sequence and a differential receiver operating characteristic was performed. Liver findings were detected in 96% using the TSE sequence. PSIF, FISP, and True-FISP imaging showed lesions in 91%, 61%, and 65%, respectively. The TSE sequence offered the best CNR, followed by PSIF imaging. Differential receiver operating characteristic analysis also rated TSE and PSIF to be the superior sequences. Lesions in the head and neck were detected in all cases by TSE and FISP, in 92% using True-FISP, and in 84% using PSIF. True-FISP offered the best CNR, followed by TSE imaging. Vessels appeared bright on FISP and True-FISP imaging and dark on the other sequences. In interventional MR imaging, no single sequence fits all purposes. Image guidance for interventional MR during liver procedures is best achieved by PSIF or TSE, whereas biopsies in the head and neck are best performed using FISP or True-FISP sequences.
Application of Stochastic Labeling with Random-Sequence Barcodes for Simultaneous Quantification and Sequencing of Environmental 16S rRNA Genes.

PubMed

Hoshino, Tatsuhiko; Inagaki, Fumio

2017-01-01

Next-generation sequencing (NGS) is a powerful tool for analyzing environmental DNA and provides the comprehensive molecular view of microbial communities. For obtaining the copy number of particular sequences in the NGS library, however, additional quantitative analysis as quantitative PCR (qPCR) or digital PCR (dPCR) is required. Furthermore, number of sequences in a sequence library does not always reflect the original copy number of a target gene because of biases caused by PCR amplification, making it difficult to convert the proportion of particular sequences in the NGS library to the copy number using the mass of input DNA. To address this issue, we applied stochastic labeling approach with random-tag sequences and developed a NGS-based quantification protocol, which enables simultaneous sequencing and quantification of the targeted DNA. This quantitative sequencing (qSeq) is initiated from single-primer extension (SPE) using a primer with random tag adjacent to the 5' end of target-specific sequence. During SPE, each DNA molecule is stochastically labeled with the random tag. Subsequently, first-round PCR is conducted, specifically targeting the SPE product, followed by second-round PCR to index for NGS. The number of random tags is only determined during the SPE step and is therefore not affected by the two rounds of PCR that may introduce amplification biases. In the case of 16S rRNA genes, after NGS sequencing and taxonomic classification, the absolute number of target phylotypes 16S rRNA gene can be estimated by Poisson statistics by counting random tags incorporated at the end of sequence. To test the feasibility of this approach, the 16S rRNA gene of Sulfolobus tokodaii was subjected to qSeq, which resulted in accurate quantification of 5.0 × 103 to 5.0 × 104 copies of the 16S rRNA gene. Furthermore, qSeq was applied to mock microbial communities and environmental samples, and the results were comparable to those obtained using digital PCR and
Flavitrack: an annotated database of flavivirus sequences

PubMed Central

Misra, Milind

2009-01-01

Motivation Properly annotated sequence data for flaviviruses, which cause diseases, such as tick-borne encephalitis (TBE), dengue fever (DF), West Nile (WN) and yellow fever (YF), can aid in the design of antiviral drugs and vaccines to prevent their spread. Flavitrack was designed to help identify conserved sequence motifs, interpret mutational and structural data and track evolution of phenotypic properties. Summary Flavitrack contains over 590 complete flavivirus genome/protein sequences and information on known mutations and literature references. Each sequence has been manually annotated according to its date and place of isolation, phenotype and lethality. Internal tools are provided to rapidly determine relationships between viruses in Flavitrack and sequences provided by the user. Availability http://carnot.utmb.edu/flavitrack Contact chschein@utmb.edu Supplementary information http://carnot.utmb.edu/flavitrack/B1S1.html PMID:17660525
Initial retrieval sequence and blending strategy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pemwell, D.L.; Grenard, C.E.

1996-09-01

This report documents the initial retrieval sequence and the methodology used to select it. Waste retrieval, storage, pretreatment and vitrification were modeled for candidate single-shell tank retrieval sequences. Performance of the sequences was measured by a set of metrics (for example,high-level waste glass volume, relative risk and schedule).Computer models were used to evaluate estimated glass volumes,process rates, retrieval dates, and blending strategy effects.The models were based on estimates of component inventories and concentrations, sludge wash factors and timing, retrieval annex limitations, etc.
Adenine specific DNA chemical sequencing reaction.

PubMed Central

Iverson, B L; Dervan, P B

1987-01-01

Reaction of DNA with K2PdCl4 at pH 2.0 followed by a piperidine workup produces specific cleavage at adenine (A) residues. Product analysis revealed the K2PdCl4 reaction involves selective depurination at adenine, affording an excision reaction analogous to the other chemical DNA sequencing reactions. Adenine residues methylated at the exocyclic amine (N6) react with lower efficiency than unmethylated adenine in an identical sequence. This simple protocol specific for A may be a useful addition to current chemical sequencing reactions. Images PMID:3671067
Spreadsheet macros for coloring sequence alignments.

PubMed

Haygood, M G

1993-12-01

This article describes a set of Microsoft Excel macros designed to color amino acid and nucleotide sequence alignments for review and preparation of visual aids. The colored alignments can then be modified to emphasize features of interest. Procedures for importing and coloring sequences are described. The macro file adds a new menu to the menu bar containing sequence-related commands to enable users unfamiliar with Excel to use the macros more readily. The macros were designed for use with Macintosh computers but will also run with the DOS version of Excel.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.