Science.gov

Sample records for activity sequence analysis

  1. High-resolution mapping and transcriptional activity analysis of chicken centromere sequences on giant lampbrush chromosomes.

    PubMed

    Krasikova, Alla; Fukagawa, Tatsuo; Zlotina, Anna

    2012-12-01

    Exploration into morphofunctional organisation of centromere DNA sequences is important for understanding the mechanisms of kinetochore specification and assembly. In-depth epigenetic analysis of DNA fragments associated with centromeric nucleosome proteins has demonstrated unique features of centromere organisation in chicken karyotype: there are both mature centromeres, which comprise chromosome-specific homogeneous arrays of tandem repeats, and recently evolved primitive centromeres, which consist of non-tandemly organised DNA sequences. In this work, we describe the arrangement and transcriptional activity of chicken centromere repeats for Cen1, Cen2, Cen3, Cen4, Cen7, Cen8, and Cen11 and non-repetitive centromere sequences of chromosomes 5, 27, and Z using highly elongated lampbrush chromosomes, which are characteristic of the diplotene stage of oogenesis. The degree of chromatin packaging and fine spatial organisations of tandemly repetitive and non-tandemly repetitive centromeric sequences significantly differ at the lampbrush stage. Using DNA/RNA FISH, we have demonstrated that during the lampbrush stage, DNA sequences are transcribed within the centromere regions of chromosomes that lack centromere-specific tandem repeats. In contrast, chromosome-specific centromeric repeats Cen1, Cen2, Cen3, Cen4, Cen7, Cen8, and Cen11 do not demonstrate any transcriptional activity during the lampbrush stage. In addition, we found that CNM repeat cluster localises adjacent to non-repetitive centromeric sequences in chicken microchromosome 27 indicating that centromere region in this chromosome is repeat-rich. Cross-species FISH allowed localisation of the sequences homologous to centromeric DNA of chicken chromosomes 5 and 27 in centromere regions of quail orthologous chromosomes.

  2. Analysis of the relationship between ribosomal DNA ITS sequences and active components in Rhodiola plants.

    PubMed

    Zhang, D J; Yuan, W T; Li, M T; Zhang, Y H

    2016-12-23

    Rhodiola plants are a valuable resource in traditional Chinese medicine. The objective of this study was to evaluate the correlation between ribosomal DNA internal transcribed spacer (ITS) sequences and the three active components in Rhodiola plants. For this, we determined ITS sequence polymorphisms and the concentrations of active components salidroside, tyrosol, and gallic acid in different Rhodiola species from the Tibetan Plateau. In a total of 23 Rhodiola samples, 16 different haplotypes were defined based on their ITS sequences. Analysis of the active components in these same samples revealed that salidroside was not detected in species with haplotypes H4, H5, or H10, tyrosol was not detected with haplotypes H3, H5, H7, H10, H14, or H15, and gallic acid was detected in with all haplotypes except H14 and H15. In addition, the concentrations of salidroside, tyrosol and gallic acid varied between samples with different haplotypes as well as those with the same haplotype, implying that no significant correlation exists between haplotype and salidroside, tyrosol or gallic acid concentrations. However, a statistically significant positive correlation was observed for among these three active components.

  3. Sequence analysis, expression, and binding activity of recombinant major outer sheath protein (Msp) of Treponema denticola.

    PubMed Central

    Fenno, J C; Müller, K H; McBride, B C

    1996-01-01

    The gene encoding the major outer sheath protein (Msp) of the oral spirochete Treponema denticola ATCC 35405 was cloned, sequenced, and expressed in Escherichia coli. Preliminary sequence analysis showed that the 5' end of the msp gene was not present on the 5.5-kb cloned fragment described in a recent study (M. Haapasalo, K. H. Müller, V. J. Uitto, W. K. Leung, and B. C. McBride, Infect. Immun. 60:2058-2065,1992). The 5' end of msp was obtained by PCR amplification from a T. denticola genomic library, and an open reading frame of 1,629 bp was identified as the coding region for Msp by combining overlapping sequences. The deduced peptide consisted of 543 amino acids and had a molecular mass of 58,233 Da. The peptide had a typical prokaryotic signal sequence with a potential cleavage site for signal peptidase 1. Northern (RNA) blot analysis showing the msp transcript to be approximately 1.7 kb was consistent with the identification of a promoter consensus sequence located optimally upstream of msp and a transcription termination signal found downstream of the stop codon. The entire msp sequence was amplified from T. denticola genomic DNA and cloned in E. coli by using a tightly regulated T7 RNA polymerase vector system. Expression of Msp was toxic to E. coli when the entire msp gene was present. High levels of Msp were produced as inclusion bodies when the putative signal peptide sequence was deleted and replaced by a vector-encoded T7 peptide sequence. Recombinant Msp purified to homogeneity from a clone containing the full-length msp gene adhered to immobilized laminin and fibronectin but not to bovine serum albumin. Attachment of recombinant Msp was decreased in the presence of soluble substrate. Attachment of T. denticola to immobilized laminin and fibronectin was increased by pretreatment of the substrate with recombinant Msp. These studies lend further support to the hypothesis that Msp mediates the extracellular matrix binding activity of T. denticola. PMID

  4. Qualitative analysis of sequence specific binding of flavones to DNA using restriction endonuclease activity assays.

    PubMed

    Duran, Elizabeth; Ramsauer, Victoria P; Ballester, Maria; Torrenegra, Ruben D; Rodriguez, Oscar E; Winkle, Stephen A

    2013-08-01

    Flavones, found in nature as secondary plant metabolites, have shown efficacy as anti-cancer agents. We have examined the binding of two flavones, 5,7-dihydroxy-3,6,8-trimethoxy-2-phenyl-4H-chromen-4-one (5,7-dihydroxy-3,6,8-trimethoxy flavone; FlavA) and 3,5-dihydroxy-6,7,8-trimethoxy-2-phenyl-4H-chromen-4-one (3,5-dihydroxy-6,7,8-trimethoxy flavone; FlavB), to phiX174 RF DNA using restriction enzyme activity assays employing the restriction enzymes Alw44, AvaII, BssHII, DraI, MluI, NarI, NciI, NruI, PstI, and XhoI. These enzymes possess differing target and flanking sequences allowing for observation of sequence specificity analysis. Using restriction enzymes that cleave once with a mixture of supercoiled and relaxed DNA substrates provides for observation of topological effects on binding. FlavA and FlavB show differing sequence specificities in their respective binding to phiX. For example, with relaxed DNA, FlavA shows inhibition of cleavage with DraI (reaction site (5') TTTAAA) but not BssHII ((5') GCGCGC) while FlavB shows the opposite results. Evidence for tolological specificity is also observed, Molecular modeling and conformational analysis of the flavones suggests that the phenyl ring of FlavB is coplanar with the flavonoid ring while the phenyl ring of FlavA is at an angle relative to the flavonoid ring. This may account for aspects of the observed sequence and topological specificities in the effects on restriction enzyme activity.

  5. Cloning and sequence analysis of a cDNA clone coding for the mouse GM2 activator protein.

    PubMed Central

    Bellachioma, G; Stirling, J L; Orlacchio, A; Beccari, T

    1993-01-01

    A cDNA (1.1 kb) containing the complete coding sequence for the mouse GM2 activator protein was isolated from a mouse macrophage library using a cDNA for the human protein as a probe. There was a single ATG located 12 bp from the 5' end of the cDNA clone followed by an open reading frame of 579 bp. Northern blot analysis of mouse macrophage RNA showed that there was a single band with a mobility corresponding to a size of 2.3 kb. We deduce from this that the mouse mRNA, in common with the mRNA for the human GM2 activator protein, has a long 3' untranslated sequence of approx. 1.7 kb. Alignment of the mouse and human deduced amino acid sequences showed 68% identity overall and 75% identity for the sequence on the C-terminal side of the first 31 residues, which in the human GM2 activator protein contains the signal peptide. Hydropathicity plots showed great similarity between the mouse and human sequences even in regions of low sequence similarity. There is a single N-glycosylation site in the mouse GM2 activator protein sequence (Asn151-Phe-Thr) which differs in its location from the single site reported in the human GM2 activator protein sequence (Asn63-Val-Thr). Images Figure 1 PMID:7689829

  6. Teacher Learning through Reciprocal Peer Coaching: An Analysis of Activity Sequences

    ERIC Educational Resources Information Center

    Zwart, R. C.; Wubbels, Th.; Bolhuis, S.; Bergen, Th. C. M.

    2008-01-01

    Just what and how eight experienced teachers in four coaching dyads learned during a 1-year reciprocal peer coaching trajectory was examined in the present study. The learning processes were mapped by providing a detailed description of reported learning activities, reported learning outcomes, and the relations between these two. The sequences of…

  7. Temporal Sequence of Hemispheric Network Activation during Semantic Processing: A Functional Network Connectivity Analysis

    ERIC Educational Resources Information Center

    Assaf, Michal; Jagannathan, Kanchana; Calhoun, Vince; Kraut, Michael; Hart, John, Jr.; Pearlson, Godfrey

    2009-01-01

    To explore the temporal sequence of, and the relationship between, the left and right hemispheres (LH and RH) during semantic memory (SM) processing we identified the neural networks involved in the performance of functional MRI semantic object retrieval task (SORT) using group independent component analysis (ICA) in 47 healthy individuals. SORT…

  8. Advances in sequence analysis.

    PubMed

    Califano, A

    2001-06-01

    In its early days, the entire field of computational biology revolved almost entirely around biological sequence analysis. Over the past few years, however, a number of new non-sequence-based areas of investigation have become mainstream, from the analysis of gene expression data from microarrays, to whole-genome association discovery, and to the reverse engineering of gene regulatory pathways. Nonetheless, with the completion of private and public efforts to map the human genome, as well as those of other organisms, sequence data continue to be a veritable mother lode of valuable biological information that can be mined in a variety of contexts. Furthermore, the integration of sequence data with a variety of alternative information is providing valuable and fundamentally new insight into biological processes, as well as an array of new computational methodologies for the analysis of biological data.

  9. Sequence Analysis and Characterization of Active Human Alu Subfamilies Based on the 1000 Genomes Pilot Project.

    PubMed

    Konkel, Miriam K; Walker, Jerilyn A; Hotard, Ashley B; Ranck, Megan C; Fontenot, Catherine C; Storer, Jessica; Stewart, Chip; Marth, Gabor T; Batzer, Mark A

    2015-08-29

    The goal of the 1000 Genomes Consortium is to characterize human genome structural variation (SV), including forms of copy number variations such as deletions, duplications, and insertions. Mobile element insertions, particularly Alu elements, are major contributors to genomic SV among humans. During the pilot phase of the project we experimentally validated 645 (611 intergenic and 34 exon targeted) polymorphic "young" Alu insertion events, absent from the human reference genome. Here, we report high resolution sequencing of 343 (322 unique) recent Alu insertion events, along with their respective target site duplications, precise genomic breakpoint coordinates, subfamily assignment, percent divergence, and estimated A-rich tail lengths. All the sequenced Alu loci were derived from the AluY lineage with no evidence of retrotransposition activity involving older Alu families (e.g., AluJ and AluS). AluYa5 is currently the most active Alu subfamily in the human lineage, followed by AluYb8, and many others including three newly identified subfamilies we have termed AluYb7a3, AluYb8b1, and AluYa4a1. This report provides the structural details of 322 unique Alu variants from individual human genomes collectively adding about 100 kb of genomic variation. Many Alu subfamilies are currently active in human populations, including a surprising level of AluY retrotransposition. Human Alu subfamilies exhibit continuous evolution with potential drivers sprouting new Alu lineages.

  10. Sequence Analysis and Characterization of Active Human Alu Subfamilies Based on the 1000 Genomes Pilot Project

    PubMed Central

    Konkel, Miriam K.; Walker, Jerilyn A.; Hotard, Ashley B.; Ranck, Megan C.; Fontenot, Catherine C.; Storer, Jessica; Stewart, Chip; Marth, Gabor T.; Batzer, Mark A.

    2015-01-01

    The goal of the 1000 Genomes Consortium is to characterize human genome structural variation (SV), including forms of copy number variations such as deletions, duplications, and insertions. Mobile element insertions, particularly Alu elements, are major contributors to genomic SV among humans. During the pilot phase of the project we experimentally validated 645 (611 intergenic and 34 exon targeted) polymorphic “young” Alu insertion events, absent from the human reference genome. Here, we report high resolution sequencing of 343 (322 unique) recent Alu insertion events, along with their respective target site duplications, precise genomic breakpoint coordinates, subfamily assignment, percent divergence, and estimated A-rich tail lengths. All the sequenced Alu loci were derived from the AluY lineage with no evidence of retrotransposition activity involving older Alu families (e.g., AluJ and AluS). AluYa5 is currently the most active Alu subfamily in the human lineage, followed by AluYb8, and many others including three newly identified subfamilies we have termed AluYb7a3, AluYb8b1, and AluYa4a1. This report provides the structural details of 322 unique Alu variants from individual human genomes collectively adding about 100 kb of genomic variation. Many Alu subfamilies are currently active in human populations, including a surprising level of AluY retrotransposition. Human Alu subfamilies exhibit continuous evolution with potential drivers sprouting new Alu lineages. PMID:26319576

  11. Magnetic Activity Analysis for a Sample of G-type Main Sequence Kepler Targets

    NASA Astrophysics Data System (ADS)

    Mehrabi, Ahmad; He, Han; Khosroshahi, Habib

    2017-01-01

    The variation of a stellar light curve owing to rotational modulation by magnetic features (starspots and faculae) on the star’s surface can be used to investigate the magnetic properties of the host star. In this paper, we use the periodicity and magnitude of the light-curve variation as two proxies to study the stellar magnetic properties for a large sample of G-type main sequence Kepler targets, for which the rotation periods were recently determined. By analyzing the correlation between the two magnetic proxies, it is found that: (1) the two proxies are positively correlated for most of the stars in our sample, and the percentages of negative, zero, and positive correlations are 4.27%, 6.81%, and 88.91%, respectively; (2) negative correlation stars cannot have a large magnitude of light-curve variation; and (3) with the increase of rotation period, the relative number of positive correlation stars decreases and the negative correlation one increases. These results indicate that stars with shorter rotation period tend to have positive correlation between the two proxies, and a good portion of the positive correlation stars have a larger magnitude of light-curve variation (and hence more intense magnetic activities) than negative correlation stars.

  12. An epistemological analysis of the evolution of didactical activities in teaching-learning sequences: the case of fluids

    NASA Astrophysics Data System (ADS)

    Psillos, D.

    2004-05-01

    In the present paper we propose a theoretical framework for an epistemological modelling of teaching-learning (didactical) activities, which draws on recent studies of scientific practice. We present and analyse the framework, which includes three categories: namely, Cosmos- Evidence-Ideas (CEI). We also apply this framework in order to model a posteriori the didactical activities included in three successive teaching-learning sequences in the field of fluids, developed gradually by the same researchers over several years under evolving dominant approaches to science teaching and learning (transmission, discovery, constructivist). For each sequence we analyse the planned activities included in student and teacher documents in terms of the CEI model. We deduce the suggested links (or lack of them) between the three categories and discuss the opportunities that students would have during science teaching to link in each sequence the world of theories with real things.

  13. Crystallographic Analysis of Rotavirus NSP2-RNA Complex Reveals Specific Recognition of 5′ GG Sequence for RTPase Activity

    PubMed Central

    Hu, Liya; Chow, Dar-Chone; Patton, John T.; Palzkill, Timothy; Estes, Mary K.

    2012-01-01

    Rotavirus nonstructural protein NSP2, a functional octamer, is critical for the formation of viroplasms, which are exclusive sites for replication and packaging of the segmented double-stranded RNA (dsRNA) rotavirus genome. As a component of replication intermediates, NSP2 is also implicated in various replication-related activities. In addition to sequence-independent single-stranded RNA-binding and helix-destabilizing activities, NSP2 exhibits monomer-associated nucleoside and 5′ RNA triphosphatase (NTPase/RTPase) activities that are mediated by a conserved H225 residue within a narrow enzymatic cleft. Lack of a 5′ γ-phosphate is a common feature of the negative-strand RNA [(−)RNA] of the packaged dsRNA segments in rotavirus. Strikingly, all (−)RNAs (of group A rotaviruses) have a 5′ GG dinucleotide sequence. As the only rotavirus protein with 5′ RTPase activity, NSP2 is implicated in the removal of the γ-phosphate from the rotavirus (−)RNA. To understand how NSP2, despite its sequence-independent RNA-binding property, recognizes (−)RNA to hydrolyze the γ-phosphate within the catalytic cleft, we determined a crystal structure of NSP2 in complex with the 5′ consensus sequence of minus-strand rotavirus RNA. Our studies show that the 5′ GG of the bound oligoribonucleotide interacts extensively with highly conserved residues in the NSP2 enzymatic cleft. Although these residues provide GG-specific interactions, surface plasmon resonance studies suggest that the C-terminal helix and other basic residues outside the enzymatic cleft account for sequence-independent RNA binding of NSP2. A novel observation from our studies, which may have implications in viroplasm formation, is that the C-terminal helix of NSP2 exhibits two distinct conformations and engages in domain-swapping interactions, which result in the formation of NSP2 octamer chains. PMID:22811529

  14. Image analysis for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Palaniappan, Kannappan; Huang, Thomas S.

    1991-07-01

    There is a great deal of interest in automating the process of DNA (deoxyribonucleic acid) sequencing to support the analysis of genomic DNA such as the Human and Mouse Genome projects. In one class of gel-based sequencing protocols autoradiograph images are generated in the final step and usually require manual interpretation to reconstruct the DNA sequence represented by the image. The need to handle a large volume of sequence information necessitates automation of the manual autoradiograph reading step through image analysis in order to reduce the length of time required to obtain sequence data and reduce transcription errors. Various adaptive image enhancement, segmentation and alignment methods were applied to autoradiograph images. The methods are adaptive to the local characteristics of the image such as noise, background signal, or presence of edges. Once the two-dimensional data is converted to a set of aligned one-dimensional profiles waveform analysis is used to determine the location of each band which represents one nucleotide in the sequence. Different classification strategies including a rule-based approach are investigated to map the profile signals, augmented with the original two-dimensional image data as necessary, to textual DNA sequence information.

  15. Cathepsin B from the white shrimp Litopenaeus vannamei: cDNA sequence analysis, tissues-specific expression and biological activity.

    PubMed

    Stephens, A; Rojo, L; Araujo-Bernal, S; Garcia-Carreño, F; Muhlia-Almazan, A

    2012-01-01

    Cathepsin B is a cystein proteinase scarcely studied in crustaceans. Its function has not been clearly described in shrimp species belonging to the sub-order Dendrobranchiata, which includes the white shrimp Litopenaeus vannamei and other species from the Penaeidae family. Studies on vertebrates suggest that these lysosomal enzymes intracellularly hydrolize protein, as other cystein proteinases. However, the expression of the gene encoding the shrimp cathepsin B in the midgut gland was affected by starvation in a similar way as other digestive proteinases which extracellularly hydrolyze food protein. In this study the white shrimp L. vannamei cathepsin B (LvCathB) cDNA was sequenced, and characterized. Its gene expression was evaluated in various shrimp tissues, and changes in the mRNA amounts were compared with those observed on other digestive proteinases from the midgut gland during starvation. By using qRT-PCR it was found that LvCathB is expressed in most shrimp tissues except in pleopods and eye stalk. Changes on LvCathB mRNA during starvation suggest that the enzyme participates during intracellular protein hydrolysis but also, after food ingestion, it participates in hydrolyzing food proteins extracellularly as confirmed by the high activity levels we found in the gastric juice and midgut gland of the white shrimp.

  16. Genome sequence and analysis of Lactobacillus helveticus

    PubMed Central

    Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca

    2013-01-01

    The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916

  17. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  18. Dissociation behavior of a bifunctional tempo-active ester reagent for peptide structure analysis by free radical initiated peptide sequencing (FRIPS) mass spectrometry.

    PubMed

    Ihling, Christian; Falvo, Francesco; Kratochvil, Isabel; Sinz, Andrea; Schäfer, Mathias

    2015-02-01

    We have synthesized a homobifunctional active ester cross-linking reagent containing a TEMPO (2,2,6,6-tetramethylpiperidine-1-oxy) moiety connected to a benzyl group (Bz), termed TEMPO-Bz-linker. The aim for designing this novel cross-linker was to facilitate MS analysis of cross-linked products by free radical initiated peptide sequencing (FRIPS). The TEMPO-Bz-linker was reacted with all 20 proteinogenic amino acids as well as with model peptides to gain detailed insights into its fragmentation mechanism upon collision activation. The final goal of this proof-of-principle study was to evaluate the potential of the TEMPO-Bz-linker for chemical cross-linking studies to derive 3D-structure information of proteins. Our studies were motivated by the well documented instability of the central NO-C bond of TEMPO-Bz reagents upon collision activation. The fragmentation of this specific bond was investigated in respect to charge states and amino acid composition of a large set of precursor ions resulting in the identification of two distinct fragmentation pathways. Molecular ions with highly basic residues are able to keep the charge carriers located, i.e. protons or sodium cations, and consequently decompose via a homolytic cleavage of the NO-C bond of the TEMPO-Bz-linker. This leads to the formation of complementary open-shell peptide radical cations, while precursor ions that are protonated at the TEMPO-Bz-linker itself exhibit a charge-driven formation of even-electron product ions upon collision activation. MS(3) product ion experiments provided amino acid sequence information and allowed determining the cross-linking site. Our study fully characterizes the CID behavior of the TEMPO-Bz-linker and demonstrates its potential, but also its limitations for chemical cross-linking applications utilizing the special features of open-shell peptide ions on the basis of selective tandem MS analysis.

  19. Clonality Analysis of Immunoglobulin Gene Rearrangement by Next-Generation Sequencing in Endemic Burkitt Lymphoma Suggests Antigen Drive Activation of BCR as Opposed to Sporadic Burkitt Lymphoma

    PubMed Central

    Amato, Teresa; Abate, Francesco; Piccaluga, Pierpaolo; Iacono, Michele; Fallerini, Chiara; Renieri, Alessandra; De Falco, Giulia; Ambrosio, Maria Raffaella; Mourmouras, Vaselious; Ogwang, Martin; Calbi, Valeria; Rabadan, Roul; Hummel, Michael; Pileri, Stefano; Bellan, Cristiana

    2016-01-01

    Objectives: Recent studies using next-generation sequencing (NGS) analysis disclosed the importance of the intrinsic activation of the B-cell receptor (BCR) pathway in the pathogenesis of sporadic Burkitt lymphoma (sBL) due to mutations of TCF3/ID3 genes. Since no definitive data are available on the genetic landscape of endemic Burkitt (eBL), we first assessed the mutation frequency of TCF3/ID3 in eBL compared with sBL and subsequently the somatic hypermutation status of the BCR to answer whether an extrinsic activation of BCR signaling could also be demonstrated in Burkitt lymphoma. Methods: We assessed the mutations of TCF3/ID3 by RNAseq and the BCR status by NGS analysis of the immunoglobulin genes (IGs). Results: We detected mutations of TCF3/ID3 in about 30% of the eBL cases. This rate is significantly lower than that detected in sBL (64%). The NGS analysis of IGs revealed intraclonal diversity, suggesting an active targeted somatic hypermutation process in eBL compared with sBL. Conclusions: These findings support the view that the antigenic pressure plays a key role in the pathogenetic pathways of eBL, which may be partially distinct from those driving sBL development. PMID:26712879

  20. RNA sequence analysis using covariance models.

    PubMed Central

    Eddy, S R; Durbin, R

    1994-01-01

    We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence family. We call these models 'covariance models'. A covariance model of tRNA sequences is an extremely sensitive and discriminative tool for searching for additional tRNAs and tRNA-related sequences in sequence databases. A model can be built automatically from an existing sequence alignment. We also describe an algorithm for learning a model and hence a consensus secondary structure from initially unaligned example sequences and no prior structural information. Models trained on unaligned tRNA examples correctly predict tRNA secondary structure and produce high-quality multiple alignments. The approach may be applied to any family of small RNA sequences. Images PMID:8029015

  1. An Epistemological Analysis of the Evolution of Didactical Activities in Teaching-Learning Sequences: The Case of Fluids. Special Issue

    ERIC Educational Resources Information Center

    Psillos, D.; Tselfes, Vassilis; Kariotoglou, Petros

    2004-01-01

    In the present paper we propose a theoretical framework for an epistemological modelling of teaching-learning (didactical) activities, which draws on recent studies of scientific practice. We present and analyse the framework, which includes three categories: namely, Cosmos-Evidence-Ideas (CEI). We also apply this framework in order to model a…

  2. Characterization and potential of three temperature ranges for hydrogen fermentation of cellulose by means of activity test and 16s rRNA sequence analysis.

    PubMed

    Gadow, Samir I; Jiang, Hongyu; Li, Yu-You

    2016-06-01

    A series of standardized activity experiments were performed to characterize three different temperature ranges of hydrogen fermentation from different carbon sources. 16S rRNA sequences analysis showed that the bacteria were close to Enterobacter genus in the mesophilic mixed culture (MMC) and Thermoanaerobacterium genus in the thermophilic and hyper-thermophilic mixed cultures (TMC and HMC). The MMC was able to utilize the glucose and cellulose to produce methane gas within a temperature range between 25 and 45 °C and hydrogen gas from 35 to 60°C. While, the TMC and HMC produced only hydrogen gas at all temperature ranges and the highest activity of 521.4mlH2/gVSSd was obtained by TMC. The thermodynamic analysis showed that more energy is consumed by hydrogen production from cellulose than from glucose. The experimental results could help to improve the economic feasibility of cellulosic biomass energy using three-phase technology to produce hythane.

  3. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    PubMed

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  4. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  5. Fractal analysis of DNA sequence data

    SciTech Connect

    Berthelsen, C.L.

    1993-01-01

    DNA sequence databases are growing at an almost exponential rate. New analysis methods are needed to extract knowledge about the organization of nucleotides from this vast amount of data. Fractal analysis is a new scientific paradigm that has been used successfully in many domains including the biological and physical sciences. Biological growth is a nonlinear dynamic process and some have suggested that to consider fractal geometry as a biological design principle may be most productive. This research is an exploratory study of the application of fractal analysis to DNA sequence data. A simple random fractal, the random walk, is used to represent DNA sequences. The fractal dimension of these walks is then estimated using the [open quote]sandbox method[close quote]. Analysis of 164 human DNA sequences compared to three types of control sequences (random, base-content matched, and dimer-content matched) reveals that long-range correlations are present in DNA that are not explained by base or dimer frequencies. The study also revealed that the fractal dimension of coding sequences was significantly lower than sequences that were primarily noncoding, indicating the presence of longer-range correlations in functional sequences. The multifractal spectrum is used to analyze fractals that are heterogeneous and have a different fractal dimension for subsets with different scalings. The multifractal spectrum of the random walks of twelve mitochondrial genome sequences was estimated. Eight vertebrate mtDNA sequences had uniformly lower spectra values than did four invertebrate mtDNA sequences. Thus, vertebrate mitochondria show significantly longer-range correlations than to invertebrate mitochondria. The higher multifractal spectra values for invertebrate mitochondria suggest a more random organization of the sequences. This research also includes considerable theoretical work on the effects of finite size, embedding dimension, and scaling ranges.

  6. Whole-Genome Sequencing in Outbreak Analysis

    PubMed Central

    Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.

    2015-01-01

    SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  7. Scale-PC shielding analysis sequences

    SciTech Connect

    Bowman, S.M.

    1996-05-01

    The SCALE computational system is a modular code system for analyses of nuclear fuel facility and package designs. With the release of SCALE-PC Version 4.3, the radiation shielding analysis community now has the capability to execute the SCALE shielding analysis sequences contained in the control modules SAS1, SAS2, SAS3, and SAS4 on a MS- DOS personal computer (PC). In addition, SCALE-PC includes two new sequences, QADS and ORIGEN-ARP. The capabilities of each sequence are presented, along with example applications.

  8. Next generation sequencing analysis of miRNAs: MiR-127-3p inhibits glioblastoma proliferation and activates TGF-β signaling by targeting SKI.

    PubMed

    Jiang, Huawei; Jin, Chengmeng; Liu, Jie; Hua, Dasong; Zhou, Fan; Lou, Xiaoyan; Zhao, Na; Lan, Qing; Huang, Qiang; Yoon, Jae-Geun; Zheng, Shu; Lin, Biaoyang

    2014-03-01

    Glioblastoma (GBM) proliferation is a multistep process during which the expression levels of many genes that control cell proliferation, cell death, and genetic stability are altered. MicroRNAs (miRNAs) are emerging as important modulators of cellular signaling, including cell proliferation in cancer. In this study, using next generation sequencing analysis of miRNAs, we found that miR-127-3p was downregulated in GBM tissues compared with normal brain tissues; we validated this result by RT-PCR. We further showed that DNA demethylation and histone deacetylase inhibition resulted in downregulation of miR-127-3p. We demonstrated that miR-127-3p overexpression inhibited GBM cell growth by inducing G1-phase arrest both in vitro and in vivo. We showed that miR-127-3p targeted SKI (v-ski sarcoma viral oncogene homolog [avian]), RGMA (RGM domain family, member A), ZWINT (ZW10 interactor, kinetochore protein), SERPINB9 (serpin peptidase inhibitor, clade B [ovalbumin], member 9), and SFRP1 (secreted frizzled-related protein 1). Finally, we found that miR-127-3p suppressed GBM cell growth by inhibiting tumor-promoting SKI and activating the tumor suppression effect of transforming growth factor-β (TGF-β) signaling. This study showed, for the first time, that miR-127-3p and its targeted gene SKI, play important roles in GBM and may serve as potential targets for GBM therapy.

  9. Optimal rotation sequences for active perception

    NASA Astrophysics Data System (ADS)

    Nakath, David; Rachuy, Carsten; Clemens, Joachim; Schill, Kerstin

    2016-05-01

    One major objective of autonomous systems navigating in dynamic environments is gathering information needed for self localization, decision making, and path planning. To account for this, such systems are usually equipped with multiple types of sensors. As these sensors often have a limited field of view and a fixed orientation, the task of active perception breaks down to the problem of calculating alignment sequences which maximize the information gain regarding expected measurements. Action sequences that rotate the system according to the calculated optimal patterns then have to be generated. In this paper we present an approach for calculating these sequences for an autonomous system equipped with multiple sensors. We use a particle filter for multi- sensor fusion and state estimation. The planning task is modeled as a Markov decision process (MDP), where the system decides in each step, what actions to perform next. The optimal control policy, which provides the best action depending on the current estimated state, maximizes the expected cumulative reward. The latter is computed from the expected information gain of all sensors over time using value iteration. The algorithm is applied to a manifold representation of the joint space of rotation and time. We show the performance of the approach in a spacecraft navigation scenario where the information gain is changing over time, caused by the dynamic environment and the continuous movement of the spacecraft

  10. Auditory sequence analysis and phonological skill

    PubMed Central

    Grube, Manon; Kumar, Sukhbinder; Cooper, Freya E.; Turton, Stuart; Griffiths, Timothy D.

    2012-01-01

    This work tests the relationship between auditory and phonological skill in a non-selected cohort of 238 school students (age 11) with the specific hypothesis that sound-sequence analysis would be more relevant to phonological skill than the analysis of basic, single sounds. Auditory processing was assessed across the domains of pitch, time and timbre; a combination of six standard tests of literacy and language ability was used to assess phonological skill. A significant correlation between general auditory and phonological skill was demonstrated, plus a significant, specific correlation between measures of phonological skill and the auditory analysis of short sequences in pitch and time. The data support a limited but significant link between auditory and phonological ability with a specific role for sound-sequence analysis, and provide a possible new focus for auditory training strategies to aid language development in early adolescence. PMID:22951739

  11. Optimizing cancer genome sequencing and analysis

    PubMed Central

    Griffith, Malachi; Miller, Christopher A.; Griffith, Obi L.; Krysiak, Kilannin; Skidmore, Zachary L.; Ramu, Avinash; Walker, Jason R.; Dang, Ha X.; Trani, Lee; Larson, David E.; Demeter, Ryan T.; Wendl, Michael C.; McMichael, Joshua F.; Austin, Rachel E.; Magrini, Vincent; McGrath, Sean D.; Ly, Amy; Kulkarni, Shashikant; Cordes, Matthew G.; Fronick, Catrina C.; Fulton, Robert S.; Maher, Christopher A.; Ding, Li; Klco, Jeffery M.; Mardis, Elaine R.; Ley, Timothy J.; Wilson, Richard K.

    2015-01-01

    Summary Tumors are typically sequenced to depths of 75–100× (exome) or 30–50× (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312×) whole genome sequencing (WGS) and exome capture (up to ~433×) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ~200,000 putative SNVs by sequencing them to depths of ~1,000×. Additional targeted sequencing provided over 10,000× coverage and ddPCR assays provided up to ~250,000× sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP accession id phs000159). PMID:26645048

  12. RSAT 2015: Regulatory Sequence Analysis Tools

    PubMed Central

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-01-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  13. RSAT 2015: Regulatory Sequence Analysis Tools.

    PubMed

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/.

  14. Phylogenetic analysis of adenovirus sequences.

    PubMed

    Harrach, Balázs; Benko, Mária

    2007-01-01

    Members of the family Adenoviridae have been isolated from a large variety of hosts, including representatives from every major vertebrate class from fish to mammals. The high prevalence, together with the fairly conserved organization of the central part of their genomes, make the adenoviruses one of (if not the) best models for studying viral evolution on a larger time scale. Phylogenetic calculation can infer the evolutionary distance among adenovirus strains on serotype, species, and genus levels, thus helping the establishment of a correct taxonomy on the one hand, and speeding up the process of typing new isolates on the other. Initially, four major lineages corresponding to four genera were recognized. Later, the demarcation criteria of lower taxon levels, such as species or types, could also be defined with phylogenetic calculations. A limited number of possible host switches have been hypothesized and convincingly supported. Application of the web-based BLAST and MultAlin programs and the freely available PHYLIP package, along with the TreeView program, enables everyone to make correct calculations. In addition to step-by-step instruction on how to perform phylogenetic analysis, critical points where typical mistakes or misinterpretation of the results might occur will be identified and hints for their avoidance will be provided.

  15. Sequence analysis by iterated maps, a review.

    PubMed

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  16. Dissociation Behavior of a TEMPO-Active Ester Cross-Linker for Peptide Structure Analysis by Free Radical Initiated Peptide Sequencing (FRIPS) in Negative ESI-MS

    NASA Astrophysics Data System (ADS)

    Hage, Christoph; Ihling, Christian H.; Götze, Michael; Schäfer, Mathias; Sinz, Andrea

    2017-01-01

    We have synthesized a homobifunctional amine-reactive cross-linking reagent, containing a TEMPO (2,2,6,6-tetramethylpiperidine-1-oxy) and a benzyl group (Bz), termed TEMPO-Bz-linker, to derive three-dimensional structural information of proteins. The aim for designing this novel cross-linker was to facilitate the mass spectrometric analysis of cross-linked products by free radical initiated peptide sequencing (FRIPS). In an initial study, we had investigated the fragmentation behavior of TEMPO-Bz-derivatized peptides upon collision activation in (+)-electrospray ionization collision-induced dissociation tandem mass spectrometry (ESI-CID-MS/MS) experiments. In addition to the homolytic NO-C bond cleavage FRIPS pathway delivering the desired odd-electron product ions, an alternative heterolytic NO-C bond cleavage, resulting in even-electron product ions mechanism was found to be relevant. The latter fragmentation route clearly depends on the protonation of the TEMPO-Bz-moiety itself, which motivated us to conduct (-)-ESI-MS, CID-MS/MS, and MS3 experiments of TEMPO-Bz-cross-linked peptides to further clarify the fragmentation behavior of TEMPO-Bz-peptide molecular ions. We show that the TEMPO-Bz-linker is highly beneficial for conducting FRIPS in negative ionization mode as the desired homolytic cleavage of the NO-C bond is the major fragmentation pathway. Based on characteristic fragments, the isomeric amino acids leucine and isoleucine could be discriminated. Interestingly, we observed pronounced amino acid side chain losses in cross-linked peptides if the cross-linked peptides contain a high number of acidic amino acids.

  17. Dissociation Behavior of a TEMPO-Active Ester Cross-Linker for Peptide Structure Analysis by Free Radical Initiated Peptide Sequencing (FRIPS) in Negative ESI-MS.

    PubMed

    Hage, Christoph; Ihling, Christian H; Götze, Michael; Schäfer, Mathias; Sinz, Andrea

    2017-01-01

    We have synthesized a homobifunctional amine-reactive cross-linking reagent, containing a TEMPO (2,2,6,6-tetramethylpiperidine-1-oxy) and a benzyl group (Bz), termed TEMPO-Bz-linker, to derive three-dimensional structural information of proteins. The aim for designing this novel cross-linker was to facilitate the mass spectrometric analysis of cross-linked products by free radical initiated peptide sequencing (FRIPS). In an initial study, we had investigated the fragmentation behavior of TEMPO-Bz-derivatized peptides upon collision activation in (+)-electrospray ionization collision-induced dissociation tandem mass spectrometry (ESI-CID-MS/MS) experiments. In addition to the homolytic NO-C bond cleavage FRIPS pathway delivering the desired odd-electron product ions, an alternative heterolytic NO-C bond cleavage, resulting in even-electron product ions mechanism was found to be relevant. The latter fragmentation route clearly depends on the protonation of the TEMPO-Bz-moiety itself, which motivated us to conduct (-)-ESI-MS, CID-MS/MS, and MS(3) experiments of TEMPO-Bz-cross-linked peptides to further clarify the fragmentation behavior of TEMPO-Bz-peptide molecular ions. We show that the TEMPO-Bz-linker is highly beneficial for conducting FRIPS in negative ionization mode as the desired homolytic cleavage of the NO-C bond is the major fragmentation pathway. Based on characteristic fragments, the isomeric amino acids leucine and isoleucine could be discriminated. Interestingly, we observed pronounced amino acid side chain losses in cross-linked peptides if the cross-linked peptides contain a high number of acidic amino acids. Graphical Abstract ᅟ.

  18. Anatomy of a microearthquake sequence on an active normal fault

    PubMed Central

    Stabile, T. A.; Satriano, C.; Orefice, A.; Festa, G.; Zollo, A.

    2012-01-01

    The analysis of similar earthquakes, such as events in a seismic sequence, is an effective tool with which to monitor and study source processes and to understand the mechanical and dynamic states of active fault systems. We are observing seismicity that is primarily concentrated in very limited regions along the 1980 Irpinia earthquake fault zone in Southern Italy, which is a complex system characterised by extensional stress regime. These zones of weakness produce repeated earthquakes and swarm-like microearthquake sequences, which are concentrated in a few specific zones of the fault system. In this study, we focused on a sequence that occurred along the main fault segment of the 1980 Irpinia earthquake to understand its characteristics and its relation to the loading-unloading mechanisms of the fault system. PMID:22606366

  19. Sequence analysis of the AAA protein family.

    PubMed Central

    Beyer, A.

    1997-01-01

    The AAA protein family, a recently recognized group of Walker-type ATPases, has been subjected to an extensive sequence analysis. Multiple sequence alignments revealed the existence of a region of sequence similarity, the so-called AAA cassette. The borders of this cassette were localized and within it, three boxes of a high degree of conservation were identified. Two of these boxes could be assigned to substantial parts of the ATP binding site (namely, to Walker motifs A and B); the third may be a portion of the catalytic center. Phylogenetic trees were calculated to obtain insights into the evolutionary history of the family. Subfamilies with varying degrees of intra-relatedness could be discriminated; these relationships are also supported by analysis of sequences outside the canonical AAA boxes: within the cassette are regions that are strongly conserved within each subfamily, whereas little or even no similarity between different subfamilies can be observed. These regions are well suited to define fingerprints for subfamilies. A secondary structure prediction utilizing all available sequence information was performed and the result was fitted to the general 3D structure of a Walker A/GTPase. The agreement was unexpectedly high and strongly supports the conclusion that the AAA family belongs to the Walker superfamily of A/GTPases. PMID:9336829

  20. Information theory applications for biological sequence analysis.

    PubMed

    Vinga, Susana

    2014-05-01

    Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.

  1. GeneQuiz: A workbench for sequence analysis

    SciTech Connect

    Scharf, M.; Schneider, R.; Casari, G.; Bork, P.

    1994-12-31

    We present the prototype of a software system, called GeneQuiz, for large-scale biological sequence analysis. The system was designed to meet the needs that arise in computational sequence analysis and our past experience with the analysis of 171 protein sequences of yeast chromosome III. We explain the cognitive challenges associated with this particular research activity and present our model of the sequence analysis process. The prototype system consists of two parts: (i) the database update and search system (driven by perl programs and rdb, a simple relational database engine also written in perl) and (ii) the visualization and browsing system (developed under C++/ET++). The principal design requirement for the first part was the complete automation of all repetitive actions: database up- dates, efficient sequence similarity searches and sampling of results in a uniform fashion. The user is then presented with {open_quotes}hit-lists{close_quotes} that summarize the results from heterogeneous database searches. The expert`s primary task now simply becomes the further analysis of the candidate entries, where the problem is to extract adequate information about functional characteristics of the query protein rapidly. This second task is tremendously accelerated by a simple combination of the heterogeneous output into uniform relational tables and the provision of browsing mechanisms that give access to database records, sequence entries and alignment views. Indexing of molecular sequence databases provides fast retrieval of individual entries with the use of unique identifiers as well as browsing through databases using pre-existing cross-references. The presentation here covers an overview of the architecture of the system prototype and our experiences on its applicability in sequence analysis.

  2. Sequence analysis by iterated maps, a review

    PubMed Central

    2014-01-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, ‘Chaos Game Representation’. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results. PMID:24162172

  3. The 2012 August 11 MW 6.5, 6.4 Ahar-Varzghan earthquakes, NW Iran: aftershock sequence analysis and evidence for activity migration

    NASA Astrophysics Data System (ADS)

    Rezapour, Mehdi

    2016-02-01

    The Ahar-Varzghan doublet earthquakes with magnitudes MW 6.5 and 6.4 occurred on 2012 August 11 in northwest Iran and were followed by many aftershocks. In this paper, we analyse ˜5 months of aftershocks of these events. The Ahar-Varzghan earthquakes occurred along complex faults and provide a new constraint on the earthquake hazard in northwest Iran. The general pattern of relocated aftershocks defines a complex seismic zone covering an area of approximately 25 × 10 km2. The Ahar-Varzghan aftershock sequence shows a secondary activity which started on November 7, approximately 3 months after the main shocks, with a significant increase in activity, regarding both number of events and their magnitude. This stage was characterized by a seismic zone that propagated to the west of the main shocks. The catalogue of aftershocks for the doublet earthquake has a magnitude completeness of Mc 2.0. A below-average b-value for the Ahar-Varzghan sequence indicates a structural heterogeneity in the fault plane and the compressive stress state of the region. Relocated aftershocks occupy a broad zone clustering east-west with near-vertical dip which we interpret as the fault plane of the first of the doublet main shocks (MW 6.5). The dominant depth range of the aftershocks is from 3 to about 20 km, and the focal depths decrease toward the western part of the fault. The aftershock activity has its highest concentration in the eastern and middle parts of the active fault, and tapers off toward the western part of the active fault segment, indicating mainly a unilateral rupture toward west.

  4. Analysis of cis-sequence of subgenomic transcript promoter from the Figwort mosaic virus and comparison of promoter activity with the cauliflower mosaic virus promoters in monocot and dicot cells.

    PubMed

    Bhattacharyya, Somnath; Dey, Nrisingha; Maiti, Indu B

    2002-12-01

    A sub-genomic transcript (Sgt) promoter was isolated from the Figwort mosaic virus (FMV) genomic clone. The FMV Sgt promoter was linked to heterologous coding sequences to form a chimeric gene construct. The 5'-3'-boundaries required for maximal activity and involvement of cis-sequences for optimal expression in plants were defined by 5'-, 3'-end deletion and internal deletion analysis of FMV Sgt promoter fragments coupled with a beta-glucuronidase reporter gene in both transient protoplast expression experiments and in transgenic plants. A 301 bp FMV Sgt promoter fragment (sequence -270 to +31 from the transcription start site; TSS) provided maximum promoter activity. The TSS of the FMV Sgt promoter was determined by primer extension analysis using total RNA from transgenic plants developed for FMV Sgt promoter: uidA fusion gene. An activator domain located upstream of the TATA box at -70 to -100 from TSS is absolutely required for promoter activity and its function is critically position-dependent with respect to TATA box. Two sequence motifs AGATTTTAAT (coordinates -100 to -91) and GTAAGCGC (coordinates -80 to -73) were found to be essential for promoter activity. The FMV Sgt promoter is less active in monocot cells; FMV Sgt promoter expression level was about 27.5-fold higher in tobacco cells compared to that in maize cells. Comparative expression analysis of FMV Sgt promoter with cauliflower mosaic virus (CaMV) 35S promoter showed that the FMV Sgt promoter is about 2-fold stronger than the CaMV 35S promoter. The FMV Sgt promoter is a constitutive promoter; expression level in seedlings was in the order: root>leaf>stem.

  5. Multilevel analysis of sports video sequences

    NASA Astrophysics Data System (ADS)

    Han, Jungong; Farin, Dirk; de With, Peter H. N.

    2006-01-01

    We propose a fully automatic and flexible framework for analysis and summarization of tennis broadcast video sequences, using visual features and specific game-context knowledge. Our framework can analyze a tennis video sequence at three levels, which provides a broad range of different analysis results. The proposed framework includes novel pixel-level and object-level tennis video processing algorithms, such as a moving-player detection taking both the color and the court (playing-field) information into account, and a player-position tracking algorithm based on a 3-D camera model. Additionally, we employ scene-level models for detecting events, like service, base-line rally and net-approach, based on a number real-world visual features. The system can summarize three forms of information: (1) all court-view playing frames in a game, (2) the moving trajectory and real-speed of each player, as well as relative position between the player and the court, (3) the semantic event segments in a game. The proposed framework is flexible in choosing the level of analysis that is desired. It is effective because the framework makes use of several visual cues obtained from the real-world domain to model important events like service, thereby increasing the accuracy of the scene-level analysis. The paper presents attractive experimental results highlighting the system efficiency and analysis capabilities.

  6. An RNA Sequencing Transcriptome Analysis Reveals Novel Insights into Molecular Aspects of the Nitrate Impact on the Nodule Activity of Medicago truncatula1[W

    PubMed Central

    Cabeza, Ricardo; Koester, Beke; Liese, Rebecca; Lingner, Annika; Baumgarten, Vanessa; Dirks, Jan; Salinas-Riester, Gabriela; Pommerenke, Claudia; Dittert, Klaus; Schulze, Joachim

    2014-01-01

    The mechanism through which nitrate reduces the activity of legume nodules is controversial. The objective of the study was to follow Medicago truncatula nodule activity after nitrate provision continuously and to identify molecular mechanisms, which down-regulate the activity of the nodules. Nodule H2 evolution started to decline after about 4 h of nitrate application. At that point in time, a strong shift in nodule gene expression (RNA sequencing) had occurred (1,120 differentially expressed genes). The most pronounced effect was the down-regulation of 127 genes for nodule-specific cysteine-rich peptides. Various other nodulins were also strongly down-regulated, in particular all the genes for leghemoglobins. In addition, shifts in the expression of genes involved in cellular iron allocation and mitochondrial ATP synthesis were observed. Furthermore, the expression of numerous genes for the formation of proteins and glycoproteins with no obvious function in nodules (e.g. germins, patatin, and thaumatin) was strongly increased. This occurred in conjunction with an up-regulation of genes for proteinase inhibitors, in particular those containing the Kunitz domain. The additionally formed proteins might possibly be involved in reducing nodule oxygen permeability. Between 4 and 28 h of nitrate exposure, a further reduction in nodule activity occurred, and the number of differentially expressed genes almost tripled. In particular, there was a differential expression of genes connected with emerging senescence. It is concluded that nitrate exerts rapid and manifold effects on nitrogenase activity. A certain degree of nitrate tolerance might be achieved when the down-regulatory effect on late nodulins can be alleviated. PMID:24285852

  7. An RNA sequencing transcriptome analysis reveals novel insights into molecular aspects of the nitrate impact on the nodule activity of Medicago truncatula.

    PubMed

    Cabeza, Ricardo; Koester, Beke; Liese, Rebecca; Lingner, Annika; Baumgarten, Vanessa; Dirks, Jan; Salinas-Riester, Gabriela; Pommerenke, Claudia; Dittert, Klaus; Schulze, Joachim

    2014-01-01

    The mechanism through which nitrate reduces the activity of legume nodules is controversial. The objective of the study was to follow Medicago truncatula nodule activity after nitrate provision continuously and to identify molecular mechanisms, which down-regulate the activity of the nodules. Nodule H2 evolution started to decline after about 4 h of nitrate application. At that point in time, a strong shift in nodule gene expression (RNA sequencing) had occurred (1,120 differentially expressed genes). The most pronounced effect was the down-regulation of 127 genes for nodule-specific cysteine-rich peptides. Various other nodulins were also strongly down-regulated, in particular all the genes for leghemoglobins. In addition, shifts in the expression of genes involved in cellular iron allocation and mitochondrial ATP synthesis were observed. Furthermore, the expression of numerous genes for the formation of proteins and glycoproteins with no obvious function in nodules (e.g. germins, patatin, and thaumatin) was strongly increased. This occurred in conjunction with an up-regulation of genes for proteinase inhibitors, in particular those containing the Kunitz domain. The additionally formed proteins might possibly be involved in reducing nodule oxygen permeability. Between 4 and 28 h of nitrate exposure, a further reduction in nodule activity occurred, and the number of differentially expressed genes almost tripled. In particular, there was a differential expression of genes connected with emerging senescence. It is concluded that nitrate exerts rapid and manifold effects on nitrogenase activity. A certain degree of nitrate tolerance might be achieved when the down-regulatory effect on late nodulins can be alleviated.

  8. RNA sequencing analysis of the developing chicken retina

    PubMed Central

    Langouet-Astrie, Christophe J.; Meinsen, Annamarie L.; Grunwald, Emily R.; Turner, Stephen D.; Enke, Raymond A.

    2016-01-01

    RNA sequencing transcriptome analysis using massively parallel next generation sequencing technology provides the capability to understand global changes in gene expression throughout a range of tissue samples. Development of the vertebrate retina requires complex temporal orchestration of transcriptional activation and repression. The chicken embryo (Gallus gallus) is a classic model system for studying developmental biology and retinogenesis. Existing retinal transcriptome projects have been critical to the vision research community for studying aspects of murine and human retinogenesis, however, there are currently no publicly available data sets describing the developing chicken retinal transcriptome. Here we used Illumina RNA sequencing (RNA-seq) analysis to characterize the mRNA transcriptome of the developing chicken retina in an effort to identify genes critical for retinal development in this important model organism. These data will be valuable to the vision research community for characterizing global changes in gene expression between ocular tissues and critical developmental time points during retinogenesis in the chicken retina. PMID:27996968

  9. Sequence analysis of myostatin promoter in cattle.

    PubMed

    Crisà, A; Marchitelli, C; Savarese, M C; Valentini, A

    2003-01-01

    Myostatin (GDF8) acts as a negative regulator of muscle growth. Mutations in the gene are responsible for the double muscling phenotype in several European cattle breeds. Here we describe the sequence of the upstream 5' region of the myostatin gene. The sequence analysis was carried out on three animals of nine European cattle breeds, with the aim to search for polymorphisms. A T/A polymorphism at -371 and a G/C polymorphism at -805 (relative to ATG) were found. PCR- RFLP was used to further screen 353 animals of the nine breeds studied and to assess the frequencies of the SNPs. The promoter region of the gene contains several binding sites for transcription factors found also in other myogenic genes. This may play an important role in the regulation of the protein and consequently on muscular development.

  10. Nucleotide sequence analysis of the gene specifying the bifunctional 6'-aminoglycoside acetyltransferase 2"-aminoglycoside phosphotransferase enzyme in Streptococcus faecalis and identification and cloning of gene regions specifying the two activities.

    PubMed

    Ferretti, J J; Gilmore, K S; Courvalin, P

    1986-08-01

    The gene specifying the bifunctional 6'-aminoglycoside acetyltransferase [AAC(6')] 2"-aminoglycoside phosphotransferase [APH(2")] enzyme from the Streptococcus faecalis plasmid pIP800 was cloned in Escherichia coli. A single protein with an apparent molecular weight of 56,000 was specified by this cloned determinant as detected in minicell experiments. Nucleotide sequence analysis revealed the presence of an open reading frame capable of specifying a protein of 479 amino acids and with a molecular weight of 56,850. The deduced amino acid sequence of the bifunctional AAC(6')-APH(2") gene product possessed two regions of homology with other sequenced resistance proteins. The N-terminal region contained a sequence that was homologous to the chloramphenicol acetyltransferase of Bacillus pumilus, and the C-terminal region contained a sequence homologous to the aminoglycoside phosphotransferase of Streptomyces fradiae. Subcloning experiments were performed with the AAC(6')-APH(2") resistance determinant, and it was possible to obtain gene segments independently specifying the acetyltransferase and phosphotransferase activities. These data suggest that the gene specifying the AAC(6')-APH(2") resistance enzyme arose as a result of a gene fusion.

  11. Sequence Analysis of LRPPRC and Its SEC1 Domain Interaction Partners Suggests Roles in Cytoskeletal Organization, Vesicular Trafficking, Nucleocytosolic Shuttling and Chromosome Activity

    PubMed Central

    Liu, Leyuan; McKeehan, Wallace L.

    2011-01-01

    LRPPRC (originally called LRP130) is an intracellular 130-kDa leucine-rich protein that co-purifies with the FGF receptor from liver cell extracts and has been detected in diverse multi-protein complexes from the cell membrane, cytoskeleton and nucleus. Here we report results of a sequence homology analysis of LRPPRC and its SEC1 domain interactive partners. Twenty-three copies of tandem repeats that are similar to PPR, TPR and HEAT repeats characterize the LRPPRC sequence. The N-terminus exhibits multiple copies of leucine-rich nuclear transport signals followed by ENTH, DUF28 and SEC1 homology domains. We used the SEC1 domain to trap interactive partners expressed from a human liver cDNA library. Interactive C19ORF5 (XP_038600) exhibited a strong homology to microtubule-associated proteins (MAP) and a potential arginine-rich mRNA binding motif. UXT (XP_033860) exhibited α-helical properties homologous to the actin-associated spectrin repeat and L/I heptad repeats in mobile transcription factors. C6ORF34 (XP_004305) was homologous to the non-DNA binding C-terminus of the E. coli Rob transcription factor. CECR2 (AAK15343) exhibited a transcription factor AT-hook motif next to two bromodomains and a homology to guanylate-binding protein 1. Taken together these features suggest a regulatory role of LRPPRC and its SEC1 domain-interactive partners in integration of cytoskeletal networks with vesicular trafficking, nucleocytosolic shuttling, chromosome remodeling and transcription. PMID:11827465

  12. Integrating Sequence Evolution into Probabilistic Orthology Analysis.

    PubMed

    Ullah, Ikram; Sjöstrand, Joel; Andersson, Peter; Sennblad, Bengt; Lagergren, Jens

    2015-11-01

    Orthology analysis, that is, finding out whether a pair of homologous genes are orthologs - stemming from a speciation - or paralogs - stemming from a gene duplication - is of central importance in computational biology, genome annotation, and phylogenetic inference. In particular, an orthologous relationship makes functional equivalence of the two genes highly likely. A major approach to orthology analysis is to reconcile a gene tree to the corresponding species tree, (most commonly performed using the most parsimonious reconciliation, MPR). However, most such phylogenetic orthology methods infer the gene tree without considering the constraints implied by the species tree and, perhaps even more importantly, only allow the gene sequences to influence the orthology analysis through the a priori reconstructed gene tree. We propose a sound, comprehensive Bayesian Markov chain Monte Carlo-based method, DLRSOrthology, to compute orthology probabilities. It efficiently sums over the possible gene trees and jointly takes into account the current gene tree, all possible reconciliations to the species tree, and the, typically strong, signal conveyed by the sequences. We compare our method with PrIME-GEM, a probabilistic orthology approach built on a probabilistic duplication-loss model, and MrBayesMPR, a probabilistic orthology approach that is based on conventional Bayesian inference coupled with MPR. We find that DLRSOrthology outperforms these competing approaches on synthetic data as well as on biological data sets and is robust to incomplete taxon sampling artifacts.

  13. Multilocus sequence analysis (MLSA) in prokaryotic taxonomy.

    PubMed

    Glaeser, Stefanie P; Kämpfer, Peter

    2015-06-01

    To obtain a higher resolution of the phylogenetic relationships of species within a genus or genera within a family, multilocus sequence analysis (MLSA) is currently a widely used method. In MLSA studies, partial sequences of genes coding for proteins with conserved functions ('housekeeping genes') are used to generate phylogenetic trees and subsequently deduce phylogenies. However, MLSA is not only suggested as a phylogenetic tool to support and clarify the resolution of bacterial species with a higher resolution, as in 16S rRNA gene-based studies, but has also been discussed as a replacement for DNA-DNA hybridization (DDH) in species delineation. Nevertheless, despite the fact that MLSA has become an accepted and widely used method in prokaryotic taxonomy, no common generally accepted recommendations have been devised to date for either the whole area of microbial taxonomy or for taxa-specific applications of individual MLSA schemes. The different ways MLSA is performed can vary greatly for the selection of genes, their number, and the calculation method used when comparing the sequences obtained. Here, we provide an overview of the historical development of MLSA and critically review its current application in prokaryotic taxonomy by highlighting the advantages and disadvantages of the method's numerous variations. This provides a perspective for its future use in forthcoming genome-based genotypic taxonomic analyses.

  14. Analysis of transcriptional and upstream regulatory sequence activity of two environmental stress-inducible genes, NBS-Str1 and BLEC-Str8, of rice.

    PubMed

    Ray, Swatismita; Kapoor, Sanjay; Tyagi, Akhilesh K

    2012-04-01

    Two abiotic stress-inducible upstream regulatory sequences (URSs) from rice have been identified and functionally characterized in rice. NBS-Str1 and BLEC-Str8 genes have been identified, by analysing the transcriptome data of cold, salt and desiccation stress-treated 7-day-old rice (Oryza sativa L. var. IR64) seedling, to be preferentially responsive to desiccation and salt stress, respectively. NBS-Str1 and BLEC-Str8 genes code for putative NBS (nucleotide binding site)-LRR (leucine rich repeat) and β-lectin domain protein, respectively. NBS-Str1 URS is induced in root tissue, preferentially in vascular bundle, during 3 and 24 h of desiccation stress condition in transgenic 7-day-old rice seedling. In mature transgenic plants, this URS shows induction in root and shoot tissue under desiccation stress as well as under prolonged (1 and 2 day) salt stress. BLEC-Str8 URS shows basal activity under un-stressed condition, however, it is inducible under salt stress condition in both root and leaf tissues in young seedling and mature plants. Activity of BLEC-Str8 URS has been found to be vascular tissue preferential, however, under salt stress condition its activity is also found in the mesophyll tissue. NBS-Str1 and BLEC-Str8 URSs are inducible by heavy metal, copper and manganese. Interestingly, both the URSs have been found to be non responsive to ABA treatment, implying them to be part of ABA-independent abiotic stress response pathway. These URSs could prove useful for expressing a transgene in a stress responsive manner for development of stress tolerant transgenic systems.

  15. Mesoscopic Patterns of Neural Activity Support Songbird Cortical Sequences

    PubMed Central

    Guitchounts, Grigori; Velho, Tarciso; Lois, Carlos; Gardner, Timothy J.

    2015-01-01

    Time-locked sequences of neural activity can be found throughout the vertebrate forebrain in various species and behavioral contexts. From “time cells” in the hippocampus of rodents to cortical activity controlling movement, temporal sequence generation is integral to many forms of learned behavior. However, the mechanisms underlying sequence generation are not well known. Here, we describe a spatial and temporal organization of the songbird premotor cortical microcircuit that supports sparse sequences of neural activity. Multi-channel electrophysiology and calcium imaging reveal that neural activity in premotor cortex is correlated with a length scale of 100 µm. Within this length scale, basal-ganglia–projecting excitatory neurons, on average, fire at a specific phase of a local 30 Hz network rhythm. These results show that premotor cortical activity is inhomogeneous in time and space, and that a mesoscopic dynamical pattern underlies the generation of the neural sequences controlling song. PMID:26039895

  16. Mesoscopic patterns of neural activity support songbird cortical sequences.

    PubMed

    Markowitz, Jeffrey E; Liberti, William A; Guitchounts, Grigori; Velho, Tarciso; Lois, Carlos; Gardner, Timothy J

    2015-06-01

    Time-locked sequences of neural activity can be found throughout the vertebrate forebrain in various species and behavioral contexts. From "time cells" in the hippocampus of rodents to cortical activity controlling movement, temporal sequence generation is integral to many forms of learned behavior. However, the mechanisms underlying sequence generation are not well known. Here, we describe a spatial and temporal organization of the songbird premotor cortical microcircuit that supports sparse sequences of neural activity. Multi-channel electrophysiology and calcium imaging reveal that neural activity in premotor cortex is correlated with a length scale of 100 µm. Within this length scale, basal-ganglia-projecting excitatory neurons, on average, fire at a specific phase of a local 30 Hz network rhythm. These results show that premotor cortical activity is inhomogeneous in time and space, and that a mesoscopic dynamical pattern underlies the generation of the neural sequences controlling song.

  17. Draft genome sequence analysis of a Pseudomonas putida W15Oct28 strain with antagonistic activity to Gram-positive and Pseudomonas sp. pathogens.

    PubMed

    Ye, Lumeng; Hildebrand, Falk; Dingemans, Jozef; Ballet, Steven; Laus, George; Matthijs, Sandra; Berendsen, Roeland; Cornelis, Pierre

    2014-01-01

    Pseudomonas putida is a member of the fluorescent pseudomonads known to produce the yellow-green fluorescent pyoverdine siderophore. P. putida W15Oct28, isolated from a stream in Brussels, was found to produce compound(s) with antimicrobial activity against the opportunistic pathogens Staphylococcus aureus, Pseudomonas aeruginosa, and the plant pathogen Pseudomonas syringae, an unusual characteristic for P. putida. The active compound production only occurred in media with low iron content and without organic nitrogen sources. Transposon mutants which lost their antimicrobial activity had the majority of insertions in genes involved in the biosynthesis of pyoverdine, although purified pyoverdine was not responsible for the antagonism. Separation of compounds present in culture supernatants revealed the presence of two fractions containing highly hydrophobic molecules active against P. aeruginosa. Analysis of the draft genome confirmed the presence of putisolvin biosynthesis genes and the corresponding lipopeptides were found to contribute to the antimicrobial activity. One cluster of ten genes was detected, comprising a NAD-dependent epimerase, an acetylornithine aminotransferase, an acyl CoA dehydrogenase, a short chain dehydrogenase, a fatty acid desaturase and three genes for a RND efflux pump. P. putida W15Oct28 genome also contains 56 genes encoding TonB-dependent receptors, conferring a high capacity to utilize pyoverdines from other pseudomonads. One unique feature of W15Oct28 is also the presence of different secretion systems including a full set of genes for type IV secretion, and several genes for type VI secretion and their VgrG effectors.

  18. Bayesian Correlation Analysis for Sequence Count Data

    PubMed Central

    Lau, Nelson; Perkins, Theodore J.

    2016-01-01

    Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities’ measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low—especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities’ signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset. PMID:27701449

  19. Bayesian Correlation Analysis for Sequence Count Data.

    PubMed

    Sánchez-Taltavull, Daniel; Ramachandran, Parameswaran; Lau, Nelson; Perkins, Theodore J

    2016-01-01

    Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.

  20. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  1. Los Alamos sequence analysis package for nucleic acids and proteins.

    PubMed Central

    Kanehisa, M I

    1982-01-01

    An interactive system for computer analysis of nucleic acid and protein sequences has been developed for the Los Alamos DNA Sequence Database. It provides a convenient way to search or verify various sequence features, e.g., restriction enzyme sites, protein coding frames, and properties of coded proteins. Further, the comprehensive analysis package on a large-scale database can be used for comparative studies on sequence and structural homologies in order to find unnoted information stored in nucleic acid sequences. PMID:6174934

  2. A stochastic model for EEG microstate sequence analysis.

    PubMed

    Gärtner, Matthias; Brodbeck, Verena; Laufs, Helmut; Schneider, Gaby

    2015-01-01

    The analysis of spontaneous resting state neuronal activity is assumed to give insight into the brain function. One noninvasive technique to study resting state activity is electroencephalography (EEG) with a subsequent microstate analysis. This technique reduces the recorded EEG signal to a sequence of prototypical topographical maps, which is hypothesized to capture important spatio-temporal properties of the signal. In a statistical EEG microstate analysis of healthy subjects in wakefulness and three stages of sleep, we observed a simple structure in the microstate transition matrix. It can be described with a first order Markov chain in which the transition probability from the current state (i.e., map) to a different map does not depend on the current map. The resulting transition matrix shows a high agreement with the observed transition matrix, requiring only about 2% of mass transport (1/2 L1-distance). In the second part, we introduce an extended framework in which the simple Markov chain is used to make inferences on a potential underlying time continuous process. This process cannot be directly observed and is therefore usually estimated from discrete sampling points of the EEG signal given by the local maxima of the global field power. Therefore, we propose a simple stochastic model called sampled marked intervals (SMI) model that relates the observed sequence of microstates to an assumed underlying process of background intervals and thus, complements approaches that focus on the analysis of observable microstate sequences.

  3. Automated carboxy-terminal sequence analysis of peptides.

    PubMed Central

    Bailey, J. M.; Shenoy, N. R.; Ronk, M.; Shively, J. E.

    1992-01-01

    Proteins and peptides can be sequenced from the carboxy-terminus with isothiocyanate reagents to produce amino acid thiohydantoin derivatives. Previous studies in our laboratory have focused on solution phase conditions for formation of the peptidylthiohydantoins with trimethylsilylisothiocyanate (TMS-ITC) and for hydrolysis of these peptidylthiohydantoins into an amino acid thiohydantoin derivative and a new shortened peptide capable of continued degradation (Bailey, J. M. & Shively, J. E., 1990, Biochemistry 29, 3145-3156). The current study is a continuation of this work and describes the construction of an instrument for automated C-terminal sequencing, the application of the thiocyanate chemistry to peptides covalently coupled to a novel polyethylene solid support (Shenoy, N. R., Bailey, J. M., & Shively, J. E., 1992, Protein Sci. I, 58-67), the use of sodium trimethylsilanolate as a novel reagent for the specific cleavage of the derivatized C-terminal amino acid, and the development of methodology to sequence through the difficult amino acid, aspartate. Automated programs are described for the C-terminal sequencing of peptides covalently attached to carboxylic acid-modified polyethylene. The chemistry involves activation with acetic anhydride, derivatization with TMS-ITC, and cleavage of the derivatized C-terminal amino acid with sodium trimethylsilanolate. The thiohydantoin amino acid is identified by on-line high performance liquid chromatography using a Phenomenex Ultracarb 5 ODS(30) column and a triethylamine/phosphoric acid buffer system containing pentanesulfonic acid. The generality of our automated C-terminal sequencing methodology was examined by sequencing model peptides containing all 20 of the common amino acids. All of the amino acids were found to sequence in high yield (90% or greater) except for asparagine and aspartate, which could be only partially removed, and proline, which was found not be capable of derivatization. In spite of these

  4. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    PubMed Central

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can

  5. Entropy analysis of substitutive sequences revisited

    NASA Astrophysics Data System (ADS)

    Karamanos, K.

    2001-11-01

    A given finite sequence of letters over a finite alphabet can always be algorithmically generated, in particular by a Turing machine. This fact is at the heart of complexity theory in the sense of Kolmogorov and Chaitin. A relevant question in this context is whether, given a statistically 'sufficiently long' sequence, there exists a deterministic finite automaton that generates it. In this paper we propose a simple criterion, based on measuring block entropies by lumping, which is satisfied by all automatic sequences. On the basis of this, one can determine that a given sequence is not automatic and obtain interesting information when the sequence is automatic. Following previous work on the Feigenbaum sequence, we give a necessary entropy-based condition valid for all automatic sequences read by lumping. Applications of these ideas to representative examples are discussed. In particular, we establish new entropic decimation schemes for the Thue-Morse, the Rudin-Shapiro and the paperfolding sequences read by lumping.

  6. Direct Chloroplast Sequencing: Comparison of Sequencing Platforms and Analysis Tools for Whole Chloroplast Barcoding

    PubMed Central

    Brozynska, Marta; Furtado, Agnelo; Henry, Robert James

    2014-01-01

    Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis. PMID:25329378

  7. Protegrin structure-activity relationships: using homology models of synthetic sequences to determine structural characteristics important for activity.

    PubMed

    Ostberg, Nathan; Kaznessis, Yiannis

    2005-02-01

    The protegrin family of antimicrobial peptides is among the shortest in sequence length while remaining very active against a variety of microorganisms. The major goal of this study is to characterize easily calculated molecular properties, which quantitatively show high correlation with antibacterial activity. The peptides studied have high sequence similarity but vary in activity over more than an order of magnitude. Hence, sequence analysis alone cannot be used to predict activity for these peptides. We calculate structural properties of 62 protegrin and protegrin-analogue peptides and correlate them to experimental activities against six microbe species, as well as hemolytic and cytotoxic activities. Natural protegrins structures were compared with synthetic derivatives using homology modeling, and property descriptors were calculated to determine the characteristics that confer their antimicrobial activity. A structure-activity relationship study of all these peptides provides information about the structural properties that affect activity against different microbial species.

  8. Whole exome sequence analysis of Peters anomaly

    PubMed Central

    Weh, Eric; Reis, Linda M.; Happ, Hannah C.; Levin, Alex V.; Wheeler, Patricia G.; David, Karen L.; Carney, Erin; Angle, Brad; Hauser, Natalie

    2015-01-01

    Peters anomaly is a rare form of anterior segment ocular dysgenesis, which can also be associated with additional systemic defects. At this time, the majority of cases of Peters anomaly lack a genetic diagnosis. We performed whole exome sequencing of 27 patients with syndromic or isolated Peters anomaly to search for pathogenic mutations in currently known ocular genes. Among the eight previously recognized Peters anomaly genes, we identified a de novo missense mutation in PAX6, c.155G>A, p.(Cys52Tyr), in one patient. Analysis of 691 additional genes currently associated with a different ocular phenotype identified a heterozygous splicing mutation c.1025+2T>A in TFAP2A, a de novo heterozygous nonsense mutation c.715C>T, p.(Gln239*) in HCCS, a hemizygous mutation c.385G>A, p.(Glu129Lys) in NDP, a hemizygous mutation c.3446C>T, p.(Pro1149Leu) in FLNA, and compound heterozygous mutations c.1422T>A, p.(Tyr474*) and c.2544G>A, p.(Met848Ile) in SLC4A11; all mutations, except for the FLNA and SLC4A11 c.2544G>A alleles, are novel. This is the frst study to use whole exome sequencing to discern the genetic etiology of a large cohort of patients with syndromic or isolated Peters anomaly. We report five new genes associated with this condition and suggest screening of TFAP2A and FLNA in patients with Peters anomaly and relevant syndromic features and HCCS, NDP and SLC4A11 in patients with isolated Peters anomaly. PMID:25182519

  9. Coupling sequencing by hybridization (SBH) with gel sequencing for an inexpensive analysis of genes and genomes

    SciTech Connect

    Drmanac, S.; Labat, I.; Hauser, B.; Drmanac, R.

    1996-11-01

    The speed and cost of DNA sequencing are bottlenecks in the analysis of genes end genomes. Sequencing by hybridization (SBH) is a versatile method with several applications which can accelerated DNA screening, mapping and sequencing. Requirements, achievements and problems in the development of the SBH format 1 (DNA samples arrayed) are presented and schemes for its synergetic coupling with gel sequencing techniques are discussed. It appears that by one hybridization machine with 24 boxes and four ABI gel sequencers 100- 300 Mb of DNA sequence can be determined per year. Various genetic studies based on computer assisted analysis of large collections of partial or complete DNA sequences (`sequenetics`) may be achieved in this century.

  10. Microfluidic devices for DNA sequencing: sample preparation and electrophoretic analysis.

    PubMed

    Paegel, Brian M; Blazej, Robert G; Mathies, Richard A

    2003-02-01

    Modern DNA sequencing 'factories' have revolutionized biology by completing the human genome sequence, but in the race to completion we are left with inefficient, cumbersome, and costly macroscale processes and supporting facilities. During the same period, microfabricated DNA sequencing, sample processing and analysis devices have advanced rapidly toward the goal of a 'sequencing lab-on-a-chip'. Integrated microfluidic processing dramatically reduces analysis time and reagent consumption, and eliminates costly and unreliable macroscale robotics and laboratory apparatus. A microfabricated device for high-throughput DNA sequencing that couples clone isolation, template amplification, Sanger extension, purification, and electrophoretic analysis in a single microfluidic circuit is now attainable.

  11. Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius.

    PubMed

    Al-Swailem, Abdulaziz M; Shehata, Maher M; Abu-Duhier, Faisel M; Al-Yamani, Essam J; Al-Busadah, Khalid A; Al-Arawi, Mohammed S; Al-Khider, Ali Y; Al-Muhaimeed, Abdullah N; Al-Qahtani, Fahad H; Manee, Manee M; Al-Shomrani, Badr M; Al-Qhtani, Saad M; Al-Harthi, Amer S; Akdemir, Kadir C; Inan, Mehmet S; Otu, Hasan H

    2010-05-19

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and approximately 40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism.

  12. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  13. RED: the analysis, management and dissemination of expressed sequence tags.

    PubMed

    Everitt, R; Minnema, S E; Wride, M A; Koster, C S; Hance, J E; Mansergh, F C; Rancourt, D E

    2002-12-01

    The Rancourt EST Database (RED) is a web-based system for the analysis, management, and dissemination of expressed sequence tags (ESTs). RED represents a flexible template DNA sequence database that can be easily manipulated to suit the needs of other laboratories undertaking mid-size sequencing projects.

  14. An analysis of the feasibility of short read sequencing

    PubMed Central

    Whiteford, Nava; Haslam, Niall; Weber, Gerald; Prügel-Bennett, Adam; Essex, Jonathan W.; Roach, Peter L.; Bradley, Mark; Neylon, Cameron

    2005-01-01

    Several methods for ultra high-throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). Here we report on an analysis showing the level of genome sequencing possible as a function of read length. It is shown that re-sequencing and de novo sequencing of the majority of a bacterial genome is possible with read lengths of 20–30 nt, and that reads of 50 nt can provide reconstructed contigs (a contiguous fragment of sequence data) of 1000 nt and greater that cover 80% of human chromosome 1. PMID:16275781

  15. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, M.S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device. 27 figs.

  16. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1999-10-26

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  17. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  18. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2001-06-05

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  19. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  20. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2003-08-19

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  1. Active site amino acid sequence of human factor D.

    PubMed

    Davis, A E

    1980-08-01

    Factor D was isolated from human plasma by chromatography on CM-Sephadex C50, Sephadex G-75, and hydroxylapatite. Digestion of reduced, S-carboxymethylated factor D with cyanogen bromide resulted in three peptides which were isolated by chromatography on Sephadex G-75 (superfine) equilibrated in 20% formic acid. NH2-Terminal sequences were determined by automated Edman degradation with a Beckman 890C sequencer using a 0.1 M Quadrol program. The smallest peptide (CNBr III) consisted of the NH2-terminal 14 amino acids. The other two peptides had molecular weights of 17,000 (CNBr I) and 7000 (CNBr II). Overlap of the NH2-terminal sequence of factor D with the NH2-terminal sequence of CNBr I established the order of the peptides. The NH2-terminal 53 residues of factor D are somewhat more homologous with the group-specific protease of rat intestine than with other serine proteases. The NH2-terminal sequence of CNBr II revealed the active site serine of factor D. The typical serine protease active site sequence (Gly-Asp-Ser-Gly-Gly-Pro was found at residues 12-17. The region surrounding the active site serine does not appear to be more highly homologous with any one of the other serine proteases. The structural data obtained point out the similarities between factor D and the other proteases. However, complete definition of the degree of relationship between factor D and other proteases will require determination of the remainder of the primary structure.

  2. Microbial Contamination in Next Generation Sequencing: Implications for Sequence-Based Analysis of Clinical Samples

    PubMed Central

    Strong, Michael J.; Xu, Guorong; Morici, Lisa; Splinter Bon-Durant, Sandra; Baddoo, Melody; Lin, Zhen; Fewell, Claire; Taylor, Christopher M.; Flemington, Erik K.

    2014-01-01

    The high level of accuracy and sensitivity of next generation sequencing for quantifying genetic material across organismal boundaries gives it tremendous potential for pathogen discovery and diagnosis in human disease. Despite this promise, substantial bacterial contamination is routinely found in existing human-derived RNA-seq datasets that likely arises from environmental sources. This raises the need for stringent sequencing and analysis protocols for studies investigating sequence-based microbial signatures in clinical samples. PMID:25412476

  3. Expressed sequence tags analysis of Blattella germanica.

    PubMed

    Chung, Hyang Suk; Yu, Tai Hyun; Kim, Bong Jin; Kim, Sun Mi; Kim, Joo Yeong; Yu, Hak Sun; Jeong, Hae Jin; Ock, Mee Sun

    2005-12-01

    Four hundred and sixty five randomly selected clones from a cDNA library of Blattella germanica were partially sequenced and searched using BLAST as a means of analyzing the transcribed sequences of its genome. A total of 363 expressed sequence tags (ESTs) were generated from 465 clones after editing and trimming the vector and ambiguous sequences. About 42% (154/363) of these clones showed significant homology with other data base registered genes. These new B. germanica genes constituted a broad range of transcripts distributed among ribosomal proteins, energy metabolism, allergens, proteases, protease inhibitors, enzymes, translation, cell signaling pathways, and proteins of unknown function. Eighty clones were not well-matched by database searches, and these represent new B. germanica-specific ESTs. Some genes which drew our attention are discussed. The information obtained increases our understanding of the B. germanica genome.

  4. Expressed sequence tags analysis of Blattella germanica

    PubMed Central

    Chung, Hyang Suk; Yu, Tai Hyun; Kim, Bong Jin; Kim, Sun Mi; Kim, Joo Yeong; Yu, Hak Sun; Jeong, Hae Jin

    2005-01-01

    Four hundred and sixty five randomly selected clones from a cDNA library of Blattella germanica were partially sequenced and searched using BLAST as a means of analyzing the transcribed sequences of its genome. A total of 363 expressed sequence tags (ESTs) were generated from 465 clones after editing and trimming the vector and ambiguous sequences. About 42% (154/363) of these clones showed significant homology with other data base registered genes. These new B. germanica genes constituted a broad range of transcripts distributed among ribosomal proteins, energy metabolism, allergens, proteases, protease inhibitors, enzymes, translation, cell signaling pathways, and proteins of unknown function. Eighty clones were not well-matched by database searches, and these represent new B. germanica-specific ESTs. Some genes which drew our attention are discussed. The information obtained increases our understanding of the B. germanica genome. PMID:16340304

  5. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    ERIC Educational Resources Information Center

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  6. [Tabular excel editor for analysis of aligned nucleotide sequences].

    PubMed

    Demkin, V V

    2010-01-01

    Excel platform was used for transition of results of multiple aligned nucleotide sequences obtained using the BLAST network service to the form appropriate for visual analysis and editing. Two macros operators for MS Excel 2007 were constructed. The array of aligned sequences transformed into Excel table and processed using macros operators is more appropriate for analysis than initial html data.

  7. Relationships among genera of the Saccharomycotina from multigene sequence analysis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Most known species of the subphylum Saccharomycotina (budding ascomycetous yeasts) have now been placed in phylogenetically defined clades following multigene sequence analysis. Terminal clades, which are usually well supported from bootstrap analysis, are viewed as phylogenetically circumscribed ge...

  8. Rare variant detection using family-based sequencing analysis.

    PubMed

    Peng, Gang; Fan, Yu; Palculict, Timothy B; Shen, Peidong; Ruteshouser, E Cristy; Chi, Aung-Kyaw; Davis, Ronald W; Huff, Vicki; Scharfe, Curt; Wang, Wenyi

    2013-03-05

    Next-generation sequencing is revolutionizing genomic analysis, but this analysis can be compromised by high rates of missing true variants. To develop a robust statistical method capable of identifying variants that would otherwise not be called, we conducted sequence data simulations and both whole-genome and targeted sequencing data analysis of 28 families. Our method (Family-Based Sequencing Program, FamSeq) integrates Mendelian transmission information and raw sequencing reads. Sequence analysis using FamSeq reduced the number of false negative variants by 14-33% as assessed by HapMap sample genotype confirmation. In a large family affected with Wilms tumor, 84% of variants uniquely identified by FamSeq were confirmed by Sanger sequencing. In children with early-onset neurodevelopmental disorders from 26 families, de novo variant calls in disease candidate genes were corrected by FamSeq as mendelian variants, and the number of uniquely identified variants in affected individuals increased proportionally as additional family members were included in the analysis. To gain insight into maximizing variant detection, we studied factors impacting actual improvements of family-based calling, including pedigree structure, allele frequency (common vs. rare variants), prior settings of minor allele frequency, sequence signal-to-noise ratio, and coverage depth (∼20× to >200×). These data will help guide the design, analysis, and interpretation of family-based sequencing studies to improve the ability to identify new disease-associated genes.

  9. Establishing a framework for comparative analysis of genome sequences

    SciTech Connect

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  10. Design and Analysis of Single-Cell Sequencing Experiments.

    PubMed

    Grün, Dominic; van Oudenaarden, Alexander

    2015-11-05

    Recent advances in single-cell sequencing hold great potential for exploring biological systems with unprecedented resolution. Sequencing the genome of individual cells can reveal somatic mutations and allows the investigation of clonal dynamics. Single-cell transcriptome sequencing can elucidate the cell type composition of a sample. However, single-cell sequencing comes with major technical challenges and yields complex data output. In this Primer, we provide an overview of available methods and discuss experimental design and single-cell data analysis. We hope that these guidelines will enable a growing number of researchers to leverage the power of single-cell sequencing.

  11. Modern Computational Techniques for the HMMER Sequence Analysis

    PubMed Central

    2013-01-01

    This paper focuses on the latest research and critical reviews on modern computing architectures, software and hardware accelerated algorithms for bioinformatics data analysis with an emphasis on one of the most important sequence analysis applications—hidden Markov models (HMM). We show the detailed performance comparison of sequence analysis tools on various computing platforms recently developed in the bioinformatics society. The characteristics of the sequence analysis, such as data and compute-intensive natures, make it very attractive to optimize and parallelize by using both traditional software approach and innovated hardware acceleration technologies. PMID:25937944

  12. Stratigraphic sequence analysis of the Antler foreland

    SciTech Connect

    Silberling, N.J.; Nichols, K.M.; Macke, D.L. )

    1993-04-01

    Mid-Upper Devonian to Upper Mississippian strata in western Utah were deposited in the distal Antler foreland. They record lateral and vertical changes in depositional environments that define five successive stratigraphic sequences, each representing a third-order transgressive-regressive cycle. In ascending order, these sequences are informally named the Langenheim (LA) of late Frasnian to mid-Famennian age, the Gutschick (GU) of late Famennian to early Kinderhookian age, the Morris (MO) of late Kinderhookian age; the Sadlick (SA) of Osagean to early Meramecian age, and the Maughan (MA) of mid-Meramecian to Chesterian age. MO is widespread and recognized within carbonate rocks of the Fitchville Formation and Joana Limestone. SA formed in concert with and to the east and south of the Wendover foreland high; the Delle phosphatic event marks maximum marine flooding during SA deposition. The transgressive systems tract of MA includes rhythmic-bedded limestone in the upper part of the Deseret Limestone in west-central Utah and, farther west, the hypoxic limestone and black shale of the Skunk Spring Limestone Bed and part of the overlying Chainman Shale. Traced westward into Nevada, MA first oversteps SA and then MO. Lithostratigraphic correlation of these sequences still farther west into the Eureka thrust belt (ETB) could mean that the youngest strata truncated by the Roberts Mountains thrust belong to the MA and that this thrust is simply part of the post-Mississippian ETB. However, some strata in central Nevada that lithically resemble those of the MA are paleontologically dated as Early Mississippian, the age of sequences overstepped by MA not far to the east. Thus, at least some imbricates of the ETB may contain a sequence stratigraphy which reflects local tectonic control.

  13. Learning Behavior Characterization with Multi-Feature, Hierarchical Activity Sequences

    ERIC Educational Resources Information Center

    Ye, Cheng; Segedy, James R.; Kinnebrew, John S.; Biswas, Gautam

    2015-01-01

    This paper discusses Multi-Feature Hierarchical Sequential Pattern Mining, MFH-SPAM, a novel algorithm that efficiently extracts patterns from students' learning activity sequences. This algorithm extends an existing sequential pattern mining algorithm by dynamically selecting the level of specificity for hierarchically-defined features…

  14. Evolution of Pre-Main Sequence Stellar Activity

    NASA Astrophysics Data System (ADS)

    Kuin, N. Paul

    The best known low mass pre-main sequence stars, the T Tauri stars. have very active chromospheres and show a variable emission line spectrum. There is a large variety in activity amongst than. Recently another type of pre-main sequence stars was identified which may be post T Tauri stars (PTTS) and were named 'naked T Tauri stars' (NTTS). These PTTS/NTTS are strong X-Ray sources, and do not have a T Tauri type spectrum. The age of these sources is comparable to those of the T Tauri stars, which implies that either the activity evolves much faster in pre-main sequence stars than the timescale in which they move towards the main sequence, or that an external source is responsible for their activity. We propose to obtain low resolution long- and short wavelength RE spectra of two of the PTTS/NTTS and of a T Tauri star of comparable activity. Archival data will be used from three other T Tauri stars having larger activity. The data will enable us to derive models for the chromosphere and transition region. Those will clarify whether there is a gradual change in activity, or a sudden one as way be caused by the closing up of an open magnetic field. Because they cover a large range in chromoshperic activity the models will enable us to improve the alfven wave wind models used for T Tau and RU Lup. In particular the question of the wave dissipation scale length and wave leakage will be considered.

  15. Genome-wide analysis of short interspersed nuclear elements SINES revealed high sequence conservation, gene association and retrotranspositional activity in wheat

    PubMed Central

    Ben-David, Smadar; Yaakov, Beery; Kashkush, Khalil

    2013-01-01

    Short interspersed nuclear elements (SINEs) are non-autonomous non-LTR retroelements that are present in most eukaryotic species. While SINEs have been intensively investigated in humans and other animal systems, they are poorly studied in plants, especially in wheat (Triticum aestivum). We used quantitative PCR of various wheat species to determine the copy number of a wheat SINE family, termed Au SINE, combined with computer-assisted analyses of the publicly available 454 pyrosequencing database of T. aestivum. In addition, we utilized site-specific PCR on 57 Au SINE insertions, transposon methylation display and transposon display on newly formed wheat polyploids to assess retrotranspositional activity, epigenetic status and genetic rearrangements in Au SINE, respectively. We retrieved 3706 different insertions of Au SINE from the 454 pyrosequencing database of T. aestivum, and found that most of the elements are inserted in A/T-rich regions, while approximately 38% of the insertions are associated with transcribed regions, including known wheat genes. We observed typical retrotransposition of Au SINE in the second generation of a newly formed wheat allohexaploid, and massive hypermethylation in CCGG sites surrounding Au SINE in the third generation. Finally, we observed huge differences in the copy numbers in diploid Triticum and Aegilops species, and a significant increase in the copy numbers in natural wheat polyploids, but no significant increase in the copy number of Au SINE in the first four generations for two of three newly formed allopolyploid species used in this study. Our data indicate that SINEs may play a prominent role in the genomic evolution of wheat through stress-induced activation. PMID:23855320

  16. Detailed Analysis of a Multiplet Earthquake Sequence

    NASA Astrophysics Data System (ADS)

    Iglesias, A.; Singh, S. K.; Garduño, V. H.

    2014-12-01

    The Mexican National Seismological Service reported a sequence of four small earthquakes (2.5 < M < 3.0) occurring in Morelia, a city of 1,000,000, which is the capital city of Michoacán State. A careful revision of the records from a three-component broad band station, located ~10 km far from the earthquakes, showed a sequence of 7 earthquakes in a period of about 36 hours. Waveforms are remarkably similar between them and they may be considered as a "multiplet". In this work, we use the records from the broad-band station and a coda wave interferometry based methodology to obtain the relative distance between pair of events. The 21 inter-event distances obtained are considered as over-determined system for the relative positions between events. A non-linear damped scheme is used to solve the over-determined system and to obtain the spatial distribution of the 7 earthquakes. Results show (1) distances between events are < 200 m, and (2) the sequence has an approximate linear distribution.

  17. Characterisation and Next-generation Sequencing Analysis of Unknown Arboviruses

    DTIC Science & Technology

    2012-09-01

    using techniques such as PCR-select subtraction and next-generation sequencing. Preliminary analysis of the four sequenced viruses has shown that they...HOJV) and Harrison Dam virus (HARDV), and two unknown bunyaviruses, Buffalo Creek Virus (BCV) and Maprik virus (MPKV). It describes the techniques such...unknown viruses with greater speed and at lower cost. The rapid advancement of new generation sequencing techniques allows for highly specific acquisition

  18. Nonlinear multiscale analysis of three-dimensional echocardiographic sequences

    SciTech Connect

    Sarti, A. |; Mikula, K.; Sgallari, F.

    1999-06-01

    The authors introduce a new model for multiscale analysis of space-time echocardiographic sequences. The proposed nonlinear partial differential equation, representing the multiscale analysis, filters the sequence while keeping the space-time coherent structures. It combines the ideas of regularized Perona-Malik anisotropic diffusion and the Galilean invariant movie multiscale analysis of Alvarez, Guichard, Lions and Morel. A numerical method for solving the proposed partial differential equation is suggested and its stability is shown. Computational results on synthesized and real sequences are provided. A qualitative and quantitative evaluation of the accuracy of the method is presented.

  19. Error analysis of deep sequencing of phage libraries: peptides censored in sequencing.

    PubMed

    Matochko, Wadim L; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (Sa). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq). Sequencing without any bias and errors is Seq = Sa IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  20. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    PubMed Central

    Matochko, Wadim L.; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (Sa). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq). Sequencing without any bias and errors is Seq = Sa IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071

  1. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

  2. Analysis of expressed sequence tags (ESTs) from Agrostis species obtained using sequence related amplified polymorphism.

    PubMed

    Dinler, Gizem; Budak, Hikmet

    2008-10-01

    Bentgrass (Agrostis spp.), a genus of the Poaceae family, consists of more than 200 species and is mainly used in athletic fields and golf courses. Creeping bentgrass (A. stolonifera L.) is the most commonly used species in maintaining golf courses, followed by colonial bentgrass (A. capillaris L.) and velvet bentgrass (A. canina L.). The presence and nature of sequence related amplified polymorphism (SRAP) at the cDNA level were investigated. We isolated 80 unique cDNA fragment bands from these species using 56 SRAP primer combinations. Sequence analysis of cDNA clones and analysis of putative translation products revealed that some encoded amino acid sequences were similar to proteins involved in DNA synthesis, transcription, and signal transduction. The cytosolic glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene (GenBank accession no. EB812822) was also identified from velvet bentgrass, and the corresponding protein sequence is further analyzed due to its critical role in many cellular processes. The partial peptide sequence obtained was 112 amino acids long, presenting a high degree of homology to parts of the N-terminal and C-terminal regions of cytosolic phosphorylating GAPDH (GapC). The existence of common expressed sequence tags (ESTs) revealed by a minimum evolutionary dendrogram among the Agrostis ESTs indicated the usefulness of SRAP for comparative genome analysis of transcribed genes in the grass species.

  3. Transcriptomic sequencing reveals a set of unique genes activated by butyrate-induced histone modification

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...

  4. Deep sequencing and human antibody repertoire analysis

    PubMed Central

    Boyd, Scott D; Crowe, James E

    2016-01-01

    In the past decade, high-throughput DNA sequencing (HTS) methods and improved approaches for isolating antigen-specific B cells and their antibody genes have been applied in many areas of human immunology. This work has greatly increased our understanding of human antibody repertoires and the specific clones responsible for protective immunity or immune-mediated pathogenesis. Although the principles underlying selection of individual B cell clones in the intact immune system are still under investigation, the combination of more powerful genetic tracking of antibody lineage development and functional testing of the encoded proteins promises to transform therapeutic antibody discovery and optimization. Here, we highlight recent advances in this fast-moving field. PMID:27065089

  5. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  6. Laser Desorption Mass Spectrometry for DNA Sequencing and Analysis

    NASA Astrophysics Data System (ADS)

    Chen, C. H. Winston; Taranenko, N. I.; Golovlev, V. V.; Isola, N. R.; Allman, S. L.

    1998-03-01

    Rapid DNA sequencing and/or analysis is critically important for biomedical research. In the past, gel electrophoresis has been the primary tool to achieve DNA analysis and sequencing. However, gel electrophoresis is a time-consuming and labor-extensive process. Recently, we have developed and used laser desorption mass spectrometry (LDMS) to achieve sequencing of ss-DNA longer than 100 nucleotides. With LDMS, we succeeded in sequencing DNA in seconds instead of hours or days required by gel electrophoresis. In addition to sequencing, we also applied LDMS for the detection of DNA probes for hybridization LDMS was also used to detect short tandem repeats for forensic applications. Clinical applications for disease diagnosis such as cystic fibrosis caused by base deletion and point mutation have also been demonstrated. Experimental details will be presented in the meeting. abstract.

  7. Purification and sequencing of the active site tryptic peptide from penicillin-binding protein 1b of Escherichia coli

    SciTech Connect

    Nicholas, R.A.; Suzuki, H.; Hirota, Y.; Strominger, J.L.

    1985-07-02

    This paper reports the sequence of the active site peptide of penicillin-binding protein 1b from Escherichia coli. Purified penicillin-binding protein 1b was labeled with (/sup 14/C)penicillin G, digested with trypsin, and partially purified by gel filtration. Upon further purification by high-pressure liquid chromatography, two radioactive peaks were observed, and the major peak, representing over 75% of the applied radioactivity, was submitted to amino acid analysis and sequencing. The sequence Ser-Ile-Gly-Ser-Leu-Ala-Lys was obtained. The active site nucleophile was identified by digesting the purified peptide with aminopeptidase M and separating the radioactive products on high-pressure liquid chromatography. Amino acid analysis confirmed that the serine residue in the middle of the sequence was covalently bonded to the (/sup 14/C)penicilloyl moiety. A comparison of this sequence to active site sequences of other penicillin-binding proteins and beta-lactamases is presented.

  8. Sequencing and Analysis of Neanderthal Genomic DNA

    SciTech Connect

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  9. N-terminal sequence analysis of proteins and peptides.

    PubMed

    Reim, D F; Speicher, D W

    2001-05-01

    Amino-terminal (N-terminal) sequence analysis is used to identify the order of amino acids of proteins or peptides, starting at their N-terminal end. This unit describes the sequence analysis of protein or peptide samples in solution or bound to PVDF membranes using a Perkin-Elmer Procise Sequencer. Sequence analysis of protein or peptide samples in solution or bound to PVDF membranes using a Hewlett-Packard Model G1005A sequencer is also described. Methods are provided for optimizing separation of PTH amino acid derivatives on Perkin-Elmer instruments and for increasing the proportion of sample injected onto the PTH analyzer on older Perkin-Elmer instruments by installing a modified sample loop. The amount of data obtained from a single sequencer run is substantial, and careful interpretation of this data by an experienced scientist familiar with the current operation performance of the instrument used for this analysis is critically important. A discussion of data interpretation is therefore provided. Finally, discussion of optimization of sequencer performance as well as possible solutions to frequently encountered problems is included.

  10. Temporal Sequencing of Brain Activations During Naturally Occurring Thermoregulatory Events

    PubMed Central

    Diwadkar, Vaibhav A.; Murphy, Eric R.; Freedman, Robert R.

    2014-01-01

    Thermoregulatory events are associated with activity in the constituents of the spinothalamic tract. Whereas studies have assessed activity within constituents of this pathway, in vivo functional magnetic resonance imaging (fMRI) studies have not determined if neuronal activity in the constituents of the tract is temporally ordered. Ordered activity would be expected in naturally occurring thermal events, such as menopausal hot flashes (HFs), which occur in physiological sequence. The origins of HFs may lie in brainstem structures where neuronal activity may occur earlier than in interoceptive centers, such as the insula and the prefrontal cortex. To study such time ordering, we conducted blood oxygen level-dependent-based fMRI in a group of postmenopausal women to measure neuronal activity in the brainstem, insula, and prefrontal cortex around the onset of an HF (detected using synchronously acquired skin conductance responses). Rise in brainstem activity occurred before the detectable onset of an HF. Activity in the insular and prefrontal trailed that in the brainstem, appearing following the onset of the HF. Additional activations associated with HF's were observed in the anterior cingulate cortex and the basal ganglia. Pre-HF brainstem responses may reflect the functional origins of internal thermoregulatory events. By comparison insular, prefrontal and striatal activity may be associated with the phenomenological correlates of HFs. PMID:23787950

  11. Computer analysis of HIV epitope sequences

    SciTech Connect

    Gupta, G.; Myers, G.

    1990-01-01

    Phylogenetic tree analysis provide us with important general information regarding the extent and rate of HIV variation. Currently we are attempting to extend computer analysis and modeling to the V3 loop of the type 2 virus and its simian homologues, especially in light of the prominent role the latter will play in animal model studies. Moreover, it might be possible to attack the slightly similar V4 loop by this approach. However, the strategy relies very heavily upon natural'' information and constraints, thus there exist severe limitations upon the general applicability, in addition to uncertainties with regard to long-range residue interactions. 5 refs., 3 figs.

  12. Sequencing the extrachromosomal circular mobilome reveals retrotransposon activity in plants

    PubMed Central

    Llauro, Christel; Jobet, Edouard; Robakowska-Hyzorek, Dagmara; Lasserre, Eric; Ghesquière, Alain; Panaud, Olivier

    2017-01-01

    Retrotransposons are mobile genetic elements abundant in plant and animal genomes. While efficiently silenced by the epigenetic machinery, they can be reactivated upon stress or during development. Their level of transcription not reflecting their transposition ability, it is thus difficult to evaluate their contribution to the active mobilome. Here we applied a simple methodology based on the high throughput sequencing of extrachromosomal circular DNA (eccDNA) forms of active retrotransposons to characterize the repertoire of mobile retrotransposons in plants. This method successfully identified known active retrotransposons in both Arabidopsis and rice material where the epigenome is destabilized. When applying mobilome-seq to developmental stages in wild type rice, we identified PopRice as a highly active retrotransposon producing eccDNA forms in the wild type endosperm. The mobilome-seq strategy opens new routes for the characterization of a yet unexplored fraction of plant genomes. PMID:28212378

  13. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  14. Molecular epidemiology of malaria in Cameroon. XXV. In vitro activity of fosmidomycin and its derivatives against fresh clinical isolates of Plasmodium falciparum and sequence analysis of 1-deoxy-D-xylulose 5-phosphate reductoisomerase.

    PubMed

    Tahar, Rachida; Basco, Leonardo K

    2007-08-01

    The in vitro activities of fosmidomycin derivatives, chloroquine, and pyrimethamine were assessed by the radioisotopic assay in clinical isolates of Plasmodium falciparum. In a series of experiments with RPMI 1640 medium-10% fetal bovine serum, the geometric mean 50% inhibitory concentrations (IC(50)s) (n = 34) for fosmidomycin and FR900098 were 301 nM and 118 nM, respectively. In another series of experiments, the geometric mean IC(50)s (n = 33) for fosmidomycin and TH II46 were 413 nM and 249 nM, respectively. The IC(50)s were 2-3 times lower with RPMI-10% fetal bovine serum than the IC(50)s obtained with RPMI-10% human serum. FR900098 and TH II46 were 2.6 and 1.7 times more potent, respectively, than fosmidomycin. There was no correlation between chloroquine or pyrimethamine and fosmidomycin, which suggested the absence of in vitro cross-resistance. Sequence analysis showed five amino acid substitutions, but their possible relationship with the response to fosmidomycin is not clear. Fosmidomycin derivatives are promising candidates for further development.

  15. Applying machine learning techniques to DNA sequence analysis

    SciTech Connect

    Shavlik, J.W.

    1992-01-01

    We are developing a machine learning system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being learned. Using this information (which we call a domain theory''), our learning algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, the KBANN algorithm maps inference rules, such as consensus sequences, into a neural (connectionist) network. Neural network training techniques then use the training examples of refine these inference rules. We have been applying this approach to several problems in DNA sequence analysis and have also been extending the capabilities of our learning system along several dimensions.

  16. Applying machine learning techniques to DNA sequence analysis

    SciTech Connect

    Shavlik, J.W. . Dept. of Computer Sciences); Noordewier, M.O. . Dept. of Computer Science)

    1992-01-01

    We are primarily developing a machine teaming (ML) system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being teamed. Using this information, our teaming algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, our KBANN algorithm maps inference rules about a given recognition task into a neural network. Neural network training techniques then use the training examples to refine these inference rules. We call these rules a domain theory, following the convention in the machine teaming community. We have been applying this approach to several problems in DNA sequence analysis. In addition, we have been extending the capabilities of our teaming system along several dimensions. We have also been investigating parallel algorithms that perform sequence alignments in the presence of frameshift errors.

  17. Chromospheric Activity in Pre-Main-Sequence Stars

    NASA Astrophysics Data System (ADS)

    Simon, Theodore

    IUE observations of solar-type stars show a decline of chromospheric and TR emission with age. For main-sequence stars older than 100 million yr, this decay is exponential from a plateau defined by the youngest stars. At an age of ~1 million yr, the pre-main-sequence T Tauri stars have UV emission line fluxes some 2 orders of magnitude above the plateau for mainsequence stars. This suggests that chromospheric activity in the T Tauri stars falls to the levels of the older stars by a separate decay scheme. The decline in pre-mainsequence activity may be caused by the evolutionary shallowing of the convection zone, while on the main-sequence it is due to the star's spindown. This hypothesis needs confirmation, but relatively few T Tauri stars have been observed by IUE. Since the majority of the T Tauri stars thus far observed are probably more massive than the Sun, it may be inappropriate to compare their UV emission with that of the older I Mo dwarf stars. We propose here to observe the ultraviolet chromospheric and TR lines of pre-main-sequence stars we believe to be of ~1 M(sun). We have chosen a sample of low-luminosity M-type T Tauri stars from the T-associations in Lupus; if evolutionary tracks have any validity, a large fraction of those stars should be close to 1 M(sun)in mass. In order to place the stars more accurately on the H-R diagram and to determine their rotation rates (for comparison with the mainsequence stars), we plan concurrent visual spectroscopy and visual-infrared photometry.

  18. [DNA analysis for the post genome-sequencing era].

    PubMed

    Kambara, Hideki

    2002-05-01

    With the completion of the human genome sequencing, the new post genome-sequencing era has started. The major subjects are clarifying the function of genes to apply this information to medical as well as various industrial fields. Various DNA analysis methods and instruments for gene expression profiling as well as genetic diversity including SNPs typing are required and have been developed. Here, the history and technologies related to DNA analysis including the Wada project in the early 1980's, and the Human genome project from 1990 are described. Various new technologies have developed in this decade. They include a capillary gel array DNA sequencer, DNA chips, bead probe arrays, a new DNA sequencing method using pyrosequencing and an efficient SNP typing method by BAMPER.

  19. Structure and sequence based analysis of alpha-amylase evolution.

    PubMed

    Singh, Swati; Guruprasad, Lalitha

    2014-01-01

    α-Amylases hydrolyze α- 1,4-glycosidic bonds during assimilation of biological macromolecules. The amino acid sequences of these enzymes in thousands of diverse organisms are known and the 3D structures of several proteins have been solved. The 3D structure analysis of these universal enzymes from diverse organisms has been studied by the generation of phylogenetic trees and structure based sequence analysis to generate a metric for the degree of conservation that is responsible for individual speciation. Greater similarities are observed between reference NCBI tree and structure based phylogenetic tree compared to sequence based phylogenetic tree indicating that structures truly represent the functional aspects of proteins than from the sequence information alone. We report differences in the profile specific conserved and insertion/deletion regions, factors responsible for the Ca(2+) and Cl(-) ion binding and the disulfide connectivity pattern that discriminate the enzymes over evolution.

  20. Basic Sequence Analysis Techniques for Use with Audit Trail Data

    ERIC Educational Resources Information Center

    Judd, Terry; Kennedy, Gregor

    2008-01-01

    Audit trail analysis can provide valuable insights to researchers and evaluators interested in comparing and contrasting designers' expectations of use and students' actual patterns of use of educational technology environments (ETEs). Sequence analysis techniques are particularly effective but have been neglected to some extent because of real…

  1. Food Fish Identification from DNA Extraction through Sequence Analysis

    ERIC Educational Resources Information Center

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  2. Applications of recursive segmentation to the analysis of DNA sequences.

    PubMed

    Li, Wentian; Bernaola-Galván, Pedro; Haghighi, Fatameh; Grosse, Ivo

    2002-07-01

    Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G + C)/weak(A + T) sequence, to a binary sequence indicating the presence or absence of the dinucleotide CpG, or to a sequence indicating both the base and the codon position information. We apply various conversion schemes in order to address the following five DNA sequence analysis problems: isochore mapping, CpG island detection, locating the origin and terminus of replication in bacterial genomes, finding complex repeats in telomere sequences, and delineating coding and noncoding regions. We find that the recursive segmentation procedure can successfully detect isochore borders, CpG islands, and the origin and terminus of replication, but it needs improvement for detecting complex repeats as well as borders between coding and noncoding regions.

  3. An ORFome assembly approach to metagenomics sequences analysis.

    PubMed

    Ye, Yuzhen; Tang, Haixu

    2009-06-01

    Metagenomics is an emerging methodology for the direct genomic analysis of a mixed community of uncultured microorganisms. The current analyses of metagenomics data largely rely on the computational tools originally designed for microbial genomics projects. The challenge of assembling metagenomic sequences arises mainly from the short reads and the high species complexity of the community. Alternatively, individual (short) reads will be searched directly against databases of known genes (or proteins) to identify homologous sequences. The latter approach may have low sensitivity and specificity in identifying homologous sequences, which may further bias the subsequent diversity analysis. In this paper, we present a novel approach to metagenomic data analysis, called Metagenomic ORFome Assembly (MetaORFA). The whole computational framework consists of three steps. Each read from a metagenomics project will first be annotated with putative open reading frames (ORFs) that likely encode proteins. Next, the predicted ORFs are assembled into a collection of peptides using an EULER assembly method. Finally, the assembled peptides (i.e. ORFome) are used for database searching of homologs and subsequent diversity analysis. We applied MetaORFA approach to several metagenomics datasets with low coverage short reads. The results show that MetaORFA can produce long peptides even when the sequence coverage of reads is extremely low. Hence, the ORFome assembly significantly increases the sensitivity of homology searching, and may potentially improve the diversity analysis of the metagenomic data. This improvement is especially useful for metagenomic projects when the genome assembly does not work because of the low sequence coverage.

  4. Subsalt risk reduction using seismic sequence stratigraphic analysis

    SciTech Connect

    Wornardt, W.W. Jr.

    1994-12-31

    Several recent projects involving detailed seismic sequence stratigraphic analysis of existing wells near subsalt prospects in the south additions of the offshore Louisiana area in the Gulf of Mexico have demonstrated the utility of using seismic sequence stratigraphic analysis to reduce risk when drilling subsalt plays. First, the thick section of sedimentary rocks that was though to be above and below the salt was penetrated in the area away from the salt. These sedimentary rocks were accurately dated using maximum flooding surface first occurrence downhole of important bioevent, condensed sections, abundance and diversity histograms, and high-resolution biostratigraphy while the wells were being drilled. Potential reservoir sandstones within specific Vail sequences in these wells were projected using seismic data up to the subsalt and non-subsalt sediment interface. The systems tract above and below the maximum flooding surface and the type of reservoir sandstones that were to be encounterd were predictable based on the paleobathymetry, increase and decrease of fauna and flora, recognition of the bottom-set turbidite, slope fan and basin floor fan condensed sections, and superpositional relationship of the Vail sequences and systems tracts to provide a detailed sequence stratigraphic analysis of the well. Subsequently, wells drilled through the salt could be accurately correlated with Vail sequences and systems tracts in wells that were previously correlated away from the salt layer with seismic reflection profiles.

  5. Subsalt risk reduction using seismic sequence-stratigraphic analysis

    SciTech Connect

    Wornardt, W.W. Jr.

    1994-09-01

    Several recent projects involving detailed seismic-sequence stratigraphic analysis of existing wells near subsalt prospects in the south additions of the offshore Louisiana area in the Gulf of Mexico have demonstrated the utility of using seismic sequence-stratigraphic analysis to reduce risk when drilling subsalt plays. First, the thick section of sediments that was thought to be above and below the salt was penetrated in the area away from the salt. These sediments were accurately dated using maximum flooding surface first occurrence downhole of important bioevent, condensed sections, abundance and diversity histograms, and high-resolution biostratigraphy while the wells were being drilled. Potential reservoir sands within specific Vail sequences in these wells were projected on seismic up to the subsalt and non-subsalt sediment interface. The systems tract above and below the maximum flooding surface and the type of reservoir sands that were to be encountered were predictable based on the paleobathymetry, increase and decrease of fauna and flora abundance, recognition of the bottom-set turbidite, slope fan and basin floor fan condensed sections, and superpositional relationship of the Vail sequences and systems tracts to provide a detailed sequence-stratigraphic analysis of the well in question. Subsequently, the wells drilled through the salt could be accurately correlated with the Vail sequences and systems tracts in wells that were previously correlated with seismic reflection profiles away from the salt layer.

  6. The Main Sequence of Explosive Solar Active Regions: Comparison of Emerging and Mature Active Regions

    NASA Technical Reports Server (NTRS)

    Falconer, David; Moore, Ron

    2011-01-01

    For mature active regions, an active region s magnetic flux content determines the maximum free energy the active region can have. Most Large flares and CMEs occur in active regions that are near their free-energy limit. Active-region flare power radiated in the GOES 1-8 band increases steeply as the free-energy limit is approached. We infer that the free-energy limit is set by the rate of release of an active region s free magnetic energy by flares, CMEs and coronal heating balancing the maximum rate the Sun can put free energy into the active region s magnetic field. This balance of maximum power results in explosive active regions residing in a "mainsequence" in active-region (flux content, free energy content) phase space, which sequence is analogous to the main sequence of hydrogen-burning stars in (mass, luminosity) phase space.

  7. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Athavale, Ajay [Monsanto

    2016-07-12

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  8. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Athavale, Ajay

    2012-06-01

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  9. Induction of homologous recombination between sequence repeats by the activation induced cytidine deaminase (AID) protein.

    PubMed

    Buerstedde, Jean-Marie; Lowndes, Noel; Schatz, David G

    2014-07-08

    The activation induced cytidine deaminase (AID) protein is known to initiate somatic hypermutation, gene conversion or switch recombination by cytidine deamination within the immunoglobulin loci. Using chromosomally integrated fluorescence reporter transgenes, we demonstrate a new recombinogenic activity of AID leading to intra- and intergenic deletions via homologous recombination of sequence repeats. Repeat recombination occurs at high frequencies even when the homologous sequences are hundreds of bases away from the positions of AID-mediated cytidine deamination, suggesting DNA end resection before strand invasion. Analysis of recombinants between homeologous repeats yielded evidence for heteroduplex formation and preferential migration of the Holliday junctions to the boundaries of sequence homology. These findings broaden the target and off-target mutagenic potential of AID and establish a novel system to study induced homologous recombination in vertebrate cells.DOI: http://dx.doi.org/10.7554/eLife.03110.001.

  10. Deep sequencing analysis of phage libraries using Illumina platform.

    PubMed

    Matochko, Wadim L; Chu, Kiki; Jin, Bingjie; Lee, Sam W; Whitesides, George M; Derda, Ratmir

    2012-09-01

    This paper presents an analysis of phage-displayed libraries of peptides using Illumina. We describe steps for the preparation of short DNA fragments for deep sequencing and MatLab software for the analysis of the results. Screening of peptide libraries displayed on the surface of bacteriophage (phage display) can be used to discover peptides that bind to any target. The key step in this discovery is the analysis of peptide sequences present in the library. This analysis is usually performed by Sanger sequencing, which is labor intensive and limited to examination of a few hundred phage clones. On the other hand, Illumina deep-sequencing technology can characterize over 10(7) reads in a single run. We applied Illumina sequencing to analyze phage libraries. Using PCR, we isolated the variable regions from M13KE phage vectors from a phage display library. The PCR primers contained (i) sequences flanking the variable region, (ii) barcodes, and (iii) variable 5'-terminal region. We used this approach to examine how diversity of peptides in phage display libraries changes as a result of amplification of libraries in bacteria. Using HiSeq single-end Illumina sequencing of these fragments, we acquired over 2×10(7) reads, 57 base pairs (bp) in length. Each read contained information about the barcode (6bp), one complimentary region (12bp) and a variable region (36bp). We applied this sequencing to a model library of 10(6) unique clones and observed that amplification enriches ∼150 clones, which dominate ∼20% of the library. Deep sequencing, for the first time, characterized the collapse of diversity in phage libraries. The results suggest that screens based on repeated amplification and small-scale sequencing identify a few binding clones and miss thousands of useful clones. The deep sequencing approach described here could identify under-represented clones in phage screens. It could also be instrumental in developing new screening strategies, which can preserve

  11. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers

    PubMed Central

    Pabinger, Stephan; Ernst, Karina; Pulverer, Walter; Kallmeyer, Rainer; Valdes, Ana M.; Metrustry, Sarah; Katic, Denis; Nuzzo, Angelo; Kriegner, Albert; Vierlinger, Klemens; Weinhaeusel, Andreas

    2016-01-01

    Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely

  12. Sequence analysis of chromatin immunoprecipitation data for transcription factors

    PubMed Central

    Fraenkel, Ernest

    2013-01-01

    Chromatin immunoprecipitation (ChIP) experiments allow the location of transcription factors to be determined across the genome. Subsequent analysis of the sequences of the identified regions allows binding to be localized at a higher resolution than can be achieved by current high-throughput experiments without sequence analysis, and may provide important insight into the regulatory programs enacted by the protein of interest. In this chapter we review the tools, workflow, and common pitfalls of such analyses, and recommend strategies for effective motif discovery from these data. PMID:20827592

  13. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing.

    PubMed

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids.

  14. Transcriptome analysis of blueberry using 454 EST sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Blueberry (Vaccinium corymbosum) is a major berry crop in the United States, and one that has great nutritional and economical value. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities du...

  15. Sequence and Comparative Genomic Analysis of Actin-related ProteinsD⃞

    PubMed Central

    Muller, Jean; Oma, Yukako; Vallar, Laurent; Friederich, Evelyne; Poch, Olivier; Winsor, Barbara

    2005-01-01

    Actin-related proteins (ARPs) are key players in cytoskeleton activities and nuclear functions. Two complexes, ARP2/3 and ARP1/11, also known as dynactin, are implicated in actin dynamics and in microtubule-based trafficking, respectively. ARP4 to ARP9 are components of many chromatin-modulating complexes. Conventional actins and ARPs codefine a large family of homologous proteins, the actin superfamily, with a tertiary structure known as the actin fold. Because ARPs and actin share high sequence conservation, clear family definition requires distinct features to easily and systematically identify each subfamily. In this study we performed an in depth sequence and comparative genomic analysis of ARP subfamilies. A high-quality multiple alignment of ∼700 complete protein sequences homologous to actin, including 148 ARP sequences, allowed us to extend the ARP classification to new organisms. Sequence alignments revealed conserved residues, motifs, and inserted sequence signatures to define each ARP subfamily. These discriminative characteristics allowed us to develop ARPAnno (http://bips.u-strasbg.fr/ARPAnno), a new web server dedicated to the annotation of ARP sequences. Analyses of sequence conservation among actins and ARPs highlight part of the actin fold and suggest interactions between ARPs and actin-binding proteins. Finally, analysis of ARP distribution across eukaryotic phyla emphasizes the central importance of nuclear ARPs, particularly the multifunctional ARP4. PMID:16195354

  16. Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)

    PubMed Central

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-01-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172

  17. Motion sequence analysis in the presence of figural cues

    PubMed Central

    Sinha, Pawan; Vaina, Lucia M.

    2015-01-01

    The perception of 3D structure in dynamic sequences is believed to be subserved primarily through the use of motion cues. However, real-world sequences contain many figural shape cues besides the dynamic ones. We hypothesize that if figural cues are perceptually significant during sequence analysis, then inconsistencies in these cues over time would lead to percepts of non-rigidity in sequences showing physically rigid objects in motion. We develop an experimental paradigm to test this hypothesis and present results with two patients with impairments in motion perception due to focal neurological damage, as well as two control subjects. Consistent with our hypothesis, the data suggest that figural cues strongly influence the perception of structure in motion sequences, even to the extent of inducing non-rigid percepts in sequences where motion information alone would yield rigid structures. Beyond helping to probe the issue of shape perception, our experimental paradigm might also serve as a possible perceptual assessment tool in a clinical setting. PMID:26028822

  18. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    PubMed

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp.

  19. Molecular characterization of Giardia psittaci by multilocus sequence analysis.

    PubMed

    Abe, Niichiro; Makino, Ikuko; Kojima, Atsushi

    2012-12-01

    Multilocus sequence analyses targeting small subunit ribosomal DNA (SSU rDNA), elongation factor 1 alpha (ef1α), glutamate dehydrogenase (gdh), and beta giardin (β-giardin) were performed on Giardia psittaci isolates from three Budgerigars (Melopsittacus undulates) and four Barred parakeets (Bolborhynchus lineola) kept in individual households or imported from overseas. Nucleotide differences and phylogenetic analyses at four loci indicate the distinction of G. psittaci from the other known Giardia species: Giardia muris, Giardia microti, Giardia ardeae, and Giardia duodenalis assemblages. Furthermore, G. psittaci was related more closely to G. duodenalis than to the other known Giardia species, except for G. microti. Conflicting signals regarded as "double peaks" were found at the same nucleotide positions of the ef1α in all isolates. However, the sequences of the other three loci, including gdh and β-giardin, which are known to be highly variable, from all isolates were also mutually identical at every locus. They showed no double peaks. These results suggest that double peaks found in the ef1α sequences are caused not by mixed infection with genetically different G. psittaci isolates but by allelic sequence heterogeneity (ASH), which is observed in diplomonad lineages including G. duodenalis. No sequence difference was found in any G. psittaci isolates at the gdh and β-giardin, suggesting that G. psittaci is indeed not more diverse genetically than other Giardia species. This report is the first to provide evidence related to the genetic characteristics of G. psittaci obtained using multilocus sequence analysis.

  20. Improved Algorithm for Analysis of DNA Sequences Using Multiresolution Transformation

    PubMed Central

    Inbamalar, T. M.; Sivakumar, R.

    2015-01-01

    Bioinformatics and genomic signal processing use computational techniques to solve various biological problems. They aim to study the information allied with genetic materials such as the deoxyribonucleic acid (DNA), the ribonucleic acid (RNA), and the proteins. Fast and precise identification of the protein coding regions in DNA sequence is one of the most important tasks in analysis. Existing digital signal processing (DSP) methods provide less accurate and computationally complex solution with greater background noise. Hence, improvements in accuracy, computational complexity, and reduction in background noise are essential in identification of the protein coding regions in the DNA sequences. In this paper, a new DSP based method is introduced to detect the protein coding regions in DNA sequences. Here, the DNA sequences are converted into numeric sequences using electron ion interaction potential (EIIP) representation. Then discrete wavelet transformation is taken. Absolute value of the energy is found followed by proper threshold. The test is conducted using the data bases available in the National Centre for Biotechnology Information (NCBI) site. The comparative analysis is done and it ensures the efficiency of the proposed system. PMID:26000337

  1. Halvade: scalable sequence analysis with MapReduce

    PubMed Central

    Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan

    2015-01-01

    Motivation: Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine. Results: We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been implemented according to the GATK Best Practices recommendations, supporting both whole genome and whole exome sequencing. Using a 15-node computer cluster with 360 CPU cores in total, Halvade processes the NA12878 dataset (human, 100 bp paired-end reads, 50× coverage) in <3 h with very high parallel efficiency. Even on a single, multi-core machine, Halvade attains a significant speedup compared with running the individual tools with multithreading. Availability and implementation: Halvade is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR. Its source is available at http://bioinformatics.intec.ugent.be/halvade under GPL license. Contact: jan.fostier@intec.ugent.be Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25819078

  2. Exploration of phylogenetic data using a global sequence analysis method

    PubMed Central

    Chapus, Charles; Dufraigne, Christine; Edwards, Scott; Giron, Alain; Fertil, Bernard; Deschavanne, Patrick

    2005-01-01

    Background Molecular phylogenetic methods are based on alignments of nucleic or peptidic sequences. The tremendous increase in molecular data permits phylogenetic analyses of very long sequences and of many species, but also requires methods to help manage large datasets. Results Here we explore the phylogenetic signal present in molecular data by genomic signatures, defined as the set of frequencies of short oligonucleotides present in DNA sequences. Although violating many of the standard assumptions of traditional phylogenetic analyses – in particular explicit statements of homology inherent in character matrices – the use of the signature does permit the analysis of very long sequences, even those that are unalignable, and is therefore most useful in cases where alignment is questionable. We compare the results obtained by traditional phylogenetic methods to those inferred by the signature method for two genes: RAG1, which is easily alignable, and 18S RNA, where alignments are often ambiguous for some regions. We also apply this method to a multigene data set of 33 genes for 9 bacteria and one archea species as well as to the whole genome of a set of 16 γ-proteobacteria. In addition to delivering phylogenetic results comparable to traditional methods, the comparison of signatures for the sequences involved in the bacterial example identified putative candidates for horizontal gene transfers. Conclusion The signature method is therefore a fast tool for exploring phylogenetic data, providing not only a pretreatment for discovering new sequence relationships, but also for identifying cases of sequence evolution that could confound traditional phylogenetic analysis. PMID:16280081

  3. Sequence analysis of inv (16) breakpoints associated with acute leukemia

    SciTech Connect

    Wilmenga, C.; Liu, P.; Haira, A.

    1994-09-01

    The inv(16)(p13;q22), associated with acute myelocytic leukemia FAB type M4 Eo, results in an in-frame fusion between core binding factor {beta} (CBF{beta}) and the tail region of the smooth muscle myosin heavy chain (SMMHC). Using Southern blot analysis, we found a random distribution of the 16q breakpoints throughout intron 5 of the CBF{beta} gene, which is about 15 kb. The 16p breakpoints have been found in at least 4 different introns of the SMMHC gene. However, the majority of 16p breakpoints occur in an intron that is only 370 bp in size. Since inversions appear to be an unusual form of mutation, we were interested in unraveling the possible mechanism underlying this specific inversion. The sequences around two breakpoints, as well as the corresponding germline sequences, have been determined. Patient 1 had the common breakpoint in 16p, in the 370 bp intron. In patient 2, the 16p breakpoint was more than 8 kb away from the common site. Examination of the normal sequences at the inversion sites revealed that the fused sequences were precisely joined. In patient 2 extensive sequence homology was noted between the 16p and 16q breakpoints. Currently, sequences are being analyzed for the presence of repeats that might explain the nature of the observed rearrangements by virtue of the stabilization of the folding pattern long enough for the recombination to occur. Sequence analysis of the breakpoints of 3 additional patients with the common 16p breakpoint but different 16q breakpoints are underway.

  4. A human cellular sequence implicated in trk oncogene activation is DNA damage inducible

    SciTech Connect

    Ben-Ishai, R.; Scharf, R.; Sharon, R.; Kapten, I. )

    1990-08-01

    Xeroderma pigmentosum cells, which are deficient in the repair of UV light-induced DNA damage, have been used to clone DNA-damage-inducible transcripts in human cells. The cDNA clone designated pC-5 hybridizes on RNA gel blots to a 1-kilobase transcript, which is moderately abundant in nontreated cells and whose synthesis is enhanced in human cells following UV irradiation or treatment with several other DNA-damaging agents. UV-enhanced transcription of C-5 RNA is transient and occurs at lower fluences and to a greater extent in DNA-repair-deficient than in DNA-repair-proficient cells. Southern blot analysis indicates that the C-5 gene belongs to a multigene family. A cDNA clone containing the complete coding sequence of C-5 was isolated. Sequence analysis revealed that it is homologous to a human cellular sequence encoding the amino-terminal activating sequence of the trk-2h chimeric oncogene. The presence of DNA-damage-responsive sequences at the 5' end of a chimeric oncogene could result in enhanced expression of the oncogene in response to carcinogens.

  5. Deep sequencing reveals the complete genome and evidence for transcriptional activity of the first virus-like sequences identified in Aristotelia chilensis (Maqui Berry).

    PubMed

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F; Alzate, Juan F; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-04-03

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%-73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant.

  6. Deep Sequencing Reveals the Complete Genome and Evidence for Transcriptional Activity of the First Virus-Like Sequences Identified in Aristotelia chilensis (Maqui Berry)

    PubMed Central

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-01-01

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242

  7. Amino acid sequence of homologous rat atrial peptides: natriuretic activity of native and synthetic forms.

    PubMed Central

    Seidah, N G; Lazure, C; Chrétien, M; Thibault, G; Garcia, R; Cantin, M; Genest, J; Nutt, R F; Brady, S F; Lyle, T A

    1984-01-01

    A substance called atrial natriuretic factor (ANF), localized in secretory granules of atrial cardiocytes, was isolated as four homologous natriuretic peptides from homogenates of rat atria. The complete sequence of the longest form showed that it is composed of 33 amino acids. The three other shorter forms (2-33, 3-33, and 8-33) represent amino-terminally truncated versions of the 33 amino acid parent molecule as shown by analysis of sequence, amino acid composition, or both. The proposed primary structure agrees entirely with the amino acid composition and reveals no significant sequence homology with any known protein or segment of protein. The short form ANF-(8-33) was synthesized by a multi-fragment condensation approach and the synthetic product was shown to exhibit specific activity comparable to that of the natural ANF-(3-33). PMID:6232612

  8. The amino acid sequence of monal pheasant lysozyme and its activity.

    PubMed

    Araki, T; Matsumoto, T; Torikata, T

    1998-10-01

    The amino acid sequence of monal pheasant lysozyme and its activity were analyzed. Carboxymethylated lysozyme was digested with trypsin and the resulting peptides were sequenced. The established amino acid sequence had one amino acid substitution at position 102 (Arg to Gly) comparing with Indian peafowl lysozyme and four amino acid substitutions at positions 3 (Phe to Tyr), 15 (His to Leu), 41 (Gln to His), and 121 (Gln to His) with chicken lysozyme. Analysis of the time-courses of reaction using N-acetylglucosamine pentamer as a substrate showed a difference of binding free energy change (-0.4 kcal/mol) at subsites A between monal pheasant and Indian peafowl lysozyme. This was assumed to be caused by the amino acid substitution at subsite A with loss of a positive charge at position 102 (Arg102 to Gly).

  9. Functional analysis of bipartite begomovirus coat protein promoter sequences

    SciTech Connect

    Lacatus, Gabriela; Sunter, Garry

    2008-06-20

    We demonstrate that the AL2 gene of Cabbage leaf curl virus (CaLCuV) activates the CP promoter in mesophyll and acts to derepress the promoter in vascular tissue, similar to that observed for Tomato golden mosaic virus (TGMV). Binding studies indicate that sequences mediating repression and activation of the TGMV and CaLCuV CP promoter specifically bind different nuclear factors common to Nicotiana benthamiana, spinach and tomato. However, chromatin immunoprecipitation demonstrates that TGMV AL2 can interact with both sequences independently. Binding of nuclear protein(s) from different crop species to viral sequences conserved in both bipartite and monopartite begomoviruses, including TGMV, CaLCuV, Pepper golden mosaic virus and Tomato yellow leaf curl virus suggests that bipartite begomoviruses bind common host factors to regulate the CP promoter. This is consistent with a model in which AL2 interacts with different components of the cellular transcription machinery that bind viral sequences important for repression and activation of begomovirus CP promoters.

  10. Construction of an integrated database to support genomic sequence analysis

    SciTech Connect

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  11. Patterns in protein primary sequences: classification, display and analysis.

    PubMed Central

    Saurugger, P. N.; Metfessel, B. A.

    1991-01-01

    The protein folding code, which is contained in the amino acid chain of a protein, has so far eluded elucidation. However, patterns of hydrophobic residues have previously been identified which show a specificity towards certain secondary structural elements. We are developing an analysis toolkit to find, visualize, and analyze patterns in primary sequences. Preliminary results show that there exist patterns in primary sequences which are useful for predicting the structural class of amino acid chains, performing especially well for the all-alpha helix and all-beta sheet classes. PMID:1807631

  12. Iterated Function System and Multifractal Analysis of Biological Sequences

    NASA Astrophysics Data System (ADS)

    Yu, Zu-Guo; Anh, Vo; Lau, Ka-Sing

    The fractal method has been successfully used to study many problems in physics, mathematics, engineering, finance, even in biology till now. In the past decade or so there has been a ground swell of interest in unravelling the mysteries of DNA. How to get more bioinformations from these DNA sequences is a challenging problem. The problem of classification and evolution relationship of organisms are the central problems in bioinformatics. And it is also very hard to predict the secondary and space structure of a protein from its amino acid sequence. In this paper, some recent results related these problems obtained through multifractal analysis and iterated function system (IFS) model are introduced.

  13. Transcriptome Sequencing of Gracilariopsis lemaneiformis to Analyze the Genes Related to Optically Active Phycoerythrin Synthesis

    PubMed Central

    Huang, Xiaoyun; Zang, Xiaonan; Wu, Fei; Jin, Yuming; Wang, Haitao; Liu, Chang; Ding, Yating; He, Bangxiang; Xiao, Dongfang; Song, Xinwei; Liu, Zhu

    2017-01-01

    Gracilariopsis lemaneiformis (aka Gracilaria lemaneiformis) is a red macroalga rich in phycoerythrin, which can capture light efficiently and transfer it to photosystemⅡ. However, little is known about the synthesis of optically active phycoerythrinin in G. lemaneiformis at the molecular level. With the advent of high-throughput sequencing technology, analysis of genetic information for G. lemaneiformis by transcriptome sequencing is an effective means to get a deeper insight into the molecular mechanism of phycoerythrin synthesis. Illumina technology was employed to sequence the transcriptome of two strains of G. lemaneiformis- the wild type and a green-pigmented mutant. We obtained a total of 86915 assembled unigenes as a reference gene set, and 42884 unigenes were annotated in at least one public database. Taking the above transcriptome sequencing as a reference gene set, 4041 differentially expressed genes were screened to analyze and compare the gene expression profiles of the wild type and green mutant. By GO and KEGG pathway analysis, we concluded that three factors, including a reduction in the expression level of apo-phycoerythrin, an increase of chlorophyll light-harvesting complex synthesis, and reduction of phycoerythrobilin by competitive inhibition, caused the reduction of optically active phycoerythrin in the green-pigmented mutant. PMID:28135287

  14. Neutron activation analysis system

    DOEpatents

    Taylor, M.C.; Rhodes, J.R.

    1973-12-25

    A neutron activation analysis system for monitoring a generally fluid media, such as slurries, solutions, and fluidized powders, including two separate conduit loops for circulating fluid samples within the range of radiation sources and detectors is described. Associated with the first loop is a neutron source that emits s high flux of slow and thermal neutrons. The second loop employs a fast neutron source, the flux from which is substantially free of thermal neutrons. Adjacent to both loops are gamma counters for spectrographic determination of the fluid constituents. Other gsmma sources and detectors are arranged across a portion of each loop for deterMining the fluid density. (Official Gazette)

  15. Giant panda ribosomal protein S14: cDNA, genomic sequence cloning, sequence analysis, and overexpression.

    PubMed

    Wu, G-F; Hou, Y-L; Hou, W-R; Song, Y; Zhang, T

    2010-10-13

    RPS14 is a component of the 40S ribosomal subunit encoded by the RPS14 gene and is required for its maturation. The cDNA and the genomic sequence of RPS14 were cloned successfully from the giant panda (Ailuropoda melanoleuca) using RT-PCR technology and touchdown-PCR, respectively; they were both sequenced and analyzed. The length of the cloned cDNA fragment was 492 bp; it contained an open-reading frame of 456 bp, encoding 151 amino acids. The length of the genomic sequence is 3421 bp; it contains four exons and three introns. Alignment analysis indicates that the nucleotide sequence shares a high degree of homology with those of Homo sapiens, Bos taurus, Mus musculus, Rattus norvegicus, Gallus gallus, Xenopus laevis, and Danio rerio (93.64, 83.37, 92.54, 91.89, 87.28, 84.21, and 84.87%, respectively). Comparison of the deduced amino acid sequences of the giant panda with those of these other species revealed that the RPS14 of giant panda is highly homologous with those of B. taurus, R. norvegicus and D. rerio (85.99, 99.34 and 99.34%, respectively), and is 100% identical with the others. This degree of conservation of RPS14 suggests evolutionary selection. Topology prediction shows that there are two N-glycosylation sites, three protein kinase C phosphorylation sites, two casein kinase II phosphorylation sites, four N-myristoylation sites, two amidation sites, and one ribosomal protein S11 signature in the RPS14 protein of the giant panda. The RPS14 gene can be readily expressed in Escherichia coli. When it was fused with the N-terminally His-tagged protein, it gave rise to accumulation of an expected 22-kDa polypeptide, in good agreement with the predicted molecular weight. The expression product obtained can be purified for studies of its function.

  16. Light-generated oligonucleotide arrays for rapid DNA sequence analysis.

    PubMed Central

    Pease, A C; Solas, D; Sullivan, E J; Cronin, M T; Holmes, C P; Fodor, S P

    1994-01-01

    In many areas of molecular biology there is a need to rapidly extract and analyze genetic information; however, current technologies for DNA sequence analysis are slow and labor intensive. We report here how modern photolithographic techniques can be used to facilitate sequence analysis by generating miniaturized arrays of densely packed oligonucleotide probes. These probe arrays, or DNA chips, can then be applied to parallel DNA hybridization analysis, directly yielding sequence information. In a preliminary experiment, a 1.28 x 1.28 cm array of 256 different octanucleotides was produced in 16 chemical reaction cycles, requiring 4 hr to complete. The hybridization pattern of fluorescently labeled oligonucleotide targets was then detected by epifluorescence microscopy. The fluorescence signals from complementary probes were 5-35 times stronger than those with single or double base-pair hybridization mismatches, demonstrating specificity in the identification of complementary sequences. This method should prove to be a powerful tool for rapid investigations in human genetics and diagnostics, pathogen detection, and DNA molecular recognition. Images PMID:8197176

  17. Natural recombination in alphaherpesviruses: Insights into viral evolution through full genome sequencing and sequence analysis.

    PubMed

    Loncoman, Carlos A; Vaz, Paola K; Coppo, Mauricio Jc; Hartley, Carol A; Morera, Francisco J; Browning, Glenn F; Devlin, Joanne M

    2017-04-01

    Recombination in alphaherpesviruses was first described more than sixty years ago. Since then, different techniques have been used to detect recombination in natural (field) and experimental settings. Over the last ten years, next-generation sequencing (NGS) technologies and bioinformatic analyses have greatly increased the accuracy of recombination detection, particularly in field settings, thus contributing greatly to the study of natural alphaherpesvirus recombination in both human and veterinary medicine. Such studies have highlighted the important role that natural recombination plays in the evolution of many alphaherpesviruses. These studies have also shown that recombination can be a safety concern for attenuated alphaherpesvirus vaccines, particularly in veterinary medicine where such vaccines are used extensively, but also potentially in human medicine where attenuated varicella zoster virus vaccines are in use. This review focuses on the contributions that NGS and sequence analysis have made over the last ten years to our understanding of recombination in mammalian and avian alphaherpesviruses, with particular focus on attenuated live vaccine use.

  18. Approaches to sequence analysis of 125I-labeled RNA.

    PubMed Central

    Dickson, E; Pape, L K; Robertson, H D

    1979-01-01

    A method is described for the initial steps of sequence analysis of RNase T1-and pancreatic RN-ase-resistant oligonucleotides of RNA containing cytidylate residues labeled in vitro with 125I. In many cases an oligonucleotide sequence can be deduced from a consideration of (i) its relative position in the two-dimensional fingerprint (with DEAE thin layer homochromatographic second dimension), (ii) its electrophoretic mobility on DEAE paper at pH 1.9, and (iii) identification of its products of further enzymatic digestion by comparison with a set of marker oligonucleotides. Additional methods including analysis of oligonucleotides following chemical blocking of uridylate residues with CMCT and analysis of products of incomplete enzymatic digestion are also discussed. Images PMID:106369

  19. A biostratigraphic sequence analysis in Cretaceous sediments from Eastern Venezuela

    SciTech Connect

    Paredes, I.; Carillo, M.; Fasola, A.; Luna, F. )

    1993-02-01

    This paper presents the results of a high resolution biostratigraphic study integrated with petrophysic analyses, of the Late Cretaceous sequence in several wells from the Maturin Sub-Basin, Eastern Venezuela. The main objective of this study is to integrate the different faunal and floral assemblages to the sedimentological evolution of the basin using sequential analysis techniques. This technique was applied using mainly terrestrial and marine palynomorphs which were relatively abundant and diverse as compared to the scarcity of foraminifera and nonnofossils. Based on the percentages of abundance and the diversity of the different groups of microfoss it was possible to establish the maximum flooding surfaces and condensation levels which allowed the definition of the possible candidates for the sequence boundaries. On the other hand, the identified bioevents made possible the definition of the chronostratigraphic datums of the sequence under study. The results obtained will contribute to optimize the exploration and development programs of the oil fields in Eastern Venezuela.

  20. DNA sequence analysis using hierarchical ART-based classification networks

    SciTech Connect

    LeBlanc, C.; Hruska, S.I.; Katholi, C.R.; Unnasch, T.R.

    1994-12-31

    Adaptive resonance theory (ART) describes a class of artificial neural network architectures that act as classification tools which self-organize, work in real-time, and require no retraining to classify novel sequences. We have adapted ART networks to provide support to scientists attempting to categorize tandem repeat DNA fragments from Onchocerca volvulus. In this approach, sequences of DNA fragments are presented to multiple ART-based networks which are linked together into two (or more) tiers; the first provides coarse sequence classification while the sub- sequent tiers refine the classifications as needed. The overall rating of the resulting classification of fragments is measured using statistical techniques based on those introduced to validate results from traditional phylogenetic analysis. Tests of the Hierarchical ART-based Classification Network, or HABclass network, indicate its value as a fast, easy-to-use classification tool which adapts to new data without retraining on previously classified data.

  1. Genome sequencing and analysis of the model grass Brachypodium distachyon.

    PubMed

    2010-02-11

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

  2. Genome sequencing and analysis of the model grass Brachypodium distachyon

    SciTech Connect

    Yang, Xiaohan; Kalluri, Udaya C; Tuskan, Gerald A

    2010-01-01

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

  3. Strategy for microbiome analysis using 16S rRNA gene sequence analysis on the Illumina sequencing platform.

    PubMed

    Ram, Jeffrey L; Karim, Aos S; Sendler, Edward D; Kato, Ikuko

    2011-06-01

    Understanding the identity and changes of organisms in the urogenital and other microbiomes of the human body may be key to discovering causes and new treatments of many ailments, such as vaginosis. High-throughput sequencing technologies have recently enabled discovery of the great diversity of the human microbiome. The cost per base of many of these sequencing platforms remains high (thousands of dollars per sample); however, the Illumina Genome Analyzer (IGA) is estimated to have a cost per base less than one-fifth of its nearest competitor. The main disadvantage of the IGA for sequencing PCR-amplified 16S rRNA genes is that the maximum read-length of the IGA is only 100 bases; whereas, at least 300 bases are needed to obtain phylogenetically informative data down to the genus and species level. In this paper we describe and conduct a pilot test of a multiplex sequencing strategy suitable for achieving total reads of > 300 bases per extracted DNA molecule on the IGA. Results show that all proposed primers produce products of the expected size and that correct sequences can be obtained, with all proposed forward primers. Various bioinformatic optimization of the Illumina Bustard analysis pipeline proved necessary to extract the correct sequence from IGA image data, and these modifications of the data files indicate that further optimization of the analysis pipeline may improve the quality rankings of the data and enable more sequence to be correctly analyzed. The successful application of this method could result in an unprecedentedly deep description (800,000 taxonomic identifications per sample) of the urogenital and other microbiomes in a large number of samples at a reasonable cost per sample.

  4. Evolution Analysis of Simple Sequence Repeats in Plant Genome.

    PubMed

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1-3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution.

  5. Noncoding RNA gene detection using comparative sequence analysis

    PubMed Central

    Rivas, Elena; Eddy, Sean R

    2001-01-01

    Background Noncoding RNA genes produce transcripts that exert their function without ever producing proteins. Noncoding RNA gene sequences do not have strong statistical signals, unlike protein coding genes. A reliable general purpose computational genefinder for noncoding RNA genes has been elusive. Results We describe a comparative sequence analysis algorithm for detecting novel structural RNA genes. The key idea is to test the pattern of substitutions observed in a pairwise alignment of two homologous sequences. A conserved coding region tends to show a pattern of synonymous substitutions, whereas a conserved structural RNA tends to show a pattern of compensatory mutations consistent with some base-paired secondary structure. We formalize this intuition using three probabilistic "pair-grammars": a pair stochastic context free grammar modeling alignments constrained by structural RNA evolution, a pair hidden Markov model modeling alignments constrained by coding sequence evolution, and a pair hidden Markov model modeling a null hypothesis of position-independent evolution. Given an input pairwise sequence alignment (e.g. from a BLASTN comparison of two related genomes) we classify the alignment into the coding, RNA, or null class according to the posterior probability of each class. Conclusions We have implemented this approach as a program, QRNA, which we consider to be a prototype structural noncoding RNA genefinder. Tests suggest that this approach detects noncoding RNA genes with a fair degree of reliability. PMID:11801179

  6. Molecular cloning, sequence analysis, and functional expression of a novel growth regulator, oncostatin M.

    PubMed Central

    Malik, N; Kallestad, J C; Gunderson, N L; Austin, S D; Neubauer, M G; Ochs, V; Marquardt, H; Zarling, J M; Shoyab, M; Wei, C M

    1989-01-01

    Oncostatin M is a polypeptide of Mr approximately 28,000 that acts as a growth regulator for many cultured mammalian cells. We report the cDNA and genomic cloning, sequence analysis, and functional expression in heterologous cells of oncostatin M. cDNA clones were isolated from mRNA of U937 cells that had been induced to differentiate into macrophagelike cells by treatment with phorbol 12-myristate 13-acetate, and a genomic clone was also isolated from human brain DNA. Sequence analysis of these clones established the 1,814-base-pair cDNA sequence as well as exon boundaries. This sequence predicted that oncostatin M is synthesized as a 252-amino-acid polypeptide, with a 25-residue hydrophobic sequence resembling a signal peptide at the N terminus. The predicted oncostatin M amino acid sequence shared no homology with other known proteins, but the sequence of the 3' noncoding region of the cDNA contained an A + T-rich stretch with sequence motifs found in the 3' untranslated regions of many cytokine and lymphokine cDNAs. Oncostatin M mRNA of approximately 2 kilobase pairs was detected in phorbol 12-myristate 13-acetate-treated U937 cells and in activated human T cells. Transfection of cDNA encoding the oncostatin M precursor into COS cells resulted in the secretion of proteins with the structural and functional properties of oncostatin M. The unique amino acid sequence, expression by lymphoid cells, and growth-regulatory activities of oncostatin M suggest that it is a novel cytokine. Images PMID:2779549

  7. Analysis of sequence variation in Gnathostoma spinigerum mitochondrial DNA by single-strand conformation polymorphism analysis and DNA sequence.

    PubMed

    Ngarmamonpirat, Charinthon; Waikagul, Jitra; Petmitr, Songsak; Dekumyoy, Paron; Rojekittikhun, Wichit; Anantapruti, Malinee T

    2005-03-01

    Morphological variations were observed in the advance third stage larvae of Gnathostoma spinigerum collected from swamp eel (Fluta alba), the second intermediate host. Larvae with typical and three atypical types were chosen for partial cytochrome c oxidase subunit I (COI) gene sequence analysis. A 450 bp polymerase chain reaction product of the COI gene was amplified from mitochondrial DNA. The variations were analyzed by single-strand conformation polymorphism and DNA sequencing. The nucleotide variations of the COI gene in the four types of larvae indicated the presence of an intra-specific variation of mitochondrial DNA in the G. spinigerum population.

  8. Congruence analysis of point clouds from unstable stereo image sequences

    NASA Astrophysics Data System (ADS)

    Jepping, C.; Bethmann, F.; Luhmann, T.

    2014-06-01

    This paper deals with the correction of exterior orientation parameters of stereo image sequences over deformed free-form surfaces without control points. Such imaging situation can occur, for example, during photogrammetric car crash test recordings where onboard high-speed stereo cameras are used to measure 3D surfaces. As a result of such measurements 3D point clouds of deformed surfaces are generated for a complete stereo sequence. The first objective of this research focusses on the development and investigation of methods for the detection of corresponding spatial and temporal tie points within the stereo image sequences (by stereo image matching and 3D point tracking) that are robust enough for a reliable handling of occlusions and other disturbances that may occur. The second objective of this research is the analysis of object deformations in order to detect stable areas (congruence analysis). For this purpose a RANSAC-based method for congruence analysis has been developed. This process is based on the sequential transformation of randomly selected point groups from one epoch to another by using a 3D similarity transformation. The paper gives a detailed description of the congruence analysis. The approach has been tested successfully on synthetic and real image data.

  9. Castor bean organelle genome sequencing and worldwide genetic diversity analysis.

    PubMed

    Rivarola, Maximo; Foster, Jeffrey T; Chan, Agnes P; Williams, Amber L; Rice, Danny W; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M J; Khouri, Hoda M; Beckstrom-Sternberg, Stephen M; Allan, Gerard J; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade.

  10. Castor Bean Organelle Genome Sequencing and Worldwide Genetic Diversity Analysis

    PubMed Central

    Chan, Agnes P.; Williams, Amber L.; Rice, Danny W.; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M. J.; Khouri, Hoda M.; Beckstrom-Sternberg, Stephen M.; Allan, Gerard J.; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D.

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade. PMID:21750729

  11. [The Mycobacterium leprae genome: from sequence analysis to therapeutic implications].

    PubMed

    Honore, N

    2002-01-01

    The genome of Mycobacterium leprae, the causative agent of leprosy, was analyzed by rapid sequencing of cosmids and plasmids prepared from DNA isolated from one patient's strain. Results showed that the bacillus possesses a single circular chromosome that differs from other known mycobacterium chromosomes with regard to size (3.2 Mb) and G + C content (57.8%). Computer analysis demonstrated that only half of the sequence contains protein-coding genes. The other half contains pseudogenes and non-coding sequences. These findings indicate that M. leprae has undergone a major reductive evolution leaving a minimal set of functional genes for survival. Study of the coding region of the sequence provides evidence accounting for the particular pathogenic properties of M. leprae which is an obligate intracellular parasite. Disappearance of numerous enzymatic pathways in comparison with M. tuberculosis, an intracellular pathogen comparable to M. leprae, could explain the differences observed between the two organisms. Genomic analysis of the leprosy bacillus also provided insight into the molecular basis for resistance to various antibiotics and allowed identification of several potential targets for new drug treatments.

  12. Whole-genome sequence-based analysis of thyroid function

    PubMed Central

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J.; Traglia, Michela; Brown, Suzanne J.; Mullin, Benjamin H.; Shihab, Hashem A.; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R.; Beilby, John P.; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D.; Hui, Jennie; Lim, Ee M.; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R.B.; Bell, Jordana T.; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L.; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M.; Naitza, Silvia; Walsh, John P.; Spector, Tim; Davey Smith, George; Durbin, Richard; Brent Richards, J.; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J.; Wilson, Scott G.; Turki, Saeed Al; Anderson, Carl; Anney, Richard; Antony, Dinu; Artigas, Maria Soler; Ayub, Muhammad; Balasubramaniam, Senduran; Barrett, Jeffrey C.; Barroso, Inês; Beales, Phil; Bentham, Jamie; Bhattacharya, Shoumo; Birney, Ewan; Blackwood, Douglas; Bobrow, Martin; Bochukova, Elena; Bolton, Patrick; Bounds, Rebecca; Boustred, Chris; Breen, Gerome; Calissano, Mattia; Carss, Keren; Chatterjee, Krishna; Chen, Lu; Ciampi, Antonio; Cirak, Sebhattin; Clapham, Peter; Clement, Gail; Coates, Guy; Collier, David; Cosgrove, Catherine; Cox, Tony; Craddock, Nick; Crooks, Lucy; Curran, Sarah; Curtis, David; Daly, Allan; Day-Williams, Aaron; Day, Ian N.M.; Down, Thomas; Du, Yuanping; Dunham, Ian; Edkins, Sarah; Ellis, Peter; Evans, David; Faroogi, Sadaf; Fatemifar, Ghazaleh; Fitzpatrick, David R.; Flicek, Paul; Flyod, James; Foley, A. Reghan; Franklin, Christopher S.; Futema, Marta; Gallagher, Louise; Geihs, Matthias; Geschwind, Daniel; Griffin, Heather; Grozeva, Detelina; Guo, Xueqin; Guo, Xiaosen; Gurling, Hugh; Hart, Deborah; Hendricks, Audrey; Holmans, Peter; Howie, Bryan; Huang, Liren; Hubbard, Tim; Humphries, Steve E.; Hurles, Matthew E.; Hysi, Pirro; Jackson, David K.; Jamshidi, Yalda; Jing, Tian; Joyce, Chris; Kaye, Jane; Keane, Thomas; Keogh, Julia; Kemp, John; Kennedy, Karen; Kolb-Kokocinski, Anja; Lachance, Genevieve; Langford, Cordelia; Lawson, Daniel; Lee, Irene; Lek, Monkol; Liang, Jieqin; Lin, Hong; Li, Rui; Li, Yingrui; Liu, Ryan; Lönnqvist, Jouko; Lopes, Margarida; Lotchkova, Valentina; MacArthur, Daniel; Marchini, Jonathan; Maslen, John; Massimo, Mangino; Mathieson, Iain; Marenne, Gaëlle; McGuffin, Peter; McIntosh, Andrew; McKechanie, Andrew G.; McQuillin, Andrew; Metrustry, Sarah; Mitchison, Hannah; Moayyeri, Alireza; Morris, James; Muntoni, Francesco; Northstone, Kate; O'Donnovan, Michael; Onoufriadis, Alexandros; O'Rahilly, Stephen; Oualkacha, Karim; Owen, Michael J.; Palotie, Aarno; Panoutsopoulou, Kalliope; Parker, Victoria; Parr, Jeremy R.; Paternoster, Lavinia; Paunio, Tiina; Payne, Felicity; Pietilainen, Olli; Plagnol, Vincent; Quaye, Lydia; Quai, Michael A.; Raymond, Lucy; Rehnström, Karola; Richards, Brent; Ring, Susan; Ritchie, Graham R.S.; Roberts, Nicola; Savage, David B.; Scambler, Peter; Schiffels, Stephen; Schmidts, Miriam; Schoenmakers, Nadia; Semple, Robert K.; Serra, Eva; Sharp, Sally I.; Shin, So-Youn; Skuse, David; Small, Kerrin; Southam, Lorraine; Spasic-Boskovic, Olivera; Clair, David St; Stalker, Jim; Stevens, Elizabeth; Pourcian, Beate St; Sun, Jianping; Suvisaari, Jaana; Tachmazidou, Ionna; Tobin, Martin D.; Valdes, Ana; Kogelenberg, Margriet Van; Vijayarangakannan, Parthiban; Visscher, Peter M.; Wain, Louise V.; Walters, James T.R.; Wang, Guangbiao; Wang, Jun; Wang, Yu; Ward, Kirsten; Wheeler, Elanor; Whyte, Tamieka; Williams, Hywel; Williamson, Kathleen A.; Wilson, Crispian; Wong, Kim; Xu, ChangJiang; Yang, Jian; Zhang, Fend; Zhang, Pingbo

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10−9) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10−14). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10−9) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10−11). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function. PMID:25743335

  13. Infrared thermal facial image sequence registration analysis and verification

    NASA Astrophysics Data System (ADS)

    Chen, Chieh-Li; Jian, Bo-Lin

    2015-03-01

    To study the emotional responses of subjects to the International Affective Picture System (IAPS), infrared thermal facial image sequence is preprocessed for registration before further analysis such that the variance caused by minor and irregular subject movements is reduced. Without affecting the comfort level and inducing minimal harm, this study proposes an infrared thermal facial image sequence registration process that will reduce the deviations caused by the unconscious head shaking of the subjects. A fixed image for registration is produced through the localization of the centroid of the eye region as well as image translation and rotation processes. Thermal image sequencing will then be automatically registered using the two-stage genetic algorithm proposed. The deviation before and after image registration will be demonstrated by image quality indices. The results show that the infrared thermal image sequence registration process proposed in this study is effective in localizing facial images accurately, which will be beneficial to the correlation analysis of psychological information related to the facial area.

  14. Whole-genome sequence-based analysis of thyroid function.

    PubMed

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

    2015-03-06

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function.

  15. Now and Next-Generation Sequencing Techniques: Future of Sequence Analysis Using Cloud Computing

    PubMed Central

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed “cloud computing”) has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows. PMID:23248640

  16. Now and next-generation sequencing techniques: future of sequence analysis using cloud computing.

    PubMed

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed "cloud computing") has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows.

  17. Characterization of the promoter and upstream activating sequence from the Pseudomonas alcaligenes lipase gene.

    PubMed

    Cox, M; Gerritse, G; Dankmeyer, L; Quax, W J

    2001-03-09

    Pseudomonas alcaligenes secretes a lipase with a high pH optimum, which has interesting properties for application in detergents. The expression of the lipase is strongly dependent on the presence of lipids in the growth medium such as soybean oil. The promoter of the gene was characterized and found to have resemblance to sigma54 controlled promoters, which are known to be tightly regulated. The transcription start was mapped precisely downstream of a sequence with close similarity to the -12/-24 consensus sequence of sigma54 controlled promoters. Interestingly, a hyperproducer mutant strain was isolated and found to have a C to T mutation in the -12/-24 promoter consensus region. In addition an Upstream Activating Sequence (UAS) with homology to sigma54 UAS consensus sequences was identified. It was demonstrated that an increase of the distance from the UAS to the transcription start or the deletion of the UAS results in significantly lower expression levels of lipase. A systematic mutational analysis of the UAS sequence has resulted in a variant with an increased lipase expression.

  18. Integrative analysis of environmental sequences using MEGAN4

    PubMed Central

    Huson, Daniel H.; Mitra, Suparna; Ruscheweyh, Hans-Joachim; Weber, Nico; Schuster, Stephan C.

    2011-01-01

    A major challenge in the analysis of environmental sequences is data integration. The question is how to analyze different types of data in a unified approach, addressing both the taxonomic and functional aspects. To facilitate such analyses, we have substantially extended MEGAN, a widely used taxonomic analysis program. The new program, MEGAN4, provides an integrated approach to the taxonomic and functional analysis of metagenomic, metatranscriptomic, metaproteomic, and rRNA data. While taxonomic analysis is performed based on the NCBI taxonomy, functional analysis is performed using the SEED classification of subsystems and functional roles or the KEGG classification of pathways and enzymes. A number of examples illustrate how such analyses can be performed, and show that one can also import and compare classification results obtained using others' tools. MEGAN4 is freely available for academic purposes, and installers for all three major operating systems can be downloaded from www-ab.informatik.uni-tuebingen.de/software/megan. PMID:21690186

  19. MOLECULAR CLONING, SEQUENCING, EXPRESSION AND BIOLOGICAL ACTIVITY OF GIANT PANDA (AILUROPODA MELANOLEUCA) INTERFERON-GAMMA.

    PubMed

    Zhu, Hui; Wang, Wen-Xiu; Wang, Bao-Qin; Zhu, Xiao-Fu; Wu, Xu-Jin; Ma, Qing-Yi; Chen, De-Kun

    2012-06-29

    The giant panda (Ailuropoda melanoleuca) is an endangered species and indigenous to China. Interferon-gamma (IFN-γ) is the only member of type □ IFN and is vital for the regulation of host adapted immunity and inflammatory response. Little is known aboutthe FN-γ gene and its roles in giant panda.In this study, IFN-γ gene of Qinling giant panda was amplified from total blood RNA by RT-CPR, cloned, sequenced and analysed. The open reading frame (ORF) of Qinling giant panda IFN-γ encodes 152 amino acidsand is highly similar to Sichuan giant panda with an identity of 99.3% in cDNA sequence. The IFN-γ cDNA sequence was ligated to the pET32a vector and transformed into E. coli BL21 competent cells. Expression of recombinant IFN-γ protein of Qinling giant panda in E. coli was confirmed by SDS-PAGE and Western blot analysis. Biological activity assay indicated that the recombinant IFN-γ protein at the concentration of 4-10 µg/ml activated the giant panda peripheral blood lymphocytes,while at 12 µg/mlinhibited. the activation of the lymphocytes.These findings provide insights into the evolution of giant panda IFN-γ and information regarding amino acid residues essential for their biological activity.

  20. Environmental impact analysis for the main accidental sequences of ignitor

    SciTech Connect

    Carpignano, A.; Francabandiera, S.; Vella, R.; Zucchetti, M.

    1996-12-31

    A safety analysis study has been applied to the Ignitor machine using Probabilistic Safety Assessment. The main initiating events have been identified, and accident sequences have been studied by means of traditional methods such as Failure Mode and Effect Analysis (FMEA), Fault Trees (FT) and Event Trees (ET). The consequences of the radioactive environmental releases have been assessed in terms of Effective Dose Equivalent (EDEs) to the Most Exposed Individuals (MEI) of the chosen site, by means of a population dose code. Results point out the low enviromental impact of the machine. 13 refs., 1 fig., 3 tabs.

  1. The Design and Analysis of Transposon-Insertion Sequencing Experiments

    PubMed Central

    Chao, Michael C.; Abel, Sören; Davis, Brigid M.; Waldor, Matthew K.

    2016-01-01

    Preface Transposon-insertion sequencing (TIS) is a powerful approach that can be widely applied to genome-wide definition of loci that are required for growth in diverse conditions. However, experimental design choices and stochastic biological processes can heavily influence the results of TIS experiments and affect downstream statistical analysis. Here, we discuss TIS experimental parameters and how these factors relate to the benefits and limitations of the various statistical frameworks that can be applied to computational analysis of TIS data. PMID:26775926

  2. An editing environment for DNA sequence analysis and annotation

    SciTech Connect

    Uberbacher, E.C.; Xu, Y.; Shah, M.B.; Olman, V.; Parang, M.; Mural, R.

    1998-12-31

    This paper presents a computer system for analyzing and annotating large-scale genomic sequences. The core of the system is a multiple-gene structure identification program, which predicts the most probable gene structures based on the given evidence, including pattern recognition, EST and protein homology information. A graphics-based user interface provides an environment which allows the user to interactively control the evidence to be used in the gene identification process. To overcome the computational bottleneck in the database similarity search used in the gene identification process, the authors have developed an effective way to partition a database into a set of sub-databases of related sequences, and reduced the search problem on a large database to a signature identification problem and a search problem on a much smaller sub-database. This reduces the number of sequences to be searched from N to O({radical}N) on average, and hence greatly reduces the search time, where N is the number of sequences in the original database. The system provides the user with the ability to facilitate and modify the analysis and modeling in real time.

  3. Ichnofabric and siliciclastic depositional systems: Integration for sequence stratigraphic analysis

    SciTech Connect

    Bottjer, D.J. ); Droser, M.L. )

    1991-03-01

    Much previous research on biogenic sedimentary structures has established how ichnofacies (assemblages of discrete trace fossils) vary within marine depositional systems. However, studies aimed at understanding the distribution of ichnofabric (sedimentary rock fabric resulting from biogenic reworking) have only recently been attempted. Because ichnofabric can be recorded using a semi-quantitative series of ichnofabric indices (ii), its distribution in marine sedimentary rocks can be easily recorded through vertical sequence analysis. Thicknesses of strata recording different ichnofabric indices can be logged from stratigraphic sections or cores. These data are best displayed in histograms as percent of ii recorded from the total thickness measured. These ichnofabric histograms (ichnograms) show variable but distinctive distributions for genetic units such as facies within systems tracts of siliciclastic depositional sequences. An average ichnofabric index for any genetic sedimentary unit can also be computed from the data used to construct ichnograms. Because skeletal fossils are typically much less commonly preserved in siliciclastic than carbonate depositional systems, such ichnofabric analyses have the potential of providing an important new line of evidence for depositional systems and sequence stratigraphic analysis of siliciclastic strata. In petroleum exploration results from completing analyses of ichnofabric distribution could provide important information including: (1) systems tracts with fine-grained facies that have relatively low ichnofabric values are potential source beds; and (2) petroleum reservoirs that occur in coarse episodically deposited beds are more likely to from in systems tracts with facies that have low rather than high ichnofabric values.

  4. Comparative Analysis of Genome Sequences Covering the Seven Cronobacter Species

    PubMed Central

    Cummings, Craig A.; Shih, Rita; Degoricija, Lovorka; Rico, Alain; Brzoska, Pius; Hamby, Stephen E.; Masood, Naqash; Hariri, Sumyya; Sonbol, Hana; Chuzhanova, Nadia; McClelland, Michael; Furtado, Manohar R.; Forsythe, Stephen J.

    2012-01-01

    Background Species of Cronobacter are widespread in the environment and are occasional food-borne pathogens associated with serious neonatal diseases, including bacteraemia, meningitis, and necrotising enterocolitis. The genus is composed of seven species: C. sakazakii, C. malonaticus, C. turicensis, C. dublinensis, C. muytjensii, C. universalis, and C. condimenti. Clinical cases are associated with three species, C. malonaticus, C. turicensis and, in particular, with C. sakazakii multilocus sequence type 4. Thus, it is plausible that virulence determinants have evolved in certain lineages. Methodology/Principal Findings We generated high quality sequence drafts for eleven Cronobacter genomes representing the seven Cronobacter species, including an ST4 strain of C. sakazakii. Comparative analysis of these genomes together with the two publicly available genomes revealed Cronobacter has over 6,000 genes in one or more strains and over 2,000 genes shared by all Cronobacter. Considerable variation in the presence of traits such as type six secretion systems, metal resistance (tellurite, copper and silver), and adhesins were found. C. sakazakii is unique in the Cronobacter genus in encoding genes enabling the utilization of exogenous sialic acid which may have clinical significance. The C. sakazakii ST4 strain 701 contained additional genes as compared to other C. sakazakii but none of them were known specific virulence-related genes. Conclusions/Significance Genome comparison revealed that pair-wise DNA sequence identity varies between 89 and 97% in the seven Cronobacter species, and also suggested various degrees of divergence. Sets of universal core genes and accessory genes unique to each strain were identified. These gene sequences can be used for designing genus/species specific detection assays. Genes encoding adhesins, T6SS, and metal resistance genes as well as prophages are found in only subsets of genomes and have contributed considerably to the variation of

  5. Initial genome sequencing and analysis of multiple myeloma

    PubMed Central

    Chapman, Michael A.; Lawrence, Michael S.; Keats, Jonathan J.; Cibulskis, Kristian; Sougnez, Carrie; Schinzel, Anna C.; Harview, Christina L.; Brunet, Jean-Philippe; Ahmann, Gregory J.; Adli, Mazhar; Anderson, Kenneth C.; Ardlie, Kristin G.; Auclair, Daniel; Baker, Angela; Bergsagel, P. Leif; Bernstein, Bradley E.; Drier, Yotam; Fonseca, Rafael; Gabriel, Stacey B.; Hofmeister, Craig C.; Jagannath, Sundar; Jakubowiak, Andrzej J.; Krishnan, Amrita; Levy, Joan; Liefeld, Ted; Lonial, Sagar; Mahan, Scott; Mfuko, Bunmi; Monti, Stefano; Perkins, Louise M.; Onofrio, Robb; Pugh, Trevor J.; Vincent Rajkumar, S.; Ramos, Alex H.; Siegel, David S.; Sivachenko, Andrey; Trudel, Suzanne; Vij, Ravi; Voet, Douglas; Winckler, Wendy; Zimmerman, Todd; Carpten, John; Trent, Jeff; Hahn, William C.; Garraway, Levi A.; Meyerson, Matthew; Lander, Eric S.; Getz, Gad; Golub, Todd R.

    2013-01-01

    Multiple myeloma is an incurable malignancy of plasma cells, and its pathogenesis is poorly understood. Here we report the massively parallel sequencing of 38 tumor genomes and their comparison to matched normal DNAs. Several new and unexpected oncogenic mechanisms were suggested by the pattern of somatic mutation across the dataset. These include the mutation of genes involved in protein translation (seen in nearly half of the patients), genes involved in histone methylation, and genes involved in blood coagulation. In addition, a broader than anticipated role of NF-κB signaling was suggested by mutations in 11 members of the NF-κB pathway. Of potential immediate clinical relevance, activating mutations of the kinase BRAF were observed in 4% of patients, suggesting the evaluation of BRAF inhibitors in multiple myeloma clinical trials. These results indicate that cancer genome sequencing of large collections of samples will yield new insights into cancer not anticipated by existing knowledge. PMID:21430775

  6. Cloning and sequence analysis of myostatin promoter in sheep.

    PubMed

    Du, Rong; Chen, Yong-Fu; An, Xiao-Rong; Yang, Xing-Yuan; Ma, Yi; Zhang, Lei; Yuan, Xiao-Li; Chen, Li-Mei; Qin, Jian

    2005-12-01

    To better understand the structure and function of the myostatin's gene promoter region in sheep, we cloned and sequenced a 1.517 kb fragment containing the 5'-regulatory region of the sheep myostatin gene (GenBank accession number is AY918121). The promoter sequence consists of three TATA boxes, one CAAT box, and eight putative E-boxes. Some putative muscle growth response elements for Octamer-binding factor 1(Octamer), Activator protein 1(AP1), Growth factor independence 1 zinc finger protein (Gfi-1B), Myocyte enhancer factor 2 (MEF2), Muscle-specific Mt binding site (MTBF), Glucocorticoid response elements (GRE) and Progesterone receptor binding site (PRE) were detected. Some of the motifs are conserved as compared to with that in the goat, bovine and porcine myostatin promoters. However, some differences were also found.

  7. Efficient Algorithms for Sequence Analysis with Entropic Profiles.

    PubMed

    Pizzi, Cinzia; Ornamenti, Mattia; Spangaro, Simone; Rombo, Simona E; Parida, Laxmi

    2016-10-21

    Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign showing that our algorithms, beside being faster, make it possible the analysis of longer sequences, even for high degrees of resolution, than state of the art algorithms.

  8. Sequence Analysis of the Direct Repeat Region in Mycobacterium bovis

    PubMed Central

    Caimi, Karina; Romano, Maria I.; Alito, Alicia; Zumarraga, Martin; Bigi, Fabiana; Cataldi, Angel

    2001-01-01

    Spoligotyping is a major tool for molecular typing of Mycobacterium bovis. This technique is based on the polymorphism of spacers that separate direct repeats (DRs) in the M. tuberculosis complex DR region. Numerous M. bovis strains show a lack of several spacers which appears as a gap in the spoligotyping pattern. To determine whether these gaps contain alternative spacers not included in the spoligotyping membrane, PCRs using primers that hybridize to the spacers adjacent to the gaps were performed. Comparing the sizes of products obtained by PCR with those deduced from spoligotyping patterns, fragments were selected and sequenced to look for alternative spacers. Upon analysis of the sequences, five alternative spacers were detected, although deletions of spacers are mainly responsible for the observed gaps. The alternative spacers, which are more frequent in M. bovis than in M. tuberculosis, may contribute to increased M. bovis differentiation. PMID:11230428

  9. Nonlinear analysis of correlations in Alu repeat sequences in DNA

    NASA Astrophysics Data System (ADS)

    Xiao, Yi; Huang, Yanzhao; Li, Mingfeng; Xu, Ruizhen; Xiao, Saifeng

    2003-12-01

    We report on a nonlinear analysis of deterministic structures in Alu repeats, one of the richest repetitive DNA sequences in the human genome. Alu repeats contain the recognition sites for the restriction endonuclease AluI, which is what gives them their name. Using the nonlinear prediction method developed in chaos theory, we find that all Alu repeats have novel deterministic structures and show strong nonlinear correlations that are absent from exon and intron sequences. Furthermore, the deterministic structures of Alus of younger subfamilies show panlike shapes. As young Alus can be seen as mutation free copies from the “master genes,” it may be suggested that the deterministic structures of the older subfamilies are results of an evolution from a “panlike” structure to a more diffuse correlation pattern due to mutation.

  10. Cloning and sequence analysis of candidate human natural killer-enhancing factor genes

    SciTech Connect

    Shau, H.; Butterfield, L.H.; Chiu, R.; Kim, A.

    1994-12-31

    A cytosol factor from human red blood cells enhances natural killer (NK) activity. This factor, termed NK-enhancing factor (NKEF), is a protein of 44000 M{sub r} consisting of two subunits of equal size linked by disulfide bonds. NKEF is expressed in the NK-sensitive erythroleukemic cell line K562. Using an antibody specific for NKEF as a probe for immunoblot screening, we isolated several clones from a {lambda}gt11 cDNA library of K562. Additional subcloning and sequencing revealed that the candidate NKEF cDNAs fell into one of two categories of closely related but non-identical genes, referred to as NKEF A and B. They are 88% identical in amino acid sequence and 71% identical in nucleotide sequence. Southern blot analysis suggests that there are two to three NKEF family members in the genome. Analysis of predicted amino acid sequences indicates that both NKEF A and B are cytosol proteins with several phosphorylation sites each, but that they have no glycosylation sites. They are significantly homologous to several other proteins from a wide variety of organisms ranging from prokaryotes to mammals, especially with regard to several well-conserved motifs within the amino acid sequences. The biological functions of these proteins in other species are mostly unknown, but some of them were reported to be induced by oxidative stress. Therefore, as well as for immunoregulation of NK activity, NKEF may be important for cells in coping with oxidative insults. 32 refs., 3 figs.

  11. Statistical analysis of dynamic sequences for functional imaging

    NASA Astrophysics Data System (ADS)

    Kao, Chien-Min; Chen, Chin-Tu; Wernick, Miles N.

    2000-04-01

    Factor analysis of medical image sequences (FAMIS), in which one concerns the problem of simultaneous identification of homogeneous regions (factor images) and the characteristic temporal variations (factors) inside these regions from a temporal sequence of images by statistical analysis, is one of the major challenges in medical imaging. In this research, we contribute to this important area of research by proposing a two-step approach. First, we study the use of the noise- adjusted principal component (NAPC) analysis developed by Lee et. al. for identifying the characteristic temporal variations in dynamic scans acquired by PET and MRI. NAPC allows us to effectively reject data noise and substantially reduce data dimension based on signal-to-noise ratio consideration. Subsequently, a simple spatial analysis based on the criteria of minimum spatial overlapping and non-negativity of the factor images is applied for extraction of the factors and factor images. In our simulation study, our preliminary results indicate that the proposed approach can accurately identify the factor images. However, the factors are not completely separated.

  12. Heterogeneity of mammalian DNA ligase detected on activity and DNA sequencing gels.

    PubMed Central

    Mezzina, M; Sarasin, A; Politi, N; Bertazzoni, U

    1984-01-01

    A new method to detect DNA ligase activity in situ after NaDodSO4 polyacrylamide gel electrophoresis has been developed. After renaturation of active polypeptides the ligase reaction occurs in situ by incubating the intact gel in the presence of Mg++ and ATP. Further treatment with alkaline phosphatase removes the unligated 5'-32P-end of oligo (dT) used as a substrate and active polypeptides having ligase activity are identified by autoradiography. Analysis on DNA sequencing gels of the oligo (dT) reaction products present in the activity bands ensures that the radioactive material detected in activity gels or in standard in vitro ligase assays corresponds unambiguously to a ligase activity. Using these methods, we have analysed the purified phage T4 DNA ligase, and the activities present in crude extracts and in purified fractions from monkey kidney (CV1-P) cells. The purified T4 enzyme yields one or two active peptides with Mr values of 60,000 and 70,000. Crude extracts from CV1-P cells contain several polypeptides having DNA ligase activity. Partial purification of these extracts shows that DNA ligase I isolated from hydroxylapatite column is enriched in polypeptides with Mr 200,000, 150,000 and 120,000, while DNA ligase II is enriched in those with Mr 60,000 and 70,000. Images PMID:6377238

  13. Mutational analysis of DBD*--a unique antileukemic gene sequence.

    PubMed

    Ji, Yan-shan; Johnson, Betty H; Webb, M Scott; Thompson, E Brad

    2002-01-01

    DBD* is a novel gene encoding an 89 amino acid peptide that is constitutively lethal to leukemic cells. DBD* was derived from the DNA binding domain of the human glucocorticoid receptor by a frameshift that replaces the final 21 C-terminal amino acids of the domain. Previous studies suggested that DBD* no longer acted as the natural DNA binding domain. To confirm and extend these results, we mutated DBD* in 29 single amino acid positions, critical for the function in the native domain or of possible functional significance in the novel 21 amino acid C-terminal sequence. Steroid-resistant leukemic ICR-27-4 cells were transiently transfected by electroporation with each of the 29 mutants. Cell kill was evaluated by trypan blue dye exclusion, a WST-1 tetrazolium-based assay for cell respiration, propidium iodide exclusion, and Hoechst 33258 staining of chromatin. Eleven of the 29 point mutants increased, whereas four decreased antileukemic activity. The remainder had no effect on activity. The nonconcordances between these effects and native DNA binding domain function strongly suggest that the lethality of DBD* is distinct from that of the glucocorticoid receptor. Transfections of fragments of DBD* showed that optimal activity localized to the sequence for its C-terminal 32 amino acids.

  14. The application of m-sequences to bi-static active sonar

    NASA Astrophysics Data System (ADS)

    Deferrari, Harry A.

    2003-10-01

    The m-sequences are ideal pulse compression signals that combine the energy of CW with the resolution of a pulse. Successful applications include numerous acoustic propagation experiments and the Global Positioning System. Yet, early attempts (circa 1960) to apply m-sequences to mono-static active sonar were unsuccessful. Through the years, Birdsall, Metzger and others have developed a body of theory, numerical methods and at-sea demonstrations that establish the feasibility of a novel bi-static approach-one that holds promise in high reverberation shallow water environs. An analysis is presented here. The approach includes (1) continuous transmission of long m-sequences (2) synchronous sampling to form a CON (Complete Ortho-Normal) data set; (3) direct blast removal by HCCO (Hyperspace Cancellation by Coordinate Zeroing); and (4) a full range waveform Doppler search. Ultra-fast Hadamard Transforms speed up the direct waveform pulse m-sequence pulse compression and the inverse pulse waveform transform and thereby allow timely execution of the intensive computational burden. The result is a demonstrable approach that produces a gain of 30 dB over a simple pulse and 10 dB over other sonar signals. In the end, the approach requires continuous transmission and reception as opposed to ping and listen an awkward concept at first.

  15. Total chemical synthesis of a 77-nucleotide-long RNA sequence having methionine-acceptance activity.

    PubMed Central

    Ogilvie, K K; Usman, N; Nicoghosian, K; Cedergren, R J

    1988-01-01

    Chemical synthesis is described of a 77-nucleotide-long RNA molecule that has the sequence of an Escherichia coli Ado-47-containing tRNA(fMet) species in which the modified nucleosides have been substituted by their unmodified parent nucleosides. The sequence was assembled on a solid-phase, controlled-pore glass support in a stepwise manner with an automated DNA synthesizer. The ribonucleotide building blocks used were fully protected 5'-monomethoxytrityl-2'-silyl-3'-N,N-diisopropylaminophosphoram idites. p-Nitro-phenylethyl groups were used to protect the O6 of guanine residues. The fully deprotected tRNA analogue was characterized by polyacrylamide gel electrophoresis (sizing), terminal nucleotide analysis, sequencing, and total enzyme degradation, all of which indicated that the sequence was correct and contained only 3-5 linkages. The 77-mer was then assayed for amino acid acceptor activity by using E. coli methionyl-tRNA synthetase. The results indicated that the synthetic product, lacking modified bases, is a substrate for the enzyme and has an amino acid acceptance 11% of that of the major native species, tRNA(fMet) containing 7-methylguanosine at position 47. Images PMID:3413059

  16. Sequence analysis of mutations and translocations across breast cancer subtypes

    PubMed Central

    Banerji, Shantanu; Cibulskis, Kristian; Rangel-Escareno, Claudia; Brown, Kristin K.; Carter, Scott L.; Frederick, Abbie M.; Lawrence, Michael S.; Sivachenko, Andrey Y.; Sougnez, Carrie; Zou, Lihua; Cortes, Maria L.; Fernandez-Lopez, Juan C.; Peng, Shouyong; Ardlie, Kristin G.; Auclair, Daniel; Bautista-Piña, Veronica; Duke, Fujiko; Francis, Joshua; Jung, Joonil; Maffuz-Aziz, Antonio; Onofrio, Robert C.; Parkin, Melissa; Pho, Nam H.; Quintanar-Jurado, Valeria; Ramos, Alex H.; Rebollar-Vega, Rosa; Rodriguez-Cuevas, Sergio; Romero-Cordoba, Sandra L.; Schumacher, Steven E.; Stransky, Nicolas; Thompson, Kristin M.; Uribe-Figueroa, Laura; Baselga, Jose; Beroukhim, Rameen; Polyak, Kornelia; Sgroi, Dennis C.; Richardson, Andrea L.; Jimenez-Sanchez, Gerardo; Lander, Eric S.; Gabriel, Stacey B.; Garraway, Levi A.; Golub, Todd R.; Melendez-Zajgla, Jorge; Toker, Alex; Getz, Gad; Hidalgo-Miranda, Alfredo; Meyerson, Matthew

    2014-01-01

    Breast carcinoma is the leading cause of cancer-related mortality in women worldwide with an estimated 1.38 million new cases and 458,000 deaths in 2008 alone1. This malignancy represents a heterogeneous group of tumours with characteristic molecular features, prognosis, and responses to available therapy2–4. Recurrent somatic alterations in breast cancer have been described including mutations and copy number alterations, notably ERBB2 amplifications, the first successful therapy target defined by a genomic aberration5. Prior DNA sequencing studies of breast cancer genomes have revealed additional candidate mutations and gene rearrangements 6–10. Here we report the whole-exome sequences of DNA from 103 human breast cancers of diverse subtypes from patients in Mexico and Vietnam compared to matched-normal DNA, together with whole-genome sequences of 22 breast cancer/normal pairs. Beyond confirming recurrent somatic mutations in PIK3CA11, TP536, AKT112, GATA313, and MAP3K110, we discovered recurrent mutations in the CBFB transcription factor gene and deletions of its partner RUNX1. Furthermore, we have identified a recurrent MAGI3-AKT3 fusion enriched in triple-negative breast cancer lacking estrogen and progesterone receptors and ERBB2 expression. The Magi3-Akt3 fusion leads to constitutive activation of Akt kinase, which is abolished by treatment with an ATP-competitive Akt small-molecule inhibitor. PMID:22722202

  17. Experience using web services for biological sequence analysis

    PubMed Central

    Attwood, Teresa; Chohan, Shahid Nadeem; Côté, Richard; Cudré-Mauroux, Philippe; Falquet, Laurent; Fernandes, Pedro; Finn, Robert D.; Hupponen, Taavi; Korpelainen, Eija; Labarga, Alberto; Laugraud, Aurelie; Lima, Tania; Pafilis, Evangelos; Pagni, Marco; Pettifer, Steve; Phan, Isabelle; Rahman, Nazim

    2008-01-01

    Programmatic access to data and tools through the web using so-called web services has an important role to play in bioinformatics. In this article, we discuss the most popular approaches based on SOAP/WS-I and REST and describe our, a cross section of the community, experiences with providing and using web services in the context of biological sequence analysis. We briefly review main technological approaches as well as best practice hints that are useful for both users and developers. Finally, syntactic and semantic data integration issues with multiple web services are discussed. PMID:18621748

  18. Determining physical constraints in transcriptional initiationcomplexes using DNA sequence analysis

    SciTech Connect

    Shultzaberger, Ryan K.; Chiang, Derek Y.; Moses, Alan M.; Eisen,Michael B.

    2007-07-01

    Eukaryotic gene expression is often under the control ofcooperatively acting transcription factors whose binding is limited bystructural constraints. By determining these structural constraints, wecan understand the "rules" that define functional cooperativity.Conversely, by understanding the rules of binding, we can inferstructural characteristics. We have developed an information theory basedmethod for approximating the physical limitations of cooperativeinteractions by comparing sequence analysis to microarray expressiondata. When applied to the coordinated binding of the sulfur amino acidregulatory protein Met4 by Cbf1 and Met31, we were able to create acombinatorial model that can correctly identify Met4 regulatedgenes.

  19. Cloning and sequence analysis of an actin gene in aloe.

    PubMed

    Wen, S S; He, D W; Liao, C M; Li, J; Wen, G Q; Liu, X H

    2014-07-04

    Aloe (Aloe spp), containing abundant polysaccharides and numerous bioactive ingredients, has remarkable medical, ornamental, calleidic, and edible values. In the present study, the total RNA was extracted from aloe leaf tissue. The isolated high-quality RNA was further used to clone actin gene by using reverse transcription-polymerase chain reaction (RT-PCR). The result of sequence analysis for the amplified fragment revealed that the cloned actin gene was 1012 bp in length (GenBank accession No. KC751541.1) and contained a 924-bp coding region and encoded a protein consisting of 307 amino acids. Homologous alignment showed that it shared over 80 and 96% identity with the nucleotide and amino acid sequences of actin from other plants, respectively. In addition, the cloned gene was used for phylogenetic analyses based on the deduced amino acid sequences, and the results suggested that the actin gene is highly conserved in evolution. The findings of this study will be useful for investigating the expression patterns of other genes in Aloe.

  20. Evolutionary insights from suffix array-based genome sequence analysis.

    PubMed

    Poddar, Anindya; Chandra, Nagasuma; Ganapathiraju, Madhavi; Sekar, K; Klein-Seetharaman, Judith; Reddy, Raj; Balakrishnan, N

    2007-08-01

    Gene and protein sequence analyses, central components of studies in modern biology are easily amenable to string matching and pattern recognition algorithms. The growing need of analysing whole genome sequences more efficiently and thoroughly, has led to the emergence of new computational methods. Suffix trees and suffix arrays are data structures, well known in many other areas and are highly suited for sequence analysis too. Here we report an improvement to the design of construction of suffix arrays. Enhancement in versatility and scalability, enabled by this approach, is demonstrated through the use of real-life examples. The scalability of the algorithm to whole genomes renders it suitable to address many biologically interesting problems. One example is the evolutionary insight gained by analysing unigrams, bi-grams and higher n-grams, indicating that the genetic code has a direct influence on the overall composition of the genome. Further, different proteomes have been analysed for the coverage of the possible peptide space, which indicate that as much as a quarter of the total space at the tetra-peptide level is left un-sampled in prokaryotic organisms, although almost all tri-peptides can be seen in one protein or another in a proteome. Besides, distinct patterns begin to emerge for the counts of particular tetra and higher peptides, indicative of a 'meaning' for tetra and higher n-grams. The toolkit has also been used to demonstrate the usefulness of identifying repeats in whole proteomes efficiently. As an example, 16 members of one COG,coded by the genome of Mycobacterium tuberculosis H37Rv have been found to contain a repeating sequence of 300 amino acids.

  1. Sequence analysis of the Choristoneura occidentalis granulovirus genome.

    PubMed

    Escasa, Shannon R; Lauzon, Hilary A M; Mathur, Amanda C; Krell, Peter J; Arif, Basil M

    2006-07-01

    The genome of the Choristoneura occidentalis granulovirus (ChocGV) isolated from the western spruce budworm, Choristoneura occidentalis, was sequenced completely. It was 104,710 bp long, with a 67.3% A+T content and contained 116 potential open reading frames (ORFs) covering 88.4% of the genome. Of these, 29 ORFs were conserved in all fully sequenced baculovirus genomes, 30 were GV-specific, 53 were present in some nucleopolyhedroviruses (NPVs) and/or GVs, three were common to ChocGV and Choristoneura fumiferana GV (ChfuGV) and one was so far unique. To date, ChocGV is the only GV identified that contains a homologue of the apoptosis inhibitor protein P35/P49, present in some group I NPVs. It is also the first GV without a Xestia c-nigrum GV ORF 26 homologue. Five homologous regions (hrs)/repeat regions, lacking typical NPV hr palindromes were identified. ChocGV hrs were similar to each other but not to other GV hrs. A 1.8 kb repeat region with a high A+T content (81%) and multiple repeats of 21-210 bp was found between choc36 and 37. This area resembled the non-homologous region origin of DNA replication (non-hr ori) identified in Cryptophlebia leucotreta GV (CrleGV) and Cydia pomonella GV (CpGV). Based on the mean amino acid identities of homologous proteins, ChocGV was closest to fully sequenced genomes CpGV (52.3%) and CrleGV (52.1%). The closest amino acid identity was to individual ORFs from the partially sequenced ChfuGV genome (97.2% in 38 ORFs). Phylogenetic analysis placed ChocGV in a clade with CrleGV and CpGV.

  2. Gene expression profile of human bone marrow stromal cells: high-throughput expressed sequence tag sequencing analysis.

    PubMed

    Jia, Libin; Young, Marian F; Powell, John; Yang, Liming; Ho, Nicola C; Hotchkiss, Robert; Robey, Pamela Gehron; Francomano, Clair A

    2002-01-01

    Human bone marrow stromal cells (HBMSC) are pluripotent cells with the potential to differentiate into osteoblasts, chondrocytes, myelosupportive stroma, and marrow adipocytes. We used high-throughput DNA sequencing analysis to generate 4258 single-pass sequencing reactions (known as expressed sequence tags, or ESTs) obtained from the 5' (97) and 3' (4161) ends of human cDNA clones from a HBMSC cDNA library. Our goal was to obtain tag sequences from the maximum number of possible genes and to deposit them in the publicly accessible database for ESTs (dbEST of the National Center for Biotechnology Information). Comparisons of our EST sequencing data with nonredundant human mRNA and protein databases showed that the ESTs represent 1860 gene clusters. The EST sequencing data analysis showed 60 novel genes found only in this cDNA library after BLAST analysis against 3.0 million ESTs in NCBI's dbEST database. The BLAST search also showed the identified ESTs that have close homology to known genes, which suggests that these may be newly recognized members of known gene families. The gene expression profile of this cell type is revealed by analyzing both the frequency with which a message is encountered and the functional categorization of expressed sequences. Comparing an EST sequence with the human genomic sequence database enables assignment of an EST to a specific chromosomal region (a process called digital gene localization) and often enables immediate partial determination of intron/exon boundaries within the genomic structure. It is expected that high-throughput EST sequencing and data mining analysis will greatly promote our understanding of gene expression in these cells and of growth and development of the skeleton.

  3. In Vivo Enhancer Analysis Chromosome 16 Conserved NoncodingSequences

    SciTech Connect

    Pennacchio, Len A.; Ahituv, Nadav; Moses, Alan M.; Nobrega,Marcelo; Prabhakar, Shyam; Shoukry, Malak; Minovitsky, Simon; Visel,Axel; Dubchak, Inna; Holt, Amy; Lewis, Keith D.; Plajzer-Frick, Ingrid; Akiyama, Jennifer; De Val, Sarah; Afzal, Veena; Black, Brian L.; Couronne, Olivier; Eisen, Michael B.; Rubin, Edward M.

    2006-02-01

    The identification of enhancers with predicted specificitiesin vertebrate genomes remains a significant challenge that is hampered bya lack of experimentally validated training sets. In this study, weleveraged extreme evolutionary sequence conservation as a filter toidentify putative gene regulatory elements and characterized the in vivoenhancer activity of human-fish conserved and ultraconserved1 noncodingelements on human chromosome 16 as well as such elements from elsewherein the genome. We initially tested 165 of these extremely conservedsequences in a transgenic mouse enhancer assay and observed that 48percent (79/165) functioned reproducibly as tissue-specific enhancers ofgene expression at embryonic day 11.5. While driving expression in abroad range of anatomical structures in the embryo, the majority of the79 enhancers drove expression in various regions of the developingnervous system. Studying a set of DNA elements that specifically droveforebrain expression, we identified DNA signatures specifically enrichedin these elements and used these parameters to rank all ~;3,400human-fugu conserved noncoding elements in the human genome. The testingof the top predictions in transgenic mice resulted in a three-foldenrichment for sequences with forebrain enhancer activity. These datadramatically expand the catalogue of in vivo-characterized human geneenhancers and illustrate the future utility of such training sets for avariety of iological applications including decoding the regulatoryvocabulary of the human genome.

  4. Congenic mapping and sequence analysis of the Renin locus

    PubMed Central

    Flister, Michael J.; Hoffman, Matthew J.; Reddy, Prajwal; Jacob, Howard J.; Moreno, Carol

    2013-01-01

    Renin was the first blood pressure (BP) quantitative trait locus (QTL) mapped by linkage analysis in the rat. Subsequent BP linkage and congenic studies capturing different portions of the renin region have returned conflicting results, suggesting that multiple interdependent BP loci may be residing in the chromosome 13 BP QTL that includes Renin. We used SS-13BN congenic strains to map 2 BP loci in the Renin region (chr13:45.2–49.0 Mb). We identified a 1.1 Mb protective Brown Norway (BN) region around Renin (chr13:46.1–47.2 Mb) that significantly decreased BP by 32 mmHg. The Renin protective BP locus was offset by an adjacent hypertensive locus (chr13:47.2–49.0 Mb) that significantly increased BP by 29 mmHg. Sequence analysis of the protective and hypertensive BP loci revealed 1,433 and 2,063 variants between Dahl salt-sensitive/Mcwi (SS) and BN rats, respectively. To further reduce the list of candidate variants, we re-genotyped an overlapping SS-13SR congenic strain (S/renrr) with a previously reported BP phenotype. Sequence comparison between SS, Dahl R (SR), and BN reduced the number of candidate variants in the 2 BP loci by 42% for further study. Combined with previous studies, these data suggest that at least 4 BP loci reside within the 30 cM chromosome 13 BP QTL that includes Renin. PMID:23460292

  5. Developmental Sequences of Perceptual-Motor Tasks, Movement Activities for Neurologically Handicapped and Retarded Children and Youth.

    ERIC Educational Resources Information Center

    Cratty, Bryant J.

    Intended for special education and physical education teachers, the handbook presents selected developmental sequences of activities based on the analysis of perceptual motor characteristics of groups of retarded and neurologically handicapped children. Four classifications of children and their perceptual motor characteristics are discussed: the…

  6. Species-Specific Minimal Sequence Motif for Oligodeoxyribonucleotides Activating Mouse TLR9.

    PubMed

    Pohar, Jelka; Lainšček, Duško; Fukui, Ryutaro; Yamamoto, Chikako; Miyake, Kensuke; Jerala, Roman; Benčina, Mojca

    2015-11-01

    Synthetic oligodeoxyribonucleotides (ODNs) containing unmethylated CpG recapitulate the activation of TLR9 by microbial DNA. ODNs are potent stimulators of the immune response in cells expressing TLR9. Despite extensive use of mice as experimental animals in basic and applied immunological research, the key sequence determinants that govern the activation of mouse TLR9 by ODNs have not been well defined. We performed a systematic investigation of the sequence motif of B class phosphodiester ODNs to identify the sequence properties that govern mouse TLR9 activation. In contrast to ODNs activating human TLR9, where the minimal sequence motif for the receptor activation comprises a pair of closely positioned CpGs we found that the mouse TLR9 requires a single CpG positioned 4-6 nt from the 5'-end. Activation is augmented by a 5'TCC sequence one to three nucleotides from the CG. The distance of the CG dinucleotide of four to six nucleotides from the 5'-end and the ODN's length fine-tunes activation of mouse macrophages. Length of the ODN <23 and >29 nt decreases activation of dendritic cells. The ODNs with minimal sequence induce Th1-type cytokine synthesis in dendritic cells and confirm the expression of cell surface markers in B cells. Identification of the minimal sequence provides an insight into the sequence selectivity of mouse TLR9 and points to the differences in the receptor selectivity between species probably as a result of differences in the receptor binding sites.

  7. A method for high-performance sequence analysis using polyvinylidene difluoride membranes with a biphasic reaction column sequencer.

    PubMed

    Reim, D F; Speicher, D W

    1994-01-01

    Methods have been developed for high-sensitivity sequence analysis of proteins electroblotted onto polyvinylidene difluoride (PVDF) membranes using a Hewlett-Packard G1005A protein sequencer. This sequencer normally uses a biphasic (hydrophobic/hydrophilic) reaction column which was designed to accommodate loading and cleanup of samples from diverse solutions. However, the standard column, programs, and chemistry were not designed to accommodate PVDF, which has become a common sequencing support. In this study, a systematic evaluation of the suitability of this sequencer for analysis using PVDF bound samples was performed and included evaluation of: different wash and extraction solvents, multiple programming changes, two alternative formulations of coupling reagents, and the effect of direction for solvent and reagent deliveries. High-performance analysis of PVDF bound samples was achieved by: using a modified reaction column with an empty hydrophobic (top) half of the column module, program modifications for the reaction column and converter, substitution of ethyl acetate for the standard S2/3 extraction solvent and using prototype Version 2.0 formulations of the coupling reagents, R1 and R2. High-performance sequence analyses of experimental samples electroblotted from either 1D or 2D gels onto high-retention PVDF membranes were obtained with a 41-min cycle time, including experimental samples with initial coupling yields < 2 pmol. Routine sequencer performance was comparable to, or slightly better than, a conventional gas-phase sequencer which had been previously optimized by us for high-performance sequence analysis of electroblotted samples in the low pmol range.

  8. Fine mapping of genome activation in bovine embryos by RNA sequencing

    PubMed Central

    Graf, Alexander; Krebs, Stefan; Zakhartchenko, Valeri; Schwalb, Björn; Blum, Helmut; Wolf, Eckhard

    2014-01-01

    During maternal-to-embryonic transition control of embryonic development gradually switches from maternal RNAs and proteins stored in the oocyte to gene products generated after embryonic genome activation (EGA). Detailed insight into the onset of embryonic transcription is obscured by the presence of maternal transcripts. Using the bovine model system, we established by RNA sequencing a comprehensive catalogue of transcripts in germinal vesicle and metaphase II oocytes, and in embryos at the four-cell, eight-cell, 16-cell, and blastocyst stages. These were produced by in vitro fertilization of Bos taurus taurus oocytes with sperm from a Bos taurus indicus bull to facilitate parent-specific transcriptome analysis. Transcripts from 12.4 to 13.7 × 103 different genes were detected in the various developmental stages. EGA was analyzed by (i) detection of embryonic transcripts, which are not present in oocytes; (ii) detection of transcripts from the paternal allele; and (iii) detection of primary transcripts with intronic sequences. These strategies revealed (i) 220, (ii) 937, and (iii) 6,848 genes to be activated from the four-cell to the blastocyst stage. The largest proportion of gene activation [i.e., (i) 59%, (ii) 42%, and (iii) 58%] was found in eight-cell embryos, indicating major EGA at this stage. Gene ontology analysis of genes activated at the four-cell stage identified categories related to RNA processing, translation, and transport, consistent with preparation for major EGA. Our study provides the largest transcriptome data set of bovine oocyte maturation and early embryonic development and detailed insight into the timing of embryonic activation of specific genes. PMID:24591639

  9. Protein evolution analysis of S-hydroxynitrile lyase by complete sequence design utilizing the INTMSAlign software.

    PubMed

    Nakano, Shogo; Asano, Yasuhisa

    2015-02-03

    Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.

  10. DNA Sequence Analysis of SLC26A5, Encoding Prestin, in a Patient-Control Cohort: Identification of Fourteen Novel DNA Sequence Variations

    PubMed Central

    Minor, Jacob S.; Tang, Hsiao-Yuan; Pereira, Fred A.; Alford, Raye Lynn

    2009-01-01

    Background Prestin, encoded by the gene SLC26A5, is a transmembrane protein of the cochlear outer hair cell (OHC). Prestin is required for the somatic electromotile activity of OHCs, which is absent in OHCs and causes severe hearing impairment in mice lacking prestin. In humans, the role of sequence variations in SLC26A5 in hearing loss is less clear. Although prestin is expected to be required for functional human OHCs, the clinical significance of reported putative mutant alleles in humans is uncertain. Methodology/Principal Findings To explore the hypothesis that SLC26A5 may act as a modifier gene, affecting the severity of hearing loss caused by an independent etiology, a patient-control cohort was screened for DNA sequence variations in SLC26A5 using sequencing and allele specific methods. Patients in this study carried known pathogenic or controversial sequence variations in GJB2, encoding Connexin 26, or confirmed or suspected sequence variations in SLC26A5; controls included four ethnic populations. Twenty-three different DNA sequence variations in SLC26A5, 14 of which are novel, were observed: 4 novel sequence variations were found exclusively among patients; 7 novel sequence variations were found exclusively among controls; and, 12 sequence variations, 3 of which are novel, were found in both patients and controls. Twenty-one of the 23 DNA sequence variations were located in non-coding regions of SLC26A5. Two coding sequence variations, both novel, were observed only in patients and predict a silent change, p.S434S, and an amino acid substitution, p.I663V. In silico analysis of the p.I663V amino acid variation suggested this variant might be benign. Using Fisher's exact test, no statistically significant difference was observed between patients and controls in the frequency of the identified DNA sequence variations. Haplotype analysis using HaploView 4.0 software revealed the same predominant haplotype in patients and controls and derived haplotype blocks

  11. Efficient DNA fingerprinting based on the targeted sequencing of active retrotransposon insertion sites using a bench-top high-throughput sequencing platform.

    PubMed

    Monden, Yuki; Yamamoto, Ayaka; Shindo, Akiko; Tahara, Makoto

    2014-10-01

    In many crop species, DNA fingerprinting is required for the precise identification of cultivars to protect the rights of breeders. Many families of retrotransposons have multiple copies throughout the eukaryotic genome and their integrated copies are inherited genetically. Thus, their insertion polymorphisms among cultivars are useful for DNA fingerprinting. In this study, we conducted a DNA fingerprinting based on the insertion polymorphisms of active retrotransposon families (Rtsp-1 and LIb) in sweet potato. Using 38 cultivars, we identified 2,024 insertion sites in the two families with an Illumina MiSeq sequencing platform. Of these insertion sites, 91.4% appeared to be polymorphic among the cultivars and 376 cultivar-specific insertion sites were identified, which were converted directly into cultivar-specific sequence-characterized amplified region (SCAR) markers. A phylogenetic tree was constructed using these insertion sites, which corresponded well with known pedigree information, thereby indicating their suitability for genetic diversity studies. Thus, the genome-wide comparative analysis of active retrotransposon insertion sites using the bench-top MiSeq sequencing platform is highly effective for DNA fingerprinting without any requirement for whole genome sequence information. This approach may facilitate the development of practical polymerase chain reaction-based cultivar diagnostic system and could also be applied to the determination of genetic relationships.

  12. Efficient DNA Fingerprinting Based on the Targeted Sequencing of Active Retrotransposon Insertion Sites Using a Bench-Top High-Throughput Sequencing Platform

    PubMed Central

    Monden, Yuki; Yamamoto, Ayaka; Shindo, Akiko; Tahara, Makoto

    2014-01-01

    In many crop species, DNA fingerprinting is required for the precise identification of cultivars to protect the rights of breeders. Many families of retrotransposons have multiple copies throughout the eukaryotic genome and their integrated copies are inherited genetically. Thus, their insertion polymorphisms among cultivars are useful for DNA fingerprinting. In this study, we conducted a DNA fingerprinting based on the insertion polymorphisms of active retrotransposon families (Rtsp-1 and LIb) in sweet potato. Using 38 cultivars, we identified 2,024 insertion sites in the two families with an Illumina MiSeq sequencing platform. Of these insertion sites, 91.4% appeared to be polymorphic among the cultivars and 376 cultivar-specific insertion sites were identified, which were converted directly into cultivar-specific sequence-characterized amplified region (SCAR) markers. A phylogenetic tree was constructed using these insertion sites, which corresponded well with known pedigree information, thereby indicating their suitability for genetic diversity studies. Thus, the genome-wide comparative analysis of active retrotransposon insertion sites using the bench-top MiSeq sequencing platform is highly effective for DNA fingerprinting without any requirement for whole genome sequence information. This approach may facilitate the development of practical polymerase chain reaction-based cultivar diagnostic system and could also be applied to the determination of genetic relationships. PMID:24935865

  13. ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains

    PubMed Central

    Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz

    2016-01-01

    With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734

  14. ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains.

    PubMed

    Torre, Emiliano; Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz; Grün, Sonja

    2016-07-01

    With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity.

  15. Cloning and Sequence Analysis of Two Pseudomonas Flavoprotein Xenobiotic Reductases

    PubMed Central

    Blehert, David S.; Fox, Brian G.; Chambliss, Glenn H.

    1999-01-01

    The genes encoding flavin mononucleotide-containing oxidoreductases, designated xenobiotic reductases, from Pseudomonas putida II-B and P. fluorescens I-C that removed nitrite from nitroglycerin (NG) by cleavage of the nitroester bond were cloned, sequenced, and characterized. The P. putida gene, xenA, encodes a 39,702-Da monomeric, NAD(P)H-dependent flavoprotein that removes either the terminal or central nitro groups from NG and that reduces 2-cyclohexen-1-one but did not readily reduce 2,4,6-trinitrotoluene (TNT). The P. fluorescens gene, xenB, encodes a 37,441-Da monomeric, NAD(P)H-dependent flavoprotein that exhibits fivefold regioselectivity for removal of the central nitro group from NG and that transforms TNT but did not readily react with 2-cyclohexen-1-one. Heterologous expression of xenA and xenB was demonstrated in Escherichia coli DH5α. The transcription initiation sites of both xenA and xenB were identified by primer extension analysis. BLAST analyses conducted with the P. putida xenA and the P. fluorescens xenB sequences demonstrated that these genes are similar to several other bacterial genes that encode broad-specificity flavoprotein reductases. The prokaryotic flavoprotein reductases described herein likely shared a common ancestor with old yellow enzyme of yeast, a broad-specificity enzyme which may serve a detoxification role in antioxidant defense systems. PMID:10515912

  16. Whale song analyses using bioinformatics sequence analysis approaches

    NASA Astrophysics Data System (ADS)

    Chen, Yian A.; Almeida, Jonas S.; Chou, Lien-Siang

    2005-04-01

    Animal songs are frequently analyzed using discrete hierarchical units, such as units, themes and songs. Because animal songs and bio-sequences may be understood as analogous, bioinformatics analysis tools DNA/protein sequence alignment and alignment-free methods are proposed to quantify the theme similarities of the songs of false killer whales recorded off northeast Taiwan. The eighteen themes with discrete units that were identified in an earlier study [Y. A. Chen, masters thesis, University of Charleston, 2001] were compared quantitatively using several distance metrics. These metrics included the scores calculated using the Smith-Waterman algorithm with the repeated procedure; the standardized Euclidian distance and the angle metrics based on word frequencies. The theme classifications based on different metrics were summarized and compared in dendrograms using cluster analyses. The results agree with earlier classifications derived by human observation qualitatively. These methods further quantify the similarities among themes. These methods could be applied to the analyses of other animal songs on a larger scale. For instance, these techniques could be used to investigate song evolution and cultural transmission quantifying the dissimilarities of humpback whale songs across different seasons, years, populations, and geographic regions. [Work supported by SC Sea Grant, and Ilan County Government, Taiwan.

  17. Harmonic Analysis of Sedimentary Cyclic Sequences in Kansas, Midcontinent, USA

    USGS Publications Warehouse

    Merriam, D.F.; Robinson, J.E.

    1997-01-01

    Several stratigraphic sequences in the Upper Carboniferous (Pennsylvanian) in Kansas (Midcontinent, USA) were analyzed quantitatively for periodic repetitions. The sequences were coded by lithologic type into strings of datasets. The strings then were analyzed by an adaptation of a one-dimensional Fourier transform analysis and examined for evidence of periodicity. The method was tested using different states in coding to determine the robustness of the method and data. The most persistent response is in multiples of 8-10 ft (2.5-3.0 m) and probably is dependent on the depositional thickness of the original lithologic units. Other cyclicities occurred in multiples of the basic frequency of 8-10 with persistent ones at 22 and 30 feet (6.5-9.0 m) and large ones at 80 and 160 feet (25-50 m). These levels of thickness relate well to the basic cyclothem and megacyclothem as measured on outcrop. We propose that this approach is a suitable one for analyzing cyclic events in the stratigraphic record.

  18. A DNA sequence analysis program for the Apple Macintosh.

    PubMed Central

    Gross, R H

    1986-01-01

    This paper describes a new set of programs for analyzing DNA sequences using the Apple Macintosh computer, a computer ideally suited for this kind of analysis. Because of the Macintosh interface and the availability of high quality software-only speech synthesis, these programs are truly easy to use. Instead of typing in commands, the user directs the program by making selections with the mouse, thereby eliminating most typographical and syntax errors. Output options are selected by "pressing buttons" and then clicking "OK" with the mouse. DNA sequences are confirmed by having the program speak them. The high resolution graphics on the Macintosh not only allow for explanatory diagrams to be used to aid in deciding on input parameters, but can be used to produce slides for presentations and figures for papers. Because of the clipboard and the ability of the Macintosh to readily share data among different applications, data can be saved for use directly in word processing documents (e.g. manuscripts). PMID:3003685

  19. Power analysis of single-cell RNA-sequencing experiments.

    PubMed

    Svensson, Valentine; Natarajan, Kedar Nath; Ly, Lam-Ha; Miragaia, Ricardo J; Labalette, Charlotte; Macaulay, Iain C; Cvejic, Ana; Teichmann, Sarah A

    2017-04-01

    Single-cell RNA sequencing (scRNA-seq) has become an established and powerful method to investigate transcriptomic cell-to-cell variation, thereby revealing new cell types and providing insights into developmental processes and transcriptional stochasticity. A key question is how the variety of available protocols compare in terms of their ability to detect and accurately quantify gene expression. Here, we assessed the protocol sensitivity and accuracy of many published data sets, on the basis of spike-in standards and uniform data processing. For our workflow, we developed a flexible tool for counting the number of unique molecular identifiers (https://github.com/vals/umis/). We compared 15 protocols computationally and 4 protocols experimentally for batch-matched cell populations, in addition to investigating the effects of spike-in molecular degradation. Our analysis provides an integrated framework for comparing scRNA-seq protocols.

  20. Sequence and comparative analysis of Leuconostoc dairy bacteriophages.

    PubMed

    Kot, Witold; Hansen, Lars H; Neve, Horst; Hammer, Karin; Jacobsen, Susanne; Pedersen, Per D; Sørensen, Søren J; Heller, Knut J; Vogensen, Finn K

    2014-04-17

    Bacteriophages attacking Leuconostoc species may significantly influence the quality of the final product. There is however limited knowledge of this group of phages in the literature. We have determined the complete genome sequences of nine Leuconostoc bacteriophages virulent to either Leuconostoc mesenteroides or Leuconostoc pseudomesenteroides strains. The phages have dsDNA genomes with sizes ranging from 25.7 to 28.4 kb. Comparative genomics analysis helped classify the 9 phages into two classes, which correlates with the host species. High percentage of similarity within the classes on both nucleotide and protein levels was observed. Genome comparison also revealed very high conservation of the overall genomic organization between the classes. The genes were organized in functional modules responsible for replication, packaging, head and tail morphogenesis, cell lysis and regulation and modification, respectively. No lysogeny modules were detected. To our knowledge this report provides the first comparative genomic work done on Leuconostoc dairy phages.

  1. Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

    PubMed Central

    2012-01-01

    Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742

  2. Accident Sequence Evaluation Program: Human reliability analysis procedure

    SciTech Connect

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs.

  3. Activity-rotation relations for lower main sequence stars

    NASA Astrophysics Data System (ADS)

    Dobson-Hockey, Andrea Kay

    It was known for some time that stellar rotation and activity are related, both for chromospheric activity and control activity. Younger, more rapidly rotating stars of a given spectral type generally show higher levels of activity than do older, more slowly rotating stars. On the Sun acitivity is distinctly related to magnetic fields. This leads to the suggestion that activity, at least in solar-type stars, is traceable to a magnetic dynamo which results from the interaction of rotation and differential rotation with convection. The more efficient the coriolis forces are at introducing helicity into convective motions, the more the magnetic field will be amplified and the more activity that is expected. The precise nature of the relationship between magnetic fields, rotation, and activity remains to be well-defined. It is the purpose to examine the relationship between activity and rotation in order to better define and express such a relation (or relations). To meet this goal, a comprehensive sample of stars was collected from the published literature having two or more of the following: chromospheric Ca II, H, and K emission indices; coronal soft X-ray illumination; rotation rates; and where possible, ages. It is seen that the use of normalized activity units and Rossby number generally improves the correlation between activity and rotation. The use of the convective turnover time further permits a possible explanation for the distribution of stars in an activity-color diagram. A large and homogeneous data set permits better definition of previously examined functional dependencies such as the time decay of activity and the relationship between chromospheric and coronal activity indicators.

  4. Complete Genome Sequence and Methylome Analysis of Acinetobacter calcoaceticus 65

    PubMed Central

    Fomenkov, Alexey; Vincze, Tamas; Degtyarev, Sergey K.

    2017-01-01

    ABSTRACT Acinetobacter calcoaceticus 65 is the original source strain for the restriction enzyme Acc65I. Its complete sequence and full methylome were determined using single-molecule real-time (SMRT) sequencing. PMID:28336599

  5. Activity-rotation relations for lower main-sequence stars

    SciTech Connect

    Dobson-Hockey, A.K.

    1987-01-01

    It has been known for some time that stellar rotation and activity are related, both for chromospheric activity (e.g., Noyes et al. 1984) and coronal activity (e.g., Pallavicini et al. 1981; Maggio et al. 1987). Younger, more rapidly rotating stars of a given spectral type generally show higher levels of activity than do older, more slowly rotating stars. On the Sun, activity is distinctly related to magnetic fields. This leads to the suggestion that activity, at least in solar-type stars, is traceable to a magnetic dynamo which results from the interaction of rotation and differential rotation with convection. The more efficient the coriolis forces are at introducing helicity into convective motions, the more the magnetic field will be amplified and the more activity we may expect to see. The precise nature of the relationship between magnetic fields, rotation, and activity remains to be well-defined. This thesis examines the relationship between activity (both chromospheric and coronal) and rotation in order to better define and express such a relation (or relations).

  6. Patterns of Educational Activities: Discontinuities and Sequences. Report No. 222.

    ERIC Educational Resources Information Center

    Karweit, Nancy

    Using a retrospective life history sample (LHS), the educational activities of white and black men from age 14 to age 30 were determined. A lack of association of family background characteristics with resumption of schooling activities after labor force entry was found for both blacks and whites. Attainment level was related to the likelihood of…

  7. Analysis of pre-main-sequence delta-Scuti stars

    NASA Astrophysics Data System (ADS)

    Casey, Michael Patrick

    Information on 72 confirmed or candidate pre-main-sequence delta-Scuti stars is collected and analysed to varying degree of sophistication and completeness. A systematic asteroseismic analysis of around 40 of these stars is performed, putting significant luminosity constraints on many of them simply by comparing the pulsation spectra of the stars to the fundamental and acoustic cut-off frequencies of a dense grid of stellar models. One star in particular, V1366 Ori, appears to be pulsating at or near the acoustic cut-off frequency. Many stars are found to otherwise defy proper asteroseismic analysis, in that matches between observed pulsation spectra and computed values are not able to be found. A simple test reveals that the most likely cause for these problems are the high stellar-rotation rates typically found in this class of star, with v sin i most typically between 60 and 200 km/s. The high rotation rates are found to significantly modify the pulsation spectrum of a star compared to a non-rotating star. These collective results reveal the richness and variety of phenomena within this group of stars, with stars pulsating anywhere from the lowest to the highest possible radial orders, including radial orders just below the acoustic cut-off frequency of some stars. Pulsation in non-radial orders is the normal case, not the exception to the rule, with all stars displaying low-amplitude delta-Scuti variability only.

  8. Transcriptome Sequencing and Positive Selected Genes Analysis of Bombyx mandarina

    PubMed Central

    Wu, Yuqian; Long, Renwen; Liu, Chun; Xia, Qingyou

    2015-01-01

    The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG) and posterior silk gland (PSG). Three sericin genes (sericin 1, sericin 2, and sericin 3) were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25) were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs) and 361 insertion-deletions (INDELs) were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research. PMID:25806526

  9. Automated carboxy-terminal sequence analysis of peptides and proteins using diphenyl phosphoroisothiocyanatidate.

    PubMed Central

    Bailey, J. M.; Nikfarjam, F.; Shenoy, N. R.; Shively, J. E.

    1992-01-01

    Proteins and peptides can be sequenced from the carboxy-terminus with isothiocyanate reagents to produce amino acid thiohydantoin derivatives. Previous studies in our laboratory have focused on the automation of the thiocyanate chemistry using acetic anhydride and trimethylsilylisothiocyanate (TMS-ITC) to derivatize the C-terminal amino acid to a thiohydantoin and sodium trimethylsilanolate for specific hydrolysis of the derivatized C-terminal amino acid (Bailey, J.M., Shenoy, N.R., Ronk, M., & Shively, J.E., 1992, Protein Sci. 1, 68-80). A major limitation of this approach was the need to activate the C-terminus with acetic anhydride. We now describe the use of a new reagent, diphenyl phosphoroisothiocyanatidate (DPP-ITC) and pyridine, which combines the activation and derivatization steps to produce peptidylthiohydantoins. Previous work by Kenner et al. (Kenner, G.W., Khorana, H.G., & Stedman, R.J., 1953, Chem. Soc. J., 673-678) with this reagent demonstrated slow kinetics. Several days were required for complete reaction. We show here that the inclusion of pyridine was found to promote the formation of C-terminal thiohydantoins by DPP-ITC resulting in complete conversion of the C-terminal amino acid to a thiohydantoin in less than 1 h. Reagents such as imidazole, triazine, and tetrazole were also found to promote the reaction with DPP-ITC as effectively as pyridine. General base catalysts, such as triethylamine, do not promote the reaction, but are required to convert the C-terminal carboxylic acid to a salt prior to the reaction with DPP-ITC and pyridine. By introducing the DPP-ITC reagent and pyridine in separate steps in an automated sequencer, we observed improved sequencing yields for amino acids normally found difficult to derivatize with acetic anhydride/TMS-ITC. This was particularly true for aspartic acid, which now can be sequenced in yields comparable to most of the other amino acids. Automated programs are described for the C-terminal sequencing of

  10. Exome and genome sequencing of nasopharynx cancer identifies NF-κB pathway activating mutations

    PubMed Central

    Li, Yvonne Y; Chung, Grace T. Y.; Lui, Vivian W. Y.; To, Ka-Fai; Ma, Brigette B. Y.; Chow, Chit; Woo, John K, S.; Yip, Kevin Y.; Seo, Jeongsun; Hui, Edwin P.; Mak, Michael K. F.; Rusan, Maria; Chau, Nicole G.; Or, Yvonne Y. Y.; Law, Marcus H. N.; Law, Peggy P. Y.; Liu, Zoey W. Y.; Ngan, Hoi-Lam; Hau, Pok-Man; Verhoeft, Krista R.; Poon, Peony H. Y.; Yoo, Seong-Keun; Shin, Jong-Yeon; Lee, Sau-Dan; Lun, Samantha W. M.; Jia, Lin; Chan, Anthony W. H.; Chan, Jason Y. K.; Lai, Paul B. S.; Fung, Choi-Yi; Hung, Suet-Ting; Wang, Lin; Chang, Ann Margaret V.; Chiosea, Simion I.; Hedberg, Matthew L.; Tsao, Sai-Wah; van Hasselt, Andrew C.; Chan, Anthony T. C.; Grandis, Jennifer R.; Hammerman, Peter S.; Lo, Kwok-Wai

    2017-01-01

    Nasopharyngeal carcinoma (NPC) is an aggressive head and neck cancer characterized by Epstein-Barr virus (EBV) infection and dense lymphocyte infiltration. The scarcity of NPC genomic data hinders the understanding of NPC biology, disease progression and rational therapy design. Here we performed whole-exome sequencing (WES) on 111 micro-dissected EBV-positive NPCs, with 15 cases subjected to further whole-genome sequencing (WGS), to determine its mutational landscape. We identified enrichment for genomic aberrations of multiple negative regulators of the NF-κB pathway, including CYLD, TRAF3, NFKBIA and NLRC5, in a total of 41% of cases. Functional analysis confirmed inactivating CYLD mutations as drivers for NPC cell growth. The EBV oncoprotein latent membrane protein 1 (LMP1) functions to constitutively activate NF-κB signalling, and we observed mutual exclusivity among tumours with somatic NF-κB pathway aberrations and LMP1-overexpression, suggesting that NF-κB activation is selected for by both somatic and viral events during NPC pathogenesis. PMID:28098136

  11. Cloning of two glutamate dehydrogenase cDNAs from Asparagus officinalis: sequence analysis and evolutionary implications.

    PubMed

    Pavesi, A; Ficarelli, A; Tassi, F; Restivo, F M

    2000-04-01

    Two different amplification products, termed c1 and c2, showing a high similarity to glutamate dehydrogenase sequences from plants, were obtained from Asparagus officinalis using two degenerated primers and RT-PCR (reverse transcriptase polymerase chain reaction). The genes corresponding to these cDNA clones were designated aspGDHA and aspGDHB. Screening of a cDNA library resulted in the isolation of cDNA clones for aspGDHB only. Analysis of the deduced amino acid (aa) sequence from the full-length cDNA suggests that the gene product contains all regions associated with metabolic function of NAD glutamate dehydrogenase (NAD-GDH). A first phylogenetic analysis including only GDHs from plants suggested that the two GDH genes of A. officinalis arose by an ancient duplication event, pre-dating the divergence of monocots and dicots. Codon usage analysis showed a bias towards A/T ending codons. This tendency is likely due to the biased nucleotide composition of the asparagus genome, rather than to the translational selection for specific codons. Using principal coordinate analysis, the evolutionary relatedness of plant GDHs with homologous sequences from a large spectrum of organisms was investigated. The results showed a closer affinity of plant GDHs to GDHs of thermophilic archaebacterial and eubacterial species, when compared to those of unicellular eukaryotic fungi. Sequence analysis at specific amino acid signatures, known to affect the thermal stability of GDH, and assays of enzyme activity at non-physiological temperatures, showed a greater adaptation to heat-stress conditions for the asparagus and tobacco enzymes compared with the Saccharomyces cerevisiae enzyme.

  12. Genome sequencing and analysis of the biomass-degrading fungus Trichoderma reesei (syn. Hypocrea jecorina).

    PubMed

    Martinez, Diego; Berka, Randy M; Henrissat, Bernard; Saloheimo, Markku; Arvas, Mikko; Baker, Scott E; Chapman, Jarod; Chertkov, Olga; Coutinho, Pedro M; Cullen, Dan; Danchin, Etienne G J; Grigoriev, Igor V; Harris, Paul; Jackson, Melissa; Kubicek, Christian P; Han, Cliff S; Ho, Isaac; Larrondo, Luis F; de Leon, Alfredo Lopez; Magnuson, Jon K; Merino, Sandy; Misra, Monica; Nelson, Beth; Putnam, Nicholas; Robbertse, Barbara; Salamov, Asaf A; Schmoll, Monika; Terry, Astrid; Thayer, Nina; Westerholm-Parvinen, Ann; Schoch, Conrad L; Yao, Jian; Barabote, Ravi; Barbote, Ravi; Nelson, Mary Anne; Detter, Chris; Bruce, David; Kuske, Cheryl R; Xie, Gary; Richardson, Paul; Rokhsar, Daniel S; Lucas, Susan M; Rubin, Edward M; Dunn-Coleman, Nigel; Ward, Michael; Brettin, Thomas S

    2008-05-01

    Trichoderma reesei is the main industrial source of cellulases and hemicellulases used to depolymerize biomass to simple sugars that are converted to chemical intermediates and biofuels, such as ethanol. We assembled 89 scaffolds (sets of ordered and oriented contigs) to generate 34 Mbp of nearly contiguous T. reesei genome sequence comprising 9,129 predicted gene models. Unexpectedly, considering the industrial utility and effectiveness of the carbohydrate-active enzymes of T. reesei, its genome encodes fewer cellulases and hemicellulases than any other sequenced fungus able to hydrolyze plant cell wall polysaccharides. Many T. reesei genes encoding carbohydrate-active enzymes are distributed nonrandomly in clusters that lie between regions of synteny with other Sordariomycetes. Numerous genes encoding biosynthetic pathways for secondary metabolites may promote survival of T. reesei in its competitive soil habitat, but genome analysis provided little mechanistic insight into its extraordinary capacity for protein secretion. Our analysis, coupled with the genome sequence data, provides a roadmap for constructing enhanced T. reesei strains for industrial applications such as biofuel production.

  13. PCR-Activated Cell Sorting for Cultivation-Free Enrichment and Sequencing of Rare Microbes

    PubMed Central

    Lim, Shaun W.; Tran, Tuan M.; Abate, Adam R.

    2015-01-01

    Microbial systems often exhibit staggering diversity, making the study of rare, interesting species challenging. For example, metagenomic analyses of mixed-cell populations are often dominated by the sequences of the most abundant organisms, while those of rare microbes are detected only at low levels, if at all. To overcome this, selective cultivation or fluorescence-activated cell sorting (FACS) can be used to enrich for the target species prior to sequence analysis; however, since most microbes cannot be grown in the lab, cultivation strategies often fail, while cell sorting requires techniques to uniquely label the cell type of interest, which is often not possible with uncultivable microbes. Here, we introduce a culture-independent strategy for sorting microbial cells based on genomic content, which we term PCR-activated cell sorting (PACS). This technology, which utilizes the power of droplet-based microfluidics, is similar to FACS in that it uses a fluorescent signal to uniquely identify and sort target species. However, PACS differs importantly from FACS in that the signal is generated by performing PCR assays on the cells in microfluidic droplets, allowing target cells to be identified with high specificity with suitable design of PCR primers and TaqMan probes. The PACS assay is general, requires minimal optimization and, unlike antibody methods, can be developed without access to microbial antigens. Compared to non-specific methods in which cells are sorted based on size, granularity, or the ability to take up dye, PACS enables genetic sequence-specific sorting and recovery of the cell genomes. In addition to sorting microbes, PACS can be applied to eukaryotic cells, viruses, and naked nucleic acids. PMID:25629401

  14. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    PubMed

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities.

  15. Evolution of activity signatures during the main sequence phase

    NASA Astrophysics Data System (ADS)

    Skumanich, A.; MacGregor, K.

    Recent work on the decay of magnetic activity signatures, such as chromospheric/transition region/coronal emission as well as mean flare emission, with age for solar and later type stars is reviewed. In terms of magnetic flux, as measured by excess chromospheric Ca II luminosity, it is shown that a simple dynamo-rotation relation that incorporates both a saturated state with its characteristic critical rotation as well as an asymptotic linear power law, i.e., a scale free relation, fits the extant data that includes the dMe stars. Introducing the saturated dynamo state, as exemplified by the dMe stars, into activity power-power diagrams, allows for not only specification of the saturated state, but for definition of evolutionary tracks that represent the decay from the saturated state. Using the quiescent coronal X-ray power (luminosity) as a basic measure of magnetic activity, simple monomial relations for both the saturated state (linear) and for the evolutionary tracks governing both quiescent activity and mean flare activity are found. In particular, the coronal power loss is found to vary quadratically with the chromospheric power loss, hence with magnetic flux.

  16. The European Classical Swine Fever Virus Database: Blueprint for a Pathogen-Specific Sequence Database with Integrated Sequence Analysis Tools.

    PubMed

    Postel, Alexander; Schmeiser, Stefanie; Zimmermann, Bernd; Becher, Paul

    2016-11-07

    Molecular epidemiology has become an indispensable tool in the diagnosis of diseases and in tracing the infection routes of pathogens. Due to advances in conventional sequencing and the development of high throughput technologies, the field of sequence determination is in the process of being revolutionized. Platforms for sharing sequence information and providing standardized tools for phylogenetic analyses are becoming increasingly important. The database (DB) of the European Union (EU) and World Organisation for Animal Health (OIE) Reference Laboratory for classical swine fever offers one of the world's largest semi-public virus-specific sequence collections combined with a module for phylogenetic analysis. The classical swine fever (CSF) DB (CSF-DB) became a valuable tool for supporting diagnosis and epidemiological investigations of this highly contagious disease in pigs with high socio-economic impacts worldwide. The DB has been re-designed and now allows for the storage and analysis of traditionally used, well established genomic regions and of larger genomic regions including complete viral genomes. We present an application example for the analysis of highly similar viral sequences obtained in an endemic disease situation and introduce the new geographic "CSF Maps" tool. The concept of this standardized and easy-to-use DB with an integrated genetic typing module is suited to serve as a blueprint for similar platforms for other human or animal viruses.

  17. The European Classical Swine Fever Virus Database: Blueprint for a Pathogen-Specific Sequence Database with Integrated Sequence Analysis Tools

    PubMed Central

    Postel, Alexander; Schmeiser, Stefanie; Zimmermann, Bernd; Becher, Paul

    2016-01-01

    Molecular epidemiology has become an indispensable tool in the diagnosis of diseases and in tracing the infection routes of pathogens. Due to advances in conventional sequencing and the development of high throughput technologies, the field of sequence determination is in the process of being revolutionized. Platforms for sharing sequence information and providing standardized tools for phylogenetic analyses are becoming increasingly important. The database (DB) of the European Union (EU) and World Organisation for Animal Health (OIE) Reference Laboratory for classical swine fever offers one of the world’s largest semi-public virus-specific sequence collections combined with a module for phylogenetic analysis. The classical swine fever (CSF) DB (CSF-DB) became a valuable tool for supporting diagnosis and epidemiological investigations of this highly contagious disease in pigs with high socio-economic impacts worldwide. The DB has been re-designed and now allows for the storage and analysis of traditionally used, well established genomic regions and of larger genomic regions including complete viral genomes. We present an application example for the analysis of highly similar viral sequences obtained in an endemic disease situation and introduce the new geographic “CSF Maps” tool. The concept of this standardized and easy-to-use DB with an integrated genetic typing module is suited to serve as a blueprint for similar platforms for other human or animal viruses. PMID:27827988

  18. Gene cloning, sequence analysis, purification, and characterization of a thermostable aminoacylase from Bacillus stearothermophilus.

    PubMed Central

    Sakanyan, V; Desmarez, L; Legrain, C; Charlier, D; Mett, I; Kochikyan, A; Savchenko, A; Boyen, A; Falmagne, P; Pierard, A

    1993-01-01

    A genomic DNA fragment encoding aminoacylase activity of the eubacterium Bacillus stearothermophilus was cloned into Escherichia coli. Transformants expressing aminoacylase activity were selected by their ability to complement E. coli mutants defective in acetylornithine deacetylase activity, the enzyme that converts N-acetylornithine to ornithine in the arginine biosynthetic pathway. The 2.3-kb cloned fragment has been entirely sequenced. Analysis of the sequence revealed two open reading frames, one of which encoded the aminoacylase. B. stearothermophilus aminoacylase, produced in E. coli, was purified to near homogeneity in three steps, one of which took advantage of the intrinsic thermostability of the enzyme. The enzyme exists as homotetramer of 43-kDa subunits as shown by cross-linking experiments. The deacetylating capacity of purified aminoacylase varies considerably depending on the nature of the amino acid residue in the substrate. The enzyme hydrolyzes N-acyl derivatives of aromatic amino acids most efficiently. Comparison of the predicted amino acid sequence of B. stearothermophilus aminoacylase with those of eubacterial acetylornithine deacylase, succinyldiaminopimelate desuccinylase, carboxypeptidase G2, and eukaryotic aminoacylase I suggests a common origin for these enzymes. Images PMID:8285691

  19. Non-invasive Estimation of Global Activation Sequence using the Extended Kalman Filter

    PubMed Central

    Liu, Chenguang; He, Bin

    2011-01-01

    A new algorithm for three-dimensional (3D) imaging of the activation sequence from noninvasive body surface potentials is proposed. After formulating the nonlinear relationship between the 3D activation sequence and the body surface recordings during activation, the extended Kalman filter (EKF) is utilized to estimate the activation sequence in a recursive way. The state vector containing the activation sequence is optimized during iteration by updating the error covariance matrix. A new regularization scheme is incorporated into the “predict” procedure of EKF to tackle the ill-posedness of the inverse problem. The EKF based algorithm shows good performance in simulation under single-site pacing. Between the estimated activation sequences and true values, the average correlation coefficient (CC) is 0.95, and the relative error (RE) is 0.13. The average localization error (LE) when localizing the pacing site is 3.0 mm. Good results are also obtained under dual-site pacing (CC = 0.93, RE = 0.16, LE = 4.3 mm). Furthermore, the algorithm shows robustness to noise. The present promising results demonstrate that the proposed EKF-based inverse approach can noninvasively estimate the 3D activation sequence with good accuracy and the new algorithm shows good features due to the application of EKF. PMID:20716498

  20. Functional annotation of proteomic sequences based on consensus of sequence and structural analysis.

    PubMed

    Kitson, David H; Badretdinov, Azat; Zhu, Zhan-yang; Velikanov, Mikhail; Edwards, David J; Olszewski, Krzysztof; Szalma, Sándor; Yan, Lisa

    2002-03-01

    To maximise the assignment of function of the proteins encoded by a genome and to aid the search for novel drug targets, there is an emerging need for sensitive methods of predicting protein function on a genome-wide basis. GeneAtlas is an automated, high-throughput pipeline for the prediction of protein structure and function using sequence similarity detection, homology modelling and fold recognition methods. GeneAtlas is described in detail here. To test GeneAtlas, a 'virtual' genome was used, a subset of PDB structures from the SCOP database, in which the functional relationships are known. GeneAtlas detects additional relationships by building 3D models in comparison with the sequence searching method PSI-BLAST. Functionally related proteins with sequence identity below the twilight zone can be recognised correctly.

  1. Cloning and sequence analysis of banana streak virus DNA.

    PubMed

    Harper, G; Hull, R

    1998-01-01

    Banana streak virus (BSV), a member of the Badnavirus group of plant viruses, causes severe problems in banana cultivation, reducing fruit yield and restricting plant breeding and the movement of germplasm. Current detection methods are relatively insensitive. In order to develop a PCR-based diagnostic method that is both reliable and sensitive, the genome of a Nigerian isolate of BSV has been sequenced and shown to comprise 7389 bp and to be organized in a manner characteristic of badnaviruses. Comparison of this sequence with those of other badnaviruses showed that BSV is a distinct virus. PCR with primers based on sequence data indicated that BSV sequences are present in the banana genome.

  2. Sequencing and computational analysis of complete genome sequences of Citrus yellow mosaic badna virus from acid lime and pummelo.

    PubMed

    Borah, Basanta K; Johnson, A M Anthony; Sai Gopal, D V R; Dasgupta, Indranil

    2009-08-01

    Citrus yellow mosaic badna virus (CMBV), a member of the Family Caulimoviridae, Genus Badnavirus, is the causative agent of Citrus mosaic disease in India. Although the virus has been detected in several citrus species, only two full-length genomes, one each from Sweet orange and Rangpur lime, are available in publicly accessible databases. In order to obtain a better understanding of the genetic variability of the virus in other citrus mosaic-affected citrus species, we performed the cloning and sequence analysis of complete genomes of CMBV from two additional citrus species, Acid lime and Pummelo. We show that CMBV genomes from the two hosts share high homology with previously reported CMBV sequences and hence conclude that the new isolates represent variants of the virus present in these species. Based on in silico sequence analysis, we predict the possible function of the protein encoded by one of the five ORFs.

  3. Genetic Prediction in the Genetic Analysis Workshop 18 Sequencing Data

    PubMed Central

    Ziegler, Andreas; Bohossian, Nora; Diego, Vincent P.; Yao, Chen

    2015-01-01

    High-throughput sequencing data can be used to predict phenotypes from genotypes, and this corresponds to establishing a prognostic model. In extended pedigrees the relatedness of subjects provides additional information so that genetic values, fixed or random genetic components, and heritability can be estimated. At the Genetic Analysis Workshop 18 the working group on genetic prediction dealt with both establishing a prognostic model and, in one contribution, comparing standard logistic regression with robust logistic regression in a sample of unrelated affected or unaffected individuals. Results of both logistic regression approaches were similar. All other contributions to this group used extended family data, in general using the quantitative trait blood pressure. The individual contributions varied in several important aspects, such as the estimation of the kinship matrix and the estimation method. Contributors chose various approaches for model validation, including different versions of cross-validation or within-family validation. Within-family validation included model building in the upper generations and validation in later generations. The choice of the statistical model and the computational algorithm had substantial effects on computation time. If decorrelation approaches were applied, the computational burden was substantially reduced. Some software packages estimated negative eigenvalues, although eigenvalues of correlation matrices should be nonnegative. Most statistical models and software packages have been developed for experimental crosses and planned breeding programs. With their specialized pedigree structures, they are not sufficiently flexible to accommodate the variability of human pedigrees in general, and improved implementations are required. PMID:25112190

  4. Analysis of expressed sequence tags from the Ulva prolifera (Chlorophyta)

    NASA Astrophysics Data System (ADS)

    Niu, Jianfeng; Hu, Haiyan; Hu, Songnian; Wang, Guangce; Peng, Guang; Sun, Song

    2010-01-01

    In 2008, a green tide broke out before the sailing competition of the 29th Olympic Games in Qingdao. The causative species was determined to be Enteromorpha prolifera ( Ulva prolifera O. F. Müller), a familiar green macroalga along the coastline of China. Rapid accumulation of a large biomass of floating U. prolifera prompted research on different aspects of this species. In this study, we constructed a nonnormalized cDNA library from the thalli of U. prolifera and acquired 10 072 high-quality expressed sequence tags (ESTs). These ESTs were assembled into 3 519 nonredundant gene groups, including 1 446 clusters and 2 073 singletons. After annotation with the nr database, a large number of genes were found to be related with chloroplast and ribosomal protein, GO functional classification showed 1 418 ESTs participated in photosynthesis and 1 359 ESTs were responsible for the generation of precursor metabolites and energy. In addition, rather comprehensive carbon fixation pathways were found in U. prolifera using KEGG. Some stress-related and signal transduction-related genes were also found in this study. All the evidences displayed that U. prolifera had substance and energy foundation for the intense photosynthesis and the rapid proliferation. Phylogenetic analysis of cytochrome c oxidase subunit I revealed that this green-tide causative species is most closely affiliated to Pseudendoclonium akinetum (Ulvophyceae).

  5. The MPI Bioinformatics Toolkit for protein sequence analysis

    PubMed Central

    Biegert, Andreas; Mayer, Christian; Remmert, Michael; Söding, Johannes; Lupas, Andrei N.

    2006-01-01

    The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized versions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit framework and the tools developed in-house will be packaged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at . PMID:16845021

  6. The MPI Bioinformatics Toolkit for protein sequence analysis.

    PubMed

    Biegert, Andreas; Mayer, Christian; Remmert, Michael; Söding, Johannes; Lupas, Andrei N

    2006-07-01

    The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized versions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit framework and the tools developed in-house will be packaged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at http://toolkit.tuebingen.mpg.de.

  7. Production, characterization, cloning and sequence analysis of a monofunctional catalase from Serratia marcescens SYBC08.

    PubMed

    Zeng, Hua-Wei; Cai, Yu-Jie; Liao, Xiang-Ru; Zhang, Feng; Zhang, Da-Bing

    2011-04-01

    A monofunctional catalase from Serratia marcescens SYBC08 produced by liquid state fermentation in 7 liter fermenter was isolated and purified by ammonium sulfate precipitation (ASP), ion exchange chromatography (IEC), and gel filtration (GF) and characterized. Its sequence was analyzed by LC-MS/MS technique and gene cloning. The highest catalase production (20,289 U · ml(-1)) was achieved after incubation for 40 h. The purified catalase had an estimated molecular mass of 230 kDa, consisting of four identical subunits of 58 kDa. High specific activity of the catalase (199,584 U · mg(-1) protein) was 3.44 times higher than that of Halomonas sp. Sk1 catalase (57,900 U · mg(-1) protein). The enzyme without peroxidase activity was found to be an atypical electronic spectrum of monofunctional catalase. The apparent K(m) and V(max) were 78 mM and 188, 212 per µM H(2) O(2) µM heme(-1) s(-1), respectivly. The enzyme displayed a broad pH activity range (pH 5.0-11.0), with optimal pH range of 7.0-9.0: It was most active at 20 °C and had 78% activity at 0 °C. Its thermo stability was slightly higher compared to that of commercial catalase from bovine liver. LC-MS/MS analysis confirmed that the deduced amino acid sequence of cloning gene was the catalase sequence from Serratia marcescens SYBC08. The sequence was compared with that of 23 related catalases. Although most of active site residues, NADPH-binding residues, proximal residues of the heme, distal residues of the heme and residues interacting with a water molecule in the enzyme were well conserved in 23 related catalases, weakly conserved residues were found. Its sequence was closely related with that of catalases from pathogenic bacterium in the family Enterobacteriaceae. This result imply that the enzyme with high specific activity plays a significant role in preventing those microorganisms of the family Enterobacteriaceae against hydrogen peroxide resulted in cellular damage. Calalase yield by Serratia

  8. Advanced accident sequence precursor analysis level 1 models

    SciTech Connect

    Sattison, M.B.; Thatcher, T.A.; Knudsen, J.K.; Schroeder, J.A.; Siu, N.O.

    1996-03-01

    INEL has been involved in the development of plant-specific Accident Sequence Precursor (ASP) models for the past two years. These models were developed for use with the SAPHIRE suite of PRA computer codes. They contained event tree/linked fault tree Level 1 risk models for the following initiating events: general transient, loss-of-offsite-power, steam generator tube rupture, small loss-of-coolant-accident, and anticipated transient without scram. Early in 1995 the ASP models were revised based on review comments from the NRC and an independent peer review. These models were released as Revision 1. The Office of Nuclear Regulatory Research has sponsored several projects at the INEL this fiscal year to further enhance the capabilities of the ASP models. Revision 2 models incorporates more detailed plant information into the models concerning plant response to station blackout conditions, information on battery life, and other unique features gleaned from an Office of Nuclear Reactor Regulation quick review of the Individual Plant Examination submittals. These models are currently being delivered to the NRC as they are completed. A related project is a feasibility study and model development of low power/shutdown (LP/SD) and external event extensions to the ASP models. This project will establish criteria for selection of LP/SD and external initiator operational events for analysis within the ASP program. Prototype models for each pertinent initiating event (loss of shutdown cooling, loss of inventory control, fire, flood, seismic, etc.) will be developed. A third project concerns development of enhancements to SAPHIRE. In relation to the ASP program, a new SAPHIRE module, GEM, was developed as a specific user interface for performing ASP evaluations. This module greatly simplifies the analysis process for determining the conditional core damage probability for a given combination of initiating events and equipment failures or degradations.

  9. The C. elegans apoptotic nuclease NUC-1 is related in sequence and activity to mammalian DNase II.

    PubMed

    Lyon, C J; Evans, C J; Bill, B R; Otsuka, A J; Aguilera, R J

    2000-07-11

    The Caenorhabditis elegans nuc-1 gene has previously been implicated in programmed cell death due to the presence of persistent undegraded apoptotic DNA in nuc-1 mutant animals. In this report, we describe the cloning and characterization of nuc-1, which encodes an acidic nuclease with significant sequence similarity to mammalian DNase II. Database searches performed with human DNase II protein sequence revealed a significant similarity with the predicted C. elegans C07B5.5 ORF. Subsequent analysis of crude C. elegans protein extracts revealed that wild-type animals contained a potent endonuclease activity with a cleavage preference similar to DNase II, while nuc-1 mutant worms demonstrated a marked reduction in this nuclease activity. Sequence analysis of C07B5.5 DNA and mRNA also revealed that nuc-1(e1392), but not wild-type animals contained a nonsense mutation within the CO7B5.5 coding region. Furthermore, nuc-1 transgenic lines carrying the wild-type C07B5.5 locus demonstrated a complete complementation of the nuc-1 mutant phenotype. Our results therefore provide compelling evidence that the C07B5.5 gene encodes the NUC-1 apoptotic nuclease and that this nuclease is related in sequence and activity to DNase II.

  10. Sequence- and Structure-Based Analysis of Tissue-Specific Phosphorylation Sites

    PubMed Central

    Karabulut, Nermin Pinar; Frishman, Dmitrij

    2016-01-01

    Phosphorylation is the most widespread and well studied reversible posttranslational modification. Discovering tissue-specific preferences of phosphorylation sites is important as phosphorylation plays a role in regulating almost every cellular activity and disease state. Here we present a comprehensive analysis of global and tissue-specific sequence and structure properties of phosphorylation sites utilizing recent proteomics data. We identified tissue-specific motifs in both sequence and spatial environments of phosphorylation sites. Target site preferences of kinases across tissues indicate that, while many kinases mediate phosphorylation in all tissues, there are also kinases that exhibit more tissue-specific preferences which, notably, are not caused by tissue-specific kinase expression. We also demonstrate that many metabolic pathways are differentially regulated by phosphorylation in different tissues. PMID:27332813

  11. Computational analysis of conserved coil functional residues in the mitochondrial genomic sequences of dermatophytes

    PubMed Central

    Gupta, Bulbul; Kaur, Jaspreet

    2016-01-01

    Dermatophyte is a group of closely related fungi that have the capacity to invade keratinized tissue of humans and other animals. The infection known as dermatophytosis, caused by members of the genera Microsporum, Trichophyton, and Epidermophyton includes infection to the groin (tinea cruris), beard (tinea barbae), scalp (tinea capitis), feet (tinea pedis), glabrous skin (tinea corporis), nail (tinea unguium), and hand (tinea manuum). The identification of evolutionary relationship between these three genera of dermatophyte is epidemiologically important to understand their pathogenicity. Mitochondrial DNA evolves more rapidly than a nuclear DNA due to higher rate of mutation but is very less affected by genetic recombination, making it an important tool for phylogenetic studies. Thus, here we present a novel scheme to identify the conserved coil functional residues of Trichophyton rubrum, Trichophyton mentagrophytes, Epidermophyton floccosum and Microsporum canis. Protein coding sequences of the mitochondrial genome were aligned for their similar sequences and homology modelling was performed for structure and pocket identification. The results obtained from comparative analysis of the protein sequences revealed the presence of functionally active sites in all the species of the genera Trichophyton and Microsporum. However in Epidermophyton floccosum it was observed in three protein sequences of the five studied. The absence of these conserved coil functional residues in E. floccusum may be correlated with lesser infectivity of this organism. The functional residues identified in the present study could be responsible for the disease and thus can act as putative target sites for drug designing. PMID:28149055

  12. Genome Sequencing and Analysis of the Biomass-Degrading Fungus Trichoderma reesei (syn. Hypocrea jecorina)

    SciTech Connect

    Martinez, Antonio D.; Berka, Randy; Henrissat, Bernard; Saloheimo, Markku; Arvas, Mikko; Baker, Scott E.; Chapman, Jaro d; Chertkov, Olga; Coutinho, Pedro M.; Cullen, Dan; Danchin, Etienne G.; Grigoriev, Igor V.; Harris, Paul; Jackson, Melissa ?.; kubicek, Christian P.; Han, Cliff F.; Ho, Isaac; Larrando, Luis F.; Lopez de Leon, Alfredo; Magnuson, Jon K.; Merino, Sandy; Misra, Monica; Nelson, Beth; Putnam, Nicholas; Robbertse, Barbara; Salamov, Asaf; Schmoll, Monika; Terry, Astrid ?.; Thayer, Nina; Westerholm-Parvinen, Ann; Schoch, Conrad L.; Yao, Jian ?.; Barbote, Ravi; Nelson, Mary Anne; Detter, Chris J.; Bruce, David; Kuske, Cheryl; Xie, Gary; Richardson, P. M.; Rokhsar, Daniel S.; Lucas, Susan; Rubin, Eddie M.; Dunn-Coleman, Nigel; Ward, Michael ?.; Brettin, T.

    2008-05-01

    A major thrust of the white biotechnology movement involves the development of enzyme systems which depolymerize biomass to simple sugars which are subsequently converted to sustainable biofuels (e.g., ethanol) and chemical intermediates. The fungus Trichoderma reesei (syn. Hypocrea jecorina) represents a paradigm for the industrial production of highly efficient cellulases and hemicellulases needed for hydrolysis of biomass polysaccharides. Herein we describe intriguing attributes of the T. reeseigenome in relation to the future of fuel biotechnology. The T. reesei genome sequence was derived using a whole genome shotgun approach combined with finishing work to generate an assembly comprising 89 scaffolds totaling 34 Mbp with few gaps. In total, 9,130 gene models were predicted using a combination of ab initio and sequence similarity-based methods and EST data. Considering the industrial utility and effectiveness of its enzymes, the T. reesei genome surprisingly encodes the fewest cellulases and hemicellulases of any fungus having the ability to hydrolyze plant cell wall polysaccharides and whose genome has been sequenced. Many genes encoding carbohydrate active enzymes are distributed non-randomly in groups or clusters that interestingly lie between regions of synteny with other Sordariomycetes. Additionally, the T. reesei genome contains a multitude of genes encoding biosynthetic pathways for secondary metabolites (possible antibacterial and antifungal compounds) which may promote successful competition and survival in the crowded and competitive soil habitat occupied by T. reesei. Our analysis coupled with the availability of genome sequence data provides a roadmap for construction of enhanced T. reesei strains for industrial applications.

  13. Conversational Analysis: A Review of the Literature from the Perspective of Sequencing.

    ERIC Educational Resources Information Center

    Piazza, Roberta

    1987-01-01

    Reviews literature relating to conversational analysis, identifying two major approaches which exhibit chronological and methodological differentiations and focusing from a perspective of sequencing. (CB)

  14. Synthetic muscle promoters: activities exceeding naturally occurring regulatory sequences

    NASA Technical Reports Server (NTRS)

    Li, X.; Eastman, E. M.; Schwartz, R. J.; Draghia-Akli, R.

    1999-01-01

    Relatively low levels of expression from naturally occurring promoters have limited the use of muscle as a gene therapy target. Myogenic restricted gene promoters display complex organization usually involving combinations of several myogenic regulatory elements. By random assembly of E-box, MEF-2, TEF-1, and SRE sites into synthetic promoter recombinant libraries, and screening of hundreds of individual clones for transcriptional activity in vitro and in vivo, several artificial promoters were isolated whose transcriptional potencies greatly exceed those of natural myogenic and viral gene promoters.

  15. LINE-1 elements: analysis by fluorescence in-situ hybridization and nucleotide sequences.

    PubMed

    Waters, Paul D; Dobigny, Gauthier; Waddell, Peter J; Robinson, Terence J

    2008-01-01

    Long-interspersed nuclear element-1 (LINE-1) is a non-terminal repeat transposon that constitutes a major component of the mammalian genome. LINE-1 has a dynamic evolutionary history characterized by the rise, fall, and replacement of subfamilies. The distribution of LINE-1 elements can be viewed from a chromosomal perspective using fluorescence in-situ hybridization (FISH), as well as at the sequence level. We have designed LINE-1 primers from regions conserved among mouse, rat, rabbit, and human L1, which were able to amplify part of ORF2 from all eutherian (placental) mammals tested thus far. The product generated can be used as a FISH painting probe to examine the genomic distribution of L1 in different species. It can also be cloned and sequenced for phylogenetic analysis. Although FISH patterns resulting from LINE-1 chromosome painting and bioinformatic analyses have shown that this element accumulates in AT-rich regions of the genomes of mouse and human, our PCR amplified LINE-1 probe suggests that this is not a universal phenomenon, and that the patterns displayed in laurasiatherian, afrotherian and xenarthran species are less prominent. The "banding" like distribution of LINE-1 observed in human and mouse, therefore, appears to reflect aspects of genome architecture unique to Euarchontoglires (Supraprimates), the superordinal clade to which they belong. By sequencing the cloned amplicons used for FISH experiments and supplementing these with L1 sequences obtained from public databases, analysis by parsimony, distance-based, maximum likelihood, and "hierarchical Bayesian" or "marginal likelihood" methods provides a powerful adjunct to the FISH data. Using this approach, relatively intact LINE-1 from most placental orders tend to reflect accepted eutherian evolutionary relationships. This suggests that there were often only closely related copies active near branch points in the tree, that inactive copies tended to become extinct quite readily, and that for

  16. The Main Sequences of Star-forming Galaxies and Active Galactic Nuclei at High Redshift

    NASA Astrophysics Data System (ADS)

    Mancuso, C.; Lapi, A.; Shi, J.; Cai, Z.-Y.; Gonzalez-Nuevo, J.; Béthermin, M.; Danese, L.

    2016-12-01

    We provide a novel, unifying physical interpretation on the origin, average shape, scatter, and cosmic evolution for the main sequences of star-forming galaxies and active galactic nuclei (AGNs) at high redshift z≳ 1. We achieve this goal in a model-independent way by exploiting: (i) the redshift-dependent star formation rate functions based on the latest UV/far-IR data from HST/Herschel, and related statistics of strong gravitationally lensed sources; (ii) deterministic evolutionary tracks for the history of star formation and black hole accretion, gauged on a wealth of multiwavelength observations including the observed Eddington ratio distribution. We further validate these ingredients by showing their consistency with the observed galaxy stellar mass functions and AGN bolometric luminosity functions at different redshifts via the continuity equation approach. Our analysis of the main sequence for high-redshift galaxies and AGNs highlights that the present data are consistently interpreted in terms of an in situ coevolution scenario for star formation and black hole accretion, envisaging these as local, time-coordinated processes.

  17. Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies.

    PubMed

    Giancarlo, Raffaele; Rombo, Simona E; Utro, Filippo

    2014-05-01

    High-throughput sequencing technologies produce large collections of data, mainly DNA sequences with additional information, requiring the design of efficient and effective methodologies for both their compression and storage. In this context, we first provide a classification of the main techniques that have been proposed, according to three specific research directions that have emerged from the literature and, for each, we provide an overview of the current techniques. Finally, to make this review useful to researchers and technicians applying the existing software and tools, we include a synopsis of the main characteristics of the described approaches, including details on their implementation and availability. Performance of the various methods is also highlighted, although the state of the art does not lend itself to a consistent and coherent comparison among all the methods presented here.

  18. GCAT-SEEKquence: genome consortium for active teaching of undergraduates through increased faculty access to next-generation sequencing data.

    PubMed

    Buonaccorsi, Vincent P; Boyle, Michael D; Grove, Deborah; Praul, Craig; Sakk, Eric; Stuart, Ash; Tobin, Tammy; Hosler, Jay; Carney, Susan L; Engle, Michael J; Overton, Barry E; Newman, Jeffrey D; Pizzorno, Marie; Powell, Jennifer R; Trun, Nancy

    2011-01-01

    To transform undergraduate biology education, faculty need to provide opportunities for students to engage in the process of science. The rise of research approaches using next-generation (NextGen) sequencing has been impressive, but incorporation of such approaches into the undergraduate curriculum remains a major challenge. In this paper, we report proceedings of a National Science Foundation-funded workshop held July 11-14, 2011, at Juniata College. The purpose of the workshop was to develop a regional research coordination network for undergraduate biology education (RCN/UBE). The network is collaborating with a genome-sequencing core facility located at Pennsylvania State University (University Park) to enable undergraduate students and faculty at small colleges to access state-of-the-art sequencing technology. We aim to create a database of references, protocols, and raw data related to NextGen sequencing, and to find innovative ways to reduce costs related to sequencing and bioinformatics analysis. It was agreed that our regional network for NextGen sequencing could operate more effectively if it were partnered with the Genome Consortium for Active Teaching (GCAT) as a new arm of that consortium, entitled GCAT-SEEK(quence). This step would also permit the approach to be replicated elsewhere.

  19. Identification of selectivity determinants in CYP monooxygenases by modelling and systematic analysis of sequence and structure.

    PubMed

    Seifert, Alexander; Pleiss, Jurgen

    2012-02-01

    Cytochrome P450 monooxygenases (CYPs) form a large, ubiquitous enzyme family and are of great interest in red and white biotechnology. To investigate the effect of protein structure on selectivity, the binding of substrate molecules near to the active site was modelled by molecular dynamics simulations. From a comprehensive and systematic comparison of more than 6300 CYP sequences and 31 structures using the Cytochrome P450 Engineering Database (CYPED), residues were identified which are predicted to point close to the heme centre and thus restrict accessibility for substrates. As a result, sequence-structure-function relationships are described that can be used to predict selectivity-determining positions from CYP sequences and structures. Based on this analysis, a minimal library consisting of bacterial CYP102A1 (P450(BM3)) and 24 variants was constructed. All variants were functionally expressed in E. coli, and the library was screened with four terpene substrates. Only 3 variants showed no activity towards all 4 terpenes, while 11 variants demonstrated either a strong shift or improved regio- or stereoselectivity during oxidation of at least one substrate as compared to CYP102A1 wild type. The minimal library also contains variants that show interesting side products which are not generated by the wild type enzyme. By two additional rounds of molecular modelling, diversification, and screening, the selectivity of one of these variants for a new product was optimised with a minimal screening effort. We propose this as a generic approach for other CYP substrates.

  20. Poisson approach to clustering analysis of regulatory sequences.

    PubMed

    Wang, Haiying; Zheng, Huiru; Hu, Jinglu

    2008-01-01

    The presence of similar patterns in regulatory sequences may aid users in identifying co-regulated genes or inferring regulatory modules. By modelling pattern occurrences in regulatory regions with Poisson statistics, this paper presents a log likelihood ratio statistics-based distance measure to calculate pair-wise similarities between regulatory sequences. We employed it within three clustering algorithms: hierarchical clustering, Self-Organising Map, and a self-adaptive neural network. The results indicate that, in comparison to traditional clustering algorithms, the incorporation of the log likelihood ratio statistics-based distance into the learning process may offer considerable improvements in the process of regulatory sequence-based classification of genes.

  1. Compilation and analysis of Escherichia coli promoter DNA sequences.

    PubMed Central

    Hawley, D K; McClure, W R

    1983-01-01

    The DNA sequence of 168 promoter regions (-50 to +10) for Escherichia coli RNA polymerase were compiled. The complete listing was divided into two groups depending upon whether or not the promoter had been defined by genetic (promoter mutations) or biochemical (5' end determination) criteria. A consensus promoter sequence based on homologies among 112 well-defined promoters was determined that was in substantial agreement with previous compilations. In addition, we have tabulated 98 promoter mutations. Nearly all of the altered base pairs in the mutants conform to the following general rule: down-mutations decrease homology and up-mutations increase homology to the consensus sequence. PMID:6344016

  2. DNA sequence analysis with droplet-based microfluidics

    PubMed Central

    Abate, Adam R.; Hung, Tony; Sperling, Ralph A.; Mary, Pascaline; Rotem, Assaf; Agresti, Jeremy J.; Weiner, Michael A.; Weitz, David A.

    2014-01-01

    Droplet-based microfluidic techniques can form and process micrometer scale droplets at thousands per second. Each droplet can house an individual biochemical reaction, allowing millions of reactions to be performed in minutes with small amounts of total reagent. This versatile approach has been used for engineering enzymes, quantifying concentrations of DNA in solution, and screening protein crystallization conditions. Here, we use it to read the sequences of DNA molecules with a FRET-based assay. Using probes of different sequences, we interrogate a target DNA molecule for polymorphisms. With a larger probe set, additional polymorphisms can be interrogated as well as targets of arbitrary sequence. PMID:24185402

  3. Molecular Analysis of Methanogen Richness in Landfill and Marshland Targeting 16S rDNA Sequences

    PubMed Central

    Yadav, Shailendra; Kundu, Sharbadeb; Ghosh, Sankar K.; Maitra, S. S.

    2015-01-01

    Methanogens, a key contributor in global carbon cycling, methane emission, and alternative energy production, generate methane gas via anaerobic digestion of organic matter. The methane emission potential depends upon methanogenic diversity and activity. Since they are anaerobes and difficult to isolate and culture, their diversity present in the landfill sites of Delhi and marshlands of Southern Assam, India, was analyzed using molecular techniques like 16S rDNA sequencing, DGGE, and qPCR. The sequencing results indicated the presence of methanogens belonging to the seventh order and also the order Methanomicrobiales in the Ghazipur and Bhalsawa landfill sites of Delhi. Sequences, related to the phyla Crenarchaeota (thermophilic) and Thaumarchaeota (mesophilic), were detected from marshland sites of Southern Assam, India. Jaccard analysis of DGGE gel using Gel2K showed three main clusters depending on the number and similarity of band patterns. The copy number analysis of hydrogenotrophic methanogens using qPCR indicates higher abundance in landfill sites of Delhi as compared to the marshlands of Southern Assam. The knowledge about “methanogenic archaea composition” and “abundance” in the contrasting ecosystems like “landfill” and “marshland” may reorient our understanding of the Archaea inhabitants. This study could shed light on the relationship between methane-dynamics and the global warming process. PMID:26568700

  4. Analysis of the sequence and embryonic expression of chicken neurofibromin mRNA.

    PubMed

    Schafer, G L; Ciment, G; Stocker, K M; Baizer, L

    1993-04-01

    Neurofibromatosis type 1 (NF1) is a common inherited disorder that primarily affects tissues derived from the neural crest. Recent identification and characterization of the human NF1 gene has revealed that it encodes a protein (now called neurofibromin) that is similar in sequence to the ras-GTPase activator protein (or ras-GAP), suggesting that neurofibromin may be a component of cellular signal transduction pathways regulating cellular proliferation and/or differentiation. To initiate investigations on the role of the NF1 gene product in embryonic development, we have isolated a partial cDNA for chicken neurofibromin. Sequence analysis reveals that the predicted amino acid sequence is highly conserved between chick and human. The chicken cDNA hybridizes to a 12.5-kb transcript on RNA blots, a mol wt similar to that reported for the human and murine mRNAs. Ribonuclease protection assays indicate that NF1 mRNA is expressed in a variety of tissues in the chick embryo; this is confirmed by in situ hybridization analysis. NF1 mRNA expression is detectable as early as embryonic stage 18 in the neural plate. This pattern of expression may suggest a role for neurofibromin during normal development, including that of the nervous system.

  5. Whole genome sequencing analysis of lung adenocarcinoma in Xuanwei, China

    PubMed Central

    Wang, Xiao; Li, Jing; Duan, Yong; Wu, Huifei; Xu, Qiuyue

    2017-01-01

    Background The lung cancer mortality rate in Xuanwei city is among the highest in China and adenocarcinoma is the major histological type. Lung cancer has been associated with exposure to indoor smoky coal emissions that contain high levels of polycyclic aromatic hydrocarbons; however, the pathogenesis of lung cancer has not yet been fully elucidated. Methods We performed whole genome sequencing with lung adenocarcinoma and corresponding non‐tumor tissue to explore the genomic features of Xuanwei lung cancer. We used the Molecule Annotation System to determine and plot alterations in genes and signaling pathways. Results A total of 3 428 060 and 3 416 989 single nucleotide variants were detected in tumor and normal genomes, respectively. After comparison of these two genomes, 977 high‐confidence somatic single nucleotide variants were identified. We observed a remarkably high proportion of C·G‐A·T transversions. HECTD4, RCBTB2, KLF15, and CACNA1C may be cancer‐related genes. Nine copy number variations increased in chromosome 5 and one in chromosome 7. The novel junctions were detected via clustered discordant paired ends and 1955 structural variants were discovered. Among these, we found 44 novel chromosome structural variations. In addition, EGFR and CACNA1C in the mitogen‐activated protein kinase signaling pathway were mutated or amplified in lung adenocarcinoma tumor tissue. Conclusion We obtained a comprehensive view of somatic alterations of Xuanwei lung adenocarcinoma. These findings provide insight into the genomic landscape in order to further learn about the progress and development of Xuanwei lung adenocarcinoma. PMID:28083984

  6. KNIME4NGS: a comprehensive toolbox for Next Generation Sequencing analysis.

    PubMed

    Hastreiter, Maximilian; Jeske, Tim; Hoser, Jonathan; Kluge, Michael; Ahomaa, Kaarin; Friedl, Marie-Sophie; Kopetzky, Sebastian J; Quell, Jan-Dominik; Werner Mewes, H-; Küffner, Robert

    2017-01-09

    Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules' executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME.

  7. In vivo "photofootprint" changes at sequences between the yeast GAL1 upstream activating sequence and "TATA" element require activated GAL4 protein but not a functional TATA element.

    PubMed Central

    Selleck, S B; Majors, J

    1988-01-01

    Transcription of the yeast GAL1 and GAL10 genes is induced by growth on galactose. Using the technique of photofootprinting in vivo, we previously documented equivalent transcription-dependent footprints within the putative "TATA" elements of both genes. To explore the functional significance of these observations, we created a 3-base-pair substitution mutation within the GAL1 promoter TATA element, which disrupted the ATATAA consensus sequence but left intact the photomodification targets. The mutation reduced galactose-induced RNA levels by a factor of 100. The mutant promoter no longer displayed the characteristic TATA sequence footprint, supporting the hypothesis that transcription activation involves the binding of a TATA box factor. We also observed a collection of transcription-correlated alterations in the modification pattern at sites between the UASG and the GAL1 TATA element, within sequences that are not required for inducible transcription. These patterns, characteristic of the induced wild-type GAL1 gene, were still galactose inducible with the TATA mutant GAl1 promoter, despite the low level of transcription from this promoter. We conclude that the GAL4-dependent protein/DNA structure responsible for the altered pattern within nonessential sequences is therefore not strictly coupled to an active TATA element or to high levels of expression. Nonetheless, the patterns probably reflect a stable protein-dependent structure that accompanies assembly of the transcription initiation complex. Images PMID:3041409

  8. Sequence analysis of the complete mitochondrial genome of Youxian sheldrake.

    PubMed

    He, Shao-Ping; Liu, Li-Li; Yu, Qi-Fang; Li, Si; He, Jian-Hua

    2016-01-01

    Youxian sheldrake is excellent native breeds in Hunan province in China. The complete mitochondrial (mt) genome sequence plays an important role in the accurate determination of phylogenetic relationships among metazoans. This is the first study to determine the complete mitochondrial genome sequence of Youxian sheldrake using PCR-based amplification and Sanger sequencing. The characteristic of the entire mitochondrial genome was analyzed in detail, the total length of the mitogenome is 16,605 bp, with the base composition of 29.21% A, 22.18% T, 32.84% C, 15.77% G in the Youxian sheldrake. It contained 2 ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and a major non-coding control region (D-loop region). The complete mitochondrial genome sequence of Youxian sheldrake provided an important data for further study of the phylogenetics of poultry, and available data for the genetics and breeding.

  9. Deep Sequencing Analysis of Apple Infecting Viruses in Korea

    PubMed Central

    Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun

    2016-01-01

    Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694

  10. Deep Sequencing Analysis of Apple Infecting Viruses in Korea.

    PubMed

    Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun

    2016-10-01

    Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time.

  11. Feature Extraction From DNA Sequences by Multifractal Analysis

    DTIC Science & Technology

    2007-11-02

    genome may lead to an under- standing of the genome and to the understanding of life. Recently a draft sequence of the human genome ...which covers 96% of the entire human genome containing base pairs, has been published by the Human Genome Project (HGP) and Celera Genomics . However...time series model based on the global structure of the complete genome , and showed long-range correlations in the bacteria DNA sequences . Although

  12. Production of biologically active recombinant goose FSH in a single chain form with a CTP linker sequence.

    PubMed

    Li, Hui; Zhu, Huanxi; Qin, Qinming; Lei, Mingming; Shi, Zhendan

    2017-02-01

    FSH is a glycoprotein hormone secreted by the pituitary gland that is essential for gonadal development and reproductive function. In avian reproduction study, especially in avian reproduction hormone study, it is hindered by the lack of biologically active FSH. In order to overcome this shortcoming, we prepared recombinant goose FSH as a single chain molecule and tested its biological activities in the present study. Coding sequences for mature peptides of goose FSH α and β subunits were amplified from goose pituitary cDNA. A chimeric gene containing α and β subunit sequences linked by the hCG carboxyl terminal peptide coding sequence was constructed. The recombinant gene was inserted into the pcDNA3.1-Fc eukaryotic expression vector to form pcDNA-Fc-gFSHβ-CTP-α and then transfected into 293-F cells. A recombinant, single chain goose FSH was expressed and verified by SDS-PAGE and western blot analysis, and was purified using Protein A agarose affinity and gel filtration chromatography. Biological activity analysis results showed that the recombinant, chimeric goose FSH possesses the function of stimulating estradiol secretion and cell proliferation, in cultured chicken granulosa cells. These results indicated that bioactive, recombinant goose FSH has been successfully prepared in vitro. The recombinant goose FSH will have the potential of being used as a research tool for studying avian reproductive activities, and as a standard for developing avian FSH bioassays.

  13. EST sequencing of Onychophora and phylogenomic analysis of Metazoa.

    PubMed

    Roeding, Falko; Hagner-Holler, Silke; Ruhberg, Hilke; Ebersberger, Ingo; von Haeseler, Arndt; Kube, Michael; Reinhardt, Richard; Burmester, Thorsten

    2007-12-01

    Onychophora (velvet worms) represent a small animal taxon considered to be related to Euarthropoda. We have obtained 1873 5' cDNA sequences (expressed sequence tags, ESTs) from the velvet worm Epiperipatus sp., which were assembled into 833 contigs. BLAST similarity searches revealed that 51.9% of the contigs had matches in the protein databases with expectation values lower than 10(-4). Most ESTs had the best hit with proteins from either Chordata or Arthropoda (approximately 40% respectively). The ESTs included sequences of 27 ribosomal proteins. The orthologous sequences from 28 other species of a broad range of phyla were obtained from the databases, including other EST projects. A concatenated amino acid alignment comprising 5021 positions was constructed, which covers 4259 positions when problematic regions were removed. Bayesian and maximum likelihood methods place Epiperipatus within the monophyletic Ecdysozoa (Onychophora, Arthropoda, Tardigrada and Nematoda), but its exact relation to the Euarthropoda remained unresolved. The "Articulata" concept was not supported. Tardigrada and Nematoda formed a well-supported monophylum, suggesting that Tardigrada are actually Cycloneuralia. In agreement with previous studies, we have demonstrated that random sequencing of cDNAs results in sequence information suitable for phylogenomic approaches to resolve metazoan relationships.

  14. Analysis of human immunodeficiency virus type 1 nef gene sequences present in vivo.

    PubMed Central

    Shugars, D C; Smith, M S; Glueck, D H; Nantermet, P V; Seillier-Moiseiwitsch, F; Swanstrom, R

    1993-01-01

    The nef genes of the human immunodeficiency viruses type 1 and 2 (HIV-1 and HIV-2) and the related simian immunodeficiency viruses (SIVs) encode a protein (Nef) whose role in virus replication and cytopathicity remains uncertain. As an attempt to elucidate the function of nef, we characterized the nucleotide and corresponding protein sequences of naturally occurring nef genes obtained from several HIV-1-infected individuals. A consensus Nef sequence was derived and used to identify several features that were highly conserved among the Nef sequences. These features included a nearly invariant myristylation signal, regions of sequence polymorphism and variable duplication, a region with an acidic charge, a (Pxx)4 repeat sequence, and a potential protein kinase C phosphorylation site. Clustering of premature stop codons at position 124 was noted in 6 of the 54 Nef sequences. Further analysis revealed four stretches of residues that were highly conserved not only among the patient-derived HIV-1 Nef sequences, but also among the Nef sequences of HIV-2 and the SIVs, suggesting that Nef proteins expressed by these retroviruses are functionally equivalent. The "Nef-defining" sequences were used to evaluate the sequence alignments of known proteins reported to share sequence similarity with Nef sequences and to conduct additional computer-based searches for similar protein sequences. A gene encoding the consensus Nef sequence was also generated. This gene encodes a full-length Nef protein that should be a valuable tool in further studies of Nef function. Images PMID:8043040

  15. Characterization and complete genome sequence analysis of novel bacteriophage IME-EFm1 infecting Enterococcus faecium.

    PubMed

    Wang, Yahui; Wang, Wei; Lv, Yongqiang; Zheng, Wangliang; Mi, Zhiqiang; Pei, Guangqian; An, Xiaoping; Xu, Xiaomeng; Han, Chuanyin; Liu, Jie; Zhou, Changlin; Tong, Yigang

    2014-11-01

    We isolated and characterized a novel virulent bacteriophage, IME-EFm1, specifically infecting multidrug-resistant Enterococcus faecium. IME-EFm1 is morphologically similar to members of the family Siphoviridae. It was found capable of lysing a wide range of our E. faecium collections, including two strains resistant to vancomycin. One-step growth tests revealed the host lysis activity of phage IME-EFm1, with a latent time of 30 min and a large burst size of 116 p.f.u. per cell. These biological characteristics suggested that IME-EFm1 has the potential to be used as a therapeutic agent. The complete genome of IME-EFm1 was 42 597 bp, and was linear, with terminally non-redundant dsDNA and a G+C content of 35.2 mol%. The termini of the phage genome were determined with next-generation sequencing and were further confirmed by nuclease digestion analysis. To our knowledge, this is the first report of a complete genome sequence of a bacteriophage infecting E. faecium. IME-EFm1 exhibited a low similarity to other phages in terms of genome organization and structural protein amino acid sequences. The coding region corresponded to 90.7 % of the genome; 70 putative ORFs were deduced and, of these, 29 could be functionally identified based on their homology to previously characterized proteins. A predicted metallo-β-lactamase gene was detected in the genome sequence. The identification of an antibiotic resistance gene emphasizes the necessity for complete genome sequencing of a phage to ensure it is free of any undesirable genes before use as a therapeutic agent against bacterial pathogens.

  16. Immune gene discovery by expressed sequence tag (EST) analysis of hemocytes in the ridgetail white prawn Exopalaemon carinicauda

    PubMed Central

    Duan, Yafei; Liu, Ping; Li, Jitao; Li, Jian; Chen, Ping

    2013-01-01

    The ridgetail white prawn Exopalaemon carinicauda is one of the most important commercial species in eastern China. However, little information of immune genes in E. carinicauda has been reported. To identify distinctive genes associated with immunity, an expressed sequence tag (EST) library was constructed from hemocytes of E. carinicauda. A total of 3411 clones were sequenced, yielding 2853 ESTs and the average sequence length is 436 bp. The cluster and assembly analysis yielded 1053 unique sequences including 329 contigs and 724 singletons. Blast analysis identified 593 (56.3%) of the unique sequences as orthologs of genes from other organisms (E-value < 1e-5). Based on the COG and Gene Ontology (GO), 593 unique sequences were classified. Through comparison with previous studies, 153 genes assembled from 367 ESTs have been identified as possibly involved in defense or immune functions. These genes are categorized into seven categories according to their putative functions in shrimp immune system: antimicrobial peptides, prophenoloxidase activating system, antioxidant defense systems, chaperone proteins, clottable proteins, pattern recognition receptors and other immune-related genes. According to EST abundance, the major immune-related genes were thioredoxin (141, 4.94% of all ESTs) and calmodulin (14, 0.49% of all ESTs). The EST sequences of E. carinicauda hemocytes provide important information of the immune system and lay the groundwork for development of molecular markers related to disease resistance in prawn species. PMID:23092732

  17. Student Activity Ideas for the Technology Sequence Systems and Foundation Courses.

    ERIC Educational Resources Information Center

    New York State Education Dept., Albany.

    This publication provides single-page outlines of brief ideas for high school student activities in each of the System and Foundation Courses of the New York State technology sequence. The idea outlines are provided as a resource to assist teachers in the development of student learning activities. The six courses for which ideas are presented are…

  18. Rapid Conversion of Traditional Introductory Physics Sequences to an Activity-Based Format

    ERIC Educational Resources Information Center

    Yoder, Garett; Cook, Jerry

    2014-01-01

    The Department of Physics at EKU [Eastern Kentucky University] with support from the National Science Foundations Course Curriculum and Laboratory Improvement Program has successfully converted our entire introductory physics sequence, both algebra-based and calculus-based courses, to an activity-based format where laboratory activities,…

  19. Draft genome sequence of Bacillus thuringiensis strain DAR 81934, which exhibits molluscicidal activity.

    PubMed

    Wang, Aisuo; Pattemore, Julie; Ash, Gavin; Williams, Angela; Hane, James

    2013-03-21

    Bacillus thuringiensis has been widely used as a biopesticide for a long time. Its molluscicidal activity, however, is rarely realized. Here, we report the genome sequence of B. thuringiensis strain DAR 81934, a strain with molluscicidal activity against the pest snail Cernuella virgata.

  20. Correlations between prefrontal neurons form a small-world network that optimizes the generation of multineuron sequences of activity.

    PubMed

    Luongo, Francisco J; Zimmerman, Chris A; Horn, Meryl E; Sohal, Vikaas S

    2016-05-01

    Sequential patterns of prefrontal activity are believed to mediate important behaviors, e.g., working memory, but it remains unclear exactly how they are generated. In accordance with previous studies of cortical circuits, we found that prefrontal microcircuits in young adult mice spontaneously generate many more stereotyped sequences of activity than expected by chance. However, the key question of whether these sequences depend on a specific functional organization within the cortical microcircuit, or emerge simply as a by-product of random interactions between neurons, remains unanswered. We observed that correlations between prefrontal neurons do follow a specific functional organization-they have a small-world topology. However, until now it has not been possible to directly link small-world topologies to specific circuit functions, e.g., sequence generation. Therefore, we developed a novel analysis to address this issue. Specifically, we constructed surrogate data sets that have identical levels of network activity at every point in time but nevertheless represent various network topologies. We call this method shuffling activity to rearrange correlations (SHARC). We found that only surrogate data sets based on the actual small-world functional organization of prefrontal microcircuits were able to reproduce the levels of sequences observed in actual data. As expected, small-world data sets contained many more sequences than surrogate data sets with randomly arranged correlations. Surprisingly, small-world data sets also outperformed data sets in which correlations were maximally clustered. Thus the small-world functional organization of cortical microcircuits, which effectively balances the random and maximally clustered regimes, is optimal for producing stereotyped sequential patterns of activity.

  1. Electromyographic Patterns during Golf Swing: Activation Sequence Profiling and Prediction of Shot Effectiveness

    PubMed Central

    Verikas, Antanas; Vaiciukynas, Evaldas; Gelzinis, Adas; Parker, James; Olsson, M. Charlotte

    2016-01-01

    This study analyzes muscle activity, recorded in an eight-channel electromyographic (EMG) signal stream, during the golf swing using a 7-iron club and exploits information extracted from EMG dynamics to predict the success of the resulting shot. Muscles of the arm and shoulder on both the left and right sides, namely flexor carpi radialis, extensor digitorum communis, rhomboideus and trapezius, are considered for 15 golf players (∼5 shots each). The method using Gaussian filtering is outlined for EMG onset time estimation in each channel and activation sequence profiling. Shots of each player revealed a persistent pattern of muscle activation. Profiles were plotted and insights with respect to player effectiveness were provided. Inspection of EMG dynamics revealed a pair of highest peaks in each channel as the hallmark of golf swing, and a custom application of peak detection for automatic extraction of swing segment was introduced. Various EMG features, encompassing 22 feature sets, were constructed. Feature sets were used individually and also in decision-level fusion for the prediction of shot effectiveness. The prediction of the target attribute, such as club head speed or ball carry distance, was investigated using random forest as the learner in detection and regression tasks. Detection evaluates the personal effectiveness of a shot with respect to the player-specific average, whereas regression estimates the value of target attribute, using EMG features as predictors. Fusion after decision optimization provided the best results: the equal error rate in detection was 24.3% for the speed and 31.7% for the distance; the mean absolute percentage error in regression was 3.2% for the speed and 6.4% for the distance. Proposed EMG feature sets were found to be useful, especially when used in combination. Rankings of feature sets indicated statistics for muscle activity in both the left and right body sides, correlation-based analysis of EMG dynamics and features

  2. Expressed sequence tag analysis in tef (Eragrostis tef (Zucc) Trotter).

    PubMed

    Yu, Ju-Kyung; Sun, Qi; Rota, Mauricio La; Edwards, Hugh; Tefera, Hailu; Sorrells, Mark E

    2006-04-01

    Tef (Eragrostis tef (Zucc.) Trotter) is the most important cereal crop in Ethiopia; however, there is very little DNA sequence information available for this species. Expressed sequence tags (ESTs) were generated from 4 cDNA libraries: seedling leaf, seedling root, and inflorescence of E. tef and seedling leaf of Eragrostis pilosa, a wild relative of E. tef. Clustering of 3603 sequences produced 530 clusters and 1890 singletons, resulting in 2420 tef unigenes. Approximately 3/4 of tef unigenes matched protein or nucleotide sequences in public databases. Annotation of unigenes associated 68% of the putative tef genes with gene ontology categories. Identification of the translated unigenes for conserved protein domains revealed 389 protein family domains (Pfam), the most frequent of which was protein kinase. A total of 170 ESTs containing simple sequence repeats (EST-SSRs) were identified and 80 EST-SSR markers were developed. In addition, 19 single-nucleotide polymorphism (SNP) and (or) insertion-deletion (indel) and 34 intron fragment length polymorphism (IFLP) markers were developed. The EST database and molecular markers generated in this study will be valuable resources for further tef genetic research.

  3. Complete Sequence and Genomic Analysis of Rhesus Cytomegalovirus

    PubMed Central

    Hansen, Scott G.; Strelow, Lisa I.; Franchi, David C.; Anders, David G.; Wong, Scott W.

    2003-01-01

    The complete DNA sequence of rhesus cytomegalovirus (RhCMV) strain 68-1 was determined with the whole-genome shotgun approach on virion DNA. The RhCMV genome is 221,459 bp in length and possesses a 49% G+C base composition. The genome contains 230 potential open reading frames (ORFs) of 100 or more codons that are arranged colinearly with counterparts of previously sequenced betaherpesviruses such as human cytomegalovirus (HCMV). Of the 230 RhCMV ORFs, 138 (60%) are homologous to known HCMV proteins. The conserved ORFs include the structural, replicative, and transcriptional regulatory proteins, immune evasion elements, G protein-coupled receptors, and immunoglobulin homologues. Interestingly, the RhCMV genome also contains sequences with homology to cyclooxygenase-2, an enzyme associated with inflammatory processes. Closer examination identified a series of candidate exons with the capacity to encode a full-length cyclooxygenase-2 protein. Counterparts of cyclooxygenase-2 have not been found in other sequenced herpesviruses. The availability of the complete RhCMV sequence along with the ability to grow RhCMV in vitro will facilitate the construction of recombinant viral strains for identifying viral determinants of CMV pathogenicity in the experimentally infected rhesus macaque and to the development of CMV as a vaccine vector. PMID:12767982

  4. Experimental design, preprocessing, normalization and differential expression analysis of small RNA sequencing experiments

    PubMed Central

    2011-01-01

    Prior to the advent of new, deep sequencing methods, small RNA (sRNA) discovery was dependent on Sanger sequencing, which was time-consuming and limited knowledge to only the most abundant sRNA. The innovation of large-scale, next-generation sequencing has exponentially increased knowledge of the biology, diversity and abundance of sRNA populations. In this review, we discuss issues involved in the design of sRNA sequencing experiments, including choosing a sequencing platform, inherent biases that affect sRNA measurements and replication. We outline the steps involved in preprocessing sRNA sequencing data and review both the principles behind and the current options for normalization. Finally, we discuss differential expression analysis in the absence and presence of biological replicates. While our focus is on sRNA sequencing experiments, many of the principles discussed are applicable to the sequencing of other RNA populations. PMID:21356093

  5. Exome sequencing coupled with mRNA analysis identifies NDUFAF6 as a Leigh gene.

    PubMed

    Bianciardi, Laura; Imperatore, Valentina; Fernandez-Vizarra, Erika; Lopomo, Angela; Falabella, Micol; Furini, Simone; Galluzzi, Paolo; Grosso, Salvatore; Zeviani, Massimo; Renieri, Alessandra; Mari, Francesca; Frullanti, Elisa

    2016-11-01

    We report here the case of a young male who started to show verbal fluency disturbance, clumsiness and gait anomalies at the age of 3.5years and presented bilateral striatal necrosis. Clinically, the diagnosis was compatible with Leigh syndrome but the underlying molecular defect remained elusive even after exome analysis using autosomal/X-linked recessive or de novo models. Dosage of respiratory chain activity on fibroblasts, but not in muscle, underlined a deficit in complex I. Re-analysis of heterozygous probably pathogenic variants, inherited from one healthy parent, identified the p.Ala178Pro in NDUFAF6, a complex I assembly factor. RNA analysis showed an almost mono-allelic expression of the mutated allele in blood and fibroblasts and puromycin treatment on cultured fibroblasts did not lead to the rescue of the maternal allele expression, not supporting the involvement of nonsense-mediated RNA decay mechanism. Complementation assay underlined a recovery of complex I activity after transduction of the wild-type gene. Since the second mutation was not detected and promoter methylation analysis resulted normal, we hypothesized a non-exonic event in the maternal allele affecting a regulatory element that, in conjunction with the paternal mutation, leads to the autosomal recessive disorder and the different allele expression in various tissues. This paper confirms NDUFAF6 as a genuine morbid gene and proposes the coupling of exome sequencing with mRNA analysis as a method useful for enhancing the exome sequencing detection rate when the simple application of classical inheritance models fails.

  6. Integrated bioinformatics analysis of chromatin regulator EZH2 in regulating mRNA and lncRNA expression by ChIP sequencing and RNA sequencing

    PubMed Central

    Li, Yuan; Luo, Mei; Shi, Xuejiao; Lu, Zhiliang; Sun, Shouguo; Huang, Jianbing; Chen, Zhaoli; He, Jie

    2016-01-01

    Enhancer of zeste homolog 2 (EZH2), a dynamic chromatin regulator in cancer, represents a potential therapeutic target showing early signs of promise in clinical trials. EZH2 ChIP sequencing data in 19 cell lines and RNA sequencing data in ten cancer types were downloaded from GEO and TCGA, respectively. Integrated ChIP sequencing analysis and co-expressing analysis were conducted and both mRNA and long noncoding RNA (lncRNA) targets were detected. We detected a median of 4,672 mRNA targets and 4,024 lncRNA targets regulated by EZH2 in 19 cell lines. 20 mRNA targets and 27 lncRNA targets were found in all 19 cell lines. These mRNA targets were enriched in pathways in cancer, Hippo, Wnt, MAPK and PI3K-Akt pathways. Co-expression analysis confirmed numerous targets, mRNA genes (RRAS, TGFBR2, NUF2 and PRC1) and lncRNA genes (lncRNA LINC00261, DIO3OS, RP11-307C12.11 and RP11-98D18.9) were potential targets and were significantly correlated with EZH2. We predicted genome-wide potential targets and the role of EZH2 in regulating as a transcriptional suppressor or activator which could pave the way for mechanism studies and the targeted therapy of EZH2 in cancer. PMID:27835578

  7. SxtA gene sequence analysis of dinoflagellate Alexandrium minutum

    NASA Astrophysics Data System (ADS)

    Norshaha, Safida Anira; Latib, Norhidayu Abdul; Usup, Gires; Yusof, Nurul Yuziana Mohd

    2015-09-01

    The dinoflagellate Alexandrium minutum is typically known for the production of potent neurotoxins such as saxitoxin, affecting the health of human seafood consumers via paralytic shellfish poisoning (PSP). These phenomena is related to the harmful algal blooms (HABs) that is believed to be influenced by environmental and nutritional factors. Previous study has revealed that SxtA gene is a starting gene that involved in the saxitoxin production pathway. The aim of this study was to analyse the sequence of the sxtA gene in A. minutum. The dinoflagellates culture was cultured at temperature 26°C with 16:8-hour light:dark photocycle. After the samples were harvested, RNA was extracted, complementary DNA (cDNA) was synthesised and amplified by polymerase chain reaction (PCR). The PCR products were then purified and cloned before sequenced. The SxtA sequence obtained was then analyzed in order to identify the presence of SxtA gene in Alexandrium minutum.

  8. Data repository mapping for influenza protein sequence analysis

    NASA Astrophysics Data System (ADS)

    Pellegrino, Donald; Chen, Chaomei

    2011-01-01

    This paper introduces a new method for creating an interactive sequence similarity map of all known influenza virus protein sequences and integrating the map with existing general purpose analytical tools. The NCBI data model was designed to provide a high degree of interconnectedness amongst data objects. Substantial and continuous increase in data volume has led to a large and highly connected information space. Researchers seeking to explore this space are challenged to identify a starting point. They often choose data that is popular in the literature. Reference in the literature follow a power law distribution and popular data points may bias explorers toward paths that lead only to a dead-end of what is already known. To help discover the unexpected we developed an interactive visual analytics system to map the information space of influenza protein sequence data. The design is motivated by the needs of eScience researchers.

  9. Laser desorption mass spectrometry for DNA analysis and sequencing

    SciTech Connect

    Chen, C.H.; Taranenko, N.I.; Tang, K.; Allman, S.L.

    1995-03-01

    Laser desorption mass spectrometry has been considered as a potential new method for fast DNA sequencing. Our approach is to use matrix-assisted laser desorption to produce parent ions of DNA segments and a time-of-flight mass spectrometer to identify the sizes of DNA segments. Thus, the approach is similar to gel electrophoresis sequencing using Sanger`s enzymatic method. However, gel, radioactive tagging, and dye labeling are not required. In addition, the sequencing process can possibly be finished within a few hundred microseconds instead of hours and days. In order to use mass spectrometry for fast DNA sequencing, the following three criteria need to be satisfied. They are (1) detection of large DNA segments, (2) sensitivity reaching the femtomole region, and (3) mass resolution good enough to separate DNA segments of a single nucleotide difference. It has been very difficult to detect large DNA segments by mass spectrometry before due to the fragile chemical properties of DNA and low detection sensitivity of DNA ions. We discovered several new matrices to increase the production of DNA ions. By innovative design of a mass spectrometer, we can increase the ion energy up to 45 KeV to enhance the detection sensitivity. Recently, we succeeded in detecting a DNA segment with 500 nucleotides. The sensitivity was 100 femtomole. Thus, we have fulfilled two key criteria for using mass spectrometry for fast DNA sequencing. The major effort in the near future is to improve the resolution. Different approaches are being pursued. When high resolution of mass spectrometry can be achieved and automation of sample preparation is developed, the sequencing speed to reach 500 megabases per year can be feasible.

  10. Sequence-structure analysis of FAD-containing proteins.

    PubMed

    Dym, O; Eisenberg, D

    2001-09-01

    We have analyzed structure-sequence relationships in 32 families of flavin adenine dinucleotide (FAD)-binding proteins, to prepare for genomic-scale analyses of this family. Four different FAD-family folds were identified, each containing at least two or more protein families. Three of these families, exemplified by glutathione reductase (GR), ferredoxin reductase (FR), and p-cresol methylhydroxylase (PCMH) were previously defined, and a family represented by pyruvate oxidase (PO) is newly defined. For each of the families, several conserved sequence motifs have been characterized. Several newly recognized sequence motifs are reported here for the PO, GR, and PCMH families. Each FAD fold can be uniquely identified by the presence of distinctive conserved sequence motifs. We also analyzed cofactor properties, some of which are conserved within a family fold while others display variability. Among the conserved properties is cofactor directionality: in some FAD-structural families, the adenine ring of the FAD points toward the FAD-binding domain, whereas in others the isoalloxazine ring points toward this domain. In contrast, the FAD conformation and orientation are conserved in some families while in others it displays some variability. Nevertheless, there are clear correlations among the FAD-family fold, the shape of the pocket, and the FAD conformation. Our general findings are as follows: (a) no single protein 'pharmacophore' exists for binding FAD; (b) in every FAD-binding family, the pyrophosphate moiety binds to the most strongly conserved sequence motif, suggesting that pyrophosphate binding is a significant component of molecular recognition; and (c) sequence motifs can identify proteins that bind phosphate-containing ligands.

  11. Efficient analysis of mouse genome sequences reveal many nonsense variants

    PubMed Central

    Steeland, Sophie; Timmermans, Steven; Van Ryckeghem, Sara; Hulpiau, Paco; Saeys, Yvan; Van Montagu, Marc; Vandenbroucke, Roosmarijn E.; Libert, Claude

    2016-01-01

    Genetic polymorphisms in coding genes play an important role when using mouse inbred strains as research models. They have been shown to influence research results, explain phenotypical differences between inbred strains, and increase the amount of interesting gene variants present in the many available inbred lines. SPRET/Ei is an inbred strain derived from Mus spretus that has ∼1% sequence difference with the C57BL/6J reference genome. We obtained a listing of all SNPs and insertions/deletions (indels) present in SPRET/Ei from the Mouse Genomes Project (Wellcome Trust Sanger Institute) and processed these data to obtain an overview of all transcripts having nonsynonymous coding sequence variants. We identified 8,883 unique variants affecting 10,096 different transcripts from 6,328 protein-coding genes, which is about 28% of all coding genes. Because only a subset of these variants results in drastic changes in proteins, we focused on variations that are nonsense mutations that ultimately resulted in a gain of a stop codon. These genes were identified by in silico changing the C57BL/6J coding sequences to the SPRET/Ei sequences, converting them to amino acid (AA) sequences, and comparing the AA sequences. All variants and transcripts affected were also stored in a database, which can be browsed using a SPRET/Ei M. spretus variants web tool (www.spretus.org), including a manual. We validated the tool by demonstrating the loss of function of three proteins predicted to be severely truncated, namely Fas, IRAK2, and IFNγR1. PMID:27147605

  12. Sequence-specific apolipoprotein A-I effects on lecithin:cholesterol acyltransferase activity.

    PubMed

    Dergunov, Alexander D

    2013-06-01

    Existing kinetic data of cholesteryl ester formation by lecithin:cholesterol acyltransferase in discoidal high-density lipoproteins with 34 mutations of apoA-I that involved all putative helices were grouped by cluster analysis into four noncoincident regions with mutations both without any functional impairment and with profound isolated (V- and K-mutations) or common (VK-mutations) effect on V(max)(app) and K(m)(app). Data were analyzed with a new kinetic model of LCAT activity at interface that exploits the efficiency of LCAT binding to the particle, particle dimensions, and surface concentrations of phosphatidylcholine and cholesterol. V-mutations with major location in the central part and C-domain affected the second-order rate constant of cholesteryl ester formation at the solvolysis of acyl-enzyme intermediate by cholesterol as nucleophile. The central region in apoA-I sequence is suggested to influence the proper positioning of cholesterol molecule toward LCAT active center with major contribution of arginine residue(s). K-mutations with major location in N-domain may affect binding and stability of enzyme-phosphatidylcholine complex. VK-mutations may possess mixed effects; the independent binding measurement may segregate individual steps.

  13. Mapping membrane activity in undiscovered peptide sequence space using machine learning.

    PubMed

    Lee, Ernest Y; Fulan, Benjamin M; Wong, Gerard C L; Ferguson, Andrew L

    2016-11-29

    There are some ∼1,100 known antimicrobial peptides (AMPs), which permeabilize microbial membranes but have diverse sequences. Here, we develop a support vector machine (SVM)-based classifier to investigate ⍺-helical AMPs and the interrelated nature of their functional commonality and sequence homology. SVM is used to search the undiscovered peptide sequence space and identify Pareto-optimal candidates that simultaneously maximize the distance σ from the SVM hyperplane (thus maximize its "antimicrobialness") and its ⍺-helicity, but minimize mutational distance to known AMPs. By calibrating SVM machine learning results with killing assays and small-angle X-ray scattering (SAXS), we find that the SVM metric σ correlates not with a peptide's minimum inhibitory concentration (MIC), but rather its ability to generate negative Gaussian membrane curvature. This surprising result provides a topological basis for membrane activity common to AMPs. Moreover, we highlight an important distinction between the maximal recognizability of a sequence to a trained AMP classifier (its ability to generate membrane curvature) and its maximal antimicrobial efficacy. As mutational distances are increased from known AMPs, we find AMP-like sequences that are increasingly difficult for nature to discover via simple mutation. Using the sequence map as a discovery tool, we find a unexpectedly diverse taxonomy of sequences that are just as membrane-active as known AMPs, but with a broad range of primary functions distinct from AMP functions, including endogenous neuropeptides, viral fusion proteins, topogenic peptides, and amyloids. The SVM classifier is useful as a general detector of membrane activity in peptide sequences.

  14. Monoclonal antibodies raised against 167-180 aa sequence of human carbonic anhydrase XII inhibit its enzymatic activity.

    PubMed

    Dekaminaviciute, Dovile; Kairys, Visvaldas; Zilnyte, Milda; Petrikaite, Vilma; Jogaite, Vaida; Matuliene, Jurgita; Gudleviciene, Zivile; Vullo, Daniela; Supuran, Claudiu T; Zvirbliene, Aurelija

    2014-12-01

    Abstract Human carbonic anhydrase XII (CA XII) is a single-pass transmembrane protein with an extracellular catalytic domain. This enzyme is being recognized as a potential biomarker for different tumours. The current study was aimed to generate monoclonal antibodies (MAbs) neutralizing the enzymatic activity of CA XII. Bioinformatics analysis of CA XII structure revealed surface-exposed sequences located in a proximity of its catalytic centre. Two MAbs against the selected antigenic peptide spanning 167-180 aa sequence of CA XII were generated. The MAbs were reactive with recombinant catalytic domain of CA XII expressed either in E. coli or mammalian cells. Inhibitory activity of the MAbs was demonstrated by a stopped flow CO2 hydration assay. The study provides new data on the surface-exposed linear CA XII epitope that may serve as a target for inhibitory antibodies with a potential immunotherapeutic application.

  15. Analysis of the complete DNA sequence of murine cytomegalovirus.

    PubMed Central

    Rawlinson, W D; Farrell, H E; Barrell, B G

    1996-01-01

    The complete DNA sequence of the Smith strain of murine cytomegalovirus (MCMV) was determined from virion DNA by using a whole-genome shotgun approach. The genome has an overall G+C content of 58.7%, consists of 230,278 bp, and is arranged as a single unique sequence with short (31-bp) terminal direct repeats and several short internal repeats. Significant similarity to the genome of the sequenced human cytomegalovirus (HCMV) strain AD169 is evident, particularly for 78 open reading frames encoded by the central part of the genome. There is a very similar distribution of G+C content across the two genomes. Sequences toward the ends of the MCMV genome encode tandem arrays of homologous glycoproteins (gps) arranged as two gene families. The left end encodes 15 gps that represent one family, and the right end encodes a different family of 11 gps. A homolog (m144) of cellular major histocompatibility complex (MHC) class I genes is located at the end of the genome opposite the HCMV MHC class I homolog (UL18). G protein-coupled receptor (GCR) homologs (M33 and M78) occur in positions congruent with two (UL33 and UL78) of the four putative HCMV GCR homologs. Counterparts of all of the known enzyme homologs in HCMV are present in the MCMV genome, including the phosphotransferase gene (M97), whose product phosphorylates ganciclovir in HCMV-infected cells, and the assembly protein (M80). PMID:8971012

  16. Exome Sequence Analysis of 14 Families With High Myopia

    PubMed Central

    Kloss, Bethany A.; Tompson, Stuart W.; Whisenhunt, Kristina N.; Quow, Krystina L.; Huang, Samuel J.; Pavelec, Derek M.; Rosenberg, Thomas; Young, Terri L.

    2017-01-01

    Purpose To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Methods Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sanger sequencing was used to confirm variants in original DNA, and to test for disease cosegregation in additional family members. Candidate genes and chromosomal loci previously associated with myopic refractive error and its endophenotypes were comprehensively screened. Results In 14 high myopia families, we identified 73 rare and 31 novel gene variants as candidates for pathogenicity. In seven of these families, two of the novel and eight of the rare variants were within known myopia loci. A total of 104 heterozygous nonsynonymous rare variants in 104 genes were identified in 10 out of 14 probands. Each variant cosegregated with affection status. No rare variants were identified in genes known to cause myopia or in genes closest to published genome-wide association study association signals for refractive error or its endophenotypes. Conclusions Whole exome sequencing was performed to determine gene variants implicated in the pathogenesis of AD high myopia. This study provides new genes for consideration in the pathogenesis of high myopia, and may aid in the development of genetic profiling of those at greatest risk for attendant ocular morbidities of this disorder. PMID:28384719

  17. Learning Progressions and Teaching Sequences: A Review and Analysis

    ERIC Educational Resources Information Center

    Duschl, Richard; Maeng, Seungho; Sezen, Asli

    2011-01-01

    Our paper is an analytical review of the design, development and reporting of learning progressions and teaching sequences. Research questions are: (1) what criteria are being used to propose a "hypothetical learning progression/trajectory" and (2) what measurements/evidence are being used to empirically define and refine a "hypothetical learning…

  18. DNA sequence and analysis of human chromosome 8.

    PubMed

    Nusbaum, Chad; Mikkelsen, Tarjei S; Zody, Michael C; Asakawa, Shuichi; Taudien, Stefan; Garber, Manuel; Kodira, Chinnappa D; Schueler, Mary G; Shimizu, Atsushi; Whittaker, Charles A; Chang, Jean L; Cuomo, Christina A; Dewar, Ken; FitzGerald, Michael G; Yang, Xiaoping; Allen, Nicole R; Anderson, Scott; Asakawa, Teruyo; Blechschmidt, Karin; Bloom, Toby; Borowsky, Mark L; Butler, Jonathan; Cook, April; Corum, Benjamin; DeArellano, Kurt; DeCaprio, David; Dooley, Kathleen T; Dorris, Lester; Engels, Reinhard; Glöckner, Gernot; Hafez, Nabil; Hagopian, Daniel S; Hall, Jennifer L; Ishikawa, Sabine K; Jaffe, David B; Kamat, Asha; Kudoh, Jun; Lehmann, Rüdiger; Lokitsang, Tashi; Macdonald, Pendexter; Major, John E; Matthews, Charles D; Mauceli, Evan; Menzel, Uwe; Mihalev, Atanas H; Minoshima, Shinsei; Murayama, Yuji; Naylor, Jerome W; Nicol, Robert; Nguyen, Cindy; O'Leary, Sinéad B; O'Neill, Keith; Parker, Stephen C J; Polley, Andreas; Raymond, Christina K; Reichwald, Kathrin; Rodriguez, Joseph; Sasaki, Takashi; Schilhabel, Markus; Siddiqui, Roman; Smith, Cherylyn L; Sneddon, Tam P; Talamas, Jessica A; Tenzin, Pema; Topham, Kerri; Venkataraman, Vijay; Wen, Gaiping; Yamazaki, Satoru; Young, Sarah K; Zeng, Qiandong; Zimmer, Andrew R; Rosenthal, Andre; Birren, Bruce W; Platzer, Matthias; Shimizu, Nobuyoshi; Lander, Eric S

    2006-01-19

    The International Human Genome Sequencing Consortium (IHGSC) recently completed a sequence of the human genome. As part of this project, we have focused on chromosome 8. Although some chromosomes exhibit extreme characteristics in terms of length, gene content, repeat content and fraction segmentally duplicated, chromosome 8 is distinctly typical in character, being very close to the genome median in each of these aspects. This work describes a finished sequence and gene catalogue for the chromosome, which represents just over 5% of the euchromatic human genome. A unique feature of the chromosome is a vast region of approximately 15 megabases on distal 8p that appears to have a strikingly high mutation rate, which has accelerated in the hominids relative to other sequenced mammals. This fast-evolving region contains a number of genes related to innate immunity and the nervous system, including loci that appear to be under positive selection--these include the major defensin (DEF) gene cluster and MCPH1, a gene that may have contributed to the evolution of expanded brain size in the great apes. The data from chromosome 8 should allow a better understanding of both normal and disease biology and genome evolution.

  19. Genome sequence and analysis of the tuber crop potato.

    PubMed

    Xu, Xun; Pan, Shengkai; Cheng, Shifeng; Zhang, Bo; Mu, Desheng; Ni, Peixiang; Zhang, Gengyun; Yang, Shuang; Li, Ruiqiang; Wang, Jun; Orjeda, Gisella; Guzman, Frank; Torres, Michael; Lozano, Roberto; Ponce, Olga; Martinez, Diana; De la Cruz, Germán; Chakrabarti, S K; Patil, Virupaksh U; Skryabin, Konstantin G; Kuznetsov, Boris B; Ravin, Nikolai V; Kolganova, Tatjana V; Beletsky, Alexey V; Mardanov, Andrei V; Di Genova, Alex; Bolser, Daniel M; Martin, David M A; Li, Guangcun; Yang, Yu; Kuang, Hanhui; Hu, Qun; Xiong, Xingyao; Bishop, Gerard J; Sagredo, Boris; Mejía, Nilo; Zagorski, Wlodzimierz; Gromadka, Robert; Gawor, Jan; Szczesny, Pawel; Huang, Sanwen; Zhang, Zhonghua; Liang, Chunbo; He, Jun; Li, Ying; He, Ying; Xu, Jianfei; Zhang, Youjun; Xie, Binyan; Du, Yongchen; Qu, Dongyu; Bonierbale, Merideth; Ghislain, Marc; Herrera, Maria del Rosario; Giuliano, Giovanni; Pietrella, Marco; Perrotta, Gaetano; Facella, Paolo; O'Brien, Kimberly; Feingold, Sergio E; Barreiro, Leandro E; Massa, Gabriela A; Diambra, Luis; Whitty, Brett R; Vaillancourt, Brieanne; Lin, Haining; Massa, Alicia N; Geoffroy, Michael; Lundback, Steven; DellaPenna, Dean; Buell, C Robin; Sharma, Sanjeev Kumar; Marshall, David F; Waugh, Robbie; Bryan, Glenn J; Destefanis, Marialaura; Nagy, Istvan; Milbourne, Dan; Thomson, Susan J; Fiers, Mark; Jacobs, Jeanne M E; Nielsen, Kåre L; Sønderkær, Mads; Iovene, Marina; Torres, Giovana A; Jiang, Jiming; Veilleux, Richard E; Bachem, Christian W B; de Boer, Jan; Borm, Theo; Kloosterman, Bjorn; van Eck, Herman; Datema, Erwin; Hekkert, Bas te Lintel; Goverse, Aska; van Ham, Roeland C H J; Visser, Richard G F

    2011-07-10

    Potato (Solanum tuberosum L.) is the world's most important non-grain food crop and is central to global food security. It is clonally propagated, highly heterozygous, autotetraploid, and suffers acute inbreeding depression. Here we use a homozygous doubled-monoploid potato clone to sequence and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade. We also sequenced a heterozygous diploid clone and show that gene presence/absence variants and other potentially deleterious mutations occur frequently and are a likely cause of inbreeding depression. Gene family expansion, tissue-specific expression and recruitment of genes to new pathways contributed to the evolution of tuber development. The potato genome sequence provides a platform for genetic improvement of this vital crop.

  20. Transcriptome analysis of the phytopathogenic fungus Rhizoctonia solani AG1-IB 7/3/14 applying high-throughput sequencing of expressed sequence tags (ESTs).

    PubMed

    Wibberg, Daniel; Jelonek, Lukas; Rupp, Oliver; Kröber, Magdalena; Goesmann, Alexander; Grosch, Rita; Pühler, Alfred; Schlüter, Andreas

    2014-01-01

    Rhizoctonia solani is a soil-borne plant pathogenic fungus of the phylum Basidiomycota. It affects a wide range of agriculturally important crops and hence is responsible for economically relevant crop losses. Transcriptome analysis of the bottom rot pathogen R. solani AG1-1B (isolate 7/3/14) by applying high-throughput sequencing and bioinformatics methods addressing Expressed Sequence Tag (EST) data interpretation provided new insights in expressed genes of this fungus. Two normalized cDNA libraries representing different cultivation conditions of the fungus were sequenced on the 454 FLX (Roche) system. Subsequent to cDNA sequence assembly and quality control, ESTs were analysed applying advanced bioinformatics methods. More than 14 000 transcript isoforms originating from approximately 10 000 predictable R. solani AG1-IB 7/3/14 genes are represented in each dataset. Comparative analyses revealed several differentially expressed genes depending on the growth conditions applied. Determinants with predicted functions in recognition processes between the fungus and the host plant were identified. Moreover, many R. solani AG1-IB ESTs were predicted to encode putative cellulose, pectin, and lignin degrading enzymes. Furthermore, genes playing a possible role in mitogen-activated protein (MAP) kinase cascades, 4-aminobutyric acid (GABA) metabolism, melanin synthesis, plant defence antagonism, phytotoxin, and mycotoxin synthesis were detected.

  1. Mercury: Next-gen Data Analysis and Annotation Pipeline (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Sexton, David

    2012-06-01

    David Sexton (Baylor) gives a talk titled "Mercury: Next-gen Data Analysis and Annotation Pipeline" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  2. Mercury: Next-gen Data Analysis and Annotation Pipeline (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Sexton, David [Baylor

    2016-07-12

    David Sexton (Baylor) gives a talk titled "Mercury: Next-gen Data Analysis and Annotation Pipeline" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  3. Novel technologies applied to the nucleotide sequencing and comparative sequence analysis of the genomes of infectious agents in veterinary medicine.

    PubMed

    Granberg, F; Bálint, Á; Belák, S

    2016-04-01

    Next-generation sequencing (NGS), also referred to as deep, high-throughput or massively parallel sequencing, is a powerful new tool that can be used for the complex diagnosis and intensive monitoring of infectious disease in veterinary medicine. NGS technologies are also being increasingly used to study the aetiology, genomics, evolution and epidemiology of infectious disease, as well as host-pathogen interactions and other aspects of infection biology. This review briefly summarises recent progress and achievements in this field by first introducing a range of novel techniques and then presenting examples of NGS applications in veterinary infection biology. Various work steps and processes for sampling and sample preparation, sequence analysis and comparative genomics, and improving the accuracy of genomic prediction are discussed, as are bioinformatics requirements. Examples of sequencing-based applications and comparative genomics in veterinary medicine are then provided. This review is based on novel references selected from the literature and on experiences of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine, Uppsala, Sweden.

  4. FASTAptamer: A Bioinformatic Toolkit for High-throughput Sequence Analysis of Combinatorial Selections

    PubMed Central

    Alam, Khalid K; Chang, Jonathan L; Burke, Donald H

    2015-01-01

    High-throughput sequence (HTS) analysis of combinatorial selection populations accelerates lead discovery and optimization and offers dynamic insight into selection processes. An underlying principle is that selection enriches high-fitness sequences as a fraction of the population, whereas low-fitness sequences are depleted. HTS analysis readily provides the requisite numerical information by tracking the evolutionary trajectory of individual sequences in response to selection pressures. Unlike genomic data, for which a number of software solutions exist, user-friendly tools are not readily available for the combinatorial selections field, leading many users to create custom software. FASTAptamer was designed to address the sequence-level analysis needs of the field. The open source FASTAptamer toolkit counts, normalizes and ranks read counts in a FASTQ file, compares populations for sequence distribution, generates clusters of sequence families, calculates fold-enrichment of sequences throughout the course of a selection and searches for degenerate sequence motifs. While originally designed for aptamer selections, FASTAptamer can be applied to any selection strategy that can utilize next-generation DNA sequencing, such as ribozyme or deoxyribozyme selections, in vivo mutagenesis and various surface display technologies (peptide, antibody fragment, mRNA, etc.). FASTAptamer software, sample data and a user's guide are available for download at http://burkelab.missouri.edu/fastaptamer.html. PMID:25734917

  5. Network Analysis of Sequence-Function Relationships and Exploration of Sequence Space of TEM β-Lactamases.

    PubMed

    Zeil, Catharina; Widmann, Michael; Fademrecht, Silvia; Vogel, Constantin; Pleiss, Jürgen

    2016-05-01

    The Lactamase Engineering Database (www.LacED.uni-stuttgart.de) was developed to facilitate the classification and analysis of TEM β-lactamases. The current version contains 474 TEM variants. Two hundred fifty-nine variants form a large scale-free network of highly connected point mutants. The network was divided into three subnetworks which were enriched by single phenotypes: one network with predominantly 2be and two networks with 2br phenotypes. Fifteen positions were found to be highly variable, contributing to the majority of the observed variants. Since it is expected that a considerable fraction of the theoretical sequence space is functional, the currently sequenced 474 variants represent only the tip of the iceberg of functional TEM β-lactamase variants which form a huge natural reservoir of highly interconnected variants. Almost 50% of the variants are part of a quartet. Thus, two single mutations that result in functional enzymes can be combined into a functional protein. Most of these quartets consist of the same phenotype, or the mutations are additive with respect to the phenotype. By predicting quartets from triplets, 3,916 unknown variants were constructed. Eighty-seven variants complement multiple quartets and therefore have a high probability of being functional. The construction of a TEM β-lactamase network and subsequent analyses by clustering and quartet prediction are valuable tools to gain new insights into the viable sequence space of TEM β-lactamases and to predict their phenotype. The highly connected sequence space of TEM β-lactamases is ideally suited to network analysis and demonstrates the strengths of network analysis over tree reconstruction methods.

  6. The Pollino Seismic Sequence: Activated Graben Structures in a Seismic Gap

    NASA Astrophysics Data System (ADS)

    Rößler, Dirk; Passarelli, Luigi; Govoni, Aladino; Bindi, Dino; Cesca, Simone; Hainzl, Sebatian; Maccaferri, Francesco; Rivalta, Eleonora; Woith, Heiko; Dahm, Torsten

    2015-04-01

    mapped for the area. Consistent with mapped faults, the seismicity interested both eastwards and westwards dipping normal faults that define the geometry of seismically active graben-like structures. At least one cluster shows an additional spatio-temporal migration with spreading hypocentres similar to other swarm areas with fluid-triggering mechanisms. The static Coulomb stress change transferred by the largest shock onto the swarm area and on the CF cannot explain the observed high seismicity rate. We study the evolution of the frequency-size distribution of the events and the seismicity rate changes. We find that the majority of the earthquakes cannot be justified as aftershocks (directly related to the tectonics or to earthquake-earthquake interaction) and are best explained by an additional forcing active over the entire sequence. Our findings are consistent with the action of fluids (e.g. pore-pressure diffusion) triggering seismicity on pre-loaded faults. Additional aseismic release of tectonic strain by transient, slow slip is also consistent with our analysis. Analysis of deformation time series may clarify this point in future studies.

  7. Determining structure and function of steroid dehydrogenase enzymes by sequence analysis, homology modeling, and rational mutational analysis.

    PubMed

    Duax, William L; Thomas, James; Pletnev, Vladimir; Addlagatta, Anthony; Huether, Robert; Habegger, Lukas; Weeks, Charles M

    2005-12-01

    The short-chain oxidoreductase (SCOR) family of enzymes includes over 6,000 members identified in sequenced genomes. Of these enzymes, approximately 300 have been characterized functionally, and the three-dimensional crystal structures of approximately 40 have been reported. Since some SCOR enzymes are steroid dehydrogenases involved in hypertension, diabetes, breast cancer, and polycystic kidney disease, it is important to characterize the other members of the family for which the biological functions are currently unknown and to determine their three-dimensional structure and mechanism of action. Although the SCOR family appears to have only a single fully conserved residue, it was possible, using bioinformatics methods, to determine characteristic fingerprints composed of 30-40 residues that are conserved at the 70% or greater level in SCOR subgroups. These fingerprints permit reliable prediction of several important structure-function features including cofactor preference, catalytic residues, and substrate specificity. Human type 1 3beta-hydroxysteroid dehydrogenase isomerase (3beta-HSDI) has 30% sequence identity with a human UDP galactose 4-epimerase (UDPGE), a SCOR family enzyme for which an X-ray structure has been reported. Both UDPGE and 3-HSDI appear to trace their origins back to bacterial 3alpha,20beta-HSD. Combining three-dimensional structural information and sequence data on the 3alpha,20beta-HSD, UDPGE, and 3beta-HSDI subfamilies with mutational analysis, we were able to identify the residues critical to the dehydrogenase function of 3-HSDI. We also identified the residues most probably responsible for the isomerase activity of 3beta-HSDI. We test our predictions by specific mutations based on sequence analysis and our structure-based model.

  8. Digital fragment analysis of short tandem repeats by high-throughput amplicon sequencing.

    PubMed

    Darby, Brian J; Erickson, Shay F; Hervey, Samuel D; Ellis-Felege, Susan N

    2016-07-01

    High-throughput sequencing has been proposed as a method to genotype microsatellites and overcome the four main technical drawbacks of capillary electrophoresis: amplification artifacts, imprecise sizing, length homoplasy, and limited multiplex capability. The objective of this project was to test a high-throughput amplicon sequencing approach to fragment analysis of short tandem repeats and characterize its advantages and disadvantages against traditional capillary electrophoresis. We amplified and sequenced 12 muskrat microsatellite loci from 180 muskrat specimens and analyzed the sequencing data for precision of allele calling, propensity for amplification or sequencing artifacts, and for evidence of length homoplasy. Of the 294 total alleles, we detected by sequencing, only 164 alleles would have been detected by capillary electrophoresis as the remaining 130 alleles (44%) would have been hidden by length homoplasy. The ability to detect a greater number of unique alleles resulted in the ability to resolve greater population genetic structure. The primary advantages of fragment analysis by sequencing are the ability to precisely size fragments, resolve length homoplasy, multiplex many individuals and many loci into a single high-throughput run, and compare data across projects and across laboratories (present and future) with minimal technical calibration. A significant disadvantage of fragment analysis by sequencing is that the method is only practical and cost-effective when performed on batches of several hundred samples with multiple loci. Future work is needed to optimize throughput while minimizing costs and to update existing microsatellite allele calling and analysis programs to accommodate sequence-aware microsatellite data.

  9. Human Immunodeficiency Virus Reverse Transcriptase and Protease Sequence Database: an expanded data model integrating natural language text and sequence analysis programs.

    PubMed

    Kantor, R; Machekano, R; Gonzales, M J; Dupnik, K; Schapiro, J M; Shafer, R W

    2001-01-01

    The HIV Reverse Transcriptase and Protease Sequence Database is an on-line relational database that catalogs evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of anti-HIV therapy (http://hivdb.stanford.edu). The database contains a compilation of nearly all published HIV RT and protease sequences, including submissions from International Collaboration databases and sequences published in journal articles. Sequences are linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. During the past year 3500 sequences have been added and the data model has been expanded to include drug susceptibility data on sequenced isolates. Database content has also been integrated with didactic text and the output of two sequence analysis programs.

  10. Shared investment projects and forecasting errors: setting framework conditions for coordination and sequencing data quality activities.

    PubMed

    Leitner, Stephan; Brauneis, Alexander; Rausch, Alexandra

    2015-01-01

    In this paper, we investigate the impact of inaccurate forecasting on the coordination of distributed investment decisions. In particular, by setting up a computational multi-agent model of a stylized firm, we investigate the case of investment opportunities that are mutually carried out by organizational departments. The forecasts of concern pertain to the initial amount of money necessary to launch and operate an investment opportunity, to the expected intertemporal distribution of cash flows, and the departments' efficiency in operating the investment opportunity at hand. We propose a budget allocation mechanism for coordinating such distributed decisions The paper provides guidance on how to set framework conditions, in terms of the number of investment opportunities considered in one round of funding and the number of departments operating one investment opportunity, so that the coordination mechanism is highly robust to forecasting errors. Furthermore, we show that-in some setups-a certain extent of misforecasting is desirable from the firm's point of view as it supports the achievement of the corporate objective of value maximization. We then address the question of how to improve forecasting quality in the best possible way, and provide policy advice on how to sequence activities for improving forecasting quality so that the robustness of the coordination mechanism to errors increases in the best possible way. At the same time, we show that wrong decisions regarding the sequencing can lead to a decrease in robustness. Finally, we conduct a comprehensive sensitivity analysis and prove that-in particular for relatively good forecasters-most of our results are robust to changes in setting the parameters of our multi-agent simulation model.

  11. Gardnerella vaginalis Subgroups Defined by cpn60 Sequencing and Sialidase Activity in Isolates from Canada, Belgium and Kenya.

    PubMed

    Schellenberg, John J; Paramel Jayaprakash, Teenus; Withana Gamage, Niradha; Patterson, Mo H; Vaneechoutte, Mario; Hill, Janet E

    2016-01-01

    Increased abundance of Gardnerella vaginalis and sialidase activity in vaginal fluid is associated with bacterial vaginosis (BV), a common but poorly understood clinical entity associated with poor reproductive health outcomes. Since most women are colonized with G. vaginalis, its status as a normal member of the vaginal microbiota or pathogen causing BV remains controversial, and numerous classification schemes have been described. Since 2005, sequencing of the chaperonin-60 universal target (cpn60 UT) has distinguished four subgroups in isolate collections, clone libraries and deep sequencing datasets. To clarify potential clinical and diagnostic significance of cpn60 subgroups, we undertook phenotypic and molecular characterization of 112 G. vaginalis isolates from three continents. A total of 36 subgroup A, 33 B, 35 C and 8 D isolates were identified through phylogenetic analysis of cpn60 sequences as corresponding to four "clades" identified in a recently published study, based on sequencing 473 genes across 17 isolates. cpn60 subgroups were compared with other previously described molecular methods for classification of Gardnerella subgroups, including amplified ribosomal DNA restriction analysis (ARDRA) and real-time PCR assays designed to quantify subgroups in vaginal samples. Although two ARDRA patterns were observed in isolates, each was observed in three cpn60 subgroups (A/B/D and B/C/D). Real-time PCR assays corroborated cpn60 subgroups overall, but 13 isolates from subgroups A, B and D were negative in all assays. A putative sialidase gene was detected in all subgroup B, C and D isolates, but only in a single subgroup A isolate. In contrast, sialidase activity was observed in all subgroup B isolates, 3 (9%) subgroup C isolates and no subgroup A or D isolates. These observations suggest distinct roles for G. vaginalis subgroups in BV pathogenesis. We conclude that cpn60 UT sequencing is a robust approach for defining G. vaginalis subgroups within the

  12. Gardnerella vaginalis Subgroups Defined by cpn60 Sequencing and Sialidase Activity in Isolates from Canada, Belgium and Kenya

    PubMed Central

    Schellenberg, John J.; Paramel Jayaprakash, Teenus; Withana Gamage, Niradha; Patterson, Mo H.; Vaneechoutte, Mario; Hill, Janet E.

    2016-01-01

    Increased abundance of Gardnerella vaginalis and sialidase activity in vaginal fluid is associated with bacterial vaginosis (BV), a common but poorly understood clinical entity associated with poor reproductive health outcomes. Since most women are colonized with G. vaginalis, its status as a normal member of the vaginal microbiota or pathogen causing BV remains controversial, and numerous classification schemes have been described. Since 2005, sequencing of the chaperonin-60 universal target (cpn60 UT) has distinguished four subgroups in isolate collections, clone libraries and deep sequencing datasets. To clarify potential clinical and diagnostic significance of cpn60 subgroups, we undertook phenotypic and molecular characterization of 112 G. vaginalis isolates from three continents. A total of 36 subgroup A, 33 B, 35 C and 8 D isolates were identified through phylogenetic analysis of cpn60 sequences as corresponding to four “clades” identified in a recently published study, based on sequencing 473 genes across 17 isolates. cpn60 subgroups were compared with other previously described molecular methods for classification of Gardnerella subgroups, including amplified ribosomal DNA restriction analysis (ARDRA) and real-time PCR assays designed to quantify subgroups in vaginal samples. Although two ARDRA patterns were observed in isolates, each was observed in three cpn60 subgroups (A/B/D and B/C/D). Real-time PCR assays corroborated cpn60 subgroups overall, but 13 isolates from subgroups A, B and D were negative in all assays. A putative sialidase gene was detected in all subgroup B, C and D isolates, but only in a single subgroup A isolate. In contrast, sialidase activity was observed in all subgroup B isolates, 3 (9%) subgroup C isolates and no subgroup A or D isolates. These observations suggest distinct roles for G. vaginalis subgroups in BV pathogenesis. We conclude that cpn60 UT sequencing is a robust approach for defining G. vaginalis subgroups within

  13. Active populations of rare microbes in oceanic environments as revealed by bromodeoxyuridine incorporation and 454 tag sequencing.

    PubMed

    Hamasaki, Koji; Taniguchi, Akito; Tada, Yuya; Kaneko, Ryo; Miki, Takeshi

    2016-02-01

    The "rare biosphere" consisting of thousands of low-abundance microbial taxa is important as a seed bank or a gene pool to maintain microbial functional redundancy and robustness of the ecosystem. Here we investigated contemporaneous growth of diverse microbial taxa including rare taxa and determined their variability in environmentally distinctive locations along a north-south transect in the Pacific Ocean in order to assess which taxa were actively growing and how environmental factors influenced bacterial community structures. A bromodeoxyuridine-labeling technique in combination with PCR amplicon pyrosequencing of 16S rRNA genes gave 215-793 OTUs from 1200 to 3500 unique sequences in the total communities and 175-299 OTUs nearly 860 to 1800 sequences in the active communities. Unexpectedly, many of the active OTUs were not detected in the total fractions. Among these active but rare OTUs, some taxa (2-4% of rare OTUs) showed much higher abundance (>0.10% of total reads) in the active fraction than in the total fraction, suggesting that their contribution to bacterial community productivity or growth was much larger than that expected from their standing stocks at each location. An ordination plot by the principal component analysis presented that bacterial community compositions among 4 sampling locations and between total and active fractions were distinctive with each other. A redundancy analysis revealed that the variability of community compositions significantly correlated to seawater temperature and dissolved oxygen concentration. Also, a variation partitioning analysis showed that the environmental factors explained 49% of the variability of community compositions and the distance only explained 4.0% of its variability. These results implied very dynamic change of community structures due to environmental filtering. The active bacterial populations are more diverse and spread further in rare biosphere than we have ever seen. This study implied that rare

  14. The DNA Sequence And Comparative Analysis Of Human Chromosome5

    SciTech Connect

    Schmutz, Jeremy; Martin, Joel; Terry, Astrid; Couronne, Olivier; Grimwood, Jane; Lowry, Steve; Gordon, Laurie A.; Scott, Duncan; Xie,Gary; Huang, Wayne; Hellsten, Uffe; Tran-Gyamfi, Mary; She, Xinwei; Prabhakar, Shyam; Aerts, Andrea; Altherr, Michael; Bajorek, Eva; Black,Stacey; Branscomb, Elbert; Caoile, Chenier; Challacombe, Jean F.; Chan,Yee Man; Denys, Mirian; Detter, John C.; Escobar, Julio; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Israni, Sanjay; Jett, Jamie; Kadner,Kristen; Kimball, Heather; Kobayashi, Arthur; Lopez, Frederick; Lou,Yunian; Martinez, Diego; Medina, Catherine; Morgan, Jenna; Nandkeshwar,Richard; Noonan, James P.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Priest, James; Ramirez, Lucia; Retterer, James; Rodriguez, Alex; Rogers,Stephanie; Salamov, Asaf; Salazar, Angelica; Thayer, Nina; Tice, Hope; Tsai, Ming; Ustaszewska, Anna; Vo, Nu; Wheeler, Jeremy; Wu, Kevin; Yang,Joan; Dickson, Mark; Cheng, Jan-Fang; Eichler, Evan E.; Olsen, Anne; Pennacchio, Len A.; Rokhsar, Daniel S.; Richardson, Paul; Lucas, SusanM.; Myers, Richard M.; Rubin, Edward M.

    2004-08-01

    Chromosome 5 is one of the largest human chromosomes and contains numerous intrachromosomal duplications, yet it has one of the lowest gene densities. This is partially explained by numerous gene-poor regions that display a remarkable degree of noncoding conservation with non-mammalian vertebrates, suggesting that they are functionally constrained. In total, we compiled 177.7 million base pairs of highly accurate finished sequence containing 923 manually curated protein-coding genes including the protocadherin and interleukin gene families. We also completely sequenced versions of the large chromosome-5-specific internal duplications. These duplications are very recent evolutionary events and probably have a mechanistic role in human physiological variation, as deletions in these regions are the cause of debilitating disorders including spinal muscular atrophy.

  15. The sequence and analysis of duplication rich human chromosome 16

    SciTech Connect

    Martin, J; Han, C; Gordon, L A; Terry, A; Prabhakar, S; She, X; Xie, G; Hellsten, U; Chan, Y M; Altherr, M; Couronne, O; Aerts, A; Bajorek, E; Black, S; Blumer, H; Branscomb, E; Brown, N; Bruno, W J; Buckingham, J; Callen, D F; Campbell, C S; Campbell, M L; Campbell, E W; Caoile, C; Challacombe, J F; Chasteen, L A; Chertkov, O; Chi, H C; Christensen, M; Clark, L M; Cohn, J D; Denys, M; Detter, J C; Dickson, M; Dimitrijevic-Bussod, M; Escobar, J; Fawcett, J J; Flowers, D; Fotopulos, D; Glavina, T; Gomez, M; Gonzales, E; Goodstein, D; Goodwin, L A; Grady, D L; Grigoriev, I; Groza, M; Hammon, N; Hawkins, T; Haydu, L; Hildebrand, C E; Huang, W; Israni, S; Jett, J; Jewett, P B; Kadner, K; Kimball, H; Kobayashi, A; Krawczyk, M; Leyba, T; Longmire, J L; Lopez, F; Lou, Y; Lowry, S; Ludeman, T; Manohar, C F; Mark, G A; McMurray, K L; Meincke, L J; Morgan, J; Moyzis, R K; Mundt, M O; Munk, A C; Nandkeshwar, R D; Pitluck, S; Pollard, M; Predki, P; Parson-Quintana, B; Ramirez, L; Rash, S; Retterer, J; Ricke, D O; Robinson, D; Rodriguez, A; Salamov, A; Saunders, E H; Scott, D; Shough, T; Stallings, R L; Stalvey, M; Sutherland, R D; Tapia, R; Tesmer, J G; Thayer, N; Thompson, L S; Tice, H; Torney, D C; Tran-Gyamfi, M; Tsai, M; Ulanovsky, L E; Ustaszewska, A; Vo, N; White, P S; Williams, A L; Wills, P L; Wu, J; Wu, K; Yang, J; DeJong, P; Bruce, D; Doggett, N A; Deaven, L; Schmutz, J; Grimwood, J; Richardson, P; Rokhsar, D S; Eichler, E E; Gilna, P; Lucas, S M; Myers, R M; Rubin, E M; Pennacchio, L A

    2005-04-06

    Human chromosome 16 features one of the highest levels of segmentally duplicated sequence among the human autosomes. We report here the 78,884,754 base pairs of finished chromosome 16 sequence, representing over 99.9% of its euchromatin. Manual annotation revealed 880 protein-coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes, and 3 RNA pseudogenes. These genes include metallothionein, cadherin, and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobase pairs were identified and result in gene content differences among humans. While the segmental duplications of chromosome 16 are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events likely to have had an impact on the evolution of primates and human disease susceptibility.

  16. Comparative analysis of isotropic diffusion weighted imaging sequences

    NASA Astrophysics Data System (ADS)

    Vellmer, Sebastian; Stirnberg, Rüdiger; Edelhoff, Daniel; Suter, Dieter; Stöcker, Tony; Maximov, Ivan I.

    2017-02-01

    Visualisation of living tissue structure and function is a challenging problem of modern imaging techniques. Diffusion MRI allows one to probe in vivo structures on a micrometer scale. However, conventional diffusion measurements are time-consuming procedures, because they require several measurements with different gradient directions. Considerable time savings are therefore possible by measurement schemes that generate an isotropic diffusion weighting in a single shot. Multiple approaches for generating isotropic diffusion weighting are known and have become very popular as useful tools in clinical research. Thus, there is a strong need for a comprehensive comparison of different isotropic weighting approaches. In the present work we introduce two new sequences based on simple (co)sine modulations and compare their performance to established q-space magic-angle spinning sequences and conventional DTI, using a diffusion phantom assembled from microcapillaries and in vivo experiments at 7 T. The advantages and disadvantages of all compared schemes are demonstrated and discussed.

  17. mitoSAVE: mitochondrial sequence analysis of variants in Excel.

    PubMed

    King, Jonathan L; Sajantila, Antti; Budowle, Bruce

    2014-09-01

    The mitochondrial genome (mtGenome) contains genetic information amenable to numerous applications such as medical research, population and evolutionary studies, and human identity testing. However, inconsistent nomenclature assignment makes haplotype comparison difficult and can lead to false exclusion of potentially useful profiles. Massively Parallel Sequencing (MPS) is a platform for sequencing large datasets and potentially whole populations with relative ease. However, the data generated are not easily parsed and interpreted. With this in mind, mitoSAVE has been developed to enable fast conversion of Variant Call Format (VCF) files. mitoSAVE is an Excel-based workbook that converts data within the VCF into mtDNA haplotypes using phylogenetically-established nomenclature as well as rule-based alignments consistent with current forensic standards. mitoSAVE is formatted for human mitochondrial genome; however, it can easily be adapted to support other reasonably small genomes.

  18. K-mer natural vector and its application to the phylogenetic analysis of genetic sequences

    PubMed Central

    Wen, Jia; Chan, Raymond H.; Yau, Shek-Chung; He, Rong L.; Yau, Stephen S. T.

    2014-01-01

    Based on the well-known k-mer model, we propose a k-mer natural vector model for representing a genetic sequence based on the numbers and distributions of k-mers in the sequence. We show that there exists a one-to-one correspondence between a genetic sequence and its associated k-mer natural vector. The k-mer natural vector method can be easily and quickly used to perform phylogenetic analysis of genetic sequences without requiring evolutionary models or human intervention. Whole or partial genomes can be handled more effective with our proposed method. It is applied to the phylogenetic analysis of genetic sequences, and the obtaining results fully demonstrate that the k-mer natural vector method is a very powerful tool for analysing and annotating genetic sequences and determining evolutionary relationships both in terms of accuracy and efficiency. PMID:24858075

  19. Sequence analysis and structural implications of rotavirus capsid proteins.

    PubMed

    Parbhoo, N; Dewar, J B; Gildenhuys, S

    Rotavirus is the major cause of severe virus-associated gastroenteritis worldwide in children aged 5 and younger. Many children lose their lives annually due to this infection and the impact is particularly pronounced in developing countries. The mature rotavirus is a non-enveloped triple-layered nucleocapsid containing 11 double stranded RNA segments. Here a global view on the sequence and structure of the three main capsid proteins, VP2, VP6 and VP7 is shown by generating a consensus sequence for each of these rotavirus proteins, for each species obtained from published data of representative rotavirus genotypes from across the world and across species. Degree of conservation between species was represented on homology models for each of the proteins. VP7 shows the highest level of variation with 14-45 amino acids showing conservation of less than 60%. These changes are localised to the outer surface alluding to a possible mechanism in evading the immune system. The middle layer, VP6 shows lower variability with only 14-32 sites having lower than 70% conservation. The inner structural layer made up of VP2 showed the lowest variability with only 1-16 sites having less than 70% conservation across species. The results correlate with each protein's multiple structural roles in the infection cycle. Thus, although the nucleotide sequences vary due to the error-prone nature of replication and lack of proof reading, the corresponding amino acid sequence of VP2, 6 and 7 remain relatively conserved. Benefits of this knowledge about the conservation include the ability to target proteins at sites that cannot undergo mutational changes without influencing viral fitness; as well as possibility to study systems that are highly evolved for structure and function in order to determine how to generate and manipulate such systems for use in various biotechnological applications.

  20. Sarment: Python modules for HMM analysis and partitioning of sequences.

    PubMed

    Guéguen, Laurent

    2005-08-15

    Sarment is a package of Python modules for easy building and manipulation of sequence segmentations. It provides efficient implementation of usual algorithms for hidden Markov Model computation, as well as for maximal predictive partitioning. Owing to its very large variety of criteria for computing segmentations, Sarment can handle many kinds of models. Because of object-oriented programming, the results of the segmentation are very easy tomanipulate.

  1. Analysis of the 2012 Oct 27 Haida Gwaii Aftershock Sequence

    NASA Astrophysics Data System (ADS)

    Mulder, T.; Brillon, C.; Bentkowski, W.; White, M.; Rosenberger, A.; Rogers, G. C.; Vernon, F.; Kao, H.

    2013-12-01

    The magnitude 7.7 thrust earthquake that occurred on 2012 Oct 28 offshore of Haida Gwaii (formerly the Queen Charlotte Islands), in British Columbia, Canada, produced a rich and on-going aftershock sequence. Ten months of aftershock events are determined from analyst reviewed solutions and automatic detectors and locators. For automated solutions, rotating the waveforms and running P and S wave filters (Rosenberger, 2010) over them produced phase arrivals for an improved catalogue of aftershocks compared to using a traditional signal to noise ratio detector on standard vertical and horizontal component seismograms. The automated aftershock locations from the rotated waveforms are compared to the automated locations from the standard vertical and horizontal waveforms and to analyst locations (which are generally M>2.5). The best of the automated solutions are comparable in quality to analyst solutions and much more numerous making this a viable method of processing extensive aftershock sequences. They outline a region approximately 50 km wide and 100 km long, with the aftershocks in two parallel bands. Most of the aftershocks are not on the rupture surface but are in the overlying or underlying plates. It is thought that this earthquake represents the Pacific plate thrusting underneath the North America plate with the rupture surface lying beneath the sedimentary Queen Charlotte terrace and terminating to the east in the vicinity of the Queen Charlotte fault. Due to the one-sided station distribution on land, depth trades off with distance offshore, resulting in poor depth determinations. However, using ocean bottom seismometers deployed early in the aftershock sequence, depth resolution was significantly improved. First motion focal North America plate with the rupture surface lying beneath the sedimentary Queen Charlotte terrace and terminating to the east in the vicinity of the Queen Charlotte fault.mechanisms for a portion of the aftershock sequence are compared

  2. Hydrophobic-cluster analysis of plant protein sequences. A domain homology between storage and lipid-transfer proteins.

    PubMed Central

    Henrissat, B; Popineau, Y; Kader, J C

    1988-01-01

    Hydrophobic-cluster analysis was used to characterize a conserved domain located near the C-terminal amino acid sequence of wheat (Triticum aestivum) storage proteins. This domain was transformed into a linear template for a global search for similarities in over 5200 protein sequences. In addition to proteins that had already been found to exhibit homology to wheat storage proteins, a previously unreported homology was found with non-specific lipid-transfer proteins from castor bean (Ricinus communis) and from spinach (Spinacia oleracea) leaf. Hydrophobic-cluster analysis of various members of the present protein group clearly shows a typical domain structure where (i) variable and conserved domains are located along the sequence at precise positions, (ii) the conserved domains probably reflect a common ancestor, and (iii) the unique properties of a given protein (chain cut into subunits, repetitive domains, trypsin-inhibitor active site) are associated with the variable domains. PMID:3214430

  3. Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe.

    PubMed

    Necci, Marco; Piovesan, Damiano; Tosatto, Silvio C E

    2016-12-01

    Intrinsic disorder (ID) in proteins has been extensively described for the last decade; a large-scale classification of ID in proteins is mostly missing. Here, we provide an extensive analysis of ID in the protein universe on the UniProt database derived from sequence-based predictions in MobiDB. Almost half the sequences contain an ID region of at least five residues. About 9% of proteins have a long ID region of over 20 residues which are more abundant in Eukaryotic organisms and most frequently cover less than 20% of the sequence. A small subset of about 67,000 (out of over 80 million) proteins is fully disordered and mostly found in Viruses. Most proteins have only one ID, with short ID evenly distributed along the sequence and long ID overrepresented in the center. The charged residue composition of Das and Pappu was used to classify ID proteins by structural propensities and corresponding functional enrichment. Swollen Coils seem to be used mainly as structural components and in biosynthesis in both Prokaryotes and Eukaryotes. In Bacteria, they are confined in the nucleoid and in Viruses provide DNA binding function. Coils & Hairpins seem to be specialized in ribosome binding and methylation activities. Globules & Tadpoles bind antigens in Eukaryotes but are involved in killing other organisms and cytolysis in Bacteria. The Undefined class is used by Bacteria to bind toxic substances and mediate transport and movement between and within organisms in Viruses. Fully disordered proteins behave similarly, but are enriched for glycine residues and extracellular structures.

  4. A systematic review and meta-analysis of experimental clinical evidence on initial aligning archwires and archwire sequences.

    PubMed

    Papageorgiou, S N; Konstantinidis, I; Papadopoulou, K; Jäger, A; Bourauel, C

    2014-11-01

    The aim of the study was to assess treatment effects and potential side effects of different archwires used on patients receiving orthodontic therapy. Electronic and manual unrestricted searches were conducted in 19 databases including MEDLINE, Cochrane Library, and Google Scholar until April 2012 to identify randomized controlled trials (RCTs) and quasi-RCTs. After duplicate study selection, data extraction, risk of bias assessment with the Cochrane risk of bias tool, and narrative analysis, mean differences (MDs) with confidence intervals (CIs) of similar studies were pooled using a random-effects model and evaluated with GRADE. A total of 16 RCTs were included assessing different archwire characteristics on 1108 patients. Regarding initial archwires, meta-analysis of two trials found slightly greater irregularity correction with an austenitic-active nickel-titanium (NiTi) compared with an martensitic-stabilized NiTi archwire (corresponding to MD: 1.11 mm, 95% CI: -0.38 to 2.61). Regarding archwire sequences, meta-analysis of two trials found it took patient treated with a sequence of martensitic-active copper-nickel-titanium (CuNiTi) slightly longer to reach the working archwire (MD: 0.54 months, 95% CI: -0.87 to 1.95) compared with a martensitic-stabilized NiTi sequence. However, patients treated with a sequence of martensitic-active CuNiTi archwires reported general greater pain intensity on the Likert scale 4 h and 1 day after placement of each archwire, compared with a martensitic-stabilized NiTi sequence. Although confidence in effect estimates ranged from moderate to high, meta-analyses could be performed only for limited comparisons, while inconsistency might pose a threat to some of them. At this point, there is insufficient data to make recommendations about the majority of initial archwires or for a specific archwire sequence.

  5. Molecular Identification of Two Strains of Phellinus sp. by Internal Transcribed Spacer Sequence Analysis

    PubMed Central

    2011-01-01

    Two species of cultivated Phellinus sp. were identified as P. baumii by internal transcribed spacer (ITS) sequence analysis. The fruit bodies of the examined strains were similar to those of naturally occurring strains, having a bracket-like form, yellow-to-orange color, and poroid hymenial surfaces. The DNA sequences of ITS region of both strains showed a homology of 99% with ITS1 to ITS2 sequences of P. (Inonotus) baumii strain PB0806. PMID:22783119

  6. Analysis of xylem formation in pine by cDNA sequencing

    NASA Technical Reports Server (NTRS)

    Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.; Whetten, R. W.; Davies, E. (Principal Investigator)

    1998-01-01

    Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.

  7. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments.

    PubMed

    Schwarz, Roland F; Tamuri, Asif U; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M; Schultz, Jörg; Goldman, Nick

    2016-05-05

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles).

  8. Mutational and nucleotide sequence analysis of S-adenosyl-L-homocysteine hydrolase from Rhodobacter capsulatus.

    PubMed Central

    Sganga, M W; Aksamit, R R; Cantoni, G L; Bauer, C E

    1992-01-01

    The genetic locus ahcY, encoding the enzyme S-adenosyl-L-homocysteine hydrolase (EC 3.3.1.1) from the bacterium Rhodobacter capsulatus, has been mapped by mutational analysis to within a cluster of genes involved in regulating the induction and maintenance of the bacterial photosynthetic apparatus. Sequence analysis demonstrates that ahcY encodes a 51-kDa polypeptide that displays 64% sequence identity to its human homolog. Insertion mutants in ahcY lack detectable S-adenosyl-L-homocysteine hydrolase activity and, as a consequence, S-adenosyl-L-homocysteine accumulates in the cells, resulting in a 16-fold decrease in the intracellular ratio of S-adenosyl-L-methionine to S-adenosyl-L-homocysteine as compared to wild-type cells. The ahcY disrupted strain fails to grow in minimal medium; however, growth is restored in minimal medium supplemented with methionine or homocysteine or in a complex medium, thereby indicating that the hydrolysis of S-adenosyl-L-homocysteine plays a key role in the metabolism of sulfur-containing amino acids. The ahcY mutant, when grown in supplemented medium, synthesizes significantly reduced levels of bacteriochlorophyll, indicating that modulation of the intracellular ratio of S-adenosyl-L-methionine to S-adenosyl-L-homocysteine may be an important factor in regulating bacteriochlorophyll biosynthesis. PMID:1631127

  9. Prediction of human rotavirus serotype by nucleotide sequence analysis of the VP7 protein gene.

    PubMed Central

    Green, K Y; Sears, J F; Taniguchi, K; Midthun, K; Hoshino, Y; Gorziglia, M; Nishikawa, K; Urasawa, S; Kapikian, A Z; Chanock, R M

    1988-01-01

    Human rotavirus field isolates were characterized by direct sequence analysis of the gene encoding the serotype-specific major neutralization protein (VP7). Single-stranded RNA transcripts were prepared from virus particles obtained directly from stool specimens or after two or three passages in MA-104 cells. Two regions of the gene (nucleotides 307 through 351 and 670 through 711) which had previously been shown to contain regions of sequence divergence among rotavirus serotypes were sequenced by the dideoxynucleotide method with two different synthetic oligonucleotide primers. The resulting nucleotide sequences were compared with the corresponding sequences from rotaviruses of known serotype (serotype 1, 2, 3, or 4). A total of 25 field isolates and 10 laboratory strains examined by this method exhibited marked sequence identity in both areas of the gene with the corresponding regions of 1 of the 4 reference strains. In addition, the predicted serotype from the sequence analysis correlated in each case with the serotype determined when the rotaviruses were examined by plaque reduction neutralization or reactivity with serotype-specific monoclonal antibodies. These data suggest that as a result of the high degree of sequence conservation observed among rotaviruses of the same serotype, it is possible to predict the serotype of a rotavirus isolate by direct sequence analysis of its VP7 gene. PMID:2833626

  10. Integrated next-generation sequencing analysis of whole exome and 409 cancer-related genes.

    PubMed

    Shimoda, Yuji; Nagashima, Takeshi; Urakami, Kenichi; Tanabe, Tomoe; Saito, Junko; Naruoka, Akane; Serizawa, Masakuni; Mochizuki, Tohru; Ohshima, Keiichi; Ohnami, Sumiko; Ohnami, Shumpei; Kusuhara, Masatoshi; Yamaguchi, Ken

    2016-01-01

    The use of next-generation sequencing (NGS) techniques to analyze the genomes of cancer cells has identified numerous genomic alterations, including single-base substitutions, small insertions and deletions, amplification, recombination, and epigenetic modifications. NGS contributes to the clinical management of patients as well as new discoveries that identify the mechanisms of tumorigenesis. Moreover, analysis of gene panels targeting actionable mutations enhances efforts to optimize the selection of chemotherapeutic regimens. However, whole genome sequencing takes several days and costs at least $10,000, depending on sequence coverage. Therefore, laboratories with relatively limited resources must employ a more economical approach. For this purpose, we conducted an integrated nucleotide sequence analysis of a panel of 409-cancer related genes (409-CRG) combined with whole exome sequencing (WES). Analysis of the 409-CRG panel detected low-frequency variants with high sensitivity, and WES identified moderate and high frequency somatic variants as well as germline variants.

  11. Hydrophobic cluster analysis: procedures to derive structural and functional information from 2-D-representation of protein sequences.

    PubMed

    Lemesle-Varloot, L; Henrissat, B; Gaboriaud, C; Bissery, V; Morgat, A; Mornon, J P

    1990-08-01

    Hydrophobic cluster analysis (HCA) [15] is a very efficient method to analyse and compare protein sequences. Despite its effectiveness, this method is not widely used because it relies in part on the experience and training of the user. In this article, detailed guidelines as to the use of HCA are presented and include discussions on: the definition of the hydrophobic clusters and their relationships with secondary and tertiary structures; the length of the clusters; the amino acid classification used for HCA; the HCA plot programs; and the working strategies. Various procedures for the analysis of a single sequence are presented: structural segmentation, structural domains and secondary structure evaluation. Like most sequence analysis methods, HCA is more efficient when several homologous sequences are compared. Procedures for the detection and alignment of distantly related proteins by HCA are described through several published examples along with 2 previously unreported cases: the beta-glucosidase from Ruminococcus albus is clearly related to the beta-glucosidases from Clostridum thermocellum and Hansenula anomala although they display a reverse organization of their constitutive domains; the alignment of the sequence of human GTPase activating protein with that of the Crk oncogene is presented. Finally, the pertinence of HCA in the identification of important residues for structure/function as well as in the preparation of homology modelling is discussed.

  12. DNA sequences that activate isocitrate lyase gene expression during late embryogenesis and during postgerminative growth.

    PubMed Central

    Zhang, J Z; Santes, C M; Engel, M L; Gasser, C S; Harada, J J

    1996-01-01

    We analyzed DNA sequences that regulate the expression of an isocitrate lyase gene from Brassica napus L. during late embryogenesis and during postgerminative growth to determine whether glyoxysomal function is induced by a common mechanism at different developmental stages. beta-Glucuronidase constructs were used both in transient expression assays in B. napus and in transgenic Arabidopsis thaliana to identify the segments of the isocitrate lyase 5' flanking region that influence promoter activity. DNA sequences that play the principal role in activating the promoter during post-germinative growth are located more than 1,200 bp upstream of the gene. Distinct DNA sequences that were sufficient for high-level expression during late embryogenesis but only low-level expression during postgerminative growth were also identified. Other parts of the 5' flanking region increased promoter activity both in developing seed and in seedlings. We conclude that a combination of elements is involved in regulating the isocitrate lyase gene and that distinct DNA sequences play primary roles in activating the gene in embryos and in seedlings. These findings suggest that different signals contribute to the induction of glyoxysomal function during these two developmental stages. We also showed that some of the constructs were expressed differently in transient expression assays and in transgenic plants. PMID:8934622

  13. Object relations theory and activity theory: a proposed link by way of the procedural sequence model.

    PubMed

    Ryle, A

    1991-12-01

    An account of object relations theory (ORT), represented in terms of the procedural sequence model (PSM), is compared to the ideas of Vygotsky and activity theory (AT). The two models are seen to be compatible and complementary and their combination offers a satisfactory account of human psychology, appropriate for the understanding and integration of psychotherapy.

  14. Complete Genome Sequence of a Staphylococcus epidermidis Strain with Exceptional Antimicrobial Activity

    PubMed Central

    Lassen, Simon B.; Lomholt, Hans B.

    2017-01-01

    ABSTRACT Staphylococcus epidermidis is a Gram-positive bacterium that is prevalent on human skin. The species is associated with skin health, as well as with opportunistic infections. Here, we report the complete genome sequence of S. epidermidis 14.1.R1, isolated from human skin. In bacterial interference assays, the strain showed exceptional antimicrobial activity. PMID:28280007

  15. Functional Brain Activation Differences in Stuttering Identified with a Rapid fMRI Sequence

    ERIC Educational Resources Information Center

    Loucks, Torrey; Kraft, Shelly Jo; Choo, Ai Leen; Sharma, Harish; Ambrose, Nicoline G.

    2011-01-01

    The purpose of this study was to investigate whether brain activity related to the presence of stuttering can be identified with rapid functional MRI (fMRI) sequences that involved overt and covert speech processing tasks. The long-term goal is to develop sensitive fMRI approaches with developmentally appropriate tasks to identify deviant speech…

  16. Prompt-Gamma Activation Analysis.

    PubMed

    Lindstrom, Richard M

    1993-01-01

    A permanent, full-time instrument for prompt-gamma activation analysis is nearing completion as part of the Cold Neutron Research Facility (CNRF). The design of the analytical system has been optimized for high gamma detection efficiency and low background, particularly for hydrogen. Because of the purity of the neutron beam, shielding requirements are modest and the scatter-capture background is low. As a result of a compact sample-detector geometry, the sensitivity (counting rate per gram of analyte) is a factor of four better than the existing Maryland-NIST thermal-neutron instrument at this reactor. Hydrogen backgrounds of a few micrograms have already been achieved, which promises to be of value in numerous applications where quantitative nondestructive analysis of small quantities of hydrogen in materials is necessary.

  17. Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA.

    PubMed

    Murtaza, Muhammed; Dawson, Sarah-Jane; Tsui, Dana W Y; Gale, Davina; Forshew, Tim; Piskorz, Anna M; Parkinson, Christine; Chin, Suet-Feung; Kingsbury, Zoya; Wong, Alvin S C; Marass, Francesco; Humphray, Sean; Hadfield, James; Bentley, David; Chin, Tan Min; Brenton, James D; Caldas, Carlos; Rosenfeld, Nitzan

    2013-05-02

    Cancers acquire resistance to systemic treatment as a result of clonal evolution and selection. Repeat biopsies to study genomic evolution as a result of therapy are difficult, invasive and may be confounded by intra-tumour heterogeneity. Recent studies have shown that genomic alterations in solid cancers can be characterized by massively parallel sequencing of circulating cell-free tumour DNA released from cancer cells into plasma, representing a non-invasive liquid biopsy. Here we report sequencing of cancer exomes in serial plasma samples to track genomic evolution of metastatic cancers in response to therapy. Six patients with advanced breast, ovarian and lung cancers were followed over 1-2 years. For each case, exome sequencing was performed on 2-5 plasma samples (19 in total) spanning multiple courses of treatment, at selected time points when the allele fraction of tumour mutations in plasma was high, allowing improved sensitivity. For two cases, synchronous biopsies were also analysed, confirming genome-wide representation of the tumour genome in plasma. Quantification of allele fractions in plasma identified increased representation of mutant alleles in association with emergence of therapy resistance. These included an activating mutation in PIK3CA (phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha) following treatment with paclitaxel; a truncating mutation in RB1 (retinoblastoma 1) following treatment with cisplatin; a truncating mutation in MED1 (mediator complex subunit 1) following treatment with tamoxifen and trastuzumab, and following subsequent treatment with lapatinib, a splicing mutation in GAS6 (growth arrest-specific 6) in the same patient; and a resistance-conferring mutation in EGFR (epidermal growth factor receptor; T790M) following treatment with gefitinib. These results establish proof of principle that exome-wide analysis of circulating tumour DNA could complement current invasive biopsy approaches to identify

  18. A Markovian analysis of bacterial genome sequence constraints

    PubMed Central

    Skewes, Aaron D.

    2013-01-01

    The arrangement of nucleotides within a bacterial chromosome is influenced by numerous factors. The degeneracy of the third codon within each reading frame allows some flexibility of nucleotide selection; however, the third nucleotide in the triplet of each codon is at least partly determined by the preceding two. This is most evident in organisms with a strong G + C bias, as the degenerate codon must contribute disproportionately to maintaining that bias. Therefore, a correlation exists between the first two nucleotides and the third in all open reading frames. If the arrangement of nucleotides in a bacterial chromosome is represented as a Markov process, we would expect that the correlation would be completely captured by a second-order Markov model and an increase in the order of the model (e.g., third-, fourth-…order) would not capture any additional uncertainty in the process. In this manuscript, we present the results of a comprehensive study of the Markov property that exists in the DNA sequences of 906 bacterial chromosomes. All of the 906 bacterial chromosomes studied exhibit a statistically significant Markov property that extends beyond second-order, and therefore cannot be fully explained by codon usage. An unrooted tree containing all 906 bacterial chromosomes based on their transition probability matrices of third-order shares ∼25% similarity to a tree based on sequence homologies of 16S rRNA sequences. This congruence to the 16S rRNA tree is greater than for trees based on lower-order models (e.g., second-order), and higher-order models result in diminishing improvements in congruence. A nucleotide correlation most likely exists within every bacterial chromosome that extends past three nucleotides. This correlation places significant limits on the number of nucleotide sequences that can represent probable bacterial chromosomes. Transition matrix usage is largely conserved by taxa, indicating that this property is likely inherited, however some

  19. Object-oriented data handler for sequence analysis software development.

    PubMed

    Ptitsyn, A A; Grigorovich, D A

    1995-12-01

    We report an object-oriented data handler and supplementary tools for the development of molecular genetics application software for various sequence analyses. Our data handler has a flexible and expandable format that supports the most common data types for molecular genetic software. New data types can be constructed in an object-oriented manner from the basic units. The data handler includes an object library, a format-converting program and a viewer that can visualize simultaneously the data contained in several files to construct a general picture from separate data. This software has been implemented on an IBM PC-compatible personal computer.

  20. Pentaprobe: a comprehensive sequence for the one-step detection of DNA-binding activities.

    PubMed

    Kwan, Ann H Y; Czolij, Robert; Mackay, Joel P; Crossley, Merlin

    2003-10-15

    The rapid increase in the number of novel proteins identified in genome projects necessitates simple and rapid methods for assigning function. We describe a strategy for determining whether novel proteins possess typical sequence-specific DNA-binding activity. Many proteins bind recognition sequences of 5 bp or less. Given that there are 4(5) possible 5 bp sites, one might expect the length of sequence required to cover all possibilities would be 4(5) x 5 or 5120 nt. But by allowing overlaps, utilising both strands and using a computer algorithm to generate the minimum sequence, we find the length required is only 516 base pairs. We generated this sequence as six overlapping double-stranded oligonucleotides, termed pentaprobe, and used it in gel retardation experiments to assess DNA binding by both known and putative DNA-binding proteins from several protein families. We have confirmed binding by the zinc finger proteins BKLF, Eos and Pegasus, the Ets domain protein PU.1 and the treble clef N- and C-terminal fingers of GATA-1. We also showed that the N-terminal zinc finger domain of FOG-1 does not behave as a typical DNA-binding domain. Our results suggest that pentaprobe, and related sequences such as hexaprobe, represent useful tools for probing protein function.

  1. Analysis of Comparative Sequence and Genomic Data to Verify Phylogenetic Relationship and Explore a New Subfamily of Bacterial Lipases

    PubMed Central

    Salleh, Abu Bakar; Basri, Mahiran

    2016-01-01

    Thermostable and organic solvent-tolerant enzymes have significant potential in a wide range of synthetic reactions in industry due to their inherent stability at high temperatures and their ability to endure harsh organic solvents. In this study, a novel gene encoding a true lipase was isolated by construction of a genomic DNA library of thermophilic Aneurinibacillus thermoaerophilus strain HZ into Escherichia coli plasmid vector. Sequence analysis revealed that HZ lipase had 62% identity to putative lipase from Bacillus pseudomycoides. The closely characterized lipases to the HZ lipase gene are from thermostable Bacillus and Geobacillus lipases belonging to the subfamily I.5 with ≤ 57% identity. The amino acid sequence analysis of HZ lipase determined a conserved pentapeptide containing the active serine, GHSMG and a Ca2+-binding motif, GCYGSD in the enzyme. Protein structure modeling showed that HZ lipase consisted of an α/β hydrolase fold and a lid domain. Protein sequence alignment, conserved regions analysis, clustal distance matrix and amino acid composition illustrated differences between HZ lipase and other thermostable lipases. Phylogenetic analysis revealed that this lipase represented a new subfamily of family I of bacterial true lipases, classified as family I.9. The HZ lipase was expressed under promoter Plac using IPTG and was characterized. The recombinant enzyme showed optimal activity at 65°C and retained ≥ 97% activity after incubation at 50°C for 1h. The HZ lipase was stable in various polar and non-polar organic solvents. PMID:26934700

  2. Identification of Medically Important Yeast Species by Sequence Analysis of the Internal Transcribed Spacer Regions

    PubMed Central

    Leaw, Shiang Ning; Chang, Hsien Chang; Sun, Hsiao Fang; Barton, Richard; Bouchara, Jean-Philippe; Chang, Tsung Chain

    2006-01-01

    Infections caused by yeasts have increased in previous decades due primarily to the increasing population of immunocompromised patients. In addition, infections caused by less common species such as Pichia, Rhodotorula, Trichosporon, and Saccharomyces spp. have been widely reported. This study extensively evaluated the feasibility of sequence analysis of the rRNA gene internal transcribed spacer (ITS) regions for the identification of yeasts of clinical relevance. Both the ITS1 and ITS2 regions of 373 strains (86 species), including 299 reference strains and 74 clinical isolates, were amplified by PCR and sequenced. The sequences were compared to reference data available at the GenBank database by using BLAST (basic local alignment search tool) to determine if species identification was possible by ITS sequencing. Since the GenBank database currently lacks ITS sequence entries for some yeasts, the ITS sequences of type (or reference) strains of 15 species were submitted to GenBank to facilitate identification of these species. Strains producing discrepant identifications between the conventional methods and ITS sequence analysis were further analyzed by sequencing of the D1-D2 domain of the large-subunit rRNA gene for species clarification. The rates of correct identification by ITS1 and ITS2 sequence analysis were 96.8% (361/373) and 99.7% (372/373), respectively. Of the 373 strains tested, only 1 strain (Rhodotorula glutinis BCRC 20576) could not be identified by ITS2 sequence analysis. In conclusion, identification of medically important yeasts by ITS sequencing, especially using the ITS2 region, is reliable and can be used as an accurate alternative to conventional identification methods. PMID:16517841

  3. Substrate specificity and sequence-dependent activity of the Saccharomyces cerevisiae 3-methyladenine DNA glycosylase (Mag).

    PubMed

    Lingaraju, Gondichatnahalli M; Kartalou, Maria; Meira, Lisiane B; Samson, Leona D

    2008-06-01

    DNA glycosylases initiate base excision repair by first binding, then excising aberrant DNA bases. Saccharomyces cerevisiae encodes a 3-methyladenine (3MeA) DNA glycosylase, Mag, that recognizes 3MeA and various other DNA lesions including 1,N6-ethenoadenine (epsilon A), hypoxanthine (Hx) and abasic (AP) sites. In the present study, we explore the relative substrate specificity of Mag for these lesions and in addition, show that Mag also recognizes cisplatin cross-linked adducts, but does not catalyze their excision. Through competition binding and activity studies, we show that in the context of a random DNA sequence Mag binds epsilon A and AP-sites the most tightly, followed by the cross-linked 1,2-d(ApG) cisplatin adduct. While epsilon A binding and excision by Mag was robust in this sequence context, binding and excision of Hx was extremely poor. We further studied the recognition of epsilon A and Hx by Mag, when these lesions are present at different positions within A:T and G:C tracts. Overall, epsilon A was slightly less well excised from each position within the A:T and G:C tracts compared to excision from the random sequence, whereas Hx excision was greatly increased in these sequence contexts (by up to 7-fold) compared to the random sequence. However, given most sequence contexts, Mag had a clear preference for epsilon A relative to Hx, except in the TTXTT (X=epsilon A or Hx) sequence context from which Mag removed both lesions with almost equal efficiency. We discuss how DNA sequence context affects base excision by various 3MeA DNA glycosylases.

  4. Sequencing and analysis of the gene-rich space of cowpea

    PubMed Central

    Timko, Michael P; Rushton, Paul J; Laudeman, Thomas W; Bokowiec, Marta T; Chipumuro, Edmond; Cheung, Foo; Town, Christopher D; Chen, Xianfeng

    2008-01-01

    Background Cowpea, Vigna unguiculata (L.) Walp., is one of the most important food and forage legumes in the semi-arid tropics because of its drought tolerance and ability to grow on poor quality soils. Approximately 80% of cowpea production takes place in the dry savannahs of tropical West and Central Africa, mostly by poor subsistence farmers. Despite its economic and social importance in the developing world, cowpea remains to a large extent an underexploited crop. Among the major goals of cowpea breeding and improvement programs is the stacking of desirable agronomic traits, such as disease and pest resistance and response to abiotic stresses. Implementation of marker-assisted selection and breeding programs is severely limited by a paucity of trait-linked markers and a general lack of information on gene structure and organization. With a nuclear genome size estimated at ~620 Mb, the cowpea genome is an ideal target for reduced representation sequencing. Results We report here the sequencing and analysis of the gene-rich, hypomethylated portion of the cowpea genome selectively cloned by methylation filtration (MF) technology. Over 250,000 gene-space sequence reads (GSRs) with an average length of 610 bp were generated, yielding ~160 Mb of sequence information. The GSRs were assembled, annotated by BLAST homology searches of four public protein annotation databases and four plant proteomes (A. thaliana, M. truncatula, O. sativa, and P. trichocarpa), and analyzed using various domain and gene modeling tools. A total of 41,260 GSR assemblies and singletons were annotated, of which 19,786 have unique GenBank accession numbers. Within the GSR dataset, 29% of the sequences were annotated using the Arabidopsis Gene Ontology (GO) with the largest categories of assigned function being catalytic activity and metabolic processes, groups that include the majority of cellular enzymes and components of amino acid, carbohydrate and lipid metabolism. A total of 5,888 GSRs had

  5. Factoring local sequence composition in motif significance analysis.

    PubMed

    Ng, Patrick; Keich, Uri

    2008-01-01

    We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.

  6. Analysis of sequencing and scheduling methods for arrival traffic

    NASA Technical Reports Server (NTRS)

    Neuman, Frank; Erzberger, Heinz

    1990-01-01

    The air traffic control subsystem that performs scheduling is discussed. The function of the scheduling algorithms is to plan automatically the most efficient landing order and to assign optimally spaced landing times to all arrivals. Several important scheduling algorithms are described and the statistical performance of the scheduling algorithms is examined. Scheduling brings order to an arrival sequence for aircraft. First-come-first-served scheduling (FCFS) establishes a fair order, based on estimated times of arrival, and determines proper separations. Because of the randomness of the traffic, gaps will remain in the scheduled sequence of aircraft. These gaps are filled, or partially filled, by time-advancing the leading aircraft after a gap while still preserving the FCFS order. Tightly scheduled groups of aircraft remain with a mix of heavy and large aircraft. Separation requirements differ for different types of aircraft trailing each other. Advantage is taken of this fact through mild reordering of the traffic, thus shortening the groups and reducing average delays. Actual delays for different samples with the same statistical parameters vary widely, especially for heavy traffic.

  7. Accident sequence precursor analysis level 2/3 model development

    SciTech Connect

    Lui, C.H.; Galyean, W.J.; Brownson, D.A.

    1997-02-01

    The US Nuclear Regulatory Commission`s Accident Sequence Precursor (ASP) program currently uses simple Level 1 models to assess the conditional core damage probability for operational events occurring in commercial nuclear power plants (NPP). Since not all accident sequences leading to core damage will result in the same radiological consequences, it is necessary to develop simple Level 2/3 models that can be used to analyze the response of the NPP containment structure in the context of a core damage accident, estimate the magnitude of the resulting radioactive releases to the environment, and calculate the consequences associated with these releases. The simple Level 2/3 model development work was initiated in 1995, and several prototype models have been completed. Once developed, these simple Level 2/3 models are linked to the simple Level 1 models to provide risk perspectives for operational events. This paper describes the methods implemented for the development of these simple Level 2/3 ASP models, and the linkage process to the existing Level 1 models.

  8. Analysis of the dermatophyte Trichophyton rubrum expressed sequence tags

    PubMed Central

    Wang, Lingling; Ma, Li; Leng, Wenchuan; Liu, Tao; Yu, Lu; Yang, Jian; Yang, Li; Zhang, Wenliang; Zhang, Qian; Dong, Jie; Xue, Ying; Zhu, Yafang; Xu, Xingye; Wan, Zhe; Ding, Guohui; Yu, Fudong; Tu, Kang; Li, Yixue; Li, Ruoyu; Shen, Yan; Jin, Qi

    2006-01-01

    Background Dermatophytes are the primary causative agent of dermatophytoses, a disease that affects billions of individuals worldwide. Trichophyton rubrum is the most common of the superficial fungi. Although T. rubrum is a recognized pathogen for humans, little is known about how its transcriptional pattern is related to development of the fungus and establishment of disease. It is therefore necessary to identify genes whose expression is relevant to growth, metabolism and virulence of T. rubrum. Results We generated 10 cDNA libraries covering nearly the entire growth phase and used them to isolate 11,085 unique expressed sequence tags (ESTs), including 3,816 contigs and 7,269 singletons. Comparisons with the GenBank non-redundant (NR) protein database revealed putative functions or matched homologs from other organisms for 7,764 (70%) of the ESTs. The remaining 3,321 (30%) of ESTs were only weakly similar or not similar to known sequences, suggesting that these ESTs represent novel genes. Conclusion The present data provide a comprehensive view of fungal physiological processes including metabolism, sexual and asexual growth cycles, signal transduction and pathogenic mechanisms. PMID:17032460

  9. Identification and sequence analysis of potyviruses infecting crops in Vietnam.

    PubMed

    Ha, C; Revill, P; Harding, R M; Vu, M; Dale, J L

    2008-01-01

    Fifty-two virus isolates from 13 distinct potyvirus species infecting crops in Vietnam were identified and the 3' region of each genome was sequenced. The viruses were: bean common mosaic virus (BCMV), potato virus Y (PVY), sugarcane mosaic virus (SCMV), sorghum mosaic virus (SrMV), chilli veinal mottle virus (ChiVMV), zucchini yellow mosaic virus (ZYMV), leek yellow stripe virus (LYMV), shallot yellow stripe virus (SYSV), onion yellow dwarf virus (OYDV), turnip mosaic virus (TuMV), dasheen mosaic virus (DsMV), sweet potato feathery mottle virus (SPFMV) and a novel potyvirus infecting chilli, tentatively named chilli ringspot virus (ChiRSV). With the exception of BCMV and PVY, this is first report of these viruses in Vietnam. Further, rabbit bell (Crotalaria anagyroides) and typhonia (Typhonium trilobatum) were identified as new natural hosts of the peanut stunt virus (PStV) strain of BCMV and of DsMV, respectively. Sequence and phylogenetic analyses of the entire CP-coding region revealed considerable variability in BCMV, SCMV, PVY, ZYMV and DsMV.

  10. Comparative Analysis of Single-Cell RNA Sequencing Methods.

    PubMed

    Ziegenhain, Christoph; Vieth, Beate; Parekh, Swati; Reinius, Björn; Guillaumet-Adkins, Amy; Smets, Martha; Leonhardt, Heinrich; Heyn, Holger; Hellmann, Ines; Enard, Wolfgang

    2017-02-16

    Single-cell RNA sequencing (scRNA-seq) offers new possibilities to address biological and medical questions. However, systematic comparisons of the performance of diverse scRNA-seq protocols are lacking. We generated data from 583 mouse embryonic stem cells to evaluate six prominent scRNA-seq methods: CEL-seq2, Drop-seq, MARS-seq, SCRB-seq, Smart-seq, and Smart-seq2. While Smart-seq2 detected the most genes per cell and across cells, CEL-seq2, Drop-seq, MARS-seq, and SCRB-seq quantified mRNA levels with less amplification noise due to the use of unique molecular identifiers (UMIs). Power simulations at different sequencing depths showed that Drop-seq is more cost-efficient for transcriptome quantification of large numbers of cells, while MARS-seq, SCRB-seq, and Smart-seq2 are more efficient when analyzing fewer cells. Our quantitative comparison offers the basis for an informed choice among six prominent scRNA-seq methods, and it provides a framework for benchmarking further improvements of scRNA-seq protocols.

  11. Expressed sequence tags (ESTs) analysis of Acanthamoeba healyi

    PubMed Central

    Kong, Hyun-Hee; Hwang, Mee-Yeul; Kim, Hyo-Kyung

    2001-01-01

    Randomly selected 435 clones from Acanthamoeba healyi cDNA library were sequenced and a total of 387 expressed sequence tags (ESTs) had been generated. Based on the results of BLAST search, 130 clones (34.4%) were identified as the genes enconding surface proteins, enzymes for DNA, energy production or other metabolism, kinases and phosphatases, protease, proteins for signal transduction, structural and cytoskeletal proteins, cell cycle related proteins, transcription factors, transcription and translational machineries, and transporter proteins. Most of the genes (88.5%) are newly identified in the genus Acanthamoeba. Although 15 clones matched the genes of Acanthamoeba located in the public databases, twelve clones were actin gene which was the most frequently expressed gene in this study. These ESTs of Acanthamoeba would give valuable information to study the organism as a model system for biological investigations such as cytoskeleton or cell movement, signal transduction, transcriptional and translational regulations. These results would also provide clues to elucidate factors for pathogenesis in human granulomatous amoebic encephalitis or keratitis by Acanthamoeba. PMID:11441502

  12. Earthquake Sequences Identification, from Analysis of Earthquake Occurrence Time Series Considering Multiple Semi-Periodic Sequences and Extraneous Events. Parkfield et al

    NASA Astrophysics Data System (ADS)

    Nava Pichardo, F. A.; Quinteros, C. B.; Glowacka, E.; Frez, J.

    2013-05-01

    We present a new method, based on the analytic Fourier transform, to identify semi-periodic sequences in regional large earthquake occurrence times series, which allows for the presence of multiple semi-periodic sequences and/or events not belonging to any identifiable sequence in the time series. The method yields estimates of the departure from periodicity of an observed sequence, and of the probability that the sequence is not due to chance. These estimates are used to make and to evaluate forecasts of future events belonging to each sequence. The method is surprisingly capable of correctly identifying sequences, unidentifiable by eye, in complicated time series. An example of application of the method to real seismicity data is analysis of the Parkfield event series, which correctly aftcasts the September 2004 earthquake. Other examples of sequence identification in earthquakes from central Japan and Venezuela are also shown.

  13. The role of integrated databases in microbial genome sequence analysis and metabolic reconstruction

    SciTech Connect

    Gaasterland, T., Maltsev, N., Overbeek, R.

    1997-02-01

    This paper provides an overview of the PUMA system which provides access to data about metabolic pathways, enzymes, compounds, organisms, encoded activity, and assay condition information for enzymes in particular organisms and multiple sequence alignments.

  14. IMSA: integrated metagenomic sequence analysis for identification of exogenous reads in a host genomic background.

    PubMed

    Dimon, Michelle T; Wood, Henry M; Rabbitts, Pamela H; Arron, Sarah T

    2013-01-01

    Metagenomics, the study of microbial genomes within diverse environments, is a rapidly developing field. The identification of microbial sequences within a host organism enables the study of human intestinal, respiratory, and skin microbiota, and has allowed the identification of novel viruses in diseases such as Merkel cell carcinoma. There are few publicly available tools for metagenomic high throughput sequence analysis. We present Integrated Metagenomic Sequence Analysis (IMSA), a flexible, fast, and robust computational analysis pipeline that is available for public use. IMSA takes input sequence from high throughput datasets and uses a user-defined host database to filter out host sequence. IMSA then aligns the filtered reads to a user-defined universal database to characterize exogenous reads within the host background. IMSA assigns a score to each node of the taxonomy based on read frequency, and can output this as a taxonomy report suitable for cluster analysis or as a taxonomy map (TaxMap). IMSA also outputs the specific sequence reads assigned to a taxon of interest for downstream analysis. We demonstrate the use of IMSA to detect pathogens and normal flora within sequence data from a primary human cervical cancer carrying HPV16, a primary human cutaneous squamous cell carcinoma carrying HPV 16, the CaSki cell line carrying HPV16, and the HeLa cell line carrying HPV18.

  15. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    ScienceCinema

    FitzGerald, Michael [Broad Institute

    2016-07-12

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  16. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    SciTech Connect

    FitzGerald, Michael

    2012-06-01

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  17. Analysis of common k-mers for whole genome sequences using SSB-tree.

    PubMed

    Choi, Jeong-Hyeon; Cho, Hwan-Gue

    2002-01-01

    As sequenced genomes become larger and sequencing process becomes faster, there is a need to develop a tool to analyze sequences in the whole genomic scale. However, on-memory algorithms such as suffix tree and suffix array are not applicable to the analysis of whole genome sequence set, since the size of individual whole genome ranges from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce a workbench called SequeX for the analysis and visualization of whole genome sequences using SSB-tree (Static SB-tree). It consists of two parts: the analysis query subsystem and the visualization subsystem. The query subsystem supports various transactions such as pattern matching, k-occurrence, and k-mer analysis. The visualization subsystem helps biologists to easily understand whole genome structure and feature by sequence viewer, annotation viewer, CGR (Chaos Game Representation) viewer, and k-mer viewer. The system also supports a user-friendly programming interface based on Java script for batch processing and the extension for a specific purpose of a user. SequeX can be used to identify conserved genes or sequences by the analysis of the common k-mers and annotation. We analyze the common k-mer for 72 microbial genomes announced by Entrez, and find an interesting biological fact that the longest common k-mer for 72 sequences is 11-mer, and only 11 such sequences exist. Finally we note that many common k-mers occur in conserved region such as CDS, rRNA, and tRNA.

  18. Data Analysis of Transcriptomic Sequences and qPCR Validations for Microbial Communities during Algal Blooms

    EPA Pesticide Factsheets

    A training opportunity is open to a highly microbial-research-motivated student to conduct sequence analysis, explore novel genes and metabolic pathways, validate resultant findings using qPCR/RT-qPCR and summarize the findings

  19. Signature Peptide-Enabled Metagenomics (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    McMahon, Ben [LANL

    2016-07-12

    Ben McMahon of Los Alamos National Laboratory (LANL) presents "Signature Peptide-Enabled Metagenomics" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  20. Identification of the sequences recognized by phage phi 29 transcriptional activator: possible interaction between the activator and the RNA polymerase.

    PubMed

    Nuez, B; Rojo, F; Barthelemy, I; Salas, M

    1991-05-11

    Expression of Bacillus subtilis phage phi 29 late genes requires the transcriptional activator protein p4. This activator binds to a region of the late A3 promoter spanning nucleotides -56 to -102 relative to the transcription start site, generating a strong bending Tin the DNA. In this work the target sequences recognized by protein p4 in the phage phi 29 late A3 promoter have been characterized. The binding of protein p4 to derivatives of the late A3 promoter harbouring deletions in the protein p4 binding site has been studied. When protein p4 recognition sequences were altered, the activator could only bind to the promoter in the presence of RNA polymerase. This strong cooperativity in the binding of protein p4 and RNA polymerase to the promoter suggests the presence of direct protein-protein contacts between them.

  1. Regulation of vitellogenin gene expression in transgenic Caenorhabditis elegans: short sequences required for activation of the vit-2 promoter.

    PubMed Central

    MacMorris, M; Broverman, S; Greenspoon, S; Lea, K; Madej, C; Blumenthal, T; Spieth, J

    1992-01-01

    The Caenorhabditis elegans vitellogenin genes are subject to sex-, stage-, and tissue-specific regulation: they are expressed solely in the adult hermaphrodite intestine. Comparative sequence analysis of the DNA immediately upstream of these genes revealed the presence of two repeated heptameric elements, vit promoter element 1 (VPE1) and VPE2. VPE1 has the consensus sequence TGTCAAT, while VPE2, CTGATAA, shares the recognition sequence of the GATA family of transcription factors. We report here a functional analysis of the VPEs within the 5'-flanking region of the vit-2 gene using stable transgenic lines. The 247 upstream bp containing the VPEs was sufficient for high-level, regulated expression. Furthermore, none of the four deletion mutations or eight point mutations tested resulted in expression of the reporter gene in larvae, males, or inappropriate hermaphrodite tissues. Mutation of the VPE1 closest to the TATA box inactivated the promoter, in spite of the fact that four additional close matches to the VPE1 consensus sequence are present within the 5'-flanking 200 bp. Each of these upstream VPE1-like sequences could be mutated without loss of high-level transgene expression, suggesting that if these VPE1 sequences play a role in regulating vit-2, their effects are more subtle. A site-directed mutation in the overlapping VPE1 and VPE2 at -98 was sufficient to inactivate the promoter, indicating that one or both of these VPEs must be present for activation of vit-2 transcription. Similarly, a small perturbation of the VPE2 at -150 resulted in reduction of fp155 expression, while a more extensive mutation in this element eliminated expression. On the other hand, deletion of this VPE2 and all upstream DNA still permitted correctly regulated expression, although at a very low level, suggesting that this VPE2 performs an important role in activation of vit-2 expression but may not be absolutely required. The results, taken together, demonstrate that both VPE1 and

  2. Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses.

    PubMed

    Yanagisawa, Hironobu; Tomita, Reiko; Katsu, Koji; Uehara, Takuya; Atsumi, Go; Tateda, Chika; Kobayashi, Kappei; Sekine, Ken-Taro

    2016-03-07

    The presence of high molecular weight double-stranded RNA (dsRNA) within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing) analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS) would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV), a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt) that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT)-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as "DECS-C," is a powerful method for detecting novel plant viruses.

  3. Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

    PubMed Central

    Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

    2003-01-01

    To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979

  4. Comprehensive Primer Design for Analysis of Population Genetics in Non-Sequenced Organisms

    PubMed Central

    Tezuka, Ayumi; Matsushima, Noe; Nemoto, Yoriko; Akashi, Hiroshi D.; Kawata, Masakado; Makino, Takashi

    2012-01-01

    Nuclear sequence markers are useful tool for the study of the history of populations and adaptation. However, it is not easy to obtain multiple nuclear primers for organisms with poor or no genomic sequence information. Here we used the genomes of organisms that have been fully sequenced to design comprehensive sets of primers to amplify polymorphic genomic fragments of multiple nuclear genes in non-sequenced organisms. First, we identified a large number of candidate polymorphic regions that were flanked on each side by conserved regions in the reference genomes. We then designed primers based on these conserved sequences and examined whether the primers could be used to amplify sequences in target species, montane brown frog (Rana ornativentris), anole lizard (Anolis sagrei), guppy (Poecilia reticulata), and fruit fly (Drosophila melanogaster), for population genetic analysis. We successfully obtained polymorphic markers for all target species studied. In addition, we found that sequence identities of the regions between the primer sites in the reference genomes affected the experimental success of DNA amplification and identification of polymorphic loci in the target genomes, and that exonic primers had a higher success rate than intronic primers in amplifying readable sequences. We conclude that this comparative genomic approach is a time- and cost-effective way to obtain polymorphic markers for non-sequenced organisms, and that it will contribute to the further development of evolutionary ecology and population genetics for non-sequenced organisms, aiding in the understanding of the genetic basis of adaptation. PMID:22393396

  5. An analysis of amino acid sequences surrounding archaeal glycoprotein sequons.

    PubMed

    Abu-Qarn, Mehtap; Eichler, Jerry

    2007-05-01

    Despite having provided the first example of a prokaryal glycoprotein, little is known of the rules governing the N-glycosylation process in Archaea. As in Eukarya and Bacteria, archaeal N-glycosylation takes place at the Asn residues of Asn-X-Ser/Thr sequons. Since not all sequons are utilized, it is clear that other factors, including the context in which a sequon exists, affect glycosylation efficiency. As yet, the contribution to N-glycosylation made by sequon-bordering residues and other related factors in Archaea remains unaddressed. In the following, the surroundings of Asn residues confirmed by experiment as modified were analyzed in an attempt to define sequence rules and requirements for archaeal N-glycosylation.

  6. The Sequence and Analysis of Duplication Rich Human Chromosome 16

    DOE R&D Accomplishments Database

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-01-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  7. The sequence and analysis of duplication rich human chromosome 16

    SciTech Connect

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-08-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  8. Cloud-scale RNA-sequencing differential expression analysis with Myrna

    PubMed Central

    2010-01-01

    As sequencing throughput approaches dozens of gigabases per day, there is a growing need for efficient software for analysis of transcriptome sequencing (RNA-Seq) data. Myrna is a cloud-computing pipeline for calculating differential gene expression in large RNA-Seq datasets. We apply Myrna to the analysis of publicly available data sets and assess the goodness of fit of standard statistical models. Myrna is available from http://bowtie-bio.sf.net/myrna. PMID:20701754

  9. Sequence analysis of two cosmids from Schizosaccharomyces pombe chromosome III.

    PubMed

    Lucas, M; Gwillam, R; Lepingle, A; Lyne, M; Rajandream, M A; Rochet, M; Wood, V; Gaillardin, C

    2000-12-01

    We report the complete sequence of two cosmids, SPCC895 (38457 bp insert, EMBL Accession No. AL035247) and SPCC1322 (42068 bp insert, EMBL Accession No. AL035259), localized on chromosome III of the Schizosaccharomyces pombe genome. Fourteen Coding DNA sequences (CDSs) were identified in SPCC895 and 17 in SPCC1322. Two known genes were found in each cosmid: map2 and gms1 on SPCC895, encoding the mating type P-factor precursor and an UDP-galactose transporter, respectively, and bub1 and ade6 in SPCC1322, encoding a protein kinase and a phosphoribosylaminoimidazole carboxylase, respectively. The fission yeast K RNA gene has been localized to SPCC895. Three ribosomal proteins have been predicted among these two cosmids. Nine CDSs similar to known proteins were found on SPCC895, and seven on SPCC1322. They include putative genes for an uridylate kinase, a proteasome catalytic component, an ion transporter, a checkpoint protein, a translation initiation protein, a SNARE complex protein, a protein involved in cytoskeletal organization, a spindle pole body-associating protein, pre-mRNA splicing factor RNA helicase, a 3'-5' exonuclease for RNA 3' ss-tail, an UTP-glucose-1-phosphate uridylyltransferase, a leukotriene A(4) hydrolase, a member of the RanBP7-importin beta-Cse1p superfamily, a Ca(++)-calmodulin-dependent serine/threonine protein kinase and a prohibitin antiproliferative protein. One CDS is predicted to be an integral membrane protein. One CDS from SPCC895 is similar to a CDS of unknown function from Saccharomyces cerevisiae and three from SPCC1322 are similar to CDSs of unknown function from Candida albicans, S. cerevisiae and Sz. pombe, respectively. Finally, one CDS of SPCC895 and three of SPCC1322 correspond to orphan genes.

  10. Genome sequence and comparative analysis of Avibacterium paragallinarum

    PubMed Central

    Requena, David; Chumbe, Ana; Torres, Michael; Alzamora, Ofelia; Ramirez, Manuel; Valdivia-Olarte, Hugo; Gutierrez, Andres Hazaet; Izquierdo-Lara, Ray; Saravia, Luis Enrique; Zavaleta, Milagros; Tataje-Lavanda, Luis; Best, Ivan; Fernández-Sánchez, Manolo; Icochea, Eliana; Zimic, Mirko; Fernández-Díaz, Manolo

    2013-01-01

    Background: Avibacterium paragallinarum, the causative agent of infectious coryza, is a highly contagious respiratory acute disease of poultry, which affects commercial chickens, laying hens and broilers worldwide. Methodology: In this study, we performed the whole genome sequencing, assembly and annotation of a Peruvian isolate of A. paragallinarum. Genome was sequenced in a 454 GS FLX Titanium system. De novo assembly was performed and annotation was completed with GS De Novo Assembler 2.6 using the H. influenzae str. F3031 gene model. Manual curation of the genome was performed with Artemis. Putative function of genes was predicted with Blast2GO. Virulence factors were identified by comparison with the Virulence Factor Database. Results: The genome obtained has a length of 2.47 Mb with 40.66% of GC content. Seventy five large contigs (>500 nt) were obtained, which comprised 1,204 predicted genes. All the contigs are available in Genbank [GenBank: PRJNA64665]. A total of 103 virulence factors, reported in the Virulence Factor Database, were found in A. paragallinarum. Forty four of them are present in 7 species of Haemophilus, which are related with pathogenesis, virulence and host immune system evasion. A tetracycline-resistance associated transposon (Tn10), was found in A. paragallinarum, possibly acting as a defense mechanism. Discussion and conclusion: The availability of A. paragallinarum genome represents an important source of information for the development of diagnostic tests, genotyping, and novel antigens for potential vaccines against infectious coryza. Identification of virulence factors contributes to better understanding the pathogenesis, and planning efforts for prevention and control of the disease. PMID:23861570

  11. Gene activation properties of a mouse DNA sequence isolated by expression selection.

    PubMed Central

    von Hoyningen-Huene, V; Norbury, C; Griffiths, M; Fried, M

    1986-01-01

    The MES-1 element was previously isolated from restricted total mouse cellular DNA by "expression selection"--the ability to reactivate expression of a test gene devoid of its 5' enhancer sequences. Mes-1 has been tested in long-term transformation and short-term CAT expression assays. In both assays MES-1 is active independent of orientation and at a distance when placed 5' to the test gene. The element is active with heterologous promoters and functions efficiently in both rat and mouse cells. MES-1 activates expression by increasing transcription from the test gene's own start (cap) site. Thus the expression selection technique can be used for the isolation of DNA sequences with enhancer-like properties from total cellular DNA. Images PMID:3016657

  12. Sequences flanking the core-binding site modulate glucocorticoid receptor structure and activity

    PubMed Central

    Schöne, Stefanie; Jurk, Marcel; Helabad, Mahdi Bagherpoor; Dror, Iris; Lebars, Isabelle; Kieffer, Bruno; Imhof, Petra; Rohs, Remo; Vingron, Martin; Thomas-Chollier, Morgane; Meijsing, Sebastiaan H.

    2016-01-01

    The glucocorticoid receptor (GR) binds as a homodimer to genomic response elements, which have particular sequence and shape characteristics. Here we show that the nucleotides directly flanking the core-binding site, differ depending on the strength of GR-dependent activation of nearby genes. Our study indicates that these flanking nucleotides change the three-dimensional structure of the DNA-binding site, the DNA-binding domain of GR and the quaternary structure of the dimeric complex. Functional studies in a defined genomic context show that sequence-induced changes in GR activity cannot be explained by differences in GR occupancy. Rather, mutating the dimerization interface mitigates DNA-induced changes in both activity and structure, arguing for a role of DNA-induced structural changes in modulating GR activity. Together, our study shows that DNA sequence identity of genomic binding sites modulates GR activity downstream of binding, which may play a role in achieving regulatory specificity towards individual target genes. PMID:27581526

  13. Sequence of the lid affects activity and specificity of Candida rugosa lipase isoenzymes.

    PubMed

    Brocca, Stefania; Secundo, Francesco; Ossola, Mattia; Alberghina, Lilia; Carrea, Giacomo; Lotti, Marina

    2003-10-01

    The fungus Candida rugosa produces multiple lipase isoenzymes (CRLs) with distinct differences in substrate specificity, in particular with regard to selectivity toward the fatty acyl chain length. Moreover, isoform CRL3 displays high activity towards cholesterol esters. Lipase isoenzymes share over 80% sequence identity but diverge in the sequence of the lid, a mobile loop that modulates access to the active site. In the active enzyme conformation, the open lid participates in the substrate-binding site and contributes to substrate recognition. To address the role of the lid in CRL activity and specificity, we substituted the lid sequences from isoenzymes CRL3 and CRL4 in recombinant rCRL1, thus obtaining enzymes differing only in this stretch of residues. Swapping the CRL3 lid was sufficient to confer to CRL1 cholesterol esterase activity. On the other hand, a specific shift in the chain-length specificity was not observed. Chimeric proteins displayed different sensitivity to detergents in the reaction medium.

  14. A convolutional code-based sequence analysis model and its application.

    PubMed

    Liu, Xiao; Geng, Xiaoli

    2013-04-16

    A new approach for encoding DNA sequences as input for DNA sequence analysis is proposed using the error correction coding theory of communication engineering. The encoder was designed as a convolutional code model whose generator matrix is designed based on the degeneracy of codons, with a codon treated in the model as an informational unit. The utility of the proposed model was demonstrated through the analysis of twelve prokaryote and nine eukaryote DNA sequences having different GC contents. Distinct differences in code distances were observed near the initiation and termination sites in the open reading frame, which provided a well-regulated characterization of the DNA sequences. Clearly distinguished period-3 features appeared in the coding regions, and the characteristic average code distances of the analyzed sequences were approximately proportional to their GC contents, particularly in the selected prokaryotic organisms, presenting the potential utility as an added taxonomic characteristic for use in studying the relationships of living organisms.

  15. Sequence and phylogenetic analysis of M-class genome segments of novel duck reovirus NP03

    PubMed Central

    Wang, Shao; Chen, Shilong; Cheng, Xiaoxia; Chen, Shaoying; Lin, FengQiang; Jiang, Bing; Zhu, Xiaoli; Li, Zhaolong; Wang, Jinxiang

    2015-01-01

    We report the sequence and phylogenetic analysis of the entire M1, M2, and M3 genome segments of the novel duck reovirus (NDRV) NP03. Alignment between the newly determined nucleotide sequences as well as their deduced amino acid sequences and the published sequences of avian reovirus (ARV) was carried out with DNASTAR software. Sequence comparison showed that the M2 gene had the most variability among the M-class genes of DRV. Phylogenetic analysis of the M-class genes of ARV strains revealed different lineages and clusters within DRVs. The 5 NDRV strains used in this study fall into a well-supported lineage that includes chicken ARV strains, whereas Muscovy DRV (MDRV) strains are separate from NDRV strains and form a distinct genetic lineage in the M2 gene tree. However, the MDRV and NDRV strains are closely related and located in a common lineage in the M1 and M3 gene trees, respectively. PMID:25852231

  16. Household Clustering of Escherichia coli Sequence Type 131 Clinical and Fecal Isolates According to Whole Genome Sequence Analysis

    PubMed Central

    Johnson, James R.; Davis, Gregg; Clabots, Connie; Johnston, Brian D.; Porter, Stephen; DebRoy, Chitrita; Pomputius, William; Ender, Peter T.; Cooperstock, Michael; Slater, Billie Savvas; Banerjee, Ritu; Miller, Sybille; Kisiela, Dagmara; Sokurenko, Evgeni V.; Aziz, Maliha; Price, Lance B.

    2016-01-01

    Background. Within-household sharing of strains from the resistance-associated H30R1 and H30Rx subclones of Escherichia coli sequence type 131 (ST131) has been inferred based on conventional typing data, but it has been assessed minimally using whole genome sequence (WGS) analysis. Methods. Thirty-three clinical and fecal isolates of ST131-H30R1 and ST131-H30Rx, from 20 humans and pets in 6 households, underwent WGS analysis for comparison with 52 published ST131 genomes. Phylogenetic relationships were inferred using a bootstrapped maximum likelihood tree based on core genome sequence polymorphisms. Accessory traits were compared between phylogenetically similar isolates. Results. In the WGS-based phylogeny, isolates clustered strictly by household, in clades that were distributed widely across the phylogeny, interspersed between H30R1 and H30Rx comparison genomes. For only 1 household did the core genome phylogeny place epidemiologically unlinked isolates together with household isolates, but even there multiple differences in accessory genome content clearly differentiated these 2 groups. The core genome phylogeny supported within-household strain sharing, fecal-urethral urinary tract infection pathogenesis (with the entire household potentially providing the fecal reservoir), and instances of host-specific microevolution. In 1 instance, the household's index strain persisted for 6 years before causing a new infection in a different household member. Conclusions. Within-household sharing of E coli ST131 strains was confirmed extensively at the genome level, as was long-term colonization and repeated infections due to an ST131-H30Rx strain. Future efforts toward surveillance and decolonization may need to address not just the affected patient but also other human and animal household members. PMID:27703993

  17. Analysis of the 2003-2004 microseismic sequence in the western part of the Corinth Rift

    NASA Astrophysics Data System (ADS)

    Godano, Maxime; Bernard, Pascal; Dublanchet, Pierre; Canitano, Alexandre; Marsan, David

    2013-04-01

    -east to north-west characterized by the successive activation of the multiplets. We next perform a spectral analysis to determine source parameters for each multiplet in order to estimate size of the asperities and cumulative coseismic slip. From the preceding observations and results we finally try to reproduce the 2003-2004 microseismic sequence using rate-and-state 3D asperity model (Dublanchet et al., submitted). The deformation measured during the crisis by the strainmeter installed in the Trizonia island is used in the modeling to constrain the maximum slip amplitude.

  18. Easy Bioinformatics Analysis (EBiAn): a package for manipulating and analysis of short biological sequences

    PubMed Central

    Bertucci Barbosa, Luiz Carlos; Garrido, Saulo Santesso; Garcia, Anderson; Delfino, Davi Barbosa; Gonçalves, Rodrigo Duarte; Marchetto, Reinaldo

    2010-01-01

    The work of biochemists and molecular biologists often is dependent or extremely favored by a preliminary computer analysis. Thus, the development of an efficient and friendly computational tool is very important. In this work, we developed a package of programs in Javascript language which can be used online or locally. The programs depend exclusively of Web browsers and are compatible with Internet Explorer, Opera, Mozilla Firefox and Google Chrome. With the EBiAn package it is can perform the main analysis and manipulation of DNA, RNA, proteins and peptides sequences. The programs can be freely accessed and adapted or modified to generate new programs. Availability http://www.iq.unesp.br/EXTENSAO/EBiAn/html/ebian.html PMID:21346860

  19. Sequence-selective DNA detection using multiple laminar streams: a novel microfluidic analysis method.

    PubMed

    Yamashita, Kenichi; Yamaguchi, Yoshiko; Miyazaki, Masaya; Nakamura, Hiroyuki; Shimizu, Hazime; Maeda, Hideaki

    2004-02-01

    On-site detection methods for DNA have been demanded in the pathophysiology field. Such analysis requires a simple and accurate method, rather than high-throughput. This report describes a novel microfluidic analysis method and its application for simple sequence-selective DNA detection. The method uses a microchannel device with a serpentine structure. Sequence-specific binding of probe DNA can be detected at one side of the microchannel. This method is capable of sequence-specific detection of DNA with high accuracy. Single base mutations can also be analyzed. Combination of laminar stream and laminar secondary flow in the microchannel enable specific detection of probe-bound DNA.

  20. Structural characterization and biological activity of recombinant human epidermal growth factor proteins with different N-terminal sequences.

    PubMed

    Svoboda, M; Bauhofer, A; Schwind, P; Bade, E; Rasched, I; Przybylski, M

    1994-05-18

    The primary structures and molecular homogeneity of recombinant human epidermal growth factors from different suppliers were characterized and their biological activities evaluated by a standard DNA synthesis assay. Molecular weight determinations using 252Cf-plasma-desorption and electrospray mass spectrometry in combination with N- and C-terminal sequence analysis and determination of intramolecular disulfide bridges revealed that one recombinant protein had the correct human-identical structure (54 aa residues; 6347 Da). In contrast, a second recombinant protein (7020 Da) was found to contain a pentapeptide (KKYPR) insert following its N-terminal methionine. This structural variant showed a significant reduction in its capacity to stimulate DNA synthesis.

  1. Survey Sequencing and Comparative Analysis of the Elephant Shark (Callorhinchus milii) Genome

    PubMed Central

    Venkatesh, Byrappa; Kirkness, Ewen F; Loh, Yong-Hwee; Halpern, Aaron L; Lee, Alison P; Johnson, Justin; Dandona, Nidhi; Viswanathan, Lakshmi D; Tay, Alice; Venter, J. Craig; Strausberg, Robert L; Brenner, Sydney

    2007-01-01

    Owing to their phylogenetic position, cartilaginous fishes (sharks, rays, skates, and chimaeras) provide a critical reference for our understanding of vertebrate genome evolution. The relatively small genome of the elephant shark, Callorhinchus milii, a chimaera, makes it an attractive model cartilaginous fish genome for whole-genome sequencing and comparative analysis. Here, the authors describe survey sequencing (1.4× coverage) and comparative analysis of the elephant shark genome, one of the first cartilaginous fish genomes to be sequenced to this depth. Repetitive sequences, represented mainly by a novel family of short interspersed element–like and long interspersed element–like sequences, account for about 28% of the elephant shark genome. Fragments of approximately 15,000 elephant shark genes reveal specific examples of genes that have been lost differentially during the evolution of tetrapod and teleost fish lineages. Interestingly, the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes. Elephant shark contains putative four Hox clusters indicating that, unlike teleost fish genomes, the elephant shark genome has not experienced an additional whole-genome duplication. These findings underscore the importance of the elephant shark as a critical reference vertebrate genome for comparative analysis of the human and other vertebrate genomes. This study also demonstrates that a survey-sequencing approach can be applied productively for comparative analysis of distantly related vertebrate genomes. PMID:17407382

  2. Probabilistic topic modeling for the analysis and classification of genomic sequences

    PubMed Central

    2015-01-01

    Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734

  3. Next-generation sequencing in NSCLC and melanoma patients: a cost and budget impact analysis

    PubMed Central

    van Amerongen, Rosa A; Retèl, Valesca P; Coupé, Veerle MH; Nederlof, Petra M; Vogel, Maartje J; van Harten, Wim H

    2016-01-01

    Next-generation sequencing (NGS) has reached the molecular diagnostic laboratories. Although the NGS technology aims to improve the effectiveness of therapies by selecting the most promising therapy, concerns are that NGS testing is expensive and that the ‘benefits’ are not yet in relation to these costs. In this study, we give an estimation of the costs and an institutional and national budget impact of various types of NGS tests in non-small-cell lung cancer (NSCLC) and melanoma patients within The Netherlands. First, an activity-based costing (ABC) analysis has been conducted on the costs of two examples of NGS panels (small- and medium-targeted gene panel (TGP)) based on data of The Netherlands Cancer Institute (NKI). Second, we performed a budget impact analysis (BIA) to estimate the current (2015) and future (2020) budget impact of NGS on molecular diagnostics for NSCLC and melanoma patients in The Netherlands. Literature, expert opinions, and a data set of patients within the NKI (n = 172) have been included in the BIA. Based on our analysis, we expect that the NGS test cost concerns will be limited. In the current situation, NGS can indeed result in higher diagnostic test costs, which is mainly related to required additional tests besides the small TGP. However, in the future, we expect that the use of whole-genome sequencing (WGS) will increase, for which it is expected that additional tests can be (partly) avoided. Although the current clinical benefits are expected to be limited, the research potentials of NGS are already an important advantage. PMID:27899957

  4. Full Genome Sequence and sfRNA Interferon Antagonist Activity of Zika Virus from Recife, Brazil

    PubMed Central

    Rezelj, Veronica V.; Clark, Jordan J.; Cordeiro, Marli T.; Freitas de Oliveira França, Rafael; Pena, Lindomar J.; Wilkie, Gavin S.; Da Silva Filipe, Ana; Davis, Christopher; Hughes, Joseph; Varjak, Margus; Selinger, Martin; Zuvanov, Luíza; Owsianka, Ania M.; Patel, Arvind H.; McLauchlan, John; Lindenbach, Brett D.; Fall, Gamou; Sall, Amadou A.; Biek, Roman; Rehwinkel, Jan; Schnettler, Esther; Kohl, Alain

    2016-01-01

    Background The outbreak of Zika virus (ZIKV) in the Americas has transformed a previously obscure mosquito-transmitted arbovirus of the Flaviviridae family into a major public health concern. Little is currently known about the evolution and biology of ZIKV and the factors that contribute to the associated pathogenesis. Determining genomic sequences of clinical viral isolates and characterization of elements within these are an important prerequisite to advance our understanding of viral replicative processes and virus-host interactions. Methodology/Principal findings We obtained a ZIKV isolate from a patient who presented with classical ZIKV-associated symptoms, and used high throughput sequencing and other molecular biology approaches to determine its full genome sequence, including non-coding regions. Genome regions were characterized and compared to the sequences of other isolates where available. Furthermore, we identified a subgenomic flavivirus RNA (sfRNA) in ZIKV-infected cells that has antagonist activity against RIG-I induced type I interferon induction, with a lesser effect on MDA-5 mediated action. Conclusions/Significance The full-length genome sequence including non-coding regions of a South American ZIKV isolate from a patient with classical symptoms will support efforts to develop genetic tools for this virus. Detection of sfRNA that counteracts interferon responses is likely to be important for further understanding of pathogenesis and virus-host interactions. PMID:27706161

  5. Targeted DNA methylation analysis by high throughput sequencing in porcine peri-attachment embryos.

    PubMed

    Morrill, Benson H; Cox, Lindsay; Ward, Anika; Heywood, Sierra; Prather, Randall S; Isom, S Clay

    2013-01-01

    The purpose of this experiment was to implement and evaluate the effectiveness of a next-generation sequencing-based method for DNA methylation analysis in porcine embryonic samples. Fourteen discrete genomic regions were amplified by PCR using bisulfite-converted genomic DNA derived from day 14 in vivo-derived (IVV) and parthenogenetic (PA) porcine embryos as template DNA. Resulting PCR products were subjected to high-throughput sequencing using the Illumina Genome Analyzer IIx platform. The average depth of sequencing coverage was 14,611 for IVV and 17,068 for PA. Quantitative analysis of the methylation profiles of both input samples for each genomic locus showed distinct differences in methylation profiles between IVV and PA samples for six of the target loci, and subtle differences in four loci. It was concluded that high throughput sequencing technologies can be effectively applied to provide a powerful, cost-effective approach to targeted DNA methylation analysis of embryonic and other reproductive tissues.

  6. Advanced accident sequence precursor analysis level 2 models

    SciTech Connect

    Galyean, W.J.; Brownson, D.A.; Rempe, J.L.

    1996-03-01

    The U.S. Nuclear Regulatory Commission Accident Sequence Precursor program pursues the ultimate objective of performing risk significant evaluations on operational events (precursors) occurring in commercial nuclear power plants. To achieve this objective, the Office of Nuclear Regulatory Research is supporting the development of simple probabilistic risk assessment models for all commercial nuclear power plants (NPP) in the U.S. Presently, only simple Level 1 plant models have been developed which estimate core damage frequencies. In order to provide a true risk perspective, the consequences associated with postulated core damage accidents also need to be considered. With the objective of performing risk evaluations in an integrated and consistent manner, a linked event tree approach which propagates the front end results to back end was developed. This approach utilizes simple plant models that analyze the response of the NPP containment structure in the context of a core damage accident, estimate the magnitude and timing of a radioactive release to the environment, and calculate the consequences for a given release. Detailed models and results from previous studies, such as the NUREG-1150 study, are used to quantify these simple models. These simple models are then linked to the existing Level 1 models, and are evaluated using the SAPHIRE code. To demonstrate the approach, prototypic models have been developed for a boiling water reactor, Peach Bottom, and a pressurized water reactor, Zion.

  7. Individual sequence variability and functional activities of fibrinogen-related proteins (FREPs) in the Mediterranean mussel (Mytilus galloprovincialis) suggest ancient and complex immune recognition models in invertebrates.

    PubMed

    Romero, Alejandro; Dios, Sonia; Poisa-Beiro, Laura; Costa, Maria M; Posada, David; Figueras, Antonio; Novoa, Beatriz

    2011-03-01

    In this paper, we describe sequences of fibrinogen-related proteins (FREPs) in the Mediterranean mussel Mytilus galloprovincialis (MuFREPs) with the fibrinogen domain probably involved in the antigen recognition, but without the additional collagen-like domain of ficolins, molecules responsible for complement activation by the lectin pathway. Although they do not seem to be true or primive ficolins since the phylogenetic analysis are not conclusive enough, their expression is increased after bacterial infection or PAMPs treatment and they present opsonic activities similar to mammalian ficolins. The most remarkable aspect of these sequences was the existence of a very diverse set of FREP sequences among and within individuals (different mussels do not share any identical sequence) which parallels the extraordinary complexity of the immune system, suggesting the existence of a primitive system with a potential capacity to recognize and eliminate different kind of pathogens.

  8. Nonstationary analysis of geomagnetic time sequences from Mount Etna and North Palm Springs earthquake

    NASA Astrophysics Data System (ADS)

    Fedi, M.; La Manna, M.; Palmieri, F.

    2003-10-01

    Volcanomagnetic and/or seismomagnetic effects are geomagnetic variations generated before eruptions and/or seismic events. Our aim is to analyze geomagnetic time series to detect the volcanomagnetic and/or seismomagnetic effects among a number of other variations. Two advanced signal-processing techniques are proposed to analyze the geomagnetic time series. The first technique, called Continuous Wavelet Transform Singularity Analysis (CWTSA), is based on the Continuous Wavelet Transform; the second, called Time-Variant Statistical Analysis of Nonstationary Signals (TVANS), is based on a time-varying adaptive algorithm (Recursive Least Squares). Both techniques are very effective in detecting the geomagnetic variations at the time instants likely linked to volcanic and/or seismic activity. The application of these methodologies to geomagnetic time sequences, respectively, recorded on Mount Etna during the volcanic activity of 1981 and in North Palm Springs during the seismic events of 8 July 1986 yields a good correspondence between events detected by both techniques and volcanic end seismic events. The statistical significance of geomagnetic time series was also assessed to verify the obtained results from CWTSA and TVANS. It was defined at significance level of 95% in the wavelet power spectrum for the difference of the geomagnetic time series aiming at distinguishing the most "significant" events when they are upon this one.

  9. Data analysis of HLA sequencing using Assign-SBT v3.6+ from Conexio.

    PubMed

    Wirtz, Carla; Sayer, David

    2012-01-01

    DNA Sequencing is now a standard frontline high-throughput HLA typing procedure with some unrelated bone marrow donor registry typing laboratories performing tens of thousands of tests per year. The advantage of DNA sequencing is that, by definition, sequencing directly identifies all bases in the DNA template. Alternative molecular-based assays such as the use of sequence-specific PCR primers (PCR-SSP) and oligonucleotide probes (PCR-SSO) provide information only for those regions to which the oligos are designed and no information is obtained for the regions between primers and probes.The era of routine high-throughput sequencing-based typing (SBT) was made possible by the development of locus-specific PCR-based assays and the development of the HLA sequencing-based typing software, Assign-SBT v3.2.7 by Conexio Genomics. A single PCR per locus simplified the template preparation stage of the test and Assign-SBT simplified the sequence analysis and allele assignment stage. Together these developments dramatically simplified the SBT procedure, making SBT cost effective.This chapter provides a comprehensive description of Assign-SBT sequence analysis software for use in a HLA typing laboratory.

  10. LOESS correction for length variation in gene set-based genomic sequence analysis

    PubMed Central

    Aboukhalil, Anton; Bulyk, Martha L.

    2012-01-01

    Motivation: Sequence analysis algorithms are often applied to sets of DNA, RNA or protein sequences to identify common or distinguishing features. Controlling for sequence length variation is critical to properly score sequence features and identify true biological signals rather than length-dependent artifacts. Results: Several cis-regulatory module discovery algorithms exhibit a substantial dependence between DNA sequence score and sequence length. Our newly developed LOESS method is flexible in capturing diverse score-length relationships and is more effective in correcting DNA sequence scores for length-dependent artifacts, compared with four other approaches. Application of this method to genes co-expressed during Drosophila melanogaster embryonic mesoderm development or neural development scored by the Lever motif analysis algorithm resulted in successful recovery of their biologically validated cis-regulatory codes. The LOESS length-correction method is broadly applicable, and may be useful not only for more accurate inference of cis-regulatory codes, but also for detection of other types of patterns in biological sequences. Availability: Source code and compiled code are available from http://thebrain.bwh.harvard.edu/LM_LOESS/ Contact: mlbulyk@receptor.med.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22492312

  11. A base composition analysis of natural patterns for the preprocessing of metagenome sequences

    PubMed Central

    2013-01-01

    Background On the pretext that sequence reads and contigs often exhibit the same kinds of base usage that is also observed in the sequences from which they are derived, we offer a base composition analysis tool. Our tool uses these natural patterns to determine relatedness across sequence data. We introduce spectrum sets (sets of motifs) which are permutations of bacterial restriction sites and the base composition analysis framework to measure their proportional content in sequence data. We suggest that this framework will increase the efficiency during the pre-processing stages of metagenome sequencing and assembly projects. Results Our method is able to differentiate organisms and their reads or contigs. The framework shows how to successfully determine the relatedness between these reads or contigs by comparison of base composition. In particular, we show that two types of organismal-sequence data are fundamentally different by analyzing their spectrum set motif proportions (coverage). By the application of one of the four possible spectrum sets, encompassing all known restriction sites, we provide the evidence to claim that each set has a different ability to differentiate sequence data. Furthermore, we show that the spectrum set selection having relevance to one organism, but not to the others of the data set, will greatly improve performance of sequence differentiation even if the fragment size of the read, contig or sequence is not lengthy. Conclusions We show the proof of concept of our method by its application to ten trials of two or three freshly selected sequence fragments (reads and contigs) for each experiment across the six organisms of our set. Here we describe a novel and computationally effective pre-processing step for metagenome sequencing and assembly tasks. Furthermore, our base composition method has applications in phylogeny where it can be used to infer evolutionary distances between organisms based on the notion that related organisms

  12. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants

    PubMed Central

    Gundry, Michael; Vijg, Jan

    2011-01-01

    DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5,000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a

  13. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants.

    PubMed

    Gundry, Michael; Vijg, Jan

    2012-01-03

    DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a brief

  14. Phylogenetic distribution of phenotypic traits in Bacillus thuringiensis determined by multilocus sequence analysis.

    PubMed

    Blackburn, Michael B; Martin, Phyllis A W; Kuhar, Daniel; Farrar, Robert R; Gundersen-Rindal, Dawn E

    2013-01-01

    Diverse isolates from a world-wide collection of Bacillus thuringiensis were classified based on phenotypic profiles resulting from six biochemical tests; production of amylase (T), lecithinase (L), urease (U), acid from sucrose (S) and salicin (A), and the hydrolysis of esculin (E). Eighty two isolates representing the 15 most common phenotypic profiles were subjected to phylogenetic analysis by multilocus sequence typing; these were found to be distributed among 19 sequence types, 8 of which were novel. Approximately 70% of the isolates belonged to sequence types corresponding to the classical B. thuringiensis varieties kurstaki (20 isolates), finitimus (15 isolates), morrisoni (11 isolates) and israelensis (11 isolates). Generally, there was little apparent correlation between phenotypic traits and phylogenetic position, and phenotypic variation was often substantial within a sequence type. Isolates of the sequence type corresponding to kurstaki displayed the greatest apparent phenotypic variation with 6 of the 15 phenotypic profiles represented. Despite the phenotypic variation often observed within a given sequence type, certain phenotypes appeared highly correlated with particular sequence types. Isolates with the phenotypic profiles TLUAE and LSAE were found to be exclusively associated with sequence types associated with varieties kurstaki and finitimus, respectively, and 7 of 8 TS isolates were found to be associated with the morrisoni sequence type. Our results suggest that the B. thuringiensis varieties israelensis and kurstaki represent the most abundant varieties of Bt in soil.

  15. Insights into a dinoflagellate genome through expressed sequence tag analysis

    PubMed Central

    Hackett, Jeremiah D; Scheetz, Todd E; Yoon, Hwan Su; Soares, Marcelo B; Bonaldo, Maria F; Casavant, Thomas L; Bhattacharya, Debashish

    2005-01-01

    Background Dinoflagellates are important marine primary producers and grazers and cause toxic "red tides". These taxa are characterized by many unique features such as immense genomes, the absence of nucleosomes, and photosynthetic organelles (plastids) that have been gained and lost multiple times. We generated EST sequences from non-normalized and normalized cDNA libraries from a culture of the toxic species Alexandrium tamarense to elucidate dinoflagellate evolution. Previous analyses of these data have clarified plastid origin and here we study the gene content, annotate the ESTs, and analyze the genes that are putatively involved in DNA packaging. Results Approximately 20% of the 6,723 unique (11,171 total 3'-reads) ESTs data could be annotated using Blast searches against GenBank. Several putative dinoflagellate-specific mRNAs were identified, including one novel plastid protein. Dinoflagellate genes, similar to other eukaryotes, have a high GC-content that is reflected in the amino acid codon usage. Highly represented transcripts include histone-like (HLP) and luciferin binding proteins and several genes occur in families that encode nearly identical proteins. We also identified rare transcripts encoding a predicted protein highly similar to histone H2A.X. We speculate this histone may be retained for its role in DNA double-strand break repair. Conclusion This is the most extensive collection to date of ESTs from a toxic dinoflagellate. These data will be instrumental to future research to understand the unique and complex cell biology of these organisms and for potentially identifying the genes involved in toxin production. PMID:15921535

  16. Complete genome sequence of Bacillus pumilus W3: A strain exhibiting high laccase activity.

    PubMed

    Guan, Zheng-Bing; Cai, Yu-Jie; Zhang, Yan-Zhou; Zhao, Hong; Liao, Xiang-Ru

    2015-08-10

    Here we report the full genome sequence of Bacillus pumilus W3, which was isolated from raw gallnut honey in Nandan County, Guangxi Province of China, showing high CotA-laccase activity. The W3 strain contains 3,745,123bp with GC content of 41.39%, and contains 3695 protein-coding genes, 21 rRNAs and 70 tRNAs.

  17. Biases during DNA extraction of activated sludge samples revealed by high throughput sequencing.

    PubMed

    Guo, Feng; Zhang, Tong

    2013-05-01

    Standardization of DNA extraction is a fundamental issue of fidelity and comparability in investigations of environmental microbial communities. Commercial kits for soil or feces are often adopted for studies of activated sludge because of a lack of specific kits, but they have never been evaluated regarding their effectiveness and potential biases based on high throughput sequencing. In this study, seven common DNA extraction kits were evaluated, based on not only yield/purity but also sequencing results, using two activated sludge samples (two sub-samples each, i.e. ethanol-fixed and fresh, as-is). The results indicate that the bead-beating step is necessary for DNA extraction from activated sludge. The two kits without the bead-beating step yielded very low amounts of DNA, and the least abundant operational taxonomic units (OTUs), and significantly underestimated the Gram-positive Actinobacteria, Nitrospirae, Chloroflexi, and Alphaproteobacteria and overestimated Gammaproteobacteria, Deltaproteobacteria, Bacteroidetes, and the rare phyla whose cell walls might have been readily broken. Among the other five kits, FastDNA(@) SPIN Kit for Soil extracted the most and the purest DNA. Although the number of total OTUs obtained using this kit was not the highest, the abundant OTUs and abundance of Actinobacteria demonstrated its efficiency. The three MoBio kits and one ZR kit produced fair results, but had a relatively low DNA yield and/or less Actinobacteria-related sequences. Moreover, the 50 % ethanol fixation increased the DNA yield, but did not change the sequenced microbial community in a significant way. Based on the present study, the FastDNA SPIN kit for Soil is recommended for DNA extraction of activated sludge samples. More importantly, the selection of the DNA extraction kit must be done carefully if the samples contain dominant lysing-resistant groups, such as Actinobacteria and Nitrospirae.

  18. Draft Genome Sequence of Lactobacillus crispatus EM-LC1, an Isolate with Antimicrobial Activity Cultured from an Elderly Subject.

    PubMed

    Power, Susan E; Harris, Hugh M B; Bottacini, Francesca; Ross, R Paul; O'Toole, Paul W; Fitzgerald, Gerald F

    2013-12-19

    Here we report the 1.86-Mb draft genome sequence of Lactobacillus crispatus EM-LC1, a fecal isolate with antimicrobial activity. This genome sequence is expected to provide insights into the antimicrobial activity of L. crispatus and improve our knowledge of its potential probiotic traits.

  19. Applying machine learning techniques to DNA sequence analysis. Progress report, February 14, 1991--February 13, 1992

    SciTech Connect

    Shavlik, J.W.

    1992-04-01

    We are developing a machine learning system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being learned. Using this information (which we call a ``domain theory``), our learning algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, the KBANN algorithm maps inference rules, such as consensus sequences, into a neural (connectionist) network. Neural network training techniques then use the training examples of refine these inference rules. We have been applying this approach to several problems in DNA sequence analysis and have also been extending the capabilities of our learning system along several dimensions.

  20. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats.

    PubMed

    Baud, Amelie; Hermsen, Roel; Guryev, Victor; Stridh, Pernilla; Graham, Delyth; McBride, Martin W; Foroud, Tatiana; Calderari, Sophie; Diez, Margarita; Ockinger, Johan; Beyeen, Amennai D; Gillett, Alan; Abdelmagid, Nada; Guerreiro-Cacais, Andre Ortlieb; Jagodic, Maja; Tuncel, Jonatan; Norin, Ulrika; Beattie, Elisabeth; Huynh, Ngan; Miller, William H; Koller, Daniel L; Alam, Imranul; Falak, Samreen; Osborne-Pellegrin, Mary; Martinez-Membrives, Esther; Canete, Toni; Blazquez, Gloria; Vicens-Costa, Elia; Mont-Cardona, Carme; Diaz-Moran, Sira; Tobena, Adolf; Hummel, Oliver; Zelenika, Diana; Saar, Kathrin; Patone, Giannino; Bauerfeind, Anja; Bihoreau, Marie-Therese; Heinig, Matthias; Lee, Young-Ae; Rintisch, Carola; Schulz, Herbert; Wheeler, David A; Worley, Kim C; Muzny, Donna M; Gibbs, Richard A; Lathrop, Mark; Lansu, Nico; Toonen, Pim; Ruzius, Frans Paul; de Bruijn, Ewart; Hauser, Heidi; Adams, David J; Keane, Thomas; Atanur, Santosh S; Aitman, Tim J; Flicek, Paul; Malinauskas, Tomas; Jones, E Yvonne; Ekman, Diana; Lopez-Aumatell, Regina; Dominiczak, Anna F; Johannesson, Martina; Holmdahl, Rikard; Olsson, Tomas; Gauguier, Dominique; Hubner, Norbert; Fernandez-Teruel, Alberto; Cuppen, Edwin; Mott, Richard; Flint, Jonathan

    2013-07-01

    Genetic mapping on fully sequenced individuals is transforming understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We identify 35 causal genes involved in 31 phenotypes, implicating new genes in models of anxiety, heart disease and multiple sclerosis. The relationship between sequence and genetic variation is unexpectedly complex: at approximately 40% of quantitative trait loci, a single sequence variant cannot account for the phenotypic effect. Using comparable sequence and mapping data from mice, we show that the extent and spatial pattern of variation in inbred rats differ substantially from those of inbred mice and that the genetic variants in orthologous genes rarely contribute to the same phenotype in both species.

  1. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats

    PubMed Central

    Baud, Amelie; Hermsen, Roel; Guryev, Victor; Stridh, Pernilla; Graham, Delyth; McBride, Martin W.; Foroud, Tatiana; Calderari, Sophie; Diez, Margarita; Ockinger, Johan; Beyeen, Amennai D.; Gillett, Alan; Abdelmagid, Nada; Guerreiro-Cacais, Andre Ortlieb; Jagodic, Maja; Tuncel, Jonatan; Norin, Ulrika; Beattie, Elisabeth; Huynh, Ngan; Miller, William H.; Koller, Daniel L.; Alam, Imranul; Falak, Samreen; Osborne-Pellegrin, Mary; Martinez-Membrives, Esther; Canete, Toni; Blazquez, Gloria; Vicens-Costa, Elia; Mont-Cardona, Carme; Diaz-Moran, Sira; Tobena, Adolf; Hummel, Oliver; Zelenika, Diana; Saar, Kathrin; Patone, Giannino; Bauerfeind, Anja; Bihoreau, Marie-Therese; Heinig, Matthias; Lee, Young-Ae; Rintisch, Carola; Schulz, Herbert; Wheeler, David A.; Worley, Kim C.; Muzny, Donna M.; Gibbs, Richard A.; Lathrop, Mark; Lansu, Nico; Toonen, Pim; Ruzius, Frans Paul; de Bruijn, Ewart; Hauser, Heidi; Adams, David J.; Keane, Thomas; Atanur, Santosh S.; Aitman, Tim J.; Flicek, Paul; Malinauskas, Tomas; Jones, E. Yvonne; Ekman, Diana; Lopez-Aumatell, Regina; Dominiczak, Anna F; Johannesson, Martina; Holmdahl, Rikard; Olsson, Tomas; Gauguier, Dominique; Hubner, Norbert; Fernandez-Teruel, Alberto; Cuppen, Edwin; Mott, Richard; Flint, Jonathan

    2013-01-01

    Genetic mapping on fully sequenced individuals is transforming our understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We identify 35 causal genes involved in 31 phenotypes, implicating novel genes in models of anxiety, heart disease and multiple sclerosis. The relation between sequence and genetic variation is unexpectedly complex: at approximately 40% of quantitative trait loci a single sequence variant cannot account for the phenotypic effect. Using comparable sequence and mapping data from mice, we show the extent and spatial pattern of variation in inbred rats differ significantly from those of inbred mice, and that the genetic variants in orthologous genes rarely contribute to the same phenotype in both species. PMID:23708188

  2. Full-genome sequencing and phylogenetic analysis of four neurovirulent Mexican isolates of porcine rubulavirus.

    PubMed

    Garcia-Barrera, Ali A; Del Valle, Alberto; Montaño-Hirose, Juan A; Barrón, Blanca Lilia; Salinas-Trujano, Juana; Torres-Flores, Jesus

    2017-02-09

    We report the complete genome sequences of four neurovirulent isolates of porcine rubulavirus (PorPV) from 2015 and one historical PorPV isolate from 1984 obtained by next-generation sequencing. A phylogenetic tree constructed using the individual sequences of the complete HN genes of the 2015 isolates and other historical sequences deposited in the GenBank database revealed that several recent neurovirulent isolates of PorPV (2008-2015) cluster together in a separate clade. Phylogenetic analysis of the complete genome sequences revealed that the neurovirulent strains of PorPV that circulated in Mexico during 2015 are genetically different from the PorPV strains that circulated during the 1980s.

  3. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Patel, Kamlesh D [Ken; SNL,

    2016-07-12

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  4. Partial order optimum likelihood (POOL): maximum likelihood prediction of protein active site residues using 3D Structure and sequence properties.

    PubMed

    Tong, Wenxu; Wei, Ying; Murga, Leonel F; Ondrechen, Mary Jo; Williams, Ronald J

    2009-01-01

    A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination

  5. Generation and analysis of expressed sequence tags from the ciliate protozoan parasite Ichthyophthirius multifiliis

    PubMed Central

    Abernathy, Jason W; Xu, Peng; Li, Ping; Xu, De-Hai; Kucuktas, Huseyin; Klesius, Phillip; Arias, Covadonga; Liu, Zhanjiang

    2007-01-01

    Background The ciliate protozoan Ichthyophthirius multifiliis (Ich) is an important parasite of freshwater fish that causes 'white spot disease' leading to significant losses. A genomic resource for large-scale studies of this parasite has been lacking. To study gene expression involved in Ich pathogenesis and virulence, our goal was to generate expressed sequence tags (ESTs) for the development of a powerful microarray platform for the analysis of global gene expression in this species. Here, we initiated a project to sequence and analyze over 10,000 ESTs. Results We sequenced 10,368 EST clones using a normalized cDNA library made from pooled samples of the trophont, tomont, and theront life-cycle stages, and generated 9,769 sequences (94.2% success rate). Post-sequencing processing led to 8,432 high quality sequences. Clustering analysis of these ESTs allowed identification of 4,706 unique sequences containing 976 contigs and 3,730 singletons. These unique sequences represent over two million base pairs (~10% of Plasmodium falciparum genome, a phylogenetically related protozoan). BLASTX searches produced 2,518 significant (E-value < 10-5) hits and further Gene Ontology (GO) analysis annotated 1,008 of these genes. The ESTs were analyzed comparatively against the genomes of the related protozoa Tetrahymena thermophila and P. falciparum, allowing putative identification of additional genes. All the EST sequences were deposited by dbEST in GenBank (GenBank: EG957858–EG966289). Gene discovery and annotations are presented and discussed. Conclusion This set of ESTs represents a significant proportion of the Ich transcriptome, and provides a material basis for the development of microarrays useful for gene expression studies concerning Ich development, pathogenesis, and virulence. PMID:17577414

  6. Genetic characterization of three novel chicken parvovirus strains based on analysis of their coding sequences.

    PubMed

    Koo, Bon-Sang; Lee, Hae-Rim; Jeon, Eun-Ok; Han, Moo-Sung; Min, Kyeong-Cheol; Lee, Seung-Baek; Bae, Yeon-Ji; Cho, Sun-Hyung; Mo, Jong-Suk; Kwon, Hyuk Moo; Sung, Haan Woo; Kim, Jong-Nyeo; Mo, In-Pil

    2015-01-01

    Chicken parvovirus (ChPV) is one of the causative agents of viral enteritis. Recently, the genome of the ABU-P1 strain of ChPV was fully sequenced and determined to have a distinct genomic composition compared with that of vertebrate parvoviruses. However, no comparative sequence analysis of coding regions of ChPVs was possible because of the lack of other sequence information. In this study, we obtained the nucleotide sequences of all genomic coding regions of three ChPVs by polymerase chain reaction using 13 primer sets, and deduced the amino acid sequences from the nucleotide sequences. The non-structural protein 1 (NS1) gene of the three ChPVs showed 95.0 to 95.5% nucleotide sequence identity and 96.5 to 98.1% amino acid sequence identity to those of NS1 from the ABU-P1 strain, respectively, and even higher nucleotide and amino acid similarities to one another. The viral proteins (VP) gene was more divergent between the three ChPV Korean strains and ABU-P1, with 88.1 to 88.3% nucleotide identity and 93.0% amino acid identity. Analysis of the putative tertiary structure of the ChPV VP2 protein showed that variable regions with less than 80% nucleotide similarity between the three Korean strains and ABU-P1 occurred in large loops of the VP2 protein believed to be involved in antigenicity, pathogenicity, and tissue tropism in other parvoviruses. Based on our analysis of full-length coding sequences, we discovered greater variation in ChPV strains than reported previously, especially in partial regions of the VP2 protein.

  7. Basics of Genome Sequence Analysis in Bioinformatics -- its Fundamental Ideas and Problems

    NASA Astrophysics Data System (ADS)

    Suzuki, Tomonori; Miyazaki, Satoru

    2009-02-01

    The genome sequences are one of the most fundamental data among various omics analyses. So far, basic bioinformatics tools have developing to treat genome sequences. First step of genome sequence analysis is to predict or assign "genes" on genome sequences. In the case of Eukaryotes, we can identify genes by use of full length cDNA sequences with local alignment tools such as search, blast and fasta, etc. However, it is difficult to catch mRNAs (transcripts) in Prokaryotes. Therefore, computational prediction for gene identification is first choice to start genome sequence analysis. In this review, we pick up methods for computational gene prediction first. Once genes are predicted, next step is to functions for proteins or RNAs encoded on a gene. Then, how we can define the distance between gene sequences is very important for the further analysis. So, we describe the basics of mathematical concept for gene comparison. And we also introduce our novel concept for biological sequence comparisons for the view point of informational theory. In the post genome era, many researchers are very interested in not only gene functions but also the gene regulations whose information is also on genome sequences. Cis-regulatory elements, however, is too short to find some mathematical rules. Therefore, computationally predicted cis-elements tend to include many false-positives. To reduce the ratio false-positives, we need reliable database of set of cis-regulatory elements called cis-regulatory modules for a gene. So, we are trying to develop the Cis-Regulatory Elements Module Reference Database. In the third section, we introduce you the procedure to construct the Cis-Regulatory Elements Module Reference Database and its user interfaces.

  8. FourCSeq: analysis of 4C sequencing data

    PubMed Central

    Klein, Felix A.; Pakozdi, Tibor; Anders, Simon; Ghavi-Helm, Yad; Furlong, Eileen E. M.; Huber, Wolfgang

    2015-01-01

    Motivation: Circularized Chromosome Conformation Capture (4C) is a powerful technique for studying the spatial interactions of a specific genomic region called the ‘viewpoint’ with the rest of the genome, both in a single condition or comparing different experimental conditions or cell types. Observed ligation frequencies typically show a strong, regular dependence on genomic distance from the viewpoint, on top of which specific interaction peaks are superimposed. Here, we address the computational task to find these specific peaks and to detect changes between different biological conditions. Results: We model the overall trend of decreasing interaction frequency with genomic distance by fitting a smooth monotonically decreasing function to suitably transformed count data. Based on the fit, z-scores are calculated from the residuals, and high z-scores are interpreted as peaks providing evidence for specific interactions. To compare different conditions, we normalize fragment counts between samples, and call for differential contact frequencies using the statistical method DESeq2 adapted from RNA-Seq analysis. Availability and implementation: A full end-to-end analysis pipeline is implemented in the R package FourCSeq available at www.bioconductor.org. Contact: felix.klein@embl.de or whuber@embl.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26034064

  9. Multifractal detrended cross-correlation analysis of genome sequences using chaos-game representation

    NASA Astrophysics Data System (ADS)

    Pal, Mayukha; Kiran, V. Satya; Rao, P. Madhusudana; Manimaran, P.

    2016-08-01

    We characterized the multifractal nature and power law cross-correlation between any pair of genome sequence through an integrative approach combining 2D multifractal detrended cross-correlation analysis and chaos game representation. In this paper, we have analyzed genomes of some prokaryotes and calculated fractal spectra h(q) and f(α) . From our analysis, we observed existence of multifractal nature and power law cross-correlation behavior between any pair of genome sequences. Cluster analysis was performed on the calculated scaling exponents to identify the class affiliation and the same is represented as a dendrogram. We suggest this approach may find applications in next generation sequence analysis, big data analytics etc.

  10. Coupling detrended fluctuation analysis for multiple warehouse-out behavioral sequences

    NASA Astrophysics Data System (ADS)

    Yao, Can-Zhong; Lin, Ji-Nan; Zheng, Xu-Zhou

    2017-01-01

    Interaction patterns among different warehouses could make the warehouse-out behavioral sequences less predictable. We firstly take a coupling detrended fluctuation analysis on the warehouse-out quantity, and find that the multivariate sequences exhibit significant coupling multifractal characteristics regardless of the types of steel products. Secondly, we track the sources of multifractal warehouse-out sequences by shuffling and surrogating original ones, and we find that fat-tail distribution contributes more to multifractal features than the long-term memory, regardless of types of steel products. From perspective of warehouse contribution, some warehouses steadily contribute more to multifractal than other warehouses. Finally, based on multiscale multifractal analysis, we propose Hurst surface structure to investigate coupling multifractal, and show that multiple behavioral sequences exhibit significant coupling multifractal features that emerge and usually be restricted within relatively greater time scale interval.

  11. Analysis of loss of decay-heat-removal sequences at Browns Ferry Unit One

    SciTech Connect

    Harrington, R.M.

    1983-01-01

    This paper summarizes the Oak Ridge National Laboratory (ORNL) report Loss of DHR Sequences at Browns Ferry Unit One - Accident Sequence Analysis (NUREG/CR-2973). The Loss of DHR investigation is the third in a series of accident studies concerning the BWR 4 - MK I containment plant design. These studies, sponsored by the Nuclear Regulatory Commission Severe Accident Sequence Analysis (SASA) program, have been conducted at ORNL with the full cooperation of the Tennessee Valley Authority (TVA). The purpose of the SASA studies is to predetermine the probable course of postulated severe accidents so as to establish the timing and the sequence of events. The SASA studies also produce recommendations concerning the implementation of better system design and better emergency operating instructions and operator training. The ORNL studies also include a detailed, best-estimate calculation of the release and transport of radioactive fission products following postulated severe accidents.

  12. A convenient and adaptable microcomputer environment for DNA and protein sequence manipulation and analysis.

    PubMed Central

    Pustell, J; Kafatos, F C

    1986-01-01

    We describe the further development of a widely used package of DNA and protein sequence analysis programs for microcomputers (1,2,3). The package now provides a screen oriented user interface, and an enhanced working environment with powerful formatting, disk access, and memory management tools. The new GenBank floppy disk database is supported transparently to the user and a similar version of the NBRF protein database is provided. The programs can use sequence file annotation to automatically annotate printouts and translate or extract specified regions from sequences by name. The sequence comparison programs can now perform a 5000 X 5000 bp analysis in 12 minutes on an IBM PC. A program to locate potential protein coding regions in nucleic acids, a digitizer interface, and other additions are also described. PMID:3753784

  13. Respiratory syncytial virus fusion glycoprotein: nucleotide sequence of mRNA, identification of cleavage activation site and amino acid sequence of N-terminus of F1 subunit.

    PubMed Central

    Elango, N; Satake, M; Coligan, J E; Norrby, E; Camargo, E; Venkatesan, S

    1985-01-01

    The amino acid sequence of respiratory syncytial virus fusion protein (Fo) was deduced from the sequence of a partial cDNA clone of mRNA and from the 5' mRNA sequence obtained by primer extension and dideoxysequencing. The encoded protein of 574 amino acids is extremely hydrophobic and has a molecular weight of 63371 daltons. The site of proteolytic cleavage within this protein was accurately mapped by determining a partial amino acid sequence of the N-terminus of the larger subunit (F1) purified by radioimmunoprecipitation using monoclonal antibodies. Alignment of the N-terminus of the F1 subunit within the deduced amino acid sequence of Fo permitted us to identify a sequence of lys-lys-arg-lys-arg-arg at the C-terminus of the smaller N-terminal F2 subunit that appears to represent the cleavage/activation domain. Five potential sites of glycosylation, four within the F2 subunit, were also identified. Three extremely hydrophobic domains are present in the protein; a) the N-terminal signal sequence, b) the N-terminus of the F1 subunit that is analogous to the N-terminus of the paramyxovirus F1 subunit and the HA2 subunit of influenza virus hemagglutinin, and c) the putative membrane anchorage domain near the C-terminus of F1. Images PMID:2987829

  14. The nuclear genome of Brachypodium distachyon: analysis of BAC end sequences.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Due in part to its small genome (~350 Mb), Brachypodium distachyon is emerging as a model system for temperate grasses, including important crops like wheat and barley. We present the analysis of 10.9% of the Brachypodium genome based on 64,696 BAC end sequences (BES). Analysis of repeat DNA content...

  15. Mutational analysis of the gene start sequences of pneumonia virus of mice.

    PubMed

    Dibben, Oliver; Easton, Andrew J

    2007-12-01

    The transcriptional start sequence of pneumonia virus of mice is more variable than that of the other pneumoviruses, with five different nine-base gene start (GS) sequences found in the PVM genome. The sequence requirements of the PVM gene start signal, and the efficiency of transcriptional initiation of the different virus genes, was investigated using a reverse genetics approach with a minigenome construct containing two reporter genes. A series of GS mutants were created, where each of the nine bases of the gene start consensus sequence of a reporter gene was changed to every other possible base, and the resulting effect on initiation of transcription was assayed. Nucleotide positions 1, 2 and 7 were found to be most sensitive to mutation whilst positions 4, 5 and 9 were relatively insensitive. The L gene GS sequence was found to have only 20% of the activity of the consensus sequence whilst the published M2 gene start sequence was found to be non-functional. A minigenome construct in which the two reporter genes were separated by the F-M2 gene junction of PVM was used to confirm the presence of two alternative, functional, GS sequences that could both drive the transcription of the PVM M2 gene.

  16. Plastome Sequence Determination and Comparative Analysis for Members of the Lolium-Festuca Grass Species Complex

    PubMed Central

    Hand, Melanie L.; Spangenberg, German C.; Forster, John W.; Cogan, Noel O. I.

    2013-01-01

    Chloroplast genome sequences are of broad significance in plant biology, due to frequent use in molecular phylogenetics, comparative genomics, population genetics, and genetic modification studies. The present study used a second-generation sequencing approach to determine and assemble the plastid genomes (plastomes) of four representatives from the agriculturally important Lolium-Festuca species complex of pasture grasses (Lolium multiflorum, Festuca pratensis, Festuca altissima, and Festuca ovina). Total cellular DNA was extracted from either roots or leaves, was sequenced, and the output was filtered for plastome-related reads. A comparison between sources revealed fewer plastome-related reads from root-derived template but an increase in incidental bacterium-derived sequences. Plastome assembly and annotation indicated high levels of sequence identity and a conserved organization and gene content between species. However, frequent deletions within the F. ovina plastome appeared to contribute to a smaller plastid genome size. Comparative analysis with complete plastome sequences from other members of the Poaceae confirmed conservation of most grass-specific features. Detailed analysis of the rbcL–psaI intergenic region, however, revealed a “hot-spot” of variation characterized by independent deletion events. The evolutionary implications of this observation are discussed. The complete plastome sequences are anticipated to provide the basis for potential organelle-specific genetic modification of pasture grasses. PMID:23550121

  17. Viral population analysis and minority-variant detection using short read next-generation sequencing

    PubMed Central

    Watson, Simon J.; Welkers, Matthijs R. A.; Depledge, Daniel P.; Coulter, Eve; Breuer, Judith M.; de Jong, Menno D.; Kellam, Paul

    2013-01-01

    RNA viruses within infected individuals exist as a population of evolutionary-related variants. Owing to evolutionary change affecting the constitution of this population, the frequency and/or occurrence of individual viral variants can show marked or subtle fluctuations. Since the development of massively parallel sequencing platforms, such viral populations can now be investigated to unprecedented resolution. A critical problem with such analyses is the presence of sequencing-related errors that obscure the identification of true biological variants present at low frequency. Here, we report the development and assessment of the Quality Assessment of Short Read (QUASR) Pipeline (http://sourceforge.net/projects/quasr) specific for virus genome short read analysis that minimizes sequencing errors from multiple deep-sequencing platforms, and enables post-mapping analysis of the minority variants within the viral population. QUASR significantly reduces the error-related noise in deep-sequencing datasets, resulting in increased mapping accuracy and reduction of erroneous mutations. Using QUASR, we have determined influenza virus genome dynamics in sequential samples from an in vitro evolution of 2009 pandemic H1N1 (A/H1N1/09) influenza from samples sequenced on both the Roche 454 GSFLX and Illumina GAIIx platforms. Importantly, concordance between the 454 and Illumina sequencing allowed unambiguous minority-variant detection and accurate determination of virus population turnover in vitro. PMID:23382427

  18. Viral population analysis and minority-variant detection using short read next-generation sequencing.

    PubMed

    Watson, Simon J; Welkers, Matthijs R A; Depledge, Daniel P; Coulter, Eve; Breuer, Judith M; de Jong, Menno D; Kellam, Paul

    2013-03-19

    RNA viruses within infected individuals exist as a population of evolutionary-related variants. Owing to evolutionary change affecting the constitution of this population, the frequency and/or occurrence of individual viral variants can show marked or subtle fluctuations. Since the development of massively parallel sequencing platforms, such viral populations can now be investigated to unprecedented resolution. A critical problem with such analyses is the presence of sequencing-related errors that obscure the identification of true biological variants present at low frequency. Here, we report the development and assessment of the Quality Assessment of Short Read (QUASR) Pipeline (http://sourceforge.net/projects/quasr) specific for virus genome short read analysis that minimizes sequencing errors from multiple deep-sequencing platforms, and enables post-mapping analysis of the minority variants within the viral population. QUASR significantly reduces the error-related noise in deep-sequencing datasets, resulting in increased mapping accuracy and reduction of erroneous mutations. Using QUASR, we have determined influenza virus genome dynamics in sequential samples from an in vitro evolution of 2009 pandemic H1N1 (A/H1N1/09) influenza from samples sequenced on both the Roche 454 GSFLX and Illumina GAIIx platforms. Importantly, concordance between the 454 and Illumina sequencing allowed unambiguous minority-variant detection and accurate determination of virus population turnover in vitro.

  19. Impregnating unconsolidated pyroclastic sequences: A tool for detailed facies analysis

    NASA Astrophysics Data System (ADS)

    Klapper, Daniel; Kueppers, Ulrich; Castro, Jon M.; Pacheco, Jose M. R.; Dingwell, Donald B.

    2010-05-01

    The interpretation of volcanic eruptions is usually derived from direct observation and the thorough analysis of the deposits. Processes in vent-proximal areas are usually not directly accessible or likely to be obscured. Hence, our understanding of proximal deposits is often limited as they were produced by the simultaneous events stemming from primary eruptive, transportative, and meteorological conditions. Here we present a method that permits for a direct and detailed quasi in-situ investigation of loose pyroclastic units that are usually analysed in the laboratory for their 1) grain-size distribution, 2) componentry, and 3) grain morphology. As the clast assembly is altered during sampling, the genesis of a stratigraphic unit and the relative importance of the above mentioned deposit characteristics is hard to achieve. In an attempt to overcome the possible loss of information during conventional sampling techniques, we impregnated the cleaned surfaces of proximal, unconsolidated units of the 1957-58 Capelinhos eruption on Faial, Azores. During this basaltic, emergent eruption, fluxes in magma rise rate led to a repeated build-up and collapse of tuff cones and consequently to a shift between phreatomagmatic and magmatic eruptive style. The deposits are a succession of generally parallel bedded, cm- to dm-thick layers with a predominantly ashy matrix. The lapilli content is varying gradually; the content of bombs is enriched in discrete layers without clear bomb sags. The sample areas have been cleaned and impregnated with two-component glue (EPOTEK 301). For approx. 10 * 10 cm, a volume of mixed glue of 20 ml was required. Using a syringe, this low-viscosity, transparent glue could be easily applied on the target area. We found that the glue permeated the deposit as deep as 5 mm. After > 24 h, the glue was sufficiently dry to enable the sample to be laid open. This impregnation method renders it possible to cut and polish the sample and investigate grain

  20. Impregnating unconsolidated pyroclastic sequences: A tool for detailed facies analysis

    NASA Astrophysics Data System (ADS)

    Klapper, D.; Kueppers, U.; Castro, J. M.

    2009-12-01

    The interpretation of volcanic eruptions is usually derived from direct observation and the thorough analysis of the deposits. Processes in vent-proximal areas are usually not directly accessible or likely to be obscured. Hence, our understanding of proximal deposits is often limited as they were produced by the simultaneous events stemming from primary eruptive, transportative, and meteorological conditions. Here we present a method that permits for a direct and detailed quasi in-situ investigation of loose pyroclastic units that are usually analysed in the laboratory for their 1) grain-size distribution, 2) componentry, and 3) grain morphology. As the clast assembly is altered during sampling, the genesis of a stratigraphic unit and the relative importance of the above mentioned deposit characteristics is hard to achieve. In an attempt to overcome the possible loss of information during conventional sampling techniques, we impregnated the cleaned surfaces of proximal, unconsolidated units of the 1957-58 Capelinhos eruption on Faial, Azores. During this basaltic, emergent eruption, fluxes in magma rise rate led to a repeated build-up and collapse of tuff cones and consequently to a shift between phreatomagmatic and magmatic eruptive style. The deposits are a succession of generally parallel bedded, cm- to dm-thick layers with a predominantly ashy matrix. The lapilli content is varying gradually; the content of bombs is enriched in discrete layers without clear bomb sags. The sample areas have been cleaned and impregnated with a two-component glue (EPOTEK 301). For approx. 10 * 10 cm, a volume of mixed glue of 20 ml was required. This low-viscosity, transparent glue allowed for an easy application on the target area by means of a syringe and permeated the deposit as deep as 5 mm. After > 24 h, the glue was sufficiently dry to enable the sample to be laid open. This impregnation method renders it possible to cut and polish the sample and investigate grain

  1. Complete nucleotide sequence analysis of a Dengue-1 virus isolated on Easter Island, Chile.

    PubMed

    Cáceres, C; Yung, V; Araya, P; Tognarelli, J; Villagra, E; Vera, L; Fernández, J

    2008-01-01

    Dengue-1 viruses responsible for the dengue fever outbreak in Easter Island in 2002 were isolated from acute-phase sera of dengue fever patients. In order to analyze the complete genome sequence, we designed primers to amplify contiguous segments across the entire sequence of the viral genome. RT-PCR products obtained were cloned, and complete nucleotide and deduced amino acid sequences were determined. This report constitutes the first complete genetic characterization of a DENV-1 isolate from Chile. Phylogenetic analysis shows that an Easter Island isolate is most closely related to Pacific DENV-1 genotype IV viruses.

  2. Chromatin Isolation and DNA Sequence Analysis in Large Undergraduate Laboratory Sections

    NASA Astrophysics Data System (ADS)

    Hagerman, Ann E.

    1999-10-01

    A pair of exercises that introduce undergraduate students to basic techniques and concepts of molecular biology and that are appropriate for classes with large enrollments are described. One exercise is a simple laboratory experiment in which chromatin is isolated from chicken liver and is resolved into histone proteins and DNA by ion-exchange chromatography. The other is a series of computer simulations that introduce DNA sequencing, mapping, and sequence analysis to the students. The final step of the simulation is submission of a sequence to a database on the World Wide Web for identification of the protein product of the gene.

  3. Regional association analysis delineates a sequenced chromosome region influencing antinutritive seed meal compounds in oilseed rape.

    PubMed

    Snowdon, R J; Wittkop, B; Rezaidad, A; Hasan, M; Lipsa, F; Stein, A; Friedt, W

    2010-11-01

    This study describes the use of regional association analyses to delineate a sequenced region of a Brassica napus chromosome with a significant effect on antinutritive seed meal compounds in oilseed rape. A major quantitative trait locus (QTL) influencing seed colour, fibre content, and phenolic compounds was mapped to the same position on B. napus chromosome A9 in biparental mapping populations from two different yellow-seeded × black-seeded B. napus crosses. Sequences of markers spanning the QTL region identified synteny to a sequence contig from the corresponding chromosome A9 in Brassica rapa. Remapping of sequence-derived markers originating from the B. rapa sequence contig confirmed their position within the QTL. One of these markers also mapped to a seed colour and fibre QTL on the same chromosome in a black-seeded × black-seeded B. napus cross. Consequently, regional association analysis was performed in a genetically diverse panel of dark-seeded, winter-type oilseed rape accessions. For this we used closely spaced simple sequence repeat (SSR) markers spanning the sequence contig covering the QTL region. Correction for population structure was performed using a set of genome-wide SSR markers. The identification of QTL-derived markers with significant associations to seed colour, fibre content, and phenolic compounds in the association panel enabled the identification of positional and functional candidate genes for B. napus seed meal quality within a small segment of the B. rapa genome sequence.

  4. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    PubMed

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  5. CyMATE: a new tool for methylation analysis of plant genomic DNA after bisulphite sequencing.

    PubMed

    Hetzl, Jennifer; Foerster, Andrea M; Raidl, Günther; Mittelsten Scheid, Ortrun

    2007-08-01

    Cytosine methylation is a hallmark of epigenetic information in the DNA of many fungi, vertebrates and plants. The technique of bisulphite genomic sequencing reveals the methylation state of every individual cytosine in a sequence, and thereby provides high-resolution data on epigenetic diversity; however, the manual evaluation and documentation of large amounts of data is laborious and error-prone. While some software is available for facilitating the analysis of mammalian DNA methylation, which is found nearly exclusively at CG sites, there is no software optimally suited for data from DNA with significant non-CG methylation. We describe CyMATE (Cytosine Methylation Analysis Tool for Everyone) for in silico analysis of DNA sequences after bisulphite conversion of plant DNA, in which methylation is more divergent with respect to sequence context and biological relevance. From aligned sequences, CyMATE includes and distinguishes methylation at CG, CHG and CHH (where H = A, C or T), and can extract both quantitative and qualitative data regarding general and pattern-specific methylation per sequence and per position, i.e. data for individual sites in a sequence and the epigenetic divergence within a sample. In addition, it can provide graphical output from alignments in either an overview or a 'zoom-in' view as pdf files. Detailed information, including a quality control of the sequencing data, is provided in text format. We applied CyMATE to the analysis of DNA methylation at transcriptionally silenced promoters in diploid and polyploid Arabidopsis and found significant hypermethylation, high stability of the methylated state independent of chromosome number, and non-redundant patterns of mC distribution. CyMATE is freely available for non-commercial use at http://www.gmi.oeaw.ac.at/CyMATE.

  6. Identification of genetic recombination between Acinetobacter species based on multilocus sequence analysis.

    PubMed

    Kim, Dae Hun; Park, Young Kyoung; Choi, Ji Young; Ko, Kwan Soo

    2012-07-01

    During multilocus sequence analysis of Acinetobacter calcoaceticus-Acinetobacter baumannii complex, we identified the evidence of recent genetic recombination between 2 Acinetobacter species. While 3 isolates belonged to A. nosocomialis based on 16S rRNA, gyrB, fusA, gdhB, and rplB gene sequences, they showed close relationships with Acinetobacter genomic species 'close to 13TU' in rpoB, recA, cpn60, rpoD, and gltA gene trees.

  7. A conserved sequence extending motif III of the motor domain in the Snf2-family DNA translocase Rad54 is critical for ATPase activity.

    PubMed

    Zhang, Xiao-Ping; Janke, Ryan; Kingsley, James; Luo, Jerry; Fasching, Clare; Ehmsen, Kirk T; Heyer, Wolf-Dietrich

    2013-01-01

    Rad54 is a dsDNA-dependent ATPase that translocates on duplex DNA. Its ATPase function is essential for homologous recombination, a pathway critical for meiotic chromosome segregation, repair of complex DNA damage, and recovery of stalled or broken replication forks. In recombination, Rad54 cooperates with Rad51 protein and is required to dissociate Rad51 from heteroduplex DNA to allow access by DNA polymerases for recombination-associated DNA synthesis. Sequence analysis revealed that Rad54 contains a perfect match to the consensus PIP box sequence, a widely spread PCNA interaction motif. Indeed, Rad54 interacts directly with PCNA, but this interaction is not mediated by the Rad54 PIP box-like sequence. This sequence is located as an extension of motif III of the Rad54 motor domain and is essential for full Rad54 ATPase activity. Mutations in this motif render Rad54 non-functional in vivo and severely compromise its activities in vitro. Further analysis demonstrated that such mutations affect dsDNA binding, consistent with the location of this sequence motif on the surface of the cleft formed by two RecA-like domains, which likely forms the dsDNA binding site of Rad54. Our study identified a novel sequence motif critical for Rad54 function and showed that even perfect matches to the PIP box consensus may not necessarily identify PCNA interaction sites.

  8. GT-2: in vivo transcriptional activation activity and definition of novel twin DNA binding domains with reciprocal target sequence selectivity.

    PubMed

    Ni, M; Dehesh, K; Tepperman, J M; Quail, P H

    1996-06-01

    GT-2 is a novel DNA binding protein that interacts with a triplet functionally defined, positively acting GT-box motifs (GT1-bx, GT2-bx, and GT3-bx) in the rice phytochrome A gene (PHYA) promoter. Data from a transient transfection assay used here show that recombinant GT-2 enhanced transcription from both homologous and heterologous GT-box-containing promoters, thereby indicating that this protein can function as a transcriptional activator in vivo. Previously, we have shown that GT-2 contains separate DNA binding determinants in its N- and C-terminal halves, with binding site preferences for the GT3-bx and GT2-bx promoter motifs, respectively. Here, we demonstrate that the minimal DNA binding domains reside within dual 90-amino acid polypeptide segments encompassing duplicated sequences, termed trihelix regions, in each half of the molecule, plus 15 additional immediately adjacent amino acids downstream. These minimal binding domains retained considerable target sequence selectivity for the different GT-box motifs, but this selectivity was enhanced by a separate polypeptide segment farther downstream on the C-terminal side of each trihelix region. Therefore, the data indicate that the twin DNA binding domains of GT-2 each consist of a general GT-box recognition core with intrinsic differential binding activity toward closely related target motifs and a modified sequence conferring higher resolution reciprocal selectivity between these motifs.

  9. Analysis of noise-induced temporal correlations in neuronal spike sequences

    NASA Astrophysics Data System (ADS)

    Reinoso, José A.; Torrent, M. C.; Masoller, Cristina

    2016-11-01

    We investigate temporal correlations in sequences of noise-induced neuronal spikes, using a symbolic method of time-series analysis. We focus on the sequence of time-intervals between consecutive spikes (inter-spike-intervals, ISIs). The analysis method, known as ordinal analysis, transforms the ISI sequence into a sequence of ordinal patterns (OPs), which are defined in terms of the relative ordering of consecutive ISIs. The ISI sequences are obtained from extensive simulations of two neuron models (FitzHugh-Nagumo, FHN, and integrate-and-fire, IF), with correlated noise. We find that, as the noise strength increases, temporal order gradually emerges, revealed by the existence of more frequent ordinal patterns in the ISI sequence. While in the FHN model the most frequent OP depends on the noise strength, in the IF model it is independent of the noise strength. In both models, the correlation time of the noise affects the OP probabilities but does not modify the most probable pattern.

  10. Motor sequence learning in the elderly: differential activity patterns as a function of hand modality.

    PubMed

    Eudave, Luis; Aznárez-Sanado, Maite; Luis, Elkin O; Martínez, Martín; Fernández-Seara, María A; Pastor, María A

    2016-07-21

    Previous research on motor sequence learning (MSL) in the elderly has focused mainly on unilateral tasks, even though bilateral coordination might be impaired in this age group. In this fMRI study, 28 right-handed elderly subjects were recruited. The paradigm consisted of a Novel and a simple Control sequence executed with the right (R), left (L) and both hands (B). Behavioral performance (Accuracy[AC], Inter-tap Interval[ITI]) and associated brain activity were assessed during early learning. Behavioral performance in the Novel task was similar between unilateral conditions whereas in the bimanual condition more errors and slower motor execution were observed. Brain activity increases during learning showed differences between Conditions: R showed increased activity in pre-SMA, basal ganglia and left hippocampus while B showed activity increments mainly in posterior parietal cortex and cerebellum. L did not show any activity modulation during learning. Performance correlates for AC (related to spatial success) and ITI (related to accurate timing) shared a cortico-basal-cerebellar network. However, it was found that the ITI regressor presented additional significant correlations with activity in SMA and basal ganglia in R. The AC regressor showed additional significant correlations with activity in more extended thalamic and cerebellar areas in B. The present findings suggest that, behaviorally, the spatial and temporal components of MSL are impaired in elderly subjects when using both hands. Additionally, differential brain activity patterns were found across hand modalities. The results obtained reveal the existence of a highly specialized network in the dominant hand and identify areas specifically involved in bimanual coordination.

  11. cDNA sequence and chromosomal localization of human enterokinase, the proteolytic activator of trypsinogen.

    PubMed

    Kitamoto, Y; Veile, R A; Donis-Keller, H; Sadler, J E

    1995-04-11

    Enterokinase is a serine protease of the duodenal brush border membrane that cleaves trypsinogen and produces active trypsin, thereby leading to the activation of many pancreatic digestive enzymes. Overlapping cDNA clones that encode the complete human enterokinase amino acid sequence were isolated from a human intestine cDNA library. Starting from the first ATG codon, the composite 3696 nt cDNA sequence contains an open reading frame of 3057 nt that encodes a 784 amino acid heavy chain followed by a 235 amino acid light chain; the two chains are linked by at least one disulfide bond. The heavy chain contains a potential N-terminal myristoylation site, a potential signal anchor sequence near the amino terminus, and six structural motifs that are found in otherwise unrelated proteins. These domains resemble motifs of the LDL receptor (two copies), complement component Clr (two copies), the metalloprotease meprin (one copy), and the macrophage scavenger receptor (one copy). The enterokinase light chain is homologous to the trypsin-like serine proteinases. These structural features are conserved among human, bovine, and porcine enterokinase. By Northern blotting, a 4.4 kb enterokinase mRNA was detected only in small intestine. The enterokinase gene was localized to human chromosome 21q21 by fluorescence in situ hybridization.

  12. Analysis of cloned cDNA and genomic sequences for phytochrome: complete amino acid sequences for two gene products expressed in etiolated Avena.

    PubMed Central

    Hershey, H P; Barker, R F; Idler, K B; Lissemore, J L; Quail, P H

    1985-01-01

    Cloned cDNA and genomic sequences have been analyzed to deduce the amino acid sequence of phytochrome from etiolated Avena. Restriction endonuclease site polymorphism between clones indicates that at least four phytochrome genes are expressed in this tissue. Sequence analysis of two complete and one partial coding region shows approximately 98% homology at both the nucleotide and amino acid levels, with the majority of amino acid changes being conservative. High sequence homology is also found in the 5'-untranslated region but significant divergence occurs in the 3'-untranslated region. The phytochrome polypeptides are 1128 amino acid residues long corresponding to a molecular mass of 125 kdaltons. The known protein sequence at the chromophore attachment site occurs only once in the polypeptide, establishing that phytochrome has a single chromophore per monomer covalently linked to Cys-321. Computer analyses of the amino acid sequences have provided predictions regarding a number of structural features of the phytochrome molecule. PMID:3001642

  13. Interlaboratory concordance of DNA sequence analysis to detect reverse transcriptase mutations in HIV-1 proviral DNA. ACTG Sequencing Working Group. AIDS Clinical Trials Group.

    PubMed

    Demeter, L M; D'Aquila, R; Weislow, O; Lorenzo, E; Erice, A; Fitzgibbon, J; Shafer, R; Richman, D; Howard, T M; Zhao, Y; Fisher, E; Huang, D; Mayers, D; Sylvester, S; Arens, M; Sannerud, K; Rasheed, S; Johnson, V; Kuritzkes, D; Reichelderfer, P; Japour, A

    1998-11-01

    Thirteen laboratories evaluated the reproducibility of sequencing methods to detect drug resistance mutations in HIV-1 reverse transcriptase (RT). Blinded, cultured peripheral blood mononuclear cell pellets were distributed to each laboratory. Each laboratory used its preferred method for sequencing proviral DNA. Differences in protocols included: DNA purification; number of PCR amplifications; PCR product purification; sequence/location of PCR/sequencing primers; sequencing template; sequencing reaction label; sequencing polymerase; and use of manual versus automated methods to resolve sequencing reaction products. Five unknowns were evaluated. Thirteen laboratories submitted 39043 nucleotide assignments spanning codons 10-256 of HIV-1 RT. A consensus nucleotide assignment (defined as agreement among > or = 75% of laboratories) could be made in over 99% of nucleotide positions, and was more frequent in the three laboratory isolates. The overall rate of discrepant nucleotide assignments was 0.29%. A consensus nucleotide assignment could not be made at RT codon 41 in the clinical isolate tested. Clonal analysis revealed that this was due to the presence of a mixture of wild-type and mutant genotypes. These observations suggest that sequencing methodologies currently in use in ACTG laboratories to sequence HIV-1 RT yield highly concordant results for laboratory strains; however, more discrepancies among laboratories may occur when clinical isolates are tested.

  14. Sequence Analysis of the Human Virome in Febrile and Afebrile Children

    PubMed Central

    Wylie, Kristine M.; Mihindukulasuriya, Kathie A.; Sodergren, Erica; Weinstock, George M.; Storch, Gregory A.

    2012-01-01

    Unexplained fever (UF) is a common problem in children under 3 years old. Although virus infection is suspected to be the cause of most of these fevers, a comprehensive analysis of viruses in samples from children with fever and healthy controls is important for establishing a relationship between viruses and UF. We used unbiased, deep sequencing to analyze 176 nasopharyngeal swabs (NP) and plasma samples from children with UF and afebrile controls, generating an average of 4.6 million sequences per sample. An analysis pipeline was developed to detect viral sequences, which resulted in the identification of sequences from 25 viral genera. These genera included expected pathogens, such as adenoviruses, enteroviruses, and roseoloviruses, plus viruses with unknown pathogenicity. Viruses that were unexpected in NP and plasma samples, such as the astrovirus MLB-2, were also detected. Sequencing allowed identification of virus subtype for some viruses, including roseoloviruses. Highly sensitive PCR assays detected low levels of viruses that were not detected in approximately 5 million sequences, but greater sequencing depth improved sensitivity. On average NP and plasma samples from febrile children contained 1.5- to 5-fold more viral sequences, respectively, than samples from afebrile children. Samples from febrile children contained a broader range of viral genera and contained multiple viral genera more frequently than samples from children without fever. Differences between febrile and afebrile groups were most striking in the plasma samples, where detection of viral sequence may be associated with a disseminated infection. These data indicate that virus infection is associated with UF. Further studies are important in order to establish the range of viral pathogens associated with fever and to understand of the role of viral infection in fever. Ultimately these studies may improve the medical treatment of children with UF by helping avoid antibiotic therapy for

  15. Human and Tree Shrew Alpha-synuclein: Comparative cDNA Sequence and Protein Structure Analysis.

    PubMed

    Wu, Zheng-Cun; Huang, Zhang-Qiong; Jiang, Qin-Fang; Dai, Jie-Jie; Zhang, Ying; Gao, Jia-Hong; Sun, Xiao-Mei; Chen, Nai-Hong; Yuan, Yu-He; Li, Cong; Han, Yuan-Yuan; Li, Yun; Ma, Kai-Li

    2015-10-01

    The synaptic protein alpha-synuclein (α-syn) is associated with a number of neurodegenerative diseases, and homology analyses among many species have been reported. Nevertheless, little is known about the cDNA sequence and protein structure of α-syn in tree shrews, and this information might contribute to our understanding of its role in both health and disease. We designed primers to the human α-syn cDNA sequence; then, tree shrew α-syn cDNA was obtained by RT-PCR and sequenced. Based on the acquired tree shrew α-syn cDNA sequence, both the amino acid sequence and the spatial structure of α-syn were predicted and analyzed. The homology analysis results showed that the tree shrew cDNA sequence matches the human cDNA sequence exactly except at nucleotide positions 45, 60, 65, 69, 93, 114, 147, 150, 157, 204, 252, 270, 284, 298, 308, and 324. Further protein sequence analysis revealed that the tree shrew α-syn protein sequence is 97.1 % identical to that of human α-syn. The secondary protein structure of tree shrew α-syn based on random coils and α-helices is the same as that of the human structure. The phosphorylation sites are highly conserved, except the site at position 103 of tree shrew α-syn. The predicted spatial structure of tree shrew α-syn is identical to that of human α-syn. Thus, α-syn might have a similar function in tree shrew and in human, and tree shrew might be a potential animal model for studying the pathogenesis of α-synucleinopathies.

  16. Generation and analysis of expressed sequence tags from the bone marrow of Chinese Sika deer.

    PubMed

    Yao, Baojin; Zhao, Yu; Zhang, Mei; Li, Juan

    2012-03-01

    Sika deer is one of the best-known and highly valued animals of China. Despite its economic, cultural, and biological importance, there has not been a large-scale sequencing project for Sika deer to date. With the ultimate goal of sequencing the complete genome of this organism, we first established a bone marrow cDNA library for Sika deer and generated a total of 2,025 reads. After processing the sequences, 2,017 high-quality expressed sequence tags (ESTs) were obtained. These ESTs were assembled into 1,157 unigenes, including 238 contigs and 919 singletons. Comparative analyses indicated that 888 (76.75%) of the unigenes had significant matches to sequences in the non-redundant protein database, In addition to highly expressed genes, such as stearoyl-CoA desaturase, cytochrome c oxidase, adipocyte-type fatty acid-binding protein, adiponectin and thymosin beta-4, we also obtained vascular endothelial growth factor-A and heparin-binding growth-associated molecule, both of which are of great importance for angiogenesis research. There were 244 (21.09%) unigenes with no significant match to any sequence in current protein or nucleotide databases, and these sequences may represent genes with unknown function in Sika deer. Open reading frame analysis of the sequences was performed using the getorf program. In addition, the sequences were functionally classified using the gene ontology hierarchy, clusters of orthologous groups of proteins and Kyoto encyclopedia of genes and genomes databases. Analysis of ESTs described in this paper provides an important resource for the transcriptome exploration of Sika deer, and will also facilitate further studies on functional genomics, gene discovery and genome annotation of Sika deer.

  17. Introduction to the analysis of the intracellular sorting information in protein sequences: from molecular biology to artificial neural networks.

    PubMed

    Aguilar, R Claudio

    2015-01-01

    A precise spatial-temporal organization of cell components is required for basic cellular activities such as proliferation and for complex multicellular processes such as embryo development. Particularly important is the maintenance and control of the cellular distribution of proteins, as these components fulfill crucial structural and catalytic functions. Membrane protein localization within the cell is determined and maintained by intracellular elements known as adaptors that interpret sorting information encoded in the amino acid sequence of cargoes. Understanding the sorting sequence code of cargo proteins would have a profound impact on many areas of the life sciences. For example, it would shed light onto the molecular mechanisms of several genetic diseases and would eventually allow us to control the fate of proteins. This chapter constitutes a primer on protein-sorting information analysis and localization/trafficking prediction. We provide the rationale for and a discussion of a simple basic protocol for protein sequence dissection looking for sorting signals, from simple sequence inspection techniques to more sophisticated artificial neural networks analysis of sorting signal recognition data.

  18. Sequence-controlled RNA self-processing: computational design, biochemical analysis, and visualization by AFM.

    PubMed

    Petkovic, Sonja; Badelt, Stefan; Block, Stephan; Flamm, Christoph; Delcea, Mihaela; Hofacker, Ivo; Müller, Sabine

    2015-07-01

    Reversible chemistry allowing for assembly and disassembly of molecular entities is important for biological self-organization. Thus, ribozymes that support both cleavage and formation of phosphodiester bonds may have contributed to the emergence of functional diversity and increasing complexity of regulatory RNAs in early life. We have previously engineered a variant of the hairpin ribozyme that shows how ribozymes may have circularized or extended their own length by forming concatemers. Using the Vienna RNA package, we now optimized this hairpin ribozyme variant and selected four different RNA sequences that were expected to circularize more efficiently or form longer concatemers upon transcription. (Two-dimensional) PAGE analysis confirms that (i) all four selected ribozymes are catalytically active and (ii) high yields of cyclic species are obtained. AFM imaging in combination with RNA structure prediction enabled us to calculate the distributions of monomers and self-concatenated dimers and trimers. Our results show that computationally optimized molecules do form reasonable amounts of trimers, which has not been observed for the original system so far, and we demonstrate that the combination of theoretical prediction, biochemical and physical analysis is a promising approach toward accurate prediction of ribozyme behavior and design of ribozymes with predefined functions.

  19. Sequence-controlled RNA self-processing: computational design, biochemical analysis, and visualization by AFM

    PubMed Central

    Petkovic, Sonja; Badelt, Stefan; Flamm, Christoph; Delcea, Mihaela

    2015-01-01

    Reversible chemistry allowing for assembly and disassembly of molecular entities is important for biological self-organization. Thus, ribozymes that support both cleavage and formation of phosphodiester bonds may have contributed to the emergence of functional diversity and increasing complexity of regulatory RNAs in early life. We have previously engineered a variant of the hairpin ribozyme that shows how ribozymes may have circularized or extended their own length by forming concatemers. Using the Vienna RNA package, we now optimized this hairpin ribozyme variant and selected four different RNA sequences that were expected to circularize more efficiently or form longer concatemers upon transcription. (Two-dimensional) PAGE analysis confirms that (i) all four selected ribozymes are catalytically active and (ii) high yields of cyclic species are obtained. AFM imaging in combination with RNA structure prediction enabled us to calculate the distributions of monomers and self-concatenated dimers and trimers. Our results show that computationally optimized molecules do form reasonable amounts of trimers, which has not been observed for the original system so far, and we demonstrate that the combination of theoretical prediction, biochemical and physical analysis is a promising approach toward accurate prediction of ribozyme behavior and design of ribozymes with predefined functions. PMID:25999318

  20. Gene profiling of bone around orthodontic mini-implants by RNA-sequencing analysis.

    PubMed

    Nahm, Kyung-Yen; Heo, Jung Sun; Lee, Jae-Hyung; Lee, Dong-Yeol; Chung, Kyu-Rhim; Ahn, Hyo-Won; Kim, Seong-Hun

    2015-01-01

    This study aimed to evaluate the genes that were expressed in the healing bones around SLA-treated titanium orthodontic mini-implants in a beagle at early (1-week) and late (4-week) stages with RNA-sequencing (RNA-Seq). Samples from sites of surgical defects were used as controls. Total RNA was extracted from the tissue around the implants, and an RNA-Seq analysis was performed with Illumina TruSeq. In the 1-week group, genes in the gene ontology (GO) categories of cell growth and the extracellular matrix (ECM) were upregulated, while genes in the categories of the oxidation-reduction process, intermediate filaments, and structural molecule activity were downregulated. In the 4-week group, the genes upregulated included ECM binding, stem cell fate specification, and intramembranous ossification, while genes in the oxidation-reduction process category were downregulated. GO analysis revealed an upregulation of genes that were related to significant mechanisms, including those with roles in cell proliferation, the ECM, growth factors, and osteogenic-related pathways, which are associated with bone formation. From these results, implant-induced bone formation progressed considerably during the times examined in this study. The upregulation or downregulation of selected genes was confirmed with real-time reverse transcription polymerase chain reaction. The RNA-Seq strategy was useful for defining the biological responses to orthodontic mini-implants and identifying the specific genetic networks for targeted evaluations of successful peri-implant bone remodeling.

  1. Genome sequence analysis of potential probiotic strain Leuconostoc lactis EFEL005 isolated from kimchi.

    PubMed

    Moon, Jin Seok; Choi, Hye Sun; Shin, So Yeon; Noh, Sol Ji; Jeon, Che Ok; Han, Nam Soo

    2015-05-01

    Leuconostoc lactis EFEL005 (KACC 91922) isolated from kimchi showed promising probiotic attributes; resistance against acid and bile salts, absence of transferable genes for antibiotic resistance, broad utilization of prebiotics, and no hemolytic activity. To expand our understanding of the species, we generated a draft genome sequence of the strain and analyzed its genomic features related to the aforementioned probiotic properties. Genome assembly resulted in 35 contigs, and the draft genome has 1,688,202 base pairs (bp) with a G+C content of 43.43%, containing 1,644 protein-coding genes and 50 RNA genes. The average nucleotide identity analysis showed high homology (≥ 96%) to the type strain L. lactis KCTC3528, but low homology (≤ 95%) to L. lactis KCTC3773 (formerly L. argentinum). Genomic analysis revealed the presence of various genes for sucrose metabolism (glucansucrases, invertases, sucrose phosphorylases, and mannitol dehydrogenase), acid tolerance (F1F0 ATPases, cation transport ATPase, branched-chain amino acid permease, and lysine decarboxylase), vancomycin response regulator, and antibacterial peptide (Lactacin F). No gene for production of biogenic amines (histamine and tyramine) was found. This report will facilitate the understanding of probiotic properties of this strain as a starter for fermented foods.

  2. Third-Generation Sequencing and Analysis of Four Complete Pig Liver Esterase Gene Sequences in Clones Identified by Screening BAC Library

    PubMed Central

    Zhou, Qiongqiong; Sun, Wenjuan; Liu, Xiyan; Wang, Xiliang; Xiao, Yuncai; Bi, Dingren; Yin, Jingdong; Shi, Deshi

    2016-01-01

    Aim Pig liver carboxylesterase (PLE) gene sequences in GenBank are incomplete, which has led to difficulties in studying the genetic structure and regulation mechanisms of gene expression of PLE family genes. The aim of this study was to obtain and analysis of complete gene sequences of PLE family by screening from a Rongchang pig BAC library and third-generation PacBio gene sequencing. Methods After a number of existing incomplete PLE isoform gene sequences were analysed, primers were designed based on conserved regions in PLE exons, and the whole pig genome used as a template for Polymerase chain reaction (PCR) amplification. Specific primers were then selected based on the PCR amplification results. A three-step PCR screening method was used to identify PLE-positive clones by screening a Rongchang pig BAC library and PacBio third-generation sequencing was performed. BLAST comparisons and other bioinformatics methods were applied for sequence analysis. Results Five PLE-positive BAC clones, designated BAC-10, BAC-70, BAC-75, BAC-119 and BAC-206, were identified. Sequence analysis yielded the complete sequences of four PLE genes, PLE1, PLE-B9, PLE-C4, and PLE-G2. Complete PLE gene sequences were defined as those containing regulatory sequences, exons, and introns. It was found that, not only did the PLE exon sequences of the four genes show a high degree of homology, but also that the intron sequences were highly similar. Additionally, the regulatory region of the genes contained two 720bps reverse complement sequences that may have an important function in the regulation of PLE gene expression. Significance This is the first report to confirm the complete sequences of four PLE genes. In addition, the study demonstrates that each PLE isoform is encoded by a single gene and that the various genes exhibit a high degree of sequence homology, suggesting that the PLE family evolved from a single ancestral gene. Obtaining the complete sequences of these PLE genes

  3. Structurally complex and highly active RNA ligases derived from random RNA sequences

    NASA Technical Reports Server (NTRS)

    Ekland, E. H.; Szostak, J. W.; Bartel, D. P.

    1995-01-01

    Seven families of RNA ligases, previously isolated from random RNA sequences, fall into three classes on the basis of secondary structure and regiospecificity of ligation. Two of the three classes of ribozymes have been engineered to act as true enzymes, catalyzing the multiple-turnover transformation of substrates into products. The most complex of these ribozymes has a minimal catalytic domain of 93 nucleotides. An optimized version of this ribozyme has a kcat exceeding one per second, a value far greater than that of most natural RNA catalysts and approaching that of comparable protein enzymes. The fact that such a large and complex ligase emerged from a very limited sampling of sequence space implies the existence of a large number of distinct RNA structures of equivalent complexity and activity.

  4. DNA sequence, structure, and tyrosine kinase activity of the Drosophila melanogaster abelson proto-oncogene homolog

    SciTech Connect

    Henkemeyer, M.J.; Bennett, R.L.; Gertler, F.B.; Hoffmann, F.M.

    1988-02-01

    The authors report their molecular characterization of the Drosophila melanogaster Abelson gene (abl), a gene in which recessive loss-of-function mutations result in lethality at the pupal stage of development. This essential gene consists of 10 exons extending over 26 kilobase pairs of genomic DNA. The DNA sequence encodes a protein of 1,520 amino acids with strong sequence similarity to the human c-abl proto-oncogene beginning in the type 1b 5' exon and extending through the region essential for tyrosine kinase activity. When the tyrosine kinase homologous region was expressed in Escherichia coli, phosphorylation of proteins on tyrosine residues was observed with an antiphosphotyrosine antibody. These results show that the abl gene is highly conserved through evolution and encodes a functional tyrosine protein kinase required for Drosophila development.

  5. Dual-Tracer PET Using Generalized Factor Analysis of Dynamic Sequences

    PubMed Central

    Fakhri, Georges El; Trott, Cathryn M.; Sitek, Arkadiusz; Bonab, Ali; Alpert, Nathaniel M.

    2013-01-01

    Purpose With single-photon emission computed tomography, simultaneous imaging of two physiological processes relies on discrimination of the energy of the emitted gamma rays, whereas the application of dual-tracer imaging to positron emission tomography (PET) imaging has been limited by the characteristic 511-keV emissions. Procedures To address this limitation, we developed a novel approach based on generalized factor analysis of dynamic sequences (GFADS) that exploits spatio-temporal differences between radiotracers and applied it to near-simultaneous imaging of 2-deoxy-2-[18F]fluoro-D-glucose (FDG) (brain metabolism) and 11C-raclopride (D2) with simulated human data and experimental rhesus monkey data. We show theoretically and verify by simulation and measurement that GFADS can separate FDG and raclopride measurements that are made nearly simultaneously. Results The theoretical development shows that GFADS can decompose the studies at several levels: (1) It decomposes the FDG and raclopride study so that they can be analyzed as though they were obtained separately. (2) If additional physiologic/anatomic constraints can be imposed, further decomposition is possible. (3) For the example of raclopride, specific and nonspecific binding can be determined on a pixel-by-pixel basis. We found good agreement between the estimated GFADS factors and the simulated ground truth time activity curves (TACs), and between the GFADS factor images and the corresponding ground truth activity distributions with errors less than 7.3±1.3 %. Biases in estimation of specific D2 binding and relative metabolism activity were within 5.9±3.6 % compared to the ground truth values. We also evaluated our approach in simultaneous dual-isotope brain PET studies in a rhesus monkey and obtained accuracy of better than 6 % in a mid-striatal volume, for striatal activity estimation. Conclusions Dynamic image sequences acquired following near-simultaneous injection of two PET radiopharmaceuticals

  6. Multilocus Sequence Analysis of Clinical “Candidatus Neoehrlichia mikurensis” Strains from Europe

    PubMed Central

    Grankvist, Anna; Moore, Edward R. B.; Svensson Stadler, Liselott; Pekova, Sona; Bogdan, Christian; Geißdörfer, Walter; Grip-Lindén, Jenny; Brandström, Kenny; Marsal, Jan; Andréasson, Kristofer; Lewerin, Catharina; Welinder-Olsson, Christina

    2015-01-01

    “Candidatus Neoehrlichia mikurensis” is the tick-borne agent of neoehrlichiosis, an infectious disease that primarily affects immunocompromised patients. So far, the genetic variability of “Ca. Neoehrlichia” has been studied only by comparing 16S rRNA genes and groEL operon sequences. We describe the development and use of a multilocus sequence analysis (MLSA) protocol to characterize the genetic diversity of clinical “Ca. Neoehrlichia” strains in Europe and their relatedness to other species within the Anaplasmataceae family. Six genes were selected: ftsZ, clpB, gatB, lipA, groEL, and 16S rRNA. Each MLSA locus was amplified by real-time PCR, and the PCR products were sequenced. Phylogenetic trees of MLSA locus relatedness were constructed from aligned sequences. Blood samples from 12 patients with confirmed “Ca. Neoehrlichia” infection from Sweden (n = 9), the Czech Republic (n = 2), and Germany (n = 1) were analyzed with the MLSA protocol. Three of the Swedish strains exhibited identical lipA sequences, while the lipA sequences of the strains from the other nine patients were identical to each other. One of the Czech strains had one differing nucleotide in the clpB sequence from the sequences of the other 11 strains. All 12 strains had identical sequences for the genes 16S rRNA, ftsZ, gatB, and groEL. According to the MLSA, among the Anaplasmataceae, “Ca. Neoehrlichia” is most closely related to Ehrlichia ruminantium, less so to Anaplasma phagocytophilum, and least to Wolbachia endosymbionts. To conclude, three sequence types of infectious “Ca. Neoehrlichia” were identified: one in the west of Sweden, one in the Czech Republic, and one spread throughout Europe. PMID:26157152

  7. Whole-Genome sequencing and genetic variant analysis of a quarter Horse mare

    PubMed Central

    2012-01-01

    Background The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. Results Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. Conclusions This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids. PMID:22340285

  8. Cloning and nucleotide sequence of the gene coding for enzymatically active fragments of the Bacillus polymyxa beta-amylase.

    PubMed

    Kawazu, T; Nakanishi, Y; Uozumi, N; Sasaki, T; Yamagata, H; Tsukagoshi, N; Udaka, S

    1987-04-01

    The gene encoding beta-amylase was cloned from Bacillus polymyxa 72 into Escherichia coli HB101 by inserting HindIII-generated DNA fragments into the HindIII site of pBR322. The 4.8-kilobase insert was shown to direct the synthesis of beta-amylase. A 1.8-kilobase AccI-AccI fragment of the donor strain DNA was sufficient for the beta-amylase synthesis. Homologous DNA was found by Southern blot analysis to be present only in B. polymyxa 72 and not in other bacteria such as E. coli or B. subtilis. B. polymyxa, as well as E. coli harboring the cloned DNA, was found to produce enzymatically active fragments of beta-amylases (70,000, 56,000, or 58,000, and 42,000 daltons), which were detected in situ by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Nucleotide sequence analysis of the cloned 3.1-kilobase DNA revealed that it contains one open reading frame of 2,808 nucleotides without a translational stop codon. The deduced amino acid sequence for these 2,808 nucleotides encoding a secretory precursor of the beta-amylase protein is 936 amino acids including a signal peptide of 33 or 35 residues at its amino-terminal end. The existence of a beta-amylase of larger than 100,000 daltons, which was predicted on the basis of the results of nucleotide sequence analysis of the gene, was confirmed by examining culture supernatants after various cultivation periods. It existed only transiently during cultivation, but the multiform beta-amylases described above existed for a long time. The large beta-amylase (approximately 160,000 daltons) existed for longer in the presence of a protease inhibitor such as chymostatin, suggesting that proteolytic cleavage is the cause of the formation of multiform beta-amylases.

  9. The Pathogenomic Sequence Analysis of B. cereus and B. Thuringiensis isolates closely related to Bacillus anthracis

    SciTech Connect

    Han, C S; Xie, G; Challacombe, J F; Altherr, M R; Bhotika, S S; Bruce, D; Campbell, C S; Campbell, M L; Chen, J; Chertkov, O; Cleland, C; Dimitrijevic-Bussod, M; Doggett, N A; Fawcett, J J; Glavina, T; Goodwin, L A; Hill, K K; Hitchcock, P; Jackson, P J; Keim, P; Kewalramani, A R; Longmire, J; Lucas, S; Malfatti, S; McMurry, K; Meincke, L J; Misra, M; Moseman, B L; Mundt, M; Munk, A C; Okinaka, R T; Parson-Quintana, B; Reilly, L P; Richardson, P; Robinson, D L; Rubin, E; Saunders, E; Tapia, R; Tesmer, J G; Thayer, N; Thompson, L S; Tice, H; Ticknor, L O; Wills, P L; Gilna, P; Brettin, T S

    2005-10-12

    The sequencing and analysis of two close relatives of Bacillus anthracis are reported. AFLP analysis of over 300 isolates of B. cereus, B. thuringiensis and B. anthracis identified two isolates as being very closely related to B. anthracis. One, a B. cereus, BcE33L, was isolated from a zebra carcass in Nambia; the second, a B. thuringiensis, 97-27, was isolated from a necrotic human wound. The B. cereus appears to be the closest anthracis relative sequenced to date. A core genome of over 3,900 genes was compiled for the Bacillus cereus group, including B anthracis. Comparative analysis of these two genomes with other members of the B. cereus group provides insight into the evolutionary relationships among these organisms. Evidence is presented that differential regulation modulates virulence, rather than simple acquisition of virulence factors. These genome sequences provide insight into the molecular mechanisms contributing to the host range and virulence of this group of organisms.

  10. Sequence and phylogenetic analysis of chicken anaemia virus obtained from backyard and commercial chickens in Nigeria.

    PubMed

    Oluwayelu, D O; Todd, D; Olaleye, O D

    2008-12-01

    This work reports the first molecular analysis study of chicken anaemia virus (CAV) in backyard chickens in Africa using molecular cloning and sequence analysis to characterize CAV strains obtained from commercial chickens and Nigerian backyard chickens. Partial VP1 gene sequences were determined for three CAVs from commercial chickens and for six CAV variants present in samples from a backyard chicken. Multiple alignment analysis revealed that the 6% and 4% nucleotide diversity obtained respectively for the commercial and backyard chicken strains translated to only 2% amino acid diversity for each breed. Overall, the amino acid composition of Nigerian CAVs was found to be highly conserved. Since the partial VP1 gene sequence of two backyard chicken cloned CAV strains (NGR/CI-8 and NGR/CI-9) were almost identical and evolutionarily closely related to the commercial chicken strains NGR-1, and NGR-4 and NGR-5, respectively, we concluded that CAV infections had crossed the farm boundary.

  11. EthoSeq: a tool for phylogenetic analysis and data mining in behavioral sequences.

    PubMed

    Japyassú, Hilton F; Alberts, Carlos C; Izar, Patrícia; Sato, Takechi

    2006-11-01

    This article introduces the software program called EthoSeq, which is designed to extract probabilistic behavioral sequences (tree-generated sequences, or TGSs) from observational data and to prepare a TGS-species matrix for phylogenetic analysis. The program uses Graph Theory algorithms to automatically detect behavioral patterns within the observational sessions. It includes filtering tools to adjust the search procedure to user-specified statistical needs. Preliminary analyses of data sets, such as grooming sequences in birds and foraging tactics in spiders, uncover a large number of TGSs which together yield single phylogenetic trees. An example of the use of the program is our analysis of felid grooming sequences, in which we have obtained 1,386 felid grooming TGSs for seven species, resulting in a single phylogeny. These results show that behavior is definitely useful in phylogenetic analysis. EthoSeq simplifies and automates such analyses, uncovers much of the hidden patterns of long behavioral sequences, and prepares this data for further analysis with standard phylogenetic programs. We hope it will encourage many empirical studies on the evolution of behavior.

  12. Heterologous C-terminal sequences disrupt transcriptional activation and oncogenesis by p59v-rel.

    PubMed Central

    Diehl, J A; Hannink, M

    1993-01-01

    Members of the NF-kappa B/rel family of transcription factors are regulated through a trans association with members of a family of inhibitor proteins, collectively known as I kappa B proteins, that contain five to eight copies of a 33-amino-acid repeat sequence (ankyrin repeat). Certain NF-kappa B/rel proteins are also regulated by cis-acting ankyrin repeat-containing domains. The C terminus of p105NF-kappa B, the precursor of the 50-kDa subunit of NF-kappa B, contains a series of ankyrin repeats; proteolytic removal of this ankyrin domain is necessary for the manifestation of sequence-specific DNA binding and nuclear translocation of the N-terminal product. To investigate the structural requirements important for regulation of different NF-kappa B/rel family members by polypeptides containing ankyrin repeat domains, we have constructed a p59v-rel:p105NF-kappa B chimeric protein (p110v-rel-ank). The presence of C-terminal p105NF-kappa B-derived sequences in p110v-rel-ank inhibited nuclear translocation, sequence-specific DNA binding, pp40I kappa B-alpha association, and oncogenic transformation. Sequential truncation of the C-terminal ankyrin domain of p110v-rel-ank resulted in the restoration of nuclear translocation, DNA binding, and pp40I kappa B-alpha association but did not restore the oncogenic properties of p59v-rel. The presence of 67 C-terminal p105NF-kappa B-derived amino acids was sufficient to inhibit both transcriptional activation and oncogenic transformation by p59v-rel. These results support a model in which activation of gene expression by p59v-rel is required for its ability to induce oncogenic transformation. Images PMID:8230438

  13. 16S Ribosomal DNA Sequence Analysis of a Large Collection of Environmental and Clinical Unidentifiable Bacterial Isolates

    PubMed Central

    Drancourt, Michel; Bollet, Claude; Carlioz, Antoine; Martelin, Rolland; Gayral, Jean-Pierre; Raoult, Didier

    2000-01-01

    Some bacteria are difficult to identify with phenotypic identification schemes commonly used outside reference laboratories. 16S ribosomal DNA (rDNA)-based identification of bacteria potentially offers a useful alternative when phenotypic characterization methods fail. However, as yet, the usefulness of 16S rDNA sequence analysis in the identification of conventionally unidentifiable isolates has not been evaluated with a large collection of isolates. In this study, we evaluated the utility of 16S rDNA sequencing as a means to identify a collection of 177 such isolates obtained from environmental, veterinary, and clinical sources. For 159 isolates (89.8%) there was at least one sequence in GenBank that yielded a similarity score of ≥97%, and for 139 isolates (78.5%) there was at least one sequence in GenBank that yielded a similarity score of ≥99%. These similarity score values were used to defined identification at the genus and species levels, respectively. For isolates identified to the species level, conventional identification failed to produce accurate results because of inappropriate biochemical profile determination in 76 isolates (58.7%), Gram staining in 16 isolates (11.6%), oxidase and catalase activity determination in 5 isolates (3.6%) and growth requirement determination in 2 isolates (1.5%). Eighteen isolates (10.2%) remained unidentifiable by 16S rDNA sequence analysis but were probably prototype isolates of new species. These isolates originated mainly from environmental sources (P = 0.07). The 16S rDNA approach failed to identify Enterobacter and Pantoea isolates to the species level (P = 0.04; odds ratio = 0.32 [95% confidence interval, 0.10 to 1.14]). Elsewhere, the usefulness of 16S rDNA sequencing was compromised by the presence of 16S rDNA sequences with >1% undetermined positions in the databases. Unlike phenotypic identification, which can be modified by the variability of expression of characters, 16S rDNA sequencing provides

  14. Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing

    PubMed Central

    Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian

    2011-01-01

    Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S r

  15. Development of self-compressing BLSOM for comprehensive analysis of big sequence data.

    PubMed

    Kikuchi, Akihito; Ikemura, Toshimichi; Abe, Takashi

    2015-01-01

    With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method's suitability for efficient knowledge discovery from big sequence data.

  16. A comparison of several similarity indices used in the classification of protein sequences: a multivariate analysis.

    PubMed Central

    Landès, C; Hénaut, A; Risler, J L

    1992-01-01

    The present work describes an attempt to identify reliable criteria which could be used as distance indices between protein sequences. Seven different criteria have been tested: i and ii) the scores of the alignments as given by the BESTFIT and the FASTA programs; iii) the ratio parameter, i.e. the BESTFIT score divided by the length of the aligned peptides; iv and v) the statistical significance (Z-scores) of the scores calculated by BESTFIT and FASTA, as obtained by comparison with shuffled sequences; vi) the Z-scores provided by the program RELATE which performs a segment-by-segment comparison of 2 sequences, and vii) an original distance index calculated by the program DOCMA from all the pairwise dotplots between the sequences. These 7 criteria have been tested against the aminoacid sequences of 39 globins and those of the 20 aminoacyl-tRNA synthetases from E. coli. The distances between the sequences were analyzed by the multivariate analysis techniques. The results show that the distances calculated from the scores of the pairwise alignments are not adequately sensitive. The Z-score from RELATE is not selective enough and too demanding in computer time. Three criteria gave a classification consistent with the known similarities between the sequences in the sets, namely the Z-scores from BESTFIT and FASTA and the multiple dotplot comparison distance index from DOCMA. PMID:1641329

  17. Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples

    PubMed Central

    Smith, Andrew M.; Heisler, Lawrence E.; St.Onge, Robert P.; Farias-Hesson, Eveline; Wallace, Iain M.; Bodeau, John; Harris, Adam N.; Perry, Kathleen M.; Giaever, Guri; Pourmand, Nader; Nislow, Corey

    2010-01-01

    Next-generation sequencing has proven an extremely effective technology for molecular counting applications where the number of sequence reads provides a digital readout for RNA-seq, ChIP-seq, Tn-seq and other applications. The extremely large number of sequence reads that can be obtained per run permits the analysis of increasingly complex samples. For lower complexity samples, however, a point of diminishing returns is reached when the number of counts per sequence results in oversampling with no increase in data quality. A solution to making next-generation sequencing as efficient and affordable as possible involves assaying multiple samples in a single run. Here, we report the successful 96-plexing of complex pools of DNA barcoded yeast mutants and show that such ‘Bar-seq’ assessment of these samples is comparable with data provided by barcode microarrays, the current benchmark for this application. The cost reduction and increased throughput permitted by highly multiplexed sequencing will greatly expand the scope of chemogenomics assays and, equally importantly, the approach is suitable for other sequence counting applications that could benefit from massive parallelization. PMID:20460461

  18. Application of discrete wavelet transform for analysis of genomic sequences of Mycobacterium tuberculosis.

    PubMed

    Saini, Shiwani; Dewan, Lillie

    2016-01-01

    This paper highlights the potential of discrete wavelet transforms in the analysis and comparison of genomic sequences of Mycobacterium tuberculosis (MTB) with different resistance characteristics. Graphical representations of wavelet coefficients and statistical estimates of their parameters have been used to determine the extent of similarity between different sequences of MTB without the use of conventional methods such as Basic Local Alignment Search Tool. Based on the calculation of the energy of wavelet decomposition coefficients of complete genomic sequences, their broad classification of the type of resistance can be done. All the given genomic sequences can be grouped into two broad categories wherein the drug resistant and drug susceptible sequences form one group while the multidrug resistant and extensive drug resistant sequences form the other group. This method of segregation of the sequences is faster than conventional laboratory methods which require 3-4 weeks of culture of sputum samples. Thus the proposed method can be used as a tool to enhance clinical diagnostic investigations in near real-time.

  19. Cloning and sequence analysis of the muramidase-2 gene from Enterococcus hirae.

    PubMed Central

    Chu, C P; Kariyama, R; Daneo-Moore, L; Shockman, G D

    1992-01-01

    Extracellular muramidase-2 of Enterococcus hirae ATCC 9790 was purified to homogeneity by substrate binding, guanidine-HCl extraction, and reversed-phase chromatography. A monoclonal antibody, 2F8, which specifically recognizes muramidase-2, was used to screen a genomic library of E. hirae ATCC 9790 DNA in bacteriophage lambda gt11. A positive phage clone containing a 4.5-kb DNA insert was isolated and analyzed. The EcoRI-digested 4.5-kb fragment was cut into 2.3-, 1.0-, and 1.5-kb pieces by using restriction enzymes KpnI, Sau3AI, and PstI, and each fragment was subcloned into plasmid pJDC9 or pUC19. The nucleotide sequence of each subclone was determined. The sequence data indicated an open reading frame encoding a polypeptide of 666 amino acid residues, with a calculated molecular mass of 70,678 Da. The first 24 N-terminal amino acids of purified extracellular muramidase-2 were in very good agreement with the deduced amino acid sequence after a 49-amino-acid putative signal sequence. Analysis of the deduced amino acid sequence showed the presence at the C-terminal region of the protein of six highly homologous repeat units separated by nonhomologous intervening sequences that are highly enriched in serine and threonine. The overall sequence showed a high degree of homology with a recently cloned Streptococcus faecalis autolysin. Images PMID:1347040

  20. Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data

    PubMed Central

    Kikuchi, Akihito; Ikemura, Toshimichi; Abe, Takashi

    2015-01-01

    With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method's suitability for efficient knowledge discovery from big sequence data. PMID:26495297

  1. Identification and sequence analysis of grain softness protein in selected wheat, rye and triticale.

    PubMed

    Kharrazi, M A S; Bobojonov, V

    2012-08-16

    Grain softness protein (GSP) is an important protein for overcoming milling and grain defenses in the innate immunity systems of cereals. The objective of this study was to evaluate and understand GSP sequences in selected wheat, rye and triticale. Using sequences for this gene from a sequence database, we performed clustering analysis to compare the sequences obtained from 3 germplasms with other studied sequences for GSP. The maximum difference between the Hirmand GSP genotype in wheat and the database sequences was 23% in EF109396 and EF109399. Most amino acid variation between the GSP sequences involved the same amino acids. The Nikita rye GSP gene showed 64% identity with DQ269918 and AY667063. The isoelectric point in the GSP of wheat and Lasko triticale was significantly higher than that of rye GSP. In addition, parameters such as optical density, grand average of hydrophobicity, percentage of hydrophobicity and hydrophilic amino acids, and number of alpha helices and beta sheets in GSP were similar in wheat and triticale but not in wheat and rye.

  2. Combinational usage of next generation sequencing and qPCR for the analysis of tumor samples.

    PubMed

    Loewe, Robert P

    2