Science.gov

Sample records for addition sequence analysis

  1. Multifractal analysis of the irregular set for almost-additive sequences via large deviations

    NASA Astrophysics Data System (ADS)

    Bomfim, Thiago; Varandas, Paulo

    2015-10-01

    In this paper we introduce a notion of free energy and large deviations rate function for asymptotically additive sequences of potentials via an approximation method by families of continuous potentials. We provide estimates for the topological pressure of the set of points whose non-additive sequences are far from the limit described through Kingman’s sub-additive ergodic theorem and give some applications in the context of Lyapunov exponents for diffeomorphisms and cocycles, and the Shannon-McMillan-Breiman theorem for Gibbs measures.

  2. RSAT: regulatory sequence analysis tools.

    PubMed

    Thomas-Chollier, Morgane; Sand, Olivier; Turatsinze, Jean-Valéry; Janky, Rekin's; Defrance, Matthieu; Vervisch, Eric; Brohée, Sylvain; van Helden, Jacques

    2008-07-01

    The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published. PMID:18495751

  3. Sequence analysis on microcomputers.

    PubMed

    Cannon, G C

    1987-10-01

    Overall, each of the program packages performed their tasks satisfactorily. For analyses where there was a well-defined answer, such as a search for a restriction site, there were few significant differences between the program sets. However, for tasks in which a degree of flexibility is desirable, such as homology or similarity determinations and database searches, DNASTAR consistently afforded the user more options in conducting the required analysis than did the other two packages. However, for laboratories where sequence analysis is not a major effort and the expense of a full sequence analysis workstation cannot be justified, MicroGenie and IBI-Pustell offer a satisfactory alternative. MicroGenie is a polished program system. Many may find that its user interface is more "user friendly" than the standard menu-driven interfaces. Its system of filing sequences under individual passwords facilitates use by more than one person. MicroGenie uses a hardware device for software protection that occupies a card slot in the computer on which it is used. Although I am sympathetic to the problem of software piracy, I feel that a less drastic solution is in order for a program likely to be sharing limited computer space with other software packages. The IBI-Pustell package performs the required analysis functions as accurately and quickly as MicroGenie but it lacks the clearness and ease of use. The menu system seems disjointed, and new or infrequent users often find themselves at apparent "dead-end menus" where the only clear alternative is to restart the entire program package. It is suggested from published accounts that the user interface is going to be upgraded and perhaps when that version is available, use of the system will be improved. The documentation accompanying each package was relatively clear as to how to run the programs, but all three packages assumed that the user was familiar with the computational techniques employed. MicroGenie and IBI-Pustell further

  4. ISHAN: sequence homology analysis package.

    PubMed

    Shil, Pratip; Dudani, Niraj; Vidyasagar, Pandit B

    2006-01-01

    Sequence based homology studies play an important role in evolutionary tracing and classification of proteins. Various methods are available to analyze biological sequence information. However, with the advent of proteomics era, there is a growing demand for analysis of huge amount of biological sequence information, and it has become necessary to have programs that would provide speedy analysis. ISHAN has been developed as a homology analysis package, built on various sequence analysis tools viz FASTA, ALIGN, CLUSTALW, PHYLIP and CODONW (for DNA sequences). This JAVA application offers the user choice of analysis tools. For testing, ISHAN was applied to perform phylogenetic analysis for sets of Caspase 3 DNA sequences and NF-kappaB p105 amino acid sequences. By integrating several tools it has made analysis much faster and reduced manual intervention. PMID:17274766

  5. RNA sequence analysis using covariance models.

    PubMed Central

    Eddy, S R; Durbin, R

    1994-01-01

    We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence family. We call these models 'covariance models'. A covariance model of tRNA sequences is an extremely sensitive and discriminative tool for searching for additional tRNAs and tRNA-related sequences in sequence databases. A model can be built automatically from an existing sequence alignment. We also describe an algorithm for learning a model and hence a consensus secondary structure from initially unaligned example sequences and no prior structural information. Models trained on unaligned tRNA examples correctly predict tRNA secondary structure and produce high-quality multiple alignments. The approach may be applied to any family of small RNA sequences. Images PMID:8029015

  6. Twin Mitochondrial Sequence Analysis.

    PubMed

    Bouhlal, Yosr; Martinez, Selena; Gong, Henry; Dumas, Kevin; Shieh, Joseph T C

    2013-09-01

    When applying genome-wide sequencing technologies to disease investigation, it is increasingly important to resolve sequence variation in regions of the genome that may have homologous sequences. The human mitochondrial genome challenges interpretation given the potential for heteroplasmy, somatic variation, and homologous nuclear mitochondrial sequences (numts). Identical twins share the same mitochondrial DNA (mtDNA) from early life, but whether the mitochondrial sequence remains similar is unclear. We compared an adult monozygotic twin pair using high throughput-sequencing and evaluated variants with primer extension and mitochondrial pre-enrichment. Thirty-seven variants were shared between the twin individuals, and the variants were verified on the original genomic DNA. These studies support highly identical genetic sequence in this case. Certain low-level variant calls were of high quality and homology to the mitochondrial DNA, and they were further evaluated. When we assessed calls in pre-enriched mitochondrial DNA templates, we found that these may represent numts, which can be differentiated from mtDNA variation. We conclude that twin identity extends to mitochondrial DNA, and it is critical to differentiate between numts and mtDNA in genome sequencing, particularly since significant heteroplasmy could influence genome interpretation. Further studies on mtDNA and numts will aid in understanding how variation occurs and persists. PMID:24040623

  7. Image analysis for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Palaniappan, Kannappan; Huang, Thomas S.

    1991-07-01

    There is a great deal of interest in automating the process of DNA (deoxyribonucleic acid) sequencing to support the analysis of genomic DNA such as the Human and Mouse Genome projects. In one class of gel-based sequencing protocols autoradiograph images are generated in the final step and usually require manual interpretation to reconstruct the DNA sequence represented by the image. The need to handle a large volume of sequence information necessitates automation of the manual autoradiograph reading step through image analysis in order to reduce the length of time required to obtain sequence data and reduce transcription errors. Various adaptive image enhancement, segmentation and alignment methods were applied to autoradiograph images. The methods are adaptive to the local characteristics of the image such as noise, background signal, or presence of edges. Once the two-dimensional data is converted to a set of aligned one-dimensional profiles waveform analysis is used to determine the location of each band which represents one nucleotide in the sequence. Different classification strategies including a rule-based approach are investigated to map the profile signals, augmented with the original two-dimensional image data as necessary, to textual DNA sequence information.

  8. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    PubMed

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC. PMID:23589541

  9. Statistical analysis of nucleotide sequences.

    PubMed Central

    Stückle, E E; Emmrich, C; Grob, U; Nielsen, P J

    1990-01-01

    In order to scan nucleic acid databases for potentially relevant but as yet unknown signals, we have developed an improved statistical model for pattern analysis of nucleic acid sequences by modifying previous methods based on Markov chains. We demonstrate the importance of selecting the appropriate parameters in order for the method to function at all. The model allows the simultaneous analysis of several short sequences with unequal base frequencies and Markov order k not equal to 0 as is usually the case in databases. As a test of these modifications, we show that in E. coli sequences there is a bias against palindromic hexamers which correspond to known restriction enzyme recognition sites. PMID:2251125

  10. [Multilocus sequence typing (MLST) analysis].

    PubMed

    Matsumura, Yasufumi

    2013-12-01

    Multilocus sequence typing (MLST) analysis has been emerging as a powerful tool for genotyping specific bacterial species. MLST utilizes internal fragments of multiple housekeeping genes and the combination of each allele defines the sequence type for each isolate. MLST databases contain reference data and are freely accessible via internet websites. The standard method for investigating short-term hospital outbreaks is still pulse-field gel-electrophoresis and MLST analysis is not a substitute. However, analysis of sequence types and clonal complexes (closely related sequence types) enables identification and understanding of a specific clone that is widely spreading among drug-resistant organisms, or a key clone that is important for evolution of the organism. In the case of Escherichia coli, CTX-M-15 or CTX-M-14 extended-spectrum beta-lactamase producing ST131 clone has emerged and spread globally in the last 10 years. MLST analysis is an unambiguous procedure and is becoming a common typing method to characterize isolates. PMID:24605545

  11. Genome Sequences of Five Additional Brevibacillus laterosporus Bacteriophages

    PubMed Central

    Merrill, Bryan D.; Berg, Jordan A.; Graves, Kiel A.; Ward, Andy T.; Hilton, Jared A.; Wake, Braden N.; Grose, Julianne H.; Breakwell, Donald P.

    2015-01-01

    Brevibacillus laterosporus has been isolated from many different environments, including beehives, and produces compounds that are toxic to many organisms. Five B. laterosporus phages have been isolated previously. Here, we announce five additional phages that infect this bacterium, including the first B. laterosporus siphoviruses to be discovered. PMID:26494658

  12. Biological Sequence Analysis with Multivariate String Kernels.

    PubMed

    Kuksa, Pavel P

    2013-03-01

    String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on analysis of discrete one-dimensional (1D) string data (e.g., DNA or amino acid sequences). In this work we address the multi-class biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physico-chemical descriptors) and a class of multivariate string kernels that exploit these representations. On a number of protein sequence classification tasks proposed multivariate representations and kernels show significant 15-20\\% improvements compared to existing state-of-the-art sequence classification methods. PMID:23509193

  13. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  14. Genome sequence and analysis of Lactobacillus helveticus

    PubMed Central

    Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca

    2013-01-01

    The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916

  15. Phylogenetic analysis of Ostreococcus virus sequences from the Patagonian Coast.

    PubMed

    Manrique, Julieta M; Calvo, Andrea Y; Jones, Leandro R

    2012-10-01

    A phylogenetic analysis of new Ostreococcus virus (OV) sequences from the Patagonian Coast, Argentina, and homologous sequences from public databases was performed. This analysis showed that the Patagonian sequences represented a divergent viral clade and that the rest of OV sequences analyzed here were clustered into six additional phylogenetic groups. Analyses of 18S gene libraries supported a close relationship of the Patagonian Ostreococcus host with clade A sequences described elsewhere, corroborating previous studies indicating that clade A strains are ubiquitous. Besides the Patagonian OV sequences, several phylogenetic groupings were linked to particular geographic locations, suggesting a role for allopatric cladogenesis in viral diversification. However, and in agreement with previous observations, other viral lineages included sequences with diverse geographic origins. These findings, together with analyses of ancestral trait trajectories performed here, are consistent with an evolutionary dynamics in which geographical isolation has a role in OV diversification but can be followed by rapid dispersion to remote places. PMID:22674355

  16. Phylogenetic Analysis of Poliovirus Sequences.

    PubMed

    Jorba, Jaume

    2016-01-01

    Comparative genomic sequencing is a major surveillance tool in the Polio Laboratory Network. Due to the rapid evolution of polioviruses (~1 % per year), pathways of virus transmission can be reconstructed from the pathways of genomic evolution. Here, we describe three main phylogenetic methods; estimation of genetic distances, reconstruction of a maximum-likelihood (ML) tree, and estimation of substitution rates using Bayesian Markov chain Monte Carlo (MCMC). The data set used consists of complete capsid sequences from a survey of poliovirus sequences available in GenBank. PMID:26983737

  17. Genome Sequencing and Analysis Conference IV

    SciTech Connect

    Not Available

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  18. Phylogenetic analysis of adenovirus sequences.

    PubMed

    Harrach, Balázs; Benko, Mária

    2007-01-01

    Members of the family Adenoviridae have been isolated from a large variety of hosts, including representatives from every major vertebrate class from fish to mammals. The high prevalence, together with the fairly conserved organization of the central part of their genomes, make the adenoviruses one of (if not the) best models for studying viral evolution on a larger time scale. Phylogenetic calculation can infer the evolutionary distance among adenovirus strains on serotype, species, and genus levels, thus helping the establishment of a correct taxonomy on the one hand, and speeding up the process of typing new isolates on the other. Initially, four major lineages corresponding to four genera were recognized. Later, the demarcation criteria of lower taxon levels, such as species or types, could also be defined with phylogenetic calculations. A limited number of possible host switches have been hypothesized and convincingly supported. Application of the web-based BLAST and MultAlin programs and the freely available PHYLIP package, along with the TreeView program, enables everyone to make correct calculations. In addition to step-by-step instruction on how to perform phylogenetic analysis, critical points where typical mistakes or misinterpretation of the results might occur will be identified and hints for their avoidance will be provided. PMID:17656792

  19. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  20. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    David J. States

    1998-08-01

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  1. Fractal analysis of DNA sequence data

    SciTech Connect

    Berthelsen, C.L.

    1993-01-01

    DNA sequence databases are growing at an almost exponential rate. New analysis methods are needed to extract knowledge about the organization of nucleotides from this vast amount of data. Fractal analysis is a new scientific paradigm that has been used successfully in many domains including the biological and physical sciences. Biological growth is a nonlinear dynamic process and some have suggested that to consider fractal geometry as a biological design principle may be most productive. This research is an exploratory study of the application of fractal analysis to DNA sequence data. A simple random fractal, the random walk, is used to represent DNA sequences. The fractal dimension of these walks is then estimated using the [open quote]sandbox method[close quote]. Analysis of 164 human DNA sequences compared to three types of control sequences (random, base-content matched, and dimer-content matched) reveals that long-range correlations are present in DNA that are not explained by base or dimer frequencies. The study also revealed that the fractal dimension of coding sequences was significantly lower than sequences that were primarily noncoding, indicating the presence of longer-range correlations in functional sequences. The multifractal spectrum is used to analyze fractals that are heterogeneous and have a different fractal dimension for subsets with different scalings. The multifractal spectrum of the random walks of twelve mitochondrial genome sequences was estimated. Eight vertebrate mtDNA sequences had uniformly lower spectra values than did four invertebrate mtDNA sequences. Thus, vertebrate mitochondria show significantly longer-range correlations than to invertebrate mitochondria. The higher multifractal spectra values for invertebrate mitochondria suggest a more random organization of the sequences. This research also includes considerable theoretical work on the effects of finite size, embedding dimension, and scaling ranges.

  2. Fractal Analysis of DNA Sequence Data

    NASA Astrophysics Data System (ADS)

    Berthelsen, Cheryl Lynn

    DNA sequence databases are growing at an almost exponential rate. New analysis methods are needed to extract knowledge about the organization of nucleotides from this vast amount of data. Fractal analysis is a new scientific paradigm that has been used successfully in many domains including the biological and physical sciences. Biological growth is a nonlinear dynamic process and some have suggested that to consider fractal geometry as a biological design principle may be most productive. This research is an exploratory study of the application of fractal analysis to DNA sequence data. A simple random fractal, the random walk, is used to represent DNA sequences. The fractal dimension of these walks is then estimated using the "sandbox method." Analysis of 164 human DNA sequences compared to three types of control sequences (random, base -content matched, and dimer-content matched) reveals that long-range correlations are present in DNA that are not explained by base or dimer frequencies. The study also revealed that the fractal dimension of coding sequences was significantly lower than sequences that were primarily noncoding, indicating the presence of longer-range correlations in functional sequences. The multifractal spectrum is used to analyze fractals that are heterogeneous and have a different fractal dimension for subsets with different scalings. The multifractal spectrum of the random walks of twelve mitochondrial genome sequences was estimated. Eight vertebrate mtDNA sequences had uniformly lower spectra values than did four invertebrate mtDNA sequences. Thus, vertebrate mitochondria show significantly longer-range correlations than do invertebrate mitochondria. The higher multifractal spectra values for invertebrate mitochondria suggest a more random organization of the sequences. This research also includes considerable theoretical work on the effects of finite size, embedding dimension, and scaling ranges.

  3. Whole-genome sequencing in outbreak analysis.

    PubMed

    Gilchrist, Carol A; Turner, Stephen D; Riley, Margaret F; Petri, William A; Hewlett, Erik L

    2015-07-01

    In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  4. Whole-Genome Sequencing in Outbreak Analysis

    PubMed Central

    Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.

    2015-01-01

    SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  5. Additional EIPC Study Analysis. Final Report

    SciTech Connect

    Hadley, Stanton W; Gotham, Douglas J.; Luciani, Ralph L.

    2014-12-01

    Between 2010 and 2012 the Eastern Interconnection Planning Collaborative (EIPC) conducted a major long-term resource and transmission study of the Eastern Interconnection (EI). With guidance from a Stakeholder Steering Committee (SSC) that included representatives from the Eastern Interconnection States Planning Council (EISPC) among others, the project was conducted in two phases. Phase 1 involved a long-term capacity expansion analysis that involved creation of eight major futures plus 72 sensitivities. Three scenarios were selected for more extensive transmission- focused evaluation in Phase 2. Five power flow analyses, nine production cost model runs (including six sensitivities), and three capital cost estimations were developed during this second phase. The results from Phase 1 and 2 provided a wealth of data that could be examined further to address energy-related questions. A list of 14 topics was developed for further analysis. This paper brings together the earlier interim reports of the first 13 topics plus one additional topic into a single final report.

  6. Expressed sequence tags: analysis and annotation.

    PubMed

    Parkinson, John; Blaxter, Mark

    2004-01-01

    Expressed sequence tags (ESTs) present a special set of problems for bioinformatic analysis. They are partial and error-prone, and large datasets can have significant internal redundancy. To facilitate analysis of small EST datasets from in-house projects, we present an integrated "pipeline" of tools that take EST data from sequence trace to database submission. These tools also can be used to provide clustering of ESTs into putative genes and to annotate these genes with preliminary sequence similarity searches. The systems are written to use the public-domain LINUX environment and other openly available analytical tools. PMID:15153624

  7. Laser Desorption Mass Spectrometry for DNA Sequencing and Analysis

    NASA Astrophysics Data System (ADS)

    Chen, C. H. Winston; Taranenko, N. I.; Golovlev, V. V.; Isola, N. R.; Allman, S. L.

    1998-03-01

    Rapid DNA sequencing and/or analysis is critically important for biomedical research. In the past, gel electrophoresis has been the primary tool to achieve DNA analysis and sequencing. However, gel electrophoresis is a time-consuming and labor-extensive process. Recently, we have developed and used laser desorption mass spectrometry (LDMS) to achieve sequencing of ss-DNA longer than 100 nucleotides. With LDMS, we succeeded in sequencing DNA in seconds instead of hours or days required by gel electrophoresis. In addition to sequencing, we also applied LDMS for the detection of DNA probes for hybridization LDMS was also used to detect short tandem repeats for forensic applications. Clinical applications for disease diagnosis such as cystic fibrosis caused by base deletion and point mutation have also been demonstrated. Experimental details will be presented in the meeting. abstract.

  8. The DNA sequence and comparative analysis of human chromosome 20.

    PubMed

    Deloukas, P; Matthews, L H; Ashurst, J; Burton, J; Gilbert, J G; Jones, M; Stavrides, G; Almeida, J P; Babbage, A K; Bagguley, C L; Bailey, J; Barlow, K F; Bates, K N; Beard, L M; Beare, D M; Beasley, O P; Bird, C P; Blakey, S E; Bridgeman, A M; Brown, A J; Buck, D; Burrill, W; Butler, A P; Carder, C; Carter, N P; Chapman, J C; Clamp, M; Clark, G; Clark, L N; Clark, S Y; Clee, C M; Clegg, S; Cobley, V E; Collier, R E; Connor, R; Corby, N R; Coulson, A; Coville, G J; Deadman, R; Dhami, P; Dunn, M; Ellington, A G; Frankland, J A; Fraser, A; French, L; Garner, P; Grafham, D V; Griffiths, C; Griffiths, M N; Gwilliam, R; Hall, R E; Hammond, S; Harley, J L; Heath, P D; Ho, S; Holden, J L; Howden, P J; Huckle, E; Hunt, A R; Hunt, S E; Jekosch, K; Johnson, C M; Johnson, D; Kay, M P; Kimberley, A M; King, A; Knights, A; Laird, G K; Lawlor, S; Lehvaslaiho, M H; Leversha, M; Lloyd, C; Lloyd, D M; Lovell, J D; Marsh, V L; Martin, S L; McConnachie, L J; McLay, K; McMurray, A A; Milne, S; Mistry, D; Moore, M J; Mullikin, J C; Nickerson, T; Oliver, K; Parker, A; Patel, R; Pearce, T A; Peck, A I; Phillimore, B J; Prathalingam, S R; Plumb, R W; Ramsay, H; Rice, C M; Ross, M T; Scott, C E; Sehra, H K; Shownkeen, R; Sims, S; Skuce, C D; Smith, M L; Soderlund, C; Steward, C A; Sulston, J E; Swann, M; Sycamore, N; Taylor, R; Tee, L; Thomas, D W; Thorpe, A; Tracey, A; Tromans, A C; Vaudin, M; Wall, M; Wallis, J M; Whitehead, S L; Whittaker, P; Willey, D L; Williams, L; Williams, S A; Wilming, L; Wray, P W; Hubbard, T; Durbin, R M; Bentley, D R; Beck, S; Rogers, J

    The finished sequence of human chromosome 20 comprises 59,187,298 base pairs (bp) and represents 99.4% of the euchromatic DNA. A single contig of 26 megabases (Mb) spans the entire short arm, and five contigs separated by gaps totalling 320 kb span the long arm of this metacentric chromosome. An additional 234,339 bp of sequence has been determined within the pericentromeric region of the long arm. We annotated 727 genes and 168 pseudogenes in the sequence. About 64% of these genes have a 5' and a 3' untranslated region and a complete open reading frame. Comparative analysis of the sequence of chromosome 20 to whole-genome shotgun-sequence data of two other vertebrates, the mouse Mus musculus and the puffer fish Tetraodon nigroviridis, provides an independent measure of the efficiency of gene annotation, and indicates that this analysis may account for more than 95% of all coding exons and almost all genes. PMID:11780052

  9. Whole exome sequence analysis of Peters anomaly.

    PubMed

    Weh, Eric; Reis, Linda M; Happ, Hannah C; Levin, Alex V; Wheeler, Patricia G; David, Karen L; Carney, Erin; Angle, Brad; Hauser, Natalie; Semina, Elena V

    2014-12-01

    Peters anomaly is a rare form of anterior segment ocular dysgenesis, which can also be associated with additional systemic defects. At this time, the majority of cases of Peters anomaly lack a genetic diagnosis. We performed whole exome sequencing of 27 patients with syndromic or isolated Peters anomaly to search for pathogenic mutations in currently known ocular genes. Among the eight previously recognized Peters anomaly genes, we identified a de novo missense mutation in PAX6, c.155G>A, p.(Cys52Tyr), in one patient. Analysis of 691 additional genes currently associated with a different ocular phenotype identified a heterozygous splicing mutation c.1025+2T>A in TFAP2A, a de novo heterozygous nonsense mutation c.715C>T, p.(Gln239*) in HCCS, a hemizygous mutation c.385G>A, p.(Glu129Lys) in NDP, a hemizygous mutation c.3446C>T, p.(Pro1149Leu) in FLNA, and compound heterozygous mutations c.1422T>A, p.(Tyr474*) and c.2544G>A, p.(Met848Ile) in SLC4A11; all mutations, except for the FLNA and SLC4A11 c.2544G>A alleles, are novel. This is the first study to use whole exome sequencing to discern the genetic etiology of a large cohort of patients with syndromic or isolated Peters anomaly. We report five new genes associated with this condition and suggest screening of TFAP2A and FLNA in patients with Peters anomaly and relevant syndromic features and HCCS, NDP and SLC4A11 in patients with isolated Peters anomaly. PMID:25182519

  10. Whole exome sequence analysis of Peters anomaly

    PubMed Central

    Weh, Eric; Reis, Linda M.; Happ, Hannah C.; Levin, Alex V.; Wheeler, Patricia G.; David, Karen L.; Carney, Erin; Angle, Brad; Hauser, Natalie

    2015-01-01

    Peters anomaly is a rare form of anterior segment ocular dysgenesis, which can also be associated with additional systemic defects. At this time, the majority of cases of Peters anomaly lack a genetic diagnosis. We performed whole exome sequencing of 27 patients with syndromic or isolated Peters anomaly to search for pathogenic mutations in currently known ocular genes. Among the eight previously recognized Peters anomaly genes, we identified a de novo missense mutation in PAX6, c.155G>A, p.(Cys52Tyr), in one patient. Analysis of 691 additional genes currently associated with a different ocular phenotype identified a heterozygous splicing mutation c.1025+2T>A in TFAP2A, a de novo heterozygous nonsense mutation c.715C>T, p.(Gln239*) in HCCS, a hemizygous mutation c.385G>A, p.(Glu129Lys) in NDP, a hemizygous mutation c.3446C>T, p.(Pro1149Leu) in FLNA, and compound heterozygous mutations c.1422T>A, p.(Tyr474*) and c.2544G>A, p.(Met848Ile) in SLC4A11; all mutations, except for the FLNA and SLC4A11 c.2544G>A alleles, are novel. This is the frst study to use whole exome sequencing to discern the genetic etiology of a large cohort of patients with syndromic or isolated Peters anomaly. We report five new genes associated with this condition and suggest screening of TFAP2A and FLNA in patients with Peters anomaly and relevant syndromic features and HCCS, NDP and SLC4A11 in patients with isolated Peters anomaly. PMID:25182519

  11. Auditory sequence analysis and phonological skill.

    PubMed

    Grube, Manon; Kumar, Sukhbinder; Cooper, Freya E; Turton, Stuart; Griffiths, Timothy D

    2012-11-01

    This work tests the relationship between auditory and phonological skill in a non-selected cohort of 238 school students (age 11) with the specific hypothesis that sound-sequence analysis would be more relevant to phonological skill than the analysis of basic, single sounds. Auditory processing was assessed across the domains of pitch, time and timbre; a combination of six standard tests of literacy and language ability was used to assess phonological skill. A significant correlation between general auditory and phonological skill was demonstrated, plus a significant, specific correlation between measures of phonological skill and the auditory analysis of short sequences in pitch and time. The data support a limited but significant link between auditory and phonological ability with a specific role for sound-sequence analysis, and provide a possible new focus for auditory training strategies to aid language development in early adolescence. PMID:22951739

  12. Sequencing and Analysis of Neanderthal Genomic DNA

    PubMed Central

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Pääbo, Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2008-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library are of Neanderthal origin, the strongest being the ascertainment of sequence identities between Neanderthal and chimpanzee at sites where the human genomic sequence is different. These results enabled us to calculate the human-Neanderthal divergence time based on multiple randomly distributed autosomal loci. Our analyses suggest that on average the Neanderthal genomic sequence we obtained and the reference human genome sequence share a most recent common ancestor ~706,000 years ago, and that the human and Neanderthal ancestral populations split ~370,000 years ago, before the emergence of anatomically modern humans. Our finding that the Neanderthal and human genomes are at least 99.5% identical led us to develop and successfully implement a targeted method for recovering specific ancient DNA sequences from metagenomic libraries. This initial analysis of the Neanderthal genome advances our understanding of the evolutionary relationship of Homo sapiens and Homo neanderthalensis and signifies the dawn of Neanderthal genomics. PMID:17110569

  13. Genomic sequence analysis tools: a user's guide.

    PubMed

    Fortna, A; Gardiner, K

    2001-03-01

    The wealth of information from various genome sequencing projects provides the biologist with a new perspective from which to analyze, and design experiments with, mammalian systems. The complexity of the information, however, requires new software tools, and numerous such tools are now available. Which type and which specific system is most effective depends, in part, upon how much sequence is to be analyzed and with what level of experimental support. Here we survey a number of mammalian genomic sequence analysis systems with respect to the data they provide and the ease of their use. The hope is to aid the experimental biologist in choosing the most appropriate tool for their analyses. PMID:11226611

  14. RSAT 2015: Regulatory Sequence Analysis Tools.

    PubMed

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  15. RSAT 2015: Regulatory Sequence Analysis Tools

    PubMed Central

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-01-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  16. Applying machine learning techniques to DNA sequence analysis

    SciTech Connect

    Shavlik, J.W. . Dept. of Computer Sciences); Noordewier, M.O. . Dept. of Computer Science)

    1992-01-01

    We are primarily developing a machine teaming (ML) system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being teamed. Using this information, our teaming algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, our KBANN algorithm maps inference rules about a given recognition task into a neural network. Neural network training techniques then use the training examples to refine these inference rules. We call these rules a domain theory, following the convention in the machine teaming community. We have been applying this approach to several problems in DNA sequence analysis. In addition, we have been extending the capabilities of our teaming system along several dimensions. We have also been investigating parallel algorithms that perform sequence alignments in the presence of frameshift errors.

  17. Sequence analysis by iterated maps, a review.

    PubMed

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results. PMID:24162172

  18. Acid Rain Analysis by Standard Addition Titration.

    ERIC Educational Resources Information Center

    Ophardt, Charles E.

    1985-01-01

    The standard addition titration is a precise and rapid method for the determination of the acidity in rain or snow samples. The method requires use of a standard buret, a pH meter, and Gran's plot to determine the equivalence point. Experimental procedures used and typical results obtained are presented. (JN)

  19. Addition of wsp sequences to the Wolbachia phylogenetic tree and stability of the classification.

    PubMed

    Pintureau, B; Chaudier, S; Lassablière, F; Charles, H; Grenier, S

    2000-10-01

    Wolbachia are symbiotic bacteria altering reproductive characters of numerous arthropods. Their most recent phylogeny and classification are based on sequences of the wsp gene. We sequenced wsp gene from six Wolbachia strains infecting six Trichogramma species that live as egg parasitoids on many insects. This allows us to test the effect of the addition of sequences on the Wolbachia phylogeny and to check the classification of Wolbachia infecting Trichogramma. The six Wolbachia studied are classified in the B supergroup. They confirm the monophyletic structure of the B Wolbachia in Trichogramma but introduce small differences in the Wolbachia classification. Modifications include the definition of a new group, Sem, for Wolbachia of T. semblidis and the merging of the two closely related groups, Sib and Kay. Specific primers were determined and tested for the Sem group. PMID:11040288

  20. Sequence analysis of the AAA protein family.

    PubMed Central

    Beyer, A.

    1997-01-01

    The AAA protein family, a recently recognized group of Walker-type ATPases, has been subjected to an extensive sequence analysis. Multiple sequence alignments revealed the existence of a region of sequence similarity, the so-called AAA cassette. The borders of this cassette were localized and within it, three boxes of a high degree of conservation were identified. Two of these boxes could be assigned to substantial parts of the ATP binding site (namely, to Walker motifs A and B); the third may be a portion of the catalytic center. Phylogenetic trees were calculated to obtain insights into the evolutionary history of the family. Subfamilies with varying degrees of intra-relatedness could be discriminated; these relationships are also supported by analysis of sequences outside the canonical AAA boxes: within the cassette are regions that are strongly conserved within each subfamily, whereas little or even no similarity between different subfamilies can be observed. These regions are well suited to define fingerprints for subfamilies. A secondary structure prediction utilizing all available sequence information was performed and the result was fitted to the general 3D structure of a Walker A/GTPase. The agreement was unexpectedly high and strongly supports the conclusion that the AAA family belongs to the Walker superfamily of A/GTPases. PMID:9336829

  1. Engineering of Schroedinger cat states by a sequence of displacements and photon additions or subtractions

    SciTech Connect

    Podoshvedov, S. A.

    2011-04-15

    A method to generate Schroedinger cat states in free propagating optical fields based on the use of displaced states (or displacement operators) is developed. Some optical schemes with photon-added coherent states are studied. The schemes are modifications of the general method based on a sequence of displacements and photon additions or subtractions adjusted to generate Schroedinger cat states of a larger size. The effects of detection inefficiency are taken into account.

  2. Sequence analysis by iterated maps, a review

    PubMed Central

    2014-01-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, ‘Chaos Game Representation’. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results. PMID:24162172

  3. Additives

    NASA Technical Reports Server (NTRS)

    Smalheer, C. V.

    1973-01-01

    The chemistry of lubricant additives is discussed to show what the additives are chemically and what functions they perform in the lubrication of various kinds of equipment. Current theories regarding the mode of action of lubricant additives are presented. The additive groups discussed include the following: (1) detergents and dispersants, (2) corrosion inhibitors, (3) antioxidants, (4) viscosity index improvers, (5) pour point depressants, and (6) antifouling agents.

  4. In vivo generation of linear plasmids with addition of telomeric sequences by Histoplasma capsulatum.

    PubMed

    Woods, J P; Goldman, W E

    1992-12-01

    Histoplasma capsulatum is a dimorphic pathogenic fungus that is a major cause of respiratory and systemic mycosis. We previously developed a transformation system for Histoplasma and demonstrated chromosomal integration of transforming plasmid sequences. In this study, we describe another Histoplasma mechanism for maintaining transforming DNA i.e. the generation of modified, multicopy linear plasmids carrying DNA from the transforming Escherichia coli plasmid. Under selective conditions, these linear plasmids were stable and capable of retransforming Histoplasma without further modification. In vivo modification of the transforming DNA included duplication of plasmid sequence and telomeric addition at the termini of linear DNA. Apparently Histoplasma telomerase, like that of other organisms such as humans and Tetrahymena, is able to act on non-telomeric substrates. The terminus of a Histoplasma linear plasmid was cloned and shown to contain multiple repeats of GGGTTA, the telomeric repeat unit also found in vertebrates, trypanosomes, and slime moulds. PMID:1474902

  5. NexGen Production – Sequencing and Analysis

    SciTech Connect

    Muzny, Donna

    2010-06-02

    Donna Muzny of the Baylor College of Medicine Human Genome Sequencing Center discusses next generation sequencing platforms and evaluating pipeline performance on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  6. DNAApp: a mobile application for sequencing data analysis

    PubMed Central

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-01-01

    Summary: There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. Availability and implementation: The Android version of DNAApp is available in Google Play Store as ‘DNAApp’, and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. Contact: samuelg@bii.a-star.edu.sg PMID:25095882

  7. Integrating Sequence Evolution into Probabilistic Orthology Analysis.

    PubMed

    Ullah, Ikram; Sjöstrand, Joel; Andersson, Peter; Sennblad, Bengt; Lagergren, Jens

    2015-11-01

    Orthology analysis, that is, finding out whether a pair of homologous genes are orthologs - stemming from a speciation - or paralogs - stemming from a gene duplication - is of central importance in computational biology, genome annotation, and phylogenetic inference. In particular, an orthologous relationship makes functional equivalence of the two genes highly likely. A major approach to orthology analysis is to reconcile a gene tree to the corresponding species tree, (most commonly performed using the most parsimonious reconciliation, MPR). However, most such phylogenetic orthology methods infer the gene tree without considering the constraints implied by the species tree and, perhaps even more importantly, only allow the gene sequences to influence the orthology analysis through the a priori reconstructed gene tree. We propose a sound, comprehensive Bayesian Markov chain Monte Carlo-based method, DLRSOrthology, to compute orthology probabilities. It efficiently sums over the possible gene trees and jointly takes into account the current gene tree, all possible reconciliations to the species tree, and the, typically strong, signal conveyed by the sequences. We compare our method with PrIME-GEM, a probabilistic orthology approach built on a probabilistic duplication-loss model, and MrBayesMPR, a probabilistic orthology approach that is based on conventional Bayesian inference coupled with MPR. We find that DLRSOrthology outperforms these competing approaches on synthetic data as well as on biological data sets and is robust to incomplete taxon sampling artifacts. PMID:26130236

  8. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    PubMed Central

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  9. Exploration of phylogenetic data using a global sequence analysis method

    PubMed Central

    Chapus, Charles; Dufraigne, Christine; Edwards, Scott; Giron, Alain; Fertil, Bernard; Deschavanne, Patrick

    2005-01-01

    Background Molecular phylogenetic methods are based on alignments of nucleic or peptidic sequences. The tremendous increase in molecular data permits phylogenetic analyses of very long sequences and of many species, but also requires methods to help manage large datasets. Results Here we explore the phylogenetic signal present in molecular data by genomic signatures, defined as the set of frequencies of short oligonucleotides present in DNA sequences. Although violating many of the standard assumptions of traditional phylogenetic analyses – in particular explicit statements of homology inherent in character matrices – the use of the signature does permit the analysis of very long sequences, even those that are unalignable, and is therefore most useful in cases where alignment is questionable. We compare the results obtained by traditional phylogenetic methods to those inferred by the signature method for two genes: RAG1, which is easily alignable, and 18S RNA, where alignments are often ambiguous for some regions. We also apply this method to a multigene data set of 33 genes for 9 bacteria and one archea species as well as to the whole genome of a set of 16 γ-proteobacteria. In addition to delivering phylogenetic results comparable to traditional methods, the comparison of signatures for the sequences involved in the bacterial example identified putative candidates for horizontal gene transfers. Conclusion The signature method is therefore a fast tool for exploring phylogenetic data, providing not only a pretreatment for discovering new sequence relationships, but also for identifying cases of sequence evolution that could confound traditional phylogenetic analysis. PMID:16280081

  10. FAST: FAST Analysis of Sequences Toolbox.

    PubMed

    Lawrence, Travis J; Kauffman, Kyle T; Amrine, Katherine C H; Carper, Dana L; Lee, Raymond S; Becich, Peter J; Canales, Claudia J; Ardell, David H

    2015-01-01

    FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought. PMID:26042145

  11. FAST: FAST Analysis of Sequences Toolbox

    PubMed Central

    Lawrence, Travis J.; Kauffman, Kyle T.; Amrine, Katherine C. H.; Carper, Dana L.; Lee, Raymond S.; Becich, Peter J.; Canales, Claudia J.; Ardell, David H.

    2015-01-01

    FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought. PMID:26042145

  12. Genome sequencing and analysis conference grant

    SciTech Connect

    Venter, J.C.

    1995-10-01

    The 14 plenary session presentations focused on nematode; yeast; fruit fly; plants; mycobacteria; and man. In addition there were presentations on a variety of technical innovations including database developments and refinements, bioelectronic genesensors, computer-assisted multiplex techniques, and hybridization analysis with DNA chip technology. This document includes a list of exhibitors and abstracts of sessions.

  13. Integrative visual analysis of protein sequence mutations

    PubMed Central

    2014-01-01

    Background An important aspect of studying the relationship between protein sequence, structure and function is the molecular characterization of the effect of protein mutations. To understand the functional impact of amino acid changes, the multiple biological properties of protein residues have to be considered together. Results Here, we present a novel visual approach for analyzing residue mutations. It combines different biological visualizations and integrates them with molecular data derived from external resources. To show various aspects of the biological information on different scales, our approach includes one-dimensional sequence views, three-dimensional protein structure views and two-dimensional views of residue interaction networks as well as aggregated views. The views are linked tightly and synchronized to reduce the cognitive load of the user when switching between them. In particular, the protein mutations are mapped onto the views together with further functional and structural information. We also assess the impact of individual amino acid changes by the detailed analysis and visualization of the involved residue interactions. We demonstrate the effectiveness of our approach and the developed software on the data provided for the BioVis 2013 data contest. Conclusions Our visual approach and software greatly facilitate the integrative and interactive analysis of protein mutations based on complementary visualizations. The different data views offered to the user are enriched with information about molecular properties of amino acid residues and further biological knowledge. PMID:25237389

  14. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi. PMID:26542221

  15. Complete nucleotide sequence of a Spanish isolate of alfalfa mosaic virus: evidence for additional genetic variability.

    PubMed

    Parrella, Giuseppe; Acanfora, Nadia; Orílio, Anelise F; Navas-Castillo, Jesús

    2011-06-01

    Alfalfa mosaic virus (AMV) is a plant virus that is distributed worldwide and can induce necrosis and/or yellow mosaic on a large variety of plant species, including commercially important crops. It is the only virus of the genus Alfamovirus in the family Bromoviridae. AMV isolates can be clustered into two genetic groups that correlate with their geographic origin. Here, we report for the first time the complete nucleotide sequence of a Spanish isolate of AMV found infecting Cape honeysuckle (Tecoma capensis) and named Tec-1. The tripartite genome of Tec-1 is composed of 3643 nucleotides (nt) for RNA1, 2594 nt for RNA2 and 2037 nt for RNA3. Comparative sequence analysis of the coat protein gene revealed that the isolate Tec-1 is distantly related to subgroup I of AMV and more closely related to subgroup II, although forming a distinct phylogenetic clade. Therefore, we propose to split subgroup II of AMV into two subgroups, namely IIA, comprising isolates previously included in subgroup II, and IIB, including the novel Spanish isolate Tec-1. PMID:21327783

  16. Whole-genome sequence-based analysis of thyroid function

    PubMed Central

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J.; Traglia, Michela; Brown, Suzanne J.; Mullin, Benjamin H.; Shihab, Hashem A.; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R.; Beilby, John P.; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D.; Hui, Jennie; Lim, Ee M.; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R.B.; Bell, Jordana T.; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L.; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M.; Naitza, Silvia; Walsh, John P.; Spector, Tim; Davey Smith, George; Durbin, Richard; Brent Richards, J.; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J.; Wilson, Scott G.; Turki, Saeed Al; Anderson, Carl; Anney, Richard; Antony, Dinu; Artigas, Maria Soler; Ayub, Muhammad; Balasubramaniam, Senduran; Barrett, Jeffrey C.; Barroso, Inês; Beales, Phil; Bentham, Jamie; Bhattacharya, Shoumo; Birney, Ewan; Blackwood, Douglas; Bobrow, Martin; Bochukova, Elena; Bolton, Patrick; Bounds, Rebecca; Boustred, Chris; Breen, Gerome; Calissano, Mattia; Carss, Keren; Chatterjee, Krishna; Chen, Lu; Ciampi, Antonio; Cirak, Sebhattin; Clapham, Peter; Clement, Gail; Coates, Guy; Collier, David; Cosgrove, Catherine; Cox, Tony; Craddock, Nick; Crooks, Lucy; Curran, Sarah; Curtis, David; Daly, Allan; Day-Williams, Aaron; Day, Ian N.M.; Down, Thomas; Du, Yuanping; Dunham, Ian; Edkins, Sarah; Ellis, Peter; Evans, David; Faroogi, Sadaf; Fatemifar, Ghazaleh; Fitzpatrick, David R.; Flicek, Paul; Flyod, James; Foley, A. Reghan; Franklin, Christopher S.; Futema, Marta; Gallagher, Louise; Geihs, Matthias; Geschwind, Daniel; Griffin, Heather; Grozeva, Detelina; Guo, Xueqin; Guo, Xiaosen; Gurling, Hugh; Hart, Deborah; Hendricks, Audrey; Holmans, Peter; Howie, Bryan; Huang, Liren; Hubbard, Tim; Humphries, Steve E.; Hurles, Matthew E.; Hysi, Pirro; Jackson, David K.; Jamshidi, Yalda; Jing, Tian; Joyce, Chris; Kaye, Jane; Keane, Thomas; Keogh, Julia; Kemp, John; Kennedy, Karen; Kolb-Kokocinski, Anja; Lachance, Genevieve; Langford, Cordelia; Lawson, Daniel; Lee, Irene; Lek, Monkol; Liang, Jieqin; Lin, Hong; Li, Rui; Li, Yingrui; Liu, Ryan; Lönnqvist, Jouko; Lopes, Margarida; Lotchkova, Valentina; MacArthur, Daniel; Marchini, Jonathan; Maslen, John; Massimo, Mangino; Mathieson, Iain; Marenne, Gaëlle; McGuffin, Peter; McIntosh, Andrew; McKechanie, Andrew G.; McQuillin, Andrew; Metrustry, Sarah; Mitchison, Hannah; Moayyeri, Alireza; Morris, James; Muntoni, Francesco; Northstone, Kate; O'Donnovan, Michael; Onoufriadis, Alexandros; O'Rahilly, Stephen; Oualkacha, Karim; Owen, Michael J.; Palotie, Aarno; Panoutsopoulou, Kalliope; Parker, Victoria; Parr, Jeremy R.; Paternoster, Lavinia; Paunio, Tiina; Payne, Felicity; Pietilainen, Olli; Plagnol, Vincent; Quaye, Lydia; Quai, Michael A.; Raymond, Lucy; Rehnström, Karola; Richards, Brent; Ring, Susan; Ritchie, Graham R.S.; Roberts, Nicola; Savage, David B.; Scambler, Peter; Schiffels, Stephen; Schmidts, Miriam; Schoenmakers, Nadia; Semple, Robert K.; Serra, Eva; Sharp, Sally I.; Shin, So-Youn; Skuse, David; Small, Kerrin; Southam, Lorraine; Spasic-Boskovic, Olivera; Clair, David St; Stalker, Jim; Stevens, Elizabeth; Pourcian, Beate St; Sun, Jianping; Suvisaari, Jaana; Tachmazidou, Ionna; Tobin, Martin D.; Valdes, Ana; Kogelenberg, Margriet Van; Vijayarangakannan, Parthiban; Visscher, Peter M.; Wain, Louise V.; Walters, James T.R.; Wang, Guangbiao; Wang, Jun; Wang, Yu; Ward, Kirsten; Wheeler, Elanor; Whyte, Tamieka; Williams, Hywel; Williamson, Kathleen A.; Wilson, Crispian; Wong, Kim; Xu, ChangJiang; Yang, Jian; Zhang, Fend; Zhang, Pingbo

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10−9) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10−14). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10−9) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10−11). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function. PMID:25743335

  17. Whole-genome sequence-based analysis of thyroid function.

    PubMed

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function. PMID:25743335

  18. Time fluctuation analysis of forest fire sequences

    NASA Astrophysics Data System (ADS)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  19. Direct Chloroplast Sequencing: Comparison of Sequencing Platforms and Analysis Tools for Whole Chloroplast Barcoding

    PubMed Central

    Brozynska, Marta; Furtado, Agnelo; Henry, Robert James

    2014-01-01

    Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis. PMID:25329378

  20. The DNA sequence and analysis of human chromosome 13

    PubMed Central

    Dunham, A.; Matthews, L. H.; Burton, J.; Ashurst, J. L.; Howe, K. L.; Ashcroft, K. J.; Beare, D. M.; Burford, D. C.; Hunt, S. E.; Griffiths-Jones, S.; Jones, M. C.; Keenan, S. J.; Oliver, K.; Scott, C. E.; Ainscough, R.; Almeida, J. P.; Ambrose, K. D.; Andrews, D. T.; Ashwell, R. I. S.; Babbage, A. K.; Bagguley, C. L.; Bailey, J.; Bannerjee, R.; Barlow, K. F.; Bates, K.; Beasley, H.; Bird, C. P.; Bray-Allen, S.; Brown, A. J.; Brown, J. Y.; Burrill, W.; Carder, C.; Carter, N. P.; Chapman, J. C.; Clamp, M. E.; Clark, S. Y.; Clarke, G.; Clee, C. M.; Clegg, S. C. M.; Cobley, V.; Collins, J. E.; Corby, N.; Coville, G. J.; Deloukas, P.; Dhami, P.; Dunham, I.; Dunn, M.; Earthrowl, M. E.; Ellington, A. G.; Faulkner, L.; Frankish, A. G.; Frankland, J.; French, L.; Garner, P.; Garnett, J.; Gilbert, J. G. R.; Gilson, C. J.; Ghori, J.; Grafham, D. V.; Gribble, S. M.; Griffiths, C.; Hall, R. E.; Hammond, S.; Harley, J. L.; Hart, E. A.; Heath, P. D.; Howden, P. J.; Huckle, E. J.; Hunt, P. J.; Hunt, A. R.; Johnson, C.; Johnson, D.; Kay, M.; Kimberley, A. M.; King, A.; Laird, G. K.; Langford, C. J.; Lawlor, S.; Leongamornlert, D. A.; Lloyd, D. M.; Lloyd, C.; Loveland, J. E.; Lovell, J.; Martin, S.; Mashreghi-Mohammadi, M.; McLaren, S. J.; McMurray, A.; Milne, S.; Moore, M. J. F.; Nickerson, T.; Palmer, S. A.; Pearce, A. V.; Peck, A. I.; Pelan, S.; Phillimore, B.; Porter, K. M.; Rice, C. M.; Searle, S.; Sehra, H. K.; Shownkeen, R.; Skuce, C. D.; Smith, M.; Steward, C. A.; Sycamore, N.; Tester, J.; Thomas, D. W.; Tracey, A.; Tromans, A.; Tubby, B.; Wall, M.; Wallis, J. M.; West, A. P.; Whitehead, S. L.; Willey, D. L.; Wilming, L.; Wray, P. W.; Wright, M. W.; Young, L.; Coulson, A.; Durbin, R.; Hubbard, T.; Sulston, J. E.; Beck, S.; Bentley, D. R.; Rogers, J.; Ross, M. T.

    2009-01-01

    Chromosome 13 is the largest acrocentric human chromosome. It carries genes involved in cancer including the breast cancer type 2 (BRCA2) and retinoblastoma (RB1) genes, is frequently rearranged in B-cell chronic lymphocytic leukaemia, and contains the DAOA locus associated with bipolar disorder and schizophrenia. We describe completion and analysis of 95.5 megabases (Mb) of sequence from chromosome 13, which contains 633 genes and 296 pseudogenes. We estimate that more than 95.4% of the protein-coding genes of this chromosome have been identified, on the basis of comparison with other vertebrate genome sequences. Additionally, 105 putative non-coding RNA genes were found. Chromosome 13 has one of the lowest gene densities (6.5 genes per Mb) among human chromosomes, and contains a central region of 38 Mb where the gene density drops to only 3.1 genes per Mb. PMID:15057823

  1. The DNA sequence and analysis of human chromosome 13.

    PubMed

    Dunham, A; Matthews, L H; Burton, J; Ashurst, J L; Howe, K L; Ashcroft, K J; Beare, D M; Burford, D C; Hunt, S E; Griffiths-Jones, S; Jones, M C; Keenan, S J; Oliver, K; Scott, C E; Ainscough, R; Almeida, J P; Ambrose, K D; Andrews, D T; Ashwell, R I S; Babbage, A K; Bagguley, C L; Bailey, J; Bannerjee, R; Barlow, K F; Bates, K; Beasley, H; Bird, C P; Bray-Allen, S; Brown, A J; Brown, J Y; Burrill, W; Carder, C; Carter, N P; Chapman, J C; Clamp, M E; Clark, S Y; Clarke, G; Clee, C M; Clegg, S C M; Cobley, V; Collins, J E; Corby, N; Coville, G J; Deloukas, P; Dhami, P; Dunham, I; Dunn, M; Earthrowl, M E; Ellington, A G; Faulkner, L; Frankish, A G; Frankland, J; French, L; Garner, P; Garnett, J; Gilbert, J G R; Gilson, C J; Ghori, J; Grafham, D V; Gribble, S M; Griffiths, C; Hall, R E; Hammond, S; Harley, J L; Hart, E A; Heath, P D; Howden, P J; Huckle, E J; Hunt, P J; Hunt, A R; Johnson, C; Johnson, D; Kay, M; Kimberley, A M; King, A; Laird, G K; Langford, C J; Lawlor, S; Leongamornlert, D A; Lloyd, D M; Lloyd, C; Loveland, J E; Lovell, J; Martin, S; Mashreghi-Mohammadi, M; McLaren, S J; McMurray, A; Milne, S; Moore, M J F; Nickerson, T; Palmer, S A; Pearce, A V; Peck, A I; Pelan, S; Phillimore, B; Porter, K M; Rice, C M; Searle, S; Sehra, H K; Shownkeen, R; Skuce, C D; Smith, M; Steward, C A; Sycamore, N; Tester, J; Thomas, D W; Tracey, A; Tromans, A; Tubby, B; Wall, M; Wallis, J M; West, A P; Whitehead, S L; Willey, D L; Wilming, L; Wray, P W; Wright, M W; Young, L; Coulson, A; Durbin, R; Hubbard, T; Sulston, J E; Beck, S; Bentley, D R; Rogers, J; Ross, M T

    2004-04-01

    Chromosome 13 is the largest acrocentric human chromosome. It carries genes involved in cancer including the breast cancer type 2 (BRCA2) and retinoblastoma (RB1) genes, is frequently rearranged in B-cell chronic lymphocytic leukaemia, and contains the DAOA locus associated with bipolar disorder and schizophrenia. We describe completion and analysis of 95.5 megabases (Mb) of sequence from chromosome 13, which contains 633 genes and 296 pseudogenes. We estimate that more than 95.4% of the protein-coding genes of this chromosome have been identified, on the basis of comparison with other vertebrate genome sequences. Additionally, 105 putative non-coding RNA genes were found. Chromosome 13 has one of the lowest gene densities (6.5 genes per Mb) among human chromosomes, and contains a central region of 38 Mb where the gene density drops to only 3.1 genes per Mb. PMID:15057823

  2. Now and Next-Generation Sequencing Techniques: Future of Sequence Analysis Using Cloud Computing

    PubMed Central

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed “cloud computing”) has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows. PMID:23248640

  3. Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

    PubMed Central

    2012-01-01

    Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742

  4. Time-dependent accident sequence analysis

    SciTech Connect

    Chu, T.L.

    1983-01-01

    One problem of the current event tree methodology is that the transitions between accident sequences are not modeled. The causes of transitions are mostly due to operator actions during an accident. A model for such transitions is presented. A generalized algorithm is used for quantification. In the more realistic accident analysis, the progression of the physical processes, which determines the time available for proper operators response, is modeled. Furthermore, the uncertainty associated with the physical modeling is considered. As an example, the approach is applied to analyze TMI-type accidents. Statistical evidence is collected and used in assessing the frequency of stuck-open pressure operated relief valve at B and W plants as well as the frequency of misdiagnosis. Statistical data are also used in modeling the timing of operator actions during the accident. A thermal code (CUT) is developed to determine the time at which the core uncovery occurs. A response surface is used to propagate the uncertainty associated with the thermal code.

  5. An analysis of the feasibility of short read sequencing

    PubMed Central

    Whiteford, Nava; Haslam, Niall; Weber, Gerald; Prügel-Bennett, Adam; Essex, Jonathan W.; Roach, Peter L.; Bradley, Mark; Neylon, Cameron

    2005-01-01

    Several methods for ultra high-throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). Here we report on an analysis showing the level of genome sequencing possible as a function of read length. It is shown that re-sequencing and de novo sequencing of the majority of a bacterial genome is possible with read lengths of 20–30 nt, and that reads of 50 nt can provide reconstructed contigs (a contiguous fragment of sequence data) of 1000 nt and greater that cover 80% of human chromosome 1. PMID:16275781

  6. Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius.

    PubMed

    Al-Swailem, Abdulaziz M; Shehata, Maher M; Abu-Duhier, Faisel M; Al-Yamani, Essam J; Al-Busadah, Khalid A; Al-Arawi, Mohammed S; Al-Khider, Ali Y; Al-Muhaimeed, Abdullah N; Al-Qahtani, Fahad H; Manee, Manee M; Al-Shomrani, Badr M; Al-Qhtani, Saad M; Al-Harthi, Amer S; Akdemir, Kadir C; Inan, Mehmet S; Otu, Hasan H

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and approximately 40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  7. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  8. Project Report: Automatic Sequence Processor Software Analysis

    NASA Technical Reports Server (NTRS)

    Benjamin, Brandon

    2011-01-01

    The Mission Planning and Sequencing (MPS) element of Multi-Mission Ground System and Services (MGSS) provides space missions with multi-purpose software to plan spacecraft activities, sequence spacecraft commands, and then integrate these products and execute them on spacecraft. Jet Propulsion Laboratory (JPL) is currently is flying many missions. The processes for building, integrating, and testing the multi-mission uplink software need to be improved to meet the needs of the missions and the operations teams that command the spacecraft. The Multi-Mission Sequencing Team is responsible for collecting and processing the observations, experiments and engineering activities that are to be performed on a selected spacecraft. The collection of these activities is called a sequence and ultimately a sequence becomes a sequence of spacecraft commands. The operations teams check the sequence to make sure that no constraints are violated. The workflow process involves sending a program start command, which activates the Automatic Sequence Processor (ASP). The ASP is currently a file-based system that is comprised of scripts written in perl, c-shell and awk. Once this start process is complete, the system checks for errors and aborts if there are any; otherwise the system converts the commands to binary, and then sends the resultant information to be radiated to the spacecraft.

  9. Automated shielding analysis sequences for spent fuel casks

    SciTech Connect

    Tang, J.S.; Parks, C.V.; Hermann, O.W.

    1987-01-01

    Two important Shielding Analysis Sequences (SAS) have recently been developed within the SCALE computational system. These sequences significantly enhance the existing SCALE system capabilities for evaluating radiation doses exterior to spent fuel casks. These new control module sequences (SAS1 and SAS4) and their capabilities are discussed and demonstrated, together with the existing SAS2 sequence that is used to generate radiation sources for spent fuel. Particular attention is given to the new SAS4 sequence which provides an automated scheme for generating and using biasing parameters in a subsequent Monte Carlo analysis of a cask.

  10. Organocatalytic Asymmetric 1,6-Addition/1,4-Addition Sequence to 2,4-Dienals for the Synthesis of Chiral Chromans.

    PubMed

    Poulsen, Pernille H; Feu, Karla Santos; Paz, Bruno Matos; Jensen, Frank; Jørgensen, Karl Anker

    2015-07-01

    A novel asymmetric organocatalytic 1,6-addition/1,4-addition sequence to 2,4-dienals is described. Based on a 1,6-Friedel-Crafts/1,4-oxa-Michael cascade, the organocatalyst directs the reaction of hydroxyarenes with a vinylogous iminium-ion intermediate to give only one out of four possible regioisomers, thus providing optically active chromans in high yields and 94-99 % ee. Furthermore, several transformations are presented, including the formation of an optically active macrocyclic lactam. Finally, the mechanism for the novel reaction is discussed based on computational studies. PMID:26015328

  11. Sequencing and analysis of a genomic fragment provide an insight into the Dunaliella viridis genomic sequence.

    PubMed

    Sun, Xiao-Ming; Tang, Yuan-Ping; Meng, Xiang-Zong; Zhang, Wen-Wen; Li, Shan; Deng, Zhi-Rui; Xu, Zheng-Kai; Song, Ren-Tao

    2006-11-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)(n) type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features. PMID:17091199

  12. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2001-06-05

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  13. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1999-10-26

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  14. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, M.S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device. 27 figs.

  15. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2003-08-19

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  16. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  17. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  18. Computed Tomography Inspection and Analysis for Additive Manufacturing Components

    NASA Technical Reports Server (NTRS)

    Beshears, Ronald D.

    2016-01-01

    Computed tomography (CT) inspection was performed on test articles additively manufactured from metallic materials. Metallic AM and machined wrought alloy test articles with programmed flaws were inspected using a 2MeV linear accelerator based CT system. Performance of CT inspection on identically configured wrought and AM components and programmed flaws was assessed using standard image analysis techniques to determine the impact of additive manufacturing on inspectability of objects with complex geometries.

  19. Analysis of human immunodeficiency virus type 1 nef gene sequences present in vivo.

    PubMed Central

    Shugars, D C; Smith, M S; Glueck, D H; Nantermet, P V; Seillier-Moiseiwitsch, F; Swanstrom, R

    1993-01-01

    The nef genes of the human immunodeficiency viruses type 1 and 2 (HIV-1 and HIV-2) and the related simian immunodeficiency viruses (SIVs) encode a protein (Nef) whose role in virus replication and cytopathicity remains uncertain. As an attempt to elucidate the function of nef, we characterized the nucleotide and corresponding protein sequences of naturally occurring nef genes obtained from several HIV-1-infected individuals. A consensus Nef sequence was derived and used to identify several features that were highly conserved among the Nef sequences. These features included a nearly invariant myristylation signal, regions of sequence polymorphism and variable duplication, a region with an acidic charge, a (Pxx)4 repeat sequence, and a potential protein kinase C phosphorylation site. Clustering of premature stop codons at position 124 was noted in 6 of the 54 Nef sequences. Further analysis revealed four stretches of residues that were highly conserved not only among the patient-derived HIV-1 Nef sequences, but also among the Nef sequences of HIV-2 and the SIVs, suggesting that Nef proteins expressed by these retroviruses are functionally equivalent. The "Nef-defining" sequences were used to evaluate the sequence alignments of known proteins reported to share sequence similarity with Nef sequences and to conduct additional computer-based searches for similar protein sequences. A gene encoding the consensus Nef sequence was also generated. This gene encodes a full-length Nef protein that should be a valuable tool in further studies of Nef function. Images PMID:8043040

  20. Sequence analysis of styrenic copolymers by tandem mass spectrometry.

    PubMed

    Yol, Aleer M; Janoski, Jonathan; Quirk, Roderic P; Wesdemiotis, Chrys

    2014-10-01

    Styrene and smaller molar amounts of either m-dimethylsilylstyrene (m-DMSS) or p-dimethylsilylstyrene (p-DMSS) were copolymerized under living anionic polymerization conditions, and the compositions, architectures, and sequences of the resulting copolymers were characterized by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS) and tandem mass spectrometry (MS(2)). MS analysis revealed that linear copolymer chains containing phenyl-Si(CH3)2H pendants were the major product for both DMSS comonomers. In addition, two-armed architectures with phenyl-Si(CH3)2-benzyl branches were detected as minor products. The comonomer sequence in the linear chains was established by MS(2) experiments on lithiated oligomers, based on the DMSS content of fragments generated by backbone C-C bond scissions and with the help of reference MS(2) spectra obtained from a polystyrene homopolymer and polystyrene end-capped with a p-DMSS block. The MS(2) data provided conclusive evidence that copolymerization of styrene/DMSS mixtures leads to chains with a rather random distribution of the silylated comonomer when m-DMSS is used, but to chains with tapered block structures, with the silylated units near the initiator, when p-DMSS is used. Hence, MS(2) fragmentation patterns permit not only differentiation of the sequences generated in the synthesis, but also the determination of specific comonomer locations along the polymer chain. PMID:25181590

  1. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    ERIC Educational Resources Information Center

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  2. Relationships among genera of the Saccharomycotina from multigene sequence analysis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Most known species of the subphylum Saccharomycotina (budding ascomycetous yeasts) have now been placed in phylogenetically defined clades following multigene sequence analysis. Terminal clades, which are usually well supported from bootstrap analysis, are viewed as phylogenetically circumscribed ge...

  3. Establishing a framework for comparative analysis of genome sequences

    SciTech Connect

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  4. Optimal Multicomponent Analysis Using the Generalized Standard Addition Method.

    ERIC Educational Resources Information Center

    Raymond, Margaret; And Others

    1983-01-01

    Describes an experiment on the simultaneous determination of chromium and magnesium by spectophotometry modified to include the Generalized Standard Addition Method computer program, a multivariate calibration method that provides optimal multicomponent analysis in the presence of interference and matrix effects. Provides instructions for…

  5. Wavelet Analysis on Symbolic Sequences and Two-Fold de Bruijn Sequences

    NASA Astrophysics Data System (ADS)

    Osipov, V. Al.

    2016-05-01

    The concept of symbolic sequences play important role in study of complex systems. In the work we are interested in ultrametric structure of the set of cyclic sequences naturally arising in theory of dynamical systems. Aimed at construction of analytic and numerical methods for investigation of clusters we introduce operator language on the space of symbolic sequences and propose an approach based on wavelet analysis for study of the cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a formula for counting of two-fold de Bruijn sequences, the extension of the notion of de Bruijn sequences. Possible advantages of the developed description is also discussed in context of applied problem of construction of efficient DNA sequence assembly algorithms.

  6. Wavelet Analysis on Symbolic Sequences and Two-Fold de Bruijn Sequences

    NASA Astrophysics Data System (ADS)

    Osipov, V. Al.

    2016-07-01

    The concept of symbolic sequences play important role in study of complex systems. In the work we are interested in ultrametric structure of the set of cyclic sequences naturally arising in theory of dynamical systems. Aimed at construction of analytic and numerical methods for investigation of clusters we introduce operator language on the space of symbolic sequences and propose an approach based on wavelet analysis for study of the cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a formula for counting of two-fold de Bruijn sequences, the extension of the notion of de Bruijn sequences. Possible advantages of the developed description is also discussed in context of applied problem of construction of efficient DNA sequence assembly algorithms.

  7. Modern Computational Techniques for the HMMER Sequence Analysis

    PubMed Central

    2013-01-01

    This paper focuses on the latest research and critical reviews on modern computing architectures, software and hardware accelerated algorithms for bioinformatics data analysis with an emphasis on one of the most important sequence analysis applications—hidden Markov models (HMM). We show the detailed performance comparison of sequence analysis tools on various computing platforms recently developed in the bioinformatics society. The characteristics of the sequence analysis, such as data and compute-intensive natures, make it very attractive to optimize and parallelize by using both traditional software approach and innovated hardware acceleration technologies. PMID:25937944

  8. DNA sequence-based analysis of the Pseudomonas species.

    PubMed

    Mulet, Magdalena; Lalucat, Jorge; García-Valdés, Elena

    2010-06-01

    Partial sequences of four core 'housekeeping' genes (16S rRNA, gyrB, rpoB and rpoD) of the type strains of 107 Pseudomonas species were analysed in order to obtain a comprehensive view regarding the phylogenetic relationships within the Pseudomonas genus. Gene trees allowed the discrimination of two lineages or intrageneric groups (IG), called IG P. aeruginosa and IG P. fluorescens. The first IG P. aeruginosa, was divided into three main groups, represented by the species P. aeruginosa, P. stutzeri and P. oleovorans. The second IG was divided into six groups, represented by the species P. fluorescens, P. syringae, P. lutea, P. putida, P. anguilliseptica and P. straminea. The P. fluorescens group was the most complex and included nine subgroups, represented by the species P. fluorescens, P. gessardi, P. fragi, P. mandelii, P. jesseni, P. koreensis, P. corrugata, P. chlororaphis and P. asplenii. Pseudomonas rhizospherae was affiliated with the P. fluorescens IG in the phylogenetic analysis but was independent of any group. Some species were located on phylogenetic branches that were distant from defined clusters, such as those represented by the P. oryzihabitans group and the type strains P. pachastrellae, P. pertucinogena and P. luteola. Additionally, 17 strains of P. aeruginosa, 'P. entomophila', P. fluorescens, P. putida, P. syringae and P. stutzeri, for which genome sequences have been determined, have been included to compare the results obtained in the analysis of four housekeeping genes with those obtained from whole genome analyses. PMID:20192968

  9. Stratigraphic sequence analysis of the Antler foreland

    SciTech Connect

    Silberling, N.J.; Nichols, K.M.; Macke, D.L. )

    1993-04-01

    Mid-Upper Devonian to Upper Mississippian strata in western Utah were deposited in the distal Antler foreland. They record lateral and vertical changes in depositional environments that define five successive stratigraphic sequences, each representing a third-order transgressive-regressive cycle. In ascending order, these sequences are informally named the Langenheim (LA) of late Frasnian to mid-Famennian age, the Gutschick (GU) of late Famennian to early Kinderhookian age, the Morris (MO) of late Kinderhookian age; the Sadlick (SA) of Osagean to early Meramecian age, and the Maughan (MA) of mid-Meramecian to Chesterian age. MO is widespread and recognized within carbonate rocks of the Fitchville Formation and Joana Limestone. SA formed in concert with and to the east and south of the Wendover foreland high; the Delle phosphatic event marks maximum marine flooding during SA deposition. The transgressive systems tract of MA includes rhythmic-bedded limestone in the upper part of the Deseret Limestone in west-central Utah and, farther west, the hypoxic limestone and black shale of the Skunk Spring Limestone Bed and part of the overlying Chainman Shale. Traced westward into Nevada, MA first oversteps SA and then MO. Lithostratigraphic correlation of these sequences still farther west into the Eureka thrust belt (ETB) could mean that the youngest strata truncated by the Roberts Mountains thrust belong to the MA and that this thrust is simply part of the post-Mississippian ETB. However, some strata in central Nevada that lithically resemble those of the MA are paleontologically dated as Early Mississippian, the age of sequences overstepped by MA not far to the east. Thus, at least some imbricates of the ETB may contain a sequence stratigraphy which reflects local tectonic control.

  10. Sustainable nutrients recovery and recycling by optimizing the chemical addition sequence for struvite precipitation from raw swine slurries.

    PubMed

    Taddeo, Raffaele; Kolppo, Kari; Lepistö, Raghida

    2016-09-15

    Livestock farming contributes heavily to nitrogen (N) and phosphorus (P) flows into the environment, a major cause of eutrophication of coastal and freshwater systems. Furthermore, the growing demand for N-P fertilizers is increasing the emission of anthropogenic reactive N into the atmosphere and the depletion of the current P reserves. Therefore, it is essential to minimize the anthropogenic impact on the environment and recycle the wasted N-P for agricultural reuse. This study focused on enhancing struvite (MgNH4PO4*6H2O) precipitation from raw swine slurries in batch and laboratory-scale reactors. Different chemical addition sequences were evaluated, and the best removal efficiency (E%) was obtained when the chemicals were mixed before the precipitation process. Struvite was detected at a pH as low as 6 (E%N-P∼50%), and high E%N-P was found at pH 7-9.5 (80-95%). Furthermore, air stripping was used in place of NaOH to adjust pH, returning the same efficiency as if only alkali had been used. XRD and FE-SEM analysis of the precipitate showed that the recovered struvite was of high purity with orthorhombic crystalline structure and only trace amounts of impurities from matrix organics, co-precipitation products (CaO and amorphous calcium-phosphates), and residuals of added chemicals (MgO). PMID:27208994

  11. Analysis and Evaluation of Supersonic Underwing Heat Addition

    NASA Technical Reports Server (NTRS)

    Luidens, Roger W.; Flaherty, Richard J.

    1959-01-01

    The linearized theory for heat addition under a wing has been developed to optimize wing geometry, heat addition, and angle of attack. The optimum wing has all of the thickness on the underside of the airfoil, with maximum-thickness point well downstream, has a moderate thickness ratio, and operates at an optimum angle of attack. The heat addition is confined between the fore Mach waves from under the trailing surface of the wing. By linearized theory, a wing at optimum angle of attack may have a range efficiency about twice that of a wing at zero angle of attack. More rigorous calculations using the method of characteristics for particular flow models were made for heating under a flat-plate wing and for several wings with thickness, both with heat additions concentrated near the wing. The more rigorous calculations yield in practical cases efficiencies about half those estimated by linear theory. An analysis indicates that distributing the heat addition between the fore waves from the undertrailing portion of the wing is a way of improving the performance, and further calculations appear desirable. A comparison of the conventional ramjet-plus wing with underwing heat addition when the heat addition is concentrated near the wing shows the ramjet to be superior on a range basis up to Mach number of about B. The heat distribution under the wing and the assumed ramjet and airframe performance may have a marked effect on this conclusion. Underwing heat addition can be useful in providing high-altitude maneuver capability at high flight Mach numbers for an airplane powered by conventional ramjets during cruise.

  12. Sequence analysis and compositional properties of untranslated regions of human mRNAs.

    PubMed

    Pesole, G; Fiormarino, G; Saccone, C

    1994-03-25

    A detailed computer analysis of the untranslated regions, 5'-UTR and 3'-UTR, of human mRNA sequences is reported. The compositional properties of these regions, compared with those of the corresponding coding regions, indicate that 5'-UTR and 3'-UTR are less affected by the isochore compartmentalization than the corresponding third codon positions of mRNAs. The presence of higher functional constraints in 5'-UTR is also reported. Dinucleotide analysis shows a depletion of CpG and TpA in both sequences. A search for significant sequence motifs using the WORDUP algorithm reveals the patterns already known to have a functional role in the mRNA UTR, and several other motifs whose functional roles remain to be demonstrated. This type of analysis may be particularly useful for guiding site-directed mutagenesis experiments. In addition, it can be used for assessing the nature of anonymous sequences now produced in large amounts in megabase sequencing projects. PMID:8144029

  13. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis

    PubMed Central

    Santana-Quintero, Luis; Dingerdissen, Hayley; Thierry-Mieg, Jean; Mazumder, Raja; Simonyan, Vahan

    2014-01-01

    Due to the size of Next-Generation Sequencing data, the computational challenge of sequence alignment has been vast. Inexact alignments can take up to 90% of total CPU time in bioinformatics pipelines. High-performance Integrated Virtual Environment (HIVE), a cloud-based environment optimized for storage and analysis of extra-large data, presents an algorithmic solution: the HIVE-hexagon DNA sequence aligner. HIVE-hexagon implements novel approaches to exploit both characteristics of sequence space and CPU, RAM and Input/Output (I/O) architecture to quickly compute accurate alignments. Key components of HIVE-hexagon include non-redundification and sorting of sequences; floating diagonals of linearized dynamic programming matrices; and consideration of cross-similarity to minimize computations. Availability https://hive.biochemistry.gwu.edu/hive/ PMID:24918764

  14. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence. PMID:11237011

  15. High Throughput Sequence Analysis for Disease Resistance in Maize

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...

  16. ANALYSIS OF MPC ACCESS REQUIREMENTS FOR ADDITION OF FILLER MATERIALS

    SciTech Connect

    W. Wallin

    1996-09-03

    This analysis is prepared by the Mined Geologic Disposal System (MGDS) Waste Package Development Department (WPDD) in response to a request received via a QAP-3-12 Design Input Data Request (Ref. 5.1) from WAST Design (formerly MRSMPC Design). The request is to provide: Specific MPC access requirements for the addition of filler materials at the MGDS (i.e., location and size of access required). The objective of this analysis is to provide a response to the foregoing request. The purpose of this analysis is to provide a documented record of the basis for the response. The response is stated in Section 8 herein. The response is based upon requirements from an MGDS perspective.

  17. Error analysis of deep sequencing of phage libraries: peptides censored in sequencing.

    PubMed

    Matochko, Wadim L; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (Sa). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq). Sequencing without any bias and errors is Seq = Sa IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071

  18. MESSA: MEta-Server for protein Sequence Analysis

    PubMed Central

    2012-01-01

    Background Computational sequence analysis, that is, prediction of local sequence properties, homologs, spatial structure and function from the sequence of a protein, offers an efficient way to obtain needed information about proteins under study. Since reliable prediction is usually based on the consensus of many computer programs, meta-severs have been developed to fit such needs. Most meta-servers focus on one aspect of sequence analysis, while others incorporate more information, such as PredictProtein for local sequence feature predictions, SMART for domain architecture and sequence motif annotation, and GeneSilico for secondary and spatial structure prediction. However, as predictions of local sequence properties, three-dimensional structure and function are usually intertwined, it is beneficial to address them together. Results We developed a MEta-Server for protein Sequence Analysis (MESSA) to facilitate comprehensive protein sequence analysis and gather structural and functional predictions for a protein of interest. For an input sequence, the server exploits a number of select tools to predict local sequence properties, such as secondary structure, structurally disordered regions, coiled coils, signal peptides and transmembrane helices; detect homologous proteins and assign the query to a protein family; identify three-dimensional structure templates and generate structure models; and provide predictive statements about the protein's function, including functional annotations, Gene Ontology terms, enzyme classification and possible functionally associated proteins. We tested MESSA on the proteome of Candidatus Liberibacter asiaticus. Manual curation shows that three-dimensional structure models generated by MESSA covered around 75% of all the residues in this proteome and the function of 80% of all proteins could be predicted. Availability MESSA is free for non-commercial use at http://prodata.swmed.edu/MESSA/ PMID:23031578

  19. Designing novel kinases using evolutionary sequence analysis

    NASA Astrophysics Data System (ADS)

    Mody, Areez; Weiner, Joan; Iyer, Lakshman; Ramanathan, Sharad

    2006-03-01

    Cellular pathways with new functions are thought to arise from the duplication and divergence of proteins in existing pathways. The MAP kinase pathways in eukaryotes provide one example of this. These pathways consist of the MAP kinase proteins which are responsible for evoking the correct response to external stimuli. In the yeast Saccharomyces cerevisiae these pathways detect pheromones, osmolar stresses and nutrient levels, leading the cell into dramatic changes of morphology. Despite being homologous to each other, the MAP kinase proteins show specificity of function. We investigate the nature of the amino acid sequences conferring this specificity. To this end, we i) search the sequences of similar proteins in other Eukaryote species, ii) make a study of simple theoretical models exploring the constraints felt by these protein segments and iii) experimentally construct, a large suite of hybrid proteins made of segments taken from the homologous proteins. These are then expressed in Yeast cells to see what function they are able to perform. Particularly we also ask whether it is possible to design a new kinase protein possessing new function and specificity.

  20. Analysis of Metagenomic Sequences: From Megabases to Terabases

    SciTech Connect

    Krypides, Nikos

    2010-06-04

    Nikos Krypides of the DOE Joint Genome Institute discusses metagenomics and the challenge of dealing with terabases of data on June 4, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  1. Analysis of expressed sequence tags (ESTs) from Agrostis species obtained using sequence related amplified polymorphism.

    PubMed

    Dinler, Gizem; Budak, Hikmet

    2008-10-01

    Bentgrass (Agrostis spp.), a genus of the Poaceae family, consists of more than 200 species and is mainly used in athletic fields and golf courses. Creeping bentgrass (A. stolonifera L.) is the most commonly used species in maintaining golf courses, followed by colonial bentgrass (A. capillaris L.) and velvet bentgrass (A. canina L.). The presence and nature of sequence related amplified polymorphism (SRAP) at the cDNA level were investigated. We isolated 80 unique cDNA fragment bands from these species using 56 SRAP primer combinations. Sequence analysis of cDNA clones and analysis of putative translation products revealed that some encoded amino acid sequences were similar to proteins involved in DNA synthesis, transcription, and signal transduction. The cytosolic glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene (GenBank accession no. EB812822) was also identified from velvet bentgrass, and the corresponding protein sequence is further analyzed due to its critical role in many cellular processes. The partial peptide sequence obtained was 112 amino acids long, presenting a high degree of homology to parts of the N-terminal and C-terminal regions of cytosolic phosphorylating GAPDH (GapC). The existence of common expressed sequence tags (ESTs) revealed by a minimum evolutionary dendrogram among the Agrostis ESTs indicated the usefulness of SRAP for comparative genome analysis of transcribed genes in the grass species. PMID:18726683

  2. Spectral Envelopes and Additive + Residual Analysis/Synthesis

    NASA Astrophysics Data System (ADS)

    Rodet, Xavier; Schwarz, Diemo

    The subject of this chapter is the estimation, representation, modification, and use of spectral envelopes in the context of sinusoidal-additive-plus-residual analysis/synthesis. A spectral envelope is an amplitude-vs-frequency function, which may be obtained from the envelope of a short-time spectrum (Rodet et al., 1987; Schwarz, 1998). [Precise definitions of such an envelope and short-time spectrum (STS) are given in Section 2.] The additive-plus-residual analysis/synthesis method is based on a representation of signals in terms of a sum of time-varying sinusoids and of a non-sinusoidal residual signal [e.g., see Serra (1989), Laroche et al. (1993), McAulay and Quatieri (1995), and Ding and Qian (1997)]. Many musical sound signals may be described as a combination of a nearly periodic waveform and colored noise. The nearly periodic part of the signal can be viewed as a sum of sinusoidal components, called partials, with time-varying frequency and amplitude. Such sinusoidal components are easily observed on a spectral analysis display (Fig. 5.1) as obtained, for instance, from a discrete Fourier transform.

  3. Synthesis of a Fluorescent Acridone Using a Grignard Addition, Oxidation, and Nucleophilic Aromatic Substitution Reaction Sequence

    ERIC Educational Resources Information Center

    Goodrich, Samuel; Patel, Miloni; Woydziak, Zachary R.

    2015-01-01

    A three-pot synthesis oriented for an undergraduate organic chemistry laboratory was developed to construct a fluorescent acridone molecule. This laboratory experiment utilizes Grignard addition to an aldehyde, alcohol oxidation, and iterative nucleophilic aromatic substitution steps to produce the final product. Each of the intermediates and the…

  4. The DNA sequence and comparative analysis of human chromosome 10.

    PubMed

    Deloukas, P; Earthrowl, M E; Grafham, D V; Rubenfield, M; French, L; Steward, C A; Sims, S K; Jones, M C; Searle, S; Scott, C; Howe, K; Hunt, S E; Andrews, T D; Gilbert, J G R; Swarbreck, D; Ashurst, J L; Taylor, A; Battles, J; Bird, C P; Ainscough, R; Almeida, J P; Ashwell, R I S; Ambrose, K D; Babbage, A K; Bagguley, C L; Bailey, J; Banerjee, R; Bates, K; Beasley, H; Bray-Allen, S; Brown, A J; Brown, J Y; Burford, D C; Burrill, W; Burton, J; Cahill, P; Camire, D; Carter, N P; Chapman, J C; Clark, S Y; Clarke, G; Clee, C M; Clegg, S; Corby, N; Coulson, A; Dhami, P; Dutta, I; Dunn, M; Faulkner, L; Frankish, A; Frankland, J A; Garner, P; Garnett, J; Gribble, S; Griffiths, C; Grocock, R; Gustafson, E; Hammond, S; Harley, J L; Hart, E; Heath, P D; Ho, T P; Hopkins, B; Horne, J; Howden, P J; Huckle, E; Hynds, C; Johnson, C; Johnson, D; Kana, A; Kay, M; Kimberley, A M; Kershaw, J K; Kokkinaki, M; Laird, G K; Lawlor, S; Lee, H M; Leongamornlert, D A; Laird, G; Lloyd, C; Lloyd, D M; Loveland, J; Lovell, J; McLaren, S; McLay, K E; McMurray, A; Mashreghi-Mohammadi, M; Matthews, L; Milne, S; Nickerson, T; Nguyen, M; Overton-Larty, E; Palmer, S A; Pearce, A V; Peck, A I; Pelan, S; Phillimore, B; Porter, K; Rice, C M; Rogosin, A; Ross, M T; Sarafidou, T; Sehra, H K; Shownkeen, R; Skuce, C D; Smith, M; Standring, L; Sycamore, N; Tester, J; Thorpe, A; Torcasso, W; Tracey, A; Tromans, A; Tsolas, J; Wall, M; Walsh, J; Wang, H; Weinstock, K; West, A P; Willey, D L; Whitehead, S L; Wilming, L; Wray, P W; Young, L; Chen, Y; Lovering, R C; Moschonas, N K; Siebert, R; Fechtel, K; Bentley, D; Durbin, R; Hubbard, T; Doucette-Stamm, L; Beck, S; Smith, D R; Rogers, J

    2004-05-27

    The finished sequence of human chromosome 10 comprises a total of 131,666,441 base pairs. It represents 99.4% of the euchromatic DNA and includes one megabase of heterochromatic sequence within the pericentromeric region of the short and long arm of the chromosome. Sequence annotation revealed 1,357 genes, of which 816 are protein coding, and 430 are pseudogenes. We observed widespread occurrence of overlapping coding genes (either strand) and identified 67 antisense transcripts. Our analysis suggests that both inter- and intrachromosomal segmental duplications have impacted on the gene count on chromosome 10. Multispecies comparative analysis indicated that we can readily annotate the protein-coding genes with current resources. We estimate that over 95% of all coding exons were identified in this study. Assessment of single base changes between the human chromosome 10 and chimpanzee sequence revealed nonsense mutations in only 21 coding genes with respect to the human sequence. PMID:15164054

  5. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  6. Deep sequencing and human antibody repertoire analysis.

    PubMed

    Boyd, Scott D; Crowe, James E

    2016-06-01

    In the past decade, high-throughput DNA sequencing (HTS) methods and improved approaches for isolating antigen-specific B cells and their antibody genes have been applied in many areas of human immunology. This work has greatly increased our understanding of human antibody repertoires and the specific clones responsible for protective immunity or immune-mediated pathogenesis. Although the principles underlying selection of individual B cell clones in the intact immune system are still under investigation, the combination of more powerful genetic tracking of antibody lineage development and functional testing of the encoded proteins promises to transform therapeutic antibody discovery and optimization. Here, we highlight recent advances in this fast-moving field. PMID:27065089

  7. Inference of Splicing Regulatory Activities by Sequence Neighborhood Analysis

    PubMed Central

    Stadler, Michael B; Shomron, Noam; Yeo, Gene W; Schneider, Aniket; Xiao, Xinshu; Burge, Christopher B

    2006-01-01

    Sequence-specific recognition of nucleic-acid motifs is critical to many cellular processes. We have developed a new and general method called Neighborhood Inference (NI) that predicts sequences with activity in regulating a biochemical process based on the local density of known sites in sequence space. Applied to the problem of RNA splicing regulation, NI was used to predict hundreds of new exonic splicing enhancer (ESE) and silencer (ESS) hexanucleotides from known human ESEs and ESSs. These predictions were supported by cross-validation analysis, by analysis of published splicing regulatory activity data, by sequence-conservation analysis, and by measurement of the splicing regulatory activity of 24 novel predicted ESEs, ESSs, and neutral sequences using an in vivo splicing reporter assay. These results demonstrate the ability of NI to accurately predict splicing regulatory activity and show that the scope of exonic splicing regulatory elements is substantially larger than previously anticipated. Analysis of orthologous exons in four mammals showed that the NI score of ESEs, a measure of function, is much more highly conserved above background than ESE primary sequence. This observation indicates a high degree of selection for ESE activity in mammalian exons, with surprisingly frequent interchangeability between ESE sequences. PMID:17121466

  8. Accident Sequence Evaluation Program: Human reliability analysis procedure

    SciTech Connect

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs.

  9. Sequencing and Analysis of Neanderthal Genomic DNA

    SciTech Connect

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  10. Sequence and comparative genomic analysis of actin-related proteins.

    PubMed

    Muller, Jean; Oma, Yukako; Vallar, Laurent; Friederich, Evelyne; Poch, Olivier; Winsor, Barbara

    2005-12-01

    Actin-related proteins (ARPs) are key players in cytoskeleton activities and nuclear functions. Two complexes, ARP2/3 and ARP1/11, also known as dynactin, are implicated in actin dynamics and in microtubule-based trafficking, respectively. ARP4 to ARP9 are components of many chromatin-modulating complexes. Conventional actins and ARPs codefine a large family of homologous proteins, the actin superfamily, with a tertiary structure known as the actin fold. Because ARPs and actin share high sequence conservation, clear family definition requires distinct features to easily and systematically identify each subfamily. In this study we performed an in depth sequence and comparative genomic analysis of ARP subfamilies. A high-quality multiple alignment of approximately 700 complete protein sequences homologous to actin, including 148 ARP sequences, allowed us to extend the ARP classification to new organisms. Sequence alignments revealed conserved residues, motifs, and inserted sequence signatures to define each ARP subfamily. These discriminative characteristics allowed us to develop ARPAnno (http://bips.u-strasbg.fr/ARPAnno), a new web server dedicated to the annotation of ARP sequences. Analyses of sequence conservation among actins and ARPs highlight part of the actin fold and suggest interactions between ARPs and actin-binding proteins. Finally, analysis of ARP distribution across eukaryotic phyla emphasizes the central importance of nuclear ARPs, particularly the multifunctional ARP4. PMID:16195354

  11. DNA sequence analysis with droplet-based microfluidics

    PubMed Central

    Abate, Adam R.; Hung, Tony; Sperling, Ralph A.; Mary, Pascaline; Rotem, Assaf; Agresti, Jeremy J.; Weiner, Michael A.; Weitz, David A.

    2014-01-01

    Droplet-based microfluidic techniques can form and process micrometer scale droplets at thousands per second. Each droplet can house an individual biochemical reaction, allowing millions of reactions to be performed in minutes with small amounts of total reagent. This versatile approach has been used for engineering enzymes, quantifying concentrations of DNA in solution, and screening protein crystallization conditions. Here, we use it to read the sequences of DNA molecules with a FRET-based assay. Using probes of different sequences, we interrogate a target DNA molecule for polymorphisms. With a larger probe set, additional polymorphisms can be interrogated as well as targets of arbitrary sequence. PMID:24185402

  12. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw. PMID:20478825

  13. Effect of solvent addition sequence on lycopene extraction efficiency from membrane neutralized caustic peeled tomato waste.

    PubMed

    Phinney, David M; Frelka, John C; Cooperstone, Jessica L; Schwartz, Steven J; Heldman, Dennis R

    2017-01-15

    Lycopene is a high value nutraceutical and its isolation from waste streams is often desirable to maximize profits. This research investigated solvent addition order and composition on lycopene extraction efficiency from a commercial tomato waste stream (pH 12.5, solids ∼5%) that was neutralized using membrane filtration. Constant volume dilution (CVD) was used to desalinate the caustic salt to neutralize the waste. Acetone, ethanol and hexane were used as direct or blended additions. Extraction efficiency was defined as the amount of lycopene extracted divided by the total lycopene in the sample. The CVD operation reduced the active alkali of the waste from 0.66 to <0.01M and the moisture content of the pulp increased from 93% to 97% (wet basis), showing the removal of caustic salts from the waste. Extraction efficiency varied from 32.5% to 94.5%. This study demonstrates a lab scale feasibility to extract lycopene efficiently from tomato processing byproducts. PMID:27542486

  14. A Software System for Data Analysis in Automated DNA Sequencing

    PubMed Central

    Giddings, Michael C.; Severin, Jessica; Westphall, Michael; Wu, Jiazhen; Smith, Lloyd M.

    1998-01-01

    Software for gel image analysis and base-calling in fluorescence-based sequencing consisting of two primary programs, BaseFinder and GelImager, is described. BaseFinder is a framework for trace processing, analysis, and base-calling. BaseFinder is highly extensible, allowing the addition of trace analysis and processing modules without recompilation. Powerful scripting capabilities combined with modularity and multilane handling allow the user to customize BaseFinder to virtually any type of trace processing. We have developed an extensive set of data processing and analysis modules for use with the program in fluorescence-based sequencing. GelImager is a framework for gel image manipulation. It can be used for gel visualization, lane retracking, and as a front end to the Washington University Getlanes program. The programs were designed using a cross-platform development environment, currently allowing them to run in Windows NT, Windows 95, Openstep/Mach, and Rhapsody. Work is ongoing to deploy the software on additional platforms, including Solaris, Linux, and MacOS. This software has been thoroughly tested and debugged in the analysis of >2 million bp of raw sequence data from human chromosome 19 region q13. Overall sequencing accuracy was measured using a significant subset of these data, consisting of ∼600 sequences, by comparing the individual shotgun sequences against the final assembled contigs. Also, results are reported from experiments that analyzed the accuracy of the software and two other well-known base-calling programs for sequencing the M13mp18 vector sequence. [The sequence data described in this paper have been submitted to the GenBank data library under accession no. AF025422] PMID:9647639

  15. Genomic-scale comparison of sequence- and structure-based methods of function prediction: Does structure provide additional insight?

    PubMed Central

    Fetrow, Jacquelyn S.; Siew, Naomi; Di Gennaro, Jeannine A.; Martinez-Yamout, Maria; Dyson, H. Jane; Skolnick, Jeffrey

    2001-01-01

    A function annotation method using the sequence-to-structure-to-function paradigm is applied to the identification of all disulfide oxidoreductases in the Saccharomyces cerevisiae genome. The method identifies 27 sequences as potential disulfide oxidoreductases. All previously known thioredoxins, glutaredoxins, and disulfide isomerases are correctly identified. Three of the 27 predictions are probable false-positives. Three novel predictions, which subsequently have been experimentally validated, are presented. Two additional novel predictions suggest a disulfide oxidoreductase regulatory mechanism for two subunits (OST3 and OST6) of the yeast oligosaccharyltransferase complex. Based on homology, this prediction can be extended to a potential tumor suppressor gene, N33, in humans, whose biochemical function was not previously known. Attempts to obtain a folded, active N33 construct to test the prediction were unsuccessful. The results show that structure prediction coupled with biochemically relevant structural motifs is a powerful method for the function annotation of genome sequences and can provide more detailed, robust predictions than function prediction methods that rely on sequence comparison alone. PMID:11316881

  16. Applying machine learning techniques to DNA sequence analysis

    SciTech Connect

    Shavlik, J.W.

    1992-01-01

    We are developing a machine learning system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being learned. Using this information (which we call a domain theory''), our learning algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, the KBANN algorithm maps inference rules, such as consensus sequences, into a neural (connectionist) network. Neural network training techniques then use the training examples of refine these inference rules. We have been applying this approach to several problems in DNA sequence analysis and have also been extending the capabilities of our learning system along several dimensions.

  17. Identification of Medically Important Yeast Species by Sequence Analysis of the Internal Transcribed Spacer Regions

    PubMed Central

    Leaw, Shiang Ning; Chang, Hsien Chang; Sun, Hsiao Fang; Barton, Richard; Bouchara, Jean-Philippe; Chang, Tsung Chain

    2006-01-01

    Infections caused by yeasts have increased in previous decades due primarily to the increasing population of immunocompromised patients. In addition, infections caused by less common species such as Pichia, Rhodotorula, Trichosporon, and Saccharomyces spp. have been widely reported. This study extensively evaluated the feasibility of sequence analysis of the rRNA gene internal transcribed spacer (ITS) regions for the identification of yeasts of clinical relevance. Both the ITS1 and ITS2 regions of 373 strains (86 species), including 299 reference strains and 74 clinical isolates, were amplified by PCR and sequenced. The sequences were compared to reference data available at the GenBank database by using BLAST (basic local alignment search tool) to determine if species identification was possible by ITS sequencing. Since the GenBank database currently lacks ITS sequence entries for some yeasts, the ITS sequences of type (or reference) strains of 15 species were submitted to GenBank to facilitate identification of these species. Strains producing discrepant identifications between the conventional methods and ITS sequence analysis were further analyzed by sequencing of the D1-D2 domain of the large-subunit rRNA gene for species clarification. The rates of correct identification by ITS1 and ITS2 sequence analysis were 96.8% (361/373) and 99.7% (372/373), respectively. Of the 373 strains tested, only 1 strain (Rhodotorula glutinis BCRC 20576) could not be identified by ITS2 sequence analysis. In conclusion, identification of medically important yeasts by ITS sequencing, especially using the ITS2 region, is reliable and can be used as an accurate alternative to conventional identification methods. PMID:16517841

  18. Transcriptome Sequencing and Positive Selected Genes Analysis of Bombyx mandarina

    PubMed Central

    Wu, Yuqian; Long, Renwen; Liu, Chun; Xia, Qingyou

    2015-01-01

    The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG) and posterior silk gland (PSG). Three sericin genes (sericin 1, sericin 2, and sericin 3) were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25) were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs) and 361 insertion-deletions (INDELs) were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research. PMID:25806526

  19. Transcriptome sequencing and positive selected genes analysis of Bombyx mandarina.

    PubMed

    Cheng, Tingcai; Fu, Bohua; Wu, Yuqian; Long, Renwen; Liu, Chun; Xia, Qingyou

    2015-01-01

    The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG) and posterior silk gland (PSG). Three sericin genes (sericin 1, sericin 2, and sericin 3) were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25) were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs) and 361 insertion-deletions (INDELs) were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research. PMID:25806526

  20. Comprehensive analysis of sequences of a protein switch.

    PubMed

    Chen, Szu-Hua; Meller, Jaroslaw; Elber, Ron

    2016-01-01

    Switches form a special class of proteins that dramatically change their three-dimensional structures upon a small perturbation. One possible perturbation that we explore is that of a single point mutation. Building on the pioneering experimental work of Alexander et al. (Alexander et al. PNAS, 2007; 104,11963-11968) that determines switch sequences between α and α+β folds we conduct a comprehensive sequence sampling by a Markov Chain with multiple fitness criteria to identify new switches given the experimental folds. We screen for switch sequences using a combination of contact potential, secondary structure prediction, and finally molecular dynamics simulations. Statistical properties of switch sequences are discussed and illustrated to be most sensitive to mutation at the N- and C- termini of the switch protein. Based on this analysis, a particularly stable putative switch pair is identified and proposed for further experimental analysis. PMID:26073558

  1. Deep Sequencing Analysis of Nucleolar Small RNAs: Bioinformatics.

    PubMed

    Bai, Baoyan; Laiho, Marikki

    2016-01-01

    Small RNAs (size 20-30 nt) of various types have been actively investigated in recent years, and their subcellular compartmentalization and relative concentrations are likely to be of importance to their cellular and physiological functions. Comprehensive data on this subset of the transcriptome can only be obtained by application of high-throughput sequencing, which yields data that are inherently complex and multidimensional, as sequence composition, length, and abundance will all inform to the small RNA function. Subsequent data analysis, hypothesis testing, and presentation/visualization of the results are correspondingly challenging. We have constructed small RNA libraries derived from different cellular compartments, including the nucleolus, and asked whether small RNAs exist in the nucleolus and whether they are distinct from cytoplasmic and nuclear small RNAs, the miRNAs. Here, we present a workflow for analysis of small RNA sequencing data generated by the Ion Torrent PGM sequencer from samples derived from different cellular compartments. PMID:27576724

  2. Food Fish Identification from DNA Extraction through Sequence Analysis

    ERIC Educational Resources Information Center

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  3. Basic Sequence Analysis Techniques for Use with Audit Trail Data

    ERIC Educational Resources Information Center

    Judd, Terry; Kennedy, Gregor

    2008-01-01

    Audit trail analysis can provide valuable insights to researchers and evaluators interested in comparing and contrasting designers' expectations of use and students' actual patterns of use of educational technology environments (ETEs). Sequence analysis techniques are particularly effective but have been neglected to some extent because of real…

  4. Streamlined analysis of duplex sequencing data with Du Novo.

    PubMed

    Stoler, Nicholas; Arbeithuber, Barbara; Guiblet, Wilfried; Makova, Kateryna D; Nekrutenko, Anton

    2016-01-01

    Duplex sequencing was originally developed to detect rare nucleotide polymorphisms normally obscured by the noise of high-throughput sequencing. Here we describe a new, streamlined, reference-free approach for the analysis of duplex sequencing data. We show the approach performs well on simulated data and precisely reproduces previously published results and apply it to a newly produced dataset, enabling us to type low-frequency variants in human mitochondrial DNA. Finally, we provide all necessary tools as stand-alone components as well as integrate them into the Galaxy platform. All analyses performed in this manuscript can be repeated exactly as described at http://usegalaxy.org/duplex . PMID:27566673

  5. Complete sequence and genomic analysis of murine gammaherpesvirus 68.

    PubMed Central

    Virgin, H W; Latreille, P; Wamsley, P; Hallsworth, K; Weck, K E; Dal Canto, A J; Speck, S H

    1997-01-01

    Murine gammaherpesvirus 68 (gammaHV68) infects mice, thus providing a tractable small-animal model for analysis of the acute and chronic pathogenesis of gammaherpesviruses. To facilitate molecular analysis of gammaHV68 pathogenesis, we have sequenced the gammaHV68 genome. The genome contains 118,237 bp of unique sequence flanked by multiple copies of a 1,213-bp terminal repeat. The GC content of the unique portion of the genome is 46%, while the GC content of the terminal repeat is 78%. The unique portion of the genome is estimated to encode at least 80 genes and is largely colinear with the genomes of Kaposi's sarcoma herpesvirus (KSHV; also known as human herpesvirus 8), herpesvirus saimiri (HVS), and Epstein-Barr virus (EBV). We detected 63 open reading frames (ORFs) homologous to HVS and KSHV ORFs and used the HVS/KSHV numbering system to designate these ORFs. gammaHV68 shares with HVS and KSHV ORFs homologous to a complement regulatory protein (ORF 4), a D-type cyclin (ORF 72), and a G-protein-coupled receptor with close homology to the interleukin-8 receptor (ORF 74). One ORF (K3) was identified in gammaHV68 as homologous to both ORFs K3 and K5 of KSHV and contains a domain found in a bovine herpesvirus 4 major immediate-early protein. We also detected 16 methionine-initiated ORFs predicted to encode proteins at least 100 amino acids in length that are unique to gammaHV68 (ORFs M1 to 14). ORF M1 has striking homology to poxvirus serpins, while ORF M11 encodes a potential homolog of Bcl-2-like molecules encoded by other gammaherpesviruses (gene 16 of HVS and KSHV and the BHRF1 gene of EBV). In addition, clustered at the left end of the unique region are eight sequences with significant homology to bacterial tRNAs. The unique region of the genome contains two internal repeats: a 40-bp repeat located between bp 26778 and 28191 in the genome and a 100-bp repeat located between bp 98981 and 101170. Analysis of the gammaHV68, HVS, EBV, and KSHV genomes demonstrated

  6. Sensitivity analysis of geometric errors in additive manufacturing medical models.

    PubMed

    Pinto, Jose Miguel; Arrieta, Cristobal; Andia, Marcelo E; Uribe, Sergio; Ramos-Grez, Jorge; Vargas, Alex; Irarrazaval, Pablo; Tejos, Cristian

    2015-03-01

    Additive manufacturing (AM) models are used in medical applications for surgical planning, prosthesis design and teaching. For these applications, the accuracy of the AM models is essential. Unfortunately, this accuracy is compromised due to errors introduced by each of the building steps: image acquisition, segmentation, triangulation, printing and infiltration. However, the contribution of each step to the final error remains unclear. We performed a sensitivity analysis comparing errors obtained from a reference with those obtained modifying parameters of each building step. Our analysis considered global indexes to evaluate the overall error, and local indexes to show how this error is distributed along the surface of the AM models. Our results show that the standard building process tends to overestimate the AM models, i.e. models are larger than the original structures. They also show that the triangulation resolution and the segmentation threshold are critical factors, and that the errors are concentrated at regions with high curvatures. Errors could be reduced choosing better triangulation and printing resolutions, but there is an important need for modifying some of the standard building processes, particularly the segmentation algorithms. PMID:25649961

  7. Mapping and Initial Analysis of Human Subtelomeric Sequence Assemblies

    PubMed Central

    Riethman, Harold; Ambrosini, Anthony; Castaneda, Carlos; Finklestein, Jeffrey; Hu, Xue-Lan; Mudunuri, Uma; Paul, Sheila; Wei, Jun

    2004-01-01

    Physical mapping data were combined with public draft and finished sequences to derive subtelomeric sequence assemblies for each of the 41 genetically distinct human telomere regions. Sequence gaps that remain on the reference telomeres are generally small,well-defined,and for the most part,restricted to regions directly adjacent to the terminal (TTAGGG)n tract. Of the 20.66 Mb of subtelomeric DNA analyzed, 3.01 Mb are subtelomeric repeat sequences (Srpt),and an additional 2.11 Mb are segmental duplications. The subtelomeric sequence assemblies are enriched >25-fold in short,internal (TTAGGG)n-like sequences relative to the rest of the genome; a total of 114 (TTAGGG)n-like islands were found,55 within Srpt regions,35 within one-copy regions,11 at one-copy/Srpt or Srpt/segmental duplication boundaries,and 13 at the telomeric ends of assemblies. Transcripts were annotated in each assembly,noting their mapping coordinates relative to their respective telomere and whether they originate in duplicated DNA or single-copy DNA. A total of 697 transcripts were found in 15.53 Mb of one-copy DNA,76 transcripts in 2.11 Mb of segmentally duplicated DNA,and 168 transcripts in 3.01 Mb of Srpt sequence. This overall transcript density is similar (within ∼10%) to that found genome-wide. Zinc finger-containing genes and olfactory receptor genes are duplicated within and between multiple telomere regions. PMID:14707167

  8. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Athavale, Ajay

    2012-06-01

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  9. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Athavale, Ajay [Monsanto

    2013-01-25

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  10. An Organocatalytic Asymmetric Friedel-Crafts Addition/Fluorination Sequence: Construction of Oxindole-Pyrazolone Conjugates Bearing Vicinal Tetrasubstituted Stereocenters.

    PubMed

    Bao, Xiaoze; Wang, Baomin; Cui, Longchen; Zhu, Guodong; He, Yuli; Qu, Jingping; Song, Yuming

    2015-11-01

    A highly efficient and practical one-pot sequential process, consisting of an organocatalytic enantioselective Friedel-Crafts-type addition of 4-nonsubstituted pyrazolones to isatin-derived N-Boc ketimines and a subsequent diastereoselective fluorination of the pyrazolone moiety, is developed. This reaction sequence delivers novel oxindole-pyrazolone adducts featuring vicinal tetrasubstituted stereocenters with a 0.5 mol % catalyst loading in high yield with excellent enantio- and diastereocontrol. Notably, chloro, bromo, and thioether functionalities can be readily incorporated, rendering a broad diversity of the product. PMID:26473513

  11. Biotechnological Strains of Komagataella (Pichia) pastoris are Komagataella phaffii as Determined from Multigene Sequence Analysis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Pichia pastoris was reassigned earlier to the genus Komagataella following phylogenetic analysis of gene sequences. Since that time, two additional species of Komagataella have been described, K. pseudopastoris and K. phaffii. Because these three species are unlikely to be resolved from the standa...

  12. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers

    PubMed Central

    Pabinger, Stephan; Ernst, Karina; Pulverer, Walter; Kallmeyer, Rainer; Valdes, Ana M.; Metrustry, Sarah; Katic, Denis; Nuzzo, Angelo; Kriegner, Albert; Vierlinger, Klemens; Weinhaeusel, Andreas

    2016-01-01

    Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely

  13. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers.

    PubMed

    Pabinger, Stephan; Ernst, Karina; Pulverer, Walter; Kallmeyer, Rainer; Valdes, Ana M; Metrustry, Sarah; Katic, Denis; Nuzzo, Angelo; Kriegner, Albert; Vierlinger, Klemens; Weinhaeusel, Andreas

    2016-01-01

    Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely

  14. Multilocus sequence analysis and rpoB sequencing of Mycobacterium abscessus (sensu lato) strains.

    PubMed

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-02-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536(T), M. massiliense CIP 108297(T), and M. bolletii CIP 108541(T)) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the

  15. DNA sequence and analysis of human chromosome 18.

    PubMed

    Nusbaum, Chad; Zody, Michael C; Borowsky, Mark L; Kamal, Michael; Kodira, Chinnappa D; Taylor, Todd D; Whittaker, Charles A; Chang, Jean L; Cuomo, Christina A; Dewar, Ken; FitzGerald, Michael G; Yang, Xiaoping; Abouelleil, Amr; Allen, Nicole R; Anderson, Scott; Bloom, Toby; Bugalter, Boris; Butler, Jonathan; Cook, April; DeCaprio, David; Engels, Reinhard; Garber, Manuel; Gnirke, Andreas; Hafez, Nabil; Hall, Jennifer L; Norman, Catherine Hosage; Itoh, Takehiko; Jaffe, David B; Kuroki, Yoko; Lehoczky, Jessica; Lui, Annie; Macdonald, Pendexter; Mauceli, Evan; Mikkelsen, Tarjei S; Naylor, Jerome W; Nicol, Robert; Nguyen, Cindy; Noguchi, Hideki; O'Leary, Sinéad B; O'Neill, Keith; Piqani, Bruno; Smith, Cherylyn L; Talamas, Jessica A; Topham, Kerri; Totoki, Yasushi; Toyoda, Atsushi; Wain, Hester M; Young, Sarah K; Zeng, Qiandong; Zimmer, Andrew R; Fujiyama, Asao; Hattori, Masahira; Birren, Bruce W; Sakaki, Yoshiyuki; Lander, Eric S

    2005-09-22

    Chromosome 18 appears to have the lowest gene density of any human chromosome and is one of only three chromosomes for which trisomic individuals survive to term. There are also a number of genetic disorders stemming from chromosome 18 trisomy and aneuploidy. Here we report the finished sequence and gene annotation of human chromosome 18, which will allow a better understanding of the normal and disease biology of this chromosome. Despite the low density of protein-coding genes on chromosome 18, we find that the proportion of non-protein-coding sequences evolutionarily conserved among mammals is close to the genome-wide average. Extending this analysis to the entire human genome, we find that the density of conserved non-protein-coding sequences is largely uncorrelated with gene density. This has important implications for the nature and roles of non-protein-coding sequence elements. PMID:16177791

  16. Nonparametric survival analysis using Bayesian Additive Regression Trees (BART).

    PubMed

    Sparapani, Rodney A; Logan, Brent R; McCulloch, Robert E; Laud, Purushottam W

    2016-07-20

    Bayesian additive regression trees (BART) provide a framework for flexible nonparametric modeling of relationships of covariates to outcomes. Recently, BART models have been shown to provide excellent predictive performance, for both continuous and binary outcomes, and exceeding that of its competitors. Software is also readily available for such outcomes. In this article, we introduce modeling that extends the usefulness of BART in medical applications by addressing needs arising in survival analysis. Simulation studies of one-sample and two-sample scenarios, in comparison with long-standing traditional methods, establish face validity of the new approach. We then demonstrate the model's ability to accommodate data from complex regression models with a simulation study of a nonproportional hazards scenario with crossing survival functions and survival function estimation in a scenario where hazards are multiplicatively modified by a highly nonlinear function of the covariates. Using data from a recently published study of patients undergoing hematopoietic stem cell transplantation, we illustrate the use and some advantages of the proposed method in medical investigations. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26854022

  17. Analysis of expressed sequence tags from the Ulva prolifera (Chlorophyta)

    NASA Astrophysics Data System (ADS)

    Niu, Jianfeng; Hu, Haiyan; Hu, Songnian; Wang, Guangce; Peng, Guang; Sun, Song

    2010-01-01

    In 2008, a green tide broke out before the sailing competition of the 29th Olympic Games in Qingdao. The causative species was determined to be Enteromorpha prolifera ( Ulva prolifera O. F. Müller), a familiar green macroalga along the coastline of China. Rapid accumulation of a large biomass of floating U. prolifera prompted research on different aspects of this species. In this study, we constructed a nonnormalized cDNA library from the thalli of U. prolifera and acquired 10 072 high-quality expressed sequence tags (ESTs). These ESTs were assembled into 3 519 nonredundant gene groups, including 1 446 clusters and 2 073 singletons. After annotation with the nr database, a large number of genes were found to be related with chloroplast and ribosomal protein, GO functional classification showed 1 418 ESTs participated in photosynthesis and 1 359 ESTs were responsible for the generation of precursor metabolites and energy. In addition, rather comprehensive carbon fixation pathways were found in U. prolifera using KEGG. Some stress-related and signal transduction-related genes were also found in this study. All the evidences displayed that U. prolifera had substance and energy foundation for the intense photosynthesis and the rapid proliferation. Phylogenetic analysis of cytochrome c oxidase subunit I revealed that this green-tide causative species is most closely affiliated to Pseudendoclonium akinetum (Ulvophyceae).

  18. Initial sequence and comparative analysis of the cat genome

    PubMed Central

    Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.

    2007-01-01

    The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172

  19. Complete VAX/VMS DNA/protein sequence analysis system

    SciTech Connect

    Smith, D.W.

    1987-05-01

    A complete yet flexible system of programs and database libraries for analysis of DNA, RNA and protein sequences is implemented for VAX/VMS computers. Types of analysis include 1) construction and analysis of chimeric sequences (cloning in the VAX), 2) multiple analysis of one or more single sequences, 3) search and comparison studies using sequence libraries, and 4) direct input and analysis of experimental data. Published groups of programs, including the Staden, Los Alamos, Zuker, Pearson, and PHYLIP programs, are used. GenBank and EMBL DNA libraries and PIR and Doolittle NEWAT protein libraries are available, with associated programs. The system is tutorial, with online documentation for relevent VAX software, the programs, and the databases. The complete documentation is flexibly maintained on reserve via computer printout placed in 3-ring binders. Command files are used extensively; porting of the entire system to another VAX/VMS system requires modification of a single command. Users of the system are members of a VAX group, with automatic implementation of the system upon login. The present system occupies about 140,000 blocks, and is easily expanded, or contracted, as desired. The UCSD system is used extensively for both teaching and research purposes. Use of microcomputers emulating Tektronix 4014 graphics terminals permits saving of graphics output to disk for subsequent modification to generate high quality publishable figures.

  20. Precessing rotating flows with additional shear: Stability analysis

    NASA Astrophysics Data System (ADS)

    Salhi, A.; Cambon, C.

    2009-03-01

    We consider unbounded precessing rotating flows in which vertical or horizontal shear is induced by the interaction between the solid-body rotation (with angular velocity Ω0 ) and the additional “precessing” Coriolis force (with angular velocity -ɛΩ0 ), normal to it. A “weak” shear flow, with rate 2ɛ of the same order of the Poincaré “small” ratio ɛ , is needed for balancing the gyroscopic torque, so that the whole flow satisfies Euler’s equations in the precessing frame (the so-called admissibility conditions). The base flow case with vertical shear (its cross-gradient direction is aligned with the main angular velocity) corresponds to Mahalov’s [Phys. Fluids A 5, 891 (1993)] precessing infinite cylinder base flow (ignoring boundary conditions), while the base flow case with horizontal shear (its cross-gradient direction is normal to both main and precessing angular velocities) corresponds to the unbounded precessing rotating shear flow considered by Kerswell [Geophys. Astrophys. Fluid Dyn. 72, 107 (1993)]. We show that both these base flows satisfy the admissibility conditions and can support disturbances in terms of advected Fourier modes. Because the admissibility conditions cannot select one case with respect to the other, a more physical derivation is sought: Both flows are deduced from Poincaré’s [Bull. Astron. 27, 321 (1910)] basic state of a precessing spheroidal container, in the limit of small ɛ . A Rapid distortion theory (RDT) type of stability analysis is then performed for the previously mentioned disturbances, for both base flows. The stability analysis of the Kerswell base flow, using Floquet’s theory, is recovered, and its counterpart for the Mahalov base flow is presented. Typical growth rates are found to be the same for both flows at very small ɛ , but significant differences are obtained regarding growth rates and widths of instability bands, if larger ɛ values, up to 0.2, are considered. Finally, both flow cases

  1. Network Analysis of Sequence-Function Relationships and Exploration of Sequence Space of TEM β-Lactamases.

    PubMed

    Zeil, Catharina; Widmann, Michael; Fademrecht, Silvia; Vogel, Constantin; Pleiss, Jürgen

    2016-05-01

    The Lactamase Engineering Database (www.LacED.uni-stuttgart.de) was developed to facilitate the classification and analysis of TEM β-lactamases. The current version contains 474 TEM variants. Two hundred fifty-nine variants form a large scale-free network of highly connected point mutants. The network was divided into three subnetworks which were enriched by single phenotypes: one network with predominantly 2be and two networks with 2br phenotypes. Fifteen positions were found to be highly variable, contributing to the majority of the observed variants. Since it is expected that a considerable fraction of the theoretical sequence space is functional, the currently sequenced 474 variants represent only the tip of the iceberg of functional TEM β-lactamase variants which form a huge natural reservoir of highly interconnected variants. Almost 50% of the variants are part of a quartet. Thus, two single mutations that result in functional enzymes can be combined into a functional protein. Most of these quartets consist of the same phenotype, or the mutations are additive with respect to the phenotype. By predicting quartets from triplets, 3,916 unknown variants were constructed. Eighty-seven variants complement multiple quartets and therefore have a high probability of being functional. The construction of a TEM β-lactamase network and subsequent analyses by clustering and quartet prediction are valuable tools to gain new insights into the viable sequence space of TEM β-lactamases and to predict their phenotype. The highly connected sequence space of TEM β-lactamases is ideally suited to network analysis and demonstrates the strengths of network analysis over tree reconstruction methods. PMID:26883706

  2. Multilocus sequence analysis of phytopathogenic species of the genus Streptomyces

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The identification and classification of species within the genus Streptomyces is difficult because there are presently 576 validly described species and this number increases every year. The value of the application of multilocus sequence analysis scheme to the systematics of Streptomyces species h...

  3. Molecular characterization of Giardia psittaci by multilocus sequence analysis.

    PubMed

    Abe, Niichiro; Makino, Ikuko; Kojima, Atsushi

    2012-12-01

    Multilocus sequence analyses targeting small subunit ribosomal DNA (SSU rDNA), elongation factor 1 alpha (ef1α), glutamate dehydrogenase (gdh), and beta giardin (β-giardin) were performed on Giardia psittaci isolates from three Budgerigars (Melopsittacus undulates) and four Barred parakeets (Bolborhynchus lineola) kept in individual households or imported from overseas. Nucleotide differences and phylogenetic analyses at four loci indicate the distinction of G. psittaci from the other known Giardia species: Giardia muris, Giardia microti, Giardia ardeae, and Giardia duodenalis assemblages. Furthermore, G. psittaci was related more closely to G. duodenalis than to the other known Giardia species, except for G. microti. Conflicting signals regarded as "double peaks" were found at the same nucleotide positions of the ef1α in all isolates. However, the sequences of the other three loci, including gdh and β-giardin, which are known to be highly variable, from all isolates were also mutually identical at every locus. They showed no double peaks. These results suggest that double peaks found in the ef1α sequences are caused not by mixed infection with genetically different G. psittaci isolates but by allelic sequence heterogeneity (ASH), which is observed in diplomonad lineages including G. duodenalis. No sequence difference was found in any G. psittaci isolates at the gdh and β-giardin, suggesting that G. psittaci is indeed not more diverse genetically than other Giardia species. This report is the first to provide evidence related to the genetic characteristics of G. psittaci obtained using multilocus sequence analysis. PMID:22921500

  4. Motion sequence analysis in the presence of figural cues

    PubMed Central

    Sinha, Pawan; Vaina, Lucia M.

    2015-01-01

    The perception of 3D structure in dynamic sequences is believed to be subserved primarily through the use of motion cues. However, real-world sequences contain many figural shape cues besides the dynamic ones. We hypothesize that if figural cues are perceptually significant during sequence analysis, then inconsistencies in these cues over time would lead to percepts of non-rigidity in sequences showing physically rigid objects in motion. We develop an experimental paradigm to test this hypothesis and present results with two patients with impairments in motion perception due to focal neurological damage, as well as two control subjects. Consistent with our hypothesis, the data suggest that figural cues strongly influence the perception of structure in motion sequences, even to the extent of inducing non-rigid percepts in sequences where motion information alone would yield rigid structures. Beyond helping to probe the issue of shape perception, our experimental paradigm might also serve as a possible perceptual assessment tool in a clinical setting. PMID:26028822

  5. Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)

    PubMed Central

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-01-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172

  6. Analysis of singleton ORFans in fully sequenced microbial genomes.

    PubMed

    Siew, Naomi; Fischer, Daniel

    2003-11-01

    Singleton sequence ORFans are orphan ORFs (open reading frames) that have no detectable sequence similarity to any other sequence in the databases. ORFans are of particular interest not only as evolutionary puzzles but also because we can learn little about them using bioinformatics tools. Here, we present a first systematic analysis of singleton ORFans in the first 60 fully sequenced microbial genomes. We show that although ORFans have been underemphasized, the number of ORFans is steadily growing, currently accounting for 23,634 sequences. At the same time, the percentage of ORFans as a fraction of all sequences is slowly diminishing, and is currently about 14%. Short ORFans comprise about 61% of all ORFans. The abundance of short ORFans may be due to a yet unexplained artifact. The data also suggest that the number of longer ORFans may soon diminish as more genomes of closely related organisms become available. To better address the questions about the functions and origins of ORFans, we propose to focus further studies on the longer ORFans, with emphasis on three new types of ORFans: ORFan modules, paralogous ORFans, and orthologous ORFans. We conclude that the large number of ORFans reflects an intrinsic property of the genetic material not yet fully understood. Further computational and experimental studies aimed at understanding Nature's protein diversity should also include ORFans. PMID:14517975

  7. Improved Algorithm for Analysis of DNA Sequences Using Multiresolution Transformation

    PubMed Central

    Inbamalar, T. M.; Sivakumar, R.

    2015-01-01

    Bioinformatics and genomic signal processing use computational techniques to solve various biological problems. They aim to study the information allied with genetic materials such as the deoxyribonucleic acid (DNA), the ribonucleic acid (RNA), and the proteins. Fast and precise identification of the protein coding regions in DNA sequence is one of the most important tasks in analysis. Existing digital signal processing (DSP) methods provide less accurate and computationally complex solution with greater background noise. Hence, improvements in accuracy, computational complexity, and reduction in background noise are essential in identification of the protein coding regions in the DNA sequences. In this paper, a new DSP based method is introduced to detect the protein coding regions in DNA sequences. Here, the DNA sequences are converted into numeric sequences using electron ion interaction potential (EIIP) representation. Then discrete wavelet transformation is taken. Absolute value of the energy is found followed by proper threshold. The test is conducted using the data bases available in the National Centre for Biotechnology Information (NCBI) site. The comparative analysis is done and it ensures the efficiency of the proposed system. PMID:26000337

  8. Halvade: scalable sequence analysis with MapReduce

    PubMed Central

    Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan

    2015-01-01

    Motivation: Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine. Results: We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been implemented according to the GATK Best Practices recommendations, supporting both whole genome and whole exome sequencing. Using a 15-node computer cluster with 360 CPU cores in total, Halvade processes the NA12878 dataset (human, 100 bp paired-end reads, 50× coverage) in <3 h with very high parallel efficiency. Even on a single, multi-core machine, Halvade attains a significant speedup compared with running the individual tools with multithreading. Availability and implementation: Halvade is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR. Its source is available at http://bioinformatics.intec.ugent.be/halvade under GPL license. Contact: jan.fostier@intec.ugent.be Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25819078

  9. Comparative sequence-structure analysis of Aves insulin

    PubMed Central

    Islam, Md Mirazul; Aktaruzzaman, M; Mohamed, Zahurin

    2015-01-01

    Normal blood glucose level depends on the availability of insulin and its ability to bind insulin receptor (IR) that regulates the downstream signaling pathway. Insulin sequence and blood glucose level usually vary among animals due to species specificity. The study of genetic variation of insulin, blood glucose level and diabetics symptoms development in Aves is interesting because of its optimal high blood glucose level than mammals. Therefore, it is of interest to study its evolutionary relationship with other mammals using sequence data. Hence, we compiled 32 Aves insulin from GenBank to compare its sequence-structure features with phylogeny for evolutionary inference. The analysis shows long conserved motifs (about 14 residues) for functional inference. These sequences show high leucine content (20%) with high instability index (>40). Amino acid position 11, 14, 16 and 20 are variable that may have contribution to binding to IR. We identified functionally critical variable residues in the dataset for possible genetic implication. Structural models of these sequences were developed for surface analysis towards functional representation. These data find application in the understanding of insulin function across species. PMID:25848166

  10. Improved algorithm for analysis of DNA sequences using multiresolution transformation.

    PubMed

    Inbamalar, T M; Sivakumar, R

    2015-01-01

    Bioinformatics and genomic signal processing use computational techniques to solve various biological problems. They aim to study the information allied with genetic materials such as the deoxyribonucleic acid (DNA), the ribonucleic acid (RNA), and the proteins. Fast and precise identification of the protein coding regions in DNA sequence is one of the most important tasks in analysis. Existing digital signal processing (DSP) methods provide less accurate and computationally complex solution with greater background noise. Hence, improvements in accuracy, computational complexity, and reduction in background noise are essential in identification of the protein coding regions in the DNA sequences. In this paper, a new DSP based method is introduced to detect the protein coding regions in DNA sequences. Here, the DNA sequences are converted into numeric sequences using electron ion interaction potential (EIIP) representation. Then discrete wavelet transformation is taken. Absolute value of the energy is found followed by proper threshold. The test is conducted using the data bases available in the National Centre for Biotechnology Information (NCBI) site. The comparative analysis is done and it ensures the efficiency of the proposed system. PMID:26000337

  11. Automated carboxy-terminal sequence analysis of peptides.

    PubMed Central

    Bailey, J. M.; Shenoy, N. R.; Ronk, M.; Shively, J. E.

    1992-01-01

    current limitations, the methodology should be a valuable new tool for the C-terminal sequence analysis of peptides. PMID:1304884

  12. Expressed sequence tag analysis in tef (Eragrostis tef (Zucc) Trotter).

    PubMed

    Yu, Ju-Kyung; Sun, Qi; Rota, Mauricio La; Edwards, Hugh; Tefera, Hailu; Sorrells, Mark E

    2006-04-01

    Tef (Eragrostis tef (Zucc.) Trotter) is the most important cereal crop in Ethiopia; however, there is very little DNA sequence information available for this species. Expressed sequence tags (ESTs) were generated from 4 cDNA libraries: seedling leaf, seedling root, and inflorescence of E. tef and seedling leaf of Eragrostis pilosa, a wild relative of E. tef. Clustering of 3603 sequences produced 530 clusters and 1890 singletons, resulting in 2420 tef unigenes. Approximately 3/4 of tef unigenes matched protein or nucleotide sequences in public databases. Annotation of unigenes associated 68% of the putative tef genes with gene ontology categories. Identification of the translated unigenes for conserved protein domains revealed 389 protein family domains (Pfam), the most frequent of which was protein kinase. A total of 170 ESTs containing simple sequence repeats (EST-SSRs) were identified and 80 EST-SSR markers were developed. In addition, 19 single-nucleotide polymorphism (SNP) and (or) insertion-deletion (indel) and 34 intron fragment length polymorphism (IFLP) markers were developed. The EST database and molecular markers generated in this study will be valuable resources for further tef genetic research. PMID:16699556

  13. Construction of an integrated database to support genomic sequence analysis

    SciTech Connect

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  14. Generation and analysis of expressed sequence tags from the ciliate protozoan parasite Ichthyophthirius multifiliis

    PubMed Central

    Abernathy, Jason W; Xu, Peng; Li, Ping; Xu, De-Hai; Kucuktas, Huseyin; Klesius, Phillip; Arias, Covadonga; Liu, Zhanjiang

    2007-01-01

    Background The ciliate protozoan Ichthyophthirius multifiliis (Ich) is an important parasite of freshwater fish that causes 'white spot disease' leading to significant losses. A genomic resource for large-scale studies of this parasite has been lacking. To study gene expression involved in Ich pathogenesis and virulence, our goal was to generate expressed sequence tags (ESTs) for the development of a powerful microarray platform for the analysis of global gene expression in this species. Here, we initiated a project to sequence and analyze over 10,000 ESTs. Results We sequenced 10,368 EST clones using a normalized cDNA library made from pooled samples of the trophont, tomont, and theront life-cycle stages, and generated 9,769 sequences (94.2% success rate). Post-sequencing processing led to 8,432 high quality sequences. Clustering analysis of these ESTs allowed identification of 4,706 unique sequences containing 976 contigs and 3,730 singletons. These unique sequences represent over two million base pairs (~10% of Plasmodium falciparum genome, a phylogenetically related protozoan). BLASTX searches produced 2,518 significant (E-value < 10-5) hits and further Gene Ontology (GO) analysis annotated 1,008 of these genes. The ESTs were analyzed comparatively against the genomes of the related protozoa Tetrahymena thermophila and P. falciparum, allowing putative identification of additional genes. All the EST sequences were deposited by dbEST in GenBank (GenBank: EG957858–EG966289). Gene discovery and annotations are presented and discussed. Conclusion This set of ESTs represents a significant proportion of the Ich transcriptome, and provides a material basis for the development of microarrays useful for gene expression studies concerning Ich development, pathogenesis, and virulence. PMID:17577414

  15. Pierre Robin Sequence and Treacher Collins Hypoplastic Mandible Comparison Using Three-Dimensional Morphometric Analysis

    PubMed Central

    Chung, Michael T.; Levi, Benjamin; Hyun, Jeong S.; Lo, David D.; Montoro, Daniel T.; Lisiecki, Jeffrey; Bradley, James P.; Buchman, Steven R.; Longaker, Michael T.; Wan, Derrick C.

    2012-01-01

    Pierre Robin sequence and Treacher Collins syndrome are both associated with mandibular hypoplasia. It has been hypothesized, however, that the mandible may be differentially affected. The purpose of this study was to therefore compare mandibular morphology in children with Pierre Robin sequence to children with Treacher Collins syndrome using three-dimensional analysis of computed tomography (CT) scans. A retrospective analysis was performed identifying children with Pierre Robin sequence and Treacher Collins syndrome receiving CT scans. Three-dimensional reconstruction was performed and ramus height, mandibular body length, and gonial angle were measured. These were then compared to control children with normal mandibles and to clinical norms corrected for age and sex based on previously published measurements. Mandibular body length was found to be significantly shorter for children with Pierre Robin sequence while ramus height was significantly shorter for children with Treacher Collins syndrome. This resulted in distinctly different ramus height/mandibular body length ratios. In addition, the gonial angle was more obtuse in both the Pierre Robin sequence and Treacher Collins syndrome groups compared with the controls. Three-dimensional mandibular morphometric analysis in patients with Pierre Robin sequence and Treacher Collins syndrome thus revealed distinctly different patterns of mandibular hypoplasia relative to normal controls. These findings underscore distinct considerations which must be made in surgical planning for reconstruction. PMID:23154353

  16. VIROME: a standard operating procedure for analysis of viral metagenome sequences.

    PubMed

    Wommack, K Eric; Bhavsar, Jaysheel; Polson, Shawn W; Chen, Jing; Dumas, Michael; Srinivasiah, Sharath; Furman, Megan; Jamindar, Sanchita; Nasko, Daniel J

    2012-07-30

    One consistent finding among studies using shotgun metagenomics to analyze whole viral communities is that most viral sequences show no significant homology to known sequences. Thus, bioinformatic analyses based on sequence collections such as GenBank nr, which are largely comprised of sequences from known organisms, tend to ignore a majority of sequences within most shotgun viral metagenome libraries. Here we describe a bioinformatic pipeline, the Viral Informatics Resource for Metagenome Exploration (VIROME), that emphasizes the classification of viral metagenome sequences (predicted open-reading frames) based on homology search results against both known and environmental sequences. Functional and taxonomic information is derived from five annotated sequence databases which are linked to the UniRef 100 database. Environmental classifications are obtained from hits against a custom database, MetaGenomes On-Line, which contains 49 million predicted environmental peptides. Each predicted viral metagenomic ORF run through the VIROME pipeline is placed into one of seven ORF classes, thus, every sequence receives a meaningful annotation. Additionally, the pipeline includes quality control measures to remove contaminating and poor quality sequence and assesses the potential amount of cellular DNA contamination in a viral metagenome library by screening for rRNA genes. Access to the VIROME pipeline and analysis results are provided through a web-application interface that is dynamically linked to a relational back-end database. The VIROME web-application interface is designed to allow users flexibility in retrieving sequences (reads, ORFs, predicted peptides) and search results for focused secondary analyses. PMID:23407591

  17. DNA sequence copy number analysis by Comparative Genomic Hybridization (CGH)

    SciTech Connect

    Pinkel, D.; Kallioniemi, A.; Kallioniemi, O.; Waldman, F.; Sudar, D.; Gray, I. ); Rutovitz, D.; Piper, I. )

    1993-01-01

    Comparative Genomic Hybridization (CGH) uses the kinetics of in situ hybridization to compare the copy numbers of different DNA sequences within the same genome and the copy numbers of the same sequences among different genomes. In a typical application genomic DNA from a tumor and from normal cells are differentially labeled and simultaneously hybridized to normal metaphase chromosomes, and detected with different fluorochromes. Properly registered images of each fluorochrome are obtained using a microscope equipped with multi-band filters and a CCD camera. Digital image analysis permits measurement of intensity ratio profiles along each of the target chromosomes. Studies of cells with known aberrations indicate that the intensity ratio at each position is proportional to the ratio of the copy numbers of the sequences that bind there in the tumor and normal genomes. Analytical challenges posed by the need to efficiently obtain copy number karyotypes are discussed.

  18. Genome sequencing and analysis of the model grass Brachypodium distachyon.

    PubMed

    2010-02-11

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops. PMID:20148030

  19. Genome sequencing and analysis of the model grass Brachypodium distachyon

    SciTech Connect

    Yang, Xiaohan; Kalluri, Udaya C; Tuskan, Gerald A

    2010-01-01

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

  20. DNA sequence and analysis of human chromosome 9.

    PubMed

    Humphray, S J; Oliver, K; Hunt, A R; Plumb, R W; Loveland, J E; Howe, K L; Andrews, T D; Searle, S; Hunt, S E; Scott, C E; Jones, M C; Ainscough, R; Almeida, J P; Ambrose, K D; Ashwell, R I S; Babbage, A K; Babbage, S; Bagguley, C L; Bailey, J; Banerjee, R; Barker, D J; Barlow, K F; Bates, K; Beasley, H; Beasley, O; Bird, C P; Bray-Allen, S; Brown, A J; Brown, J Y; Burford, D; Burrill, W; Burton, J; Carder, C; Carter, N P; Chapman, J C; Chen, Y; Clarke, G; Clark, S Y; Clee, C M; Clegg, S; Collier, R E; Corby, N; Crosier, M; Cummings, A T; Davies, J; Dhami, P; Dunn, M; Dutta, I; Dyer, L W; Earthrowl, M E; Faulkner, L; Fleming, C J; Frankish, A; Frankland, J A; French, L; Fricker, D G; Garner, P; Garnett, J; Ghori, J; Gilbert, J G R; Glison, C; Grafham, D V; Gribble, S; Griffiths, C; Griffiths-Jones, S; Grocock, R; Guy, J; Hall, R E; Hammond, S; Harley, J L; Harrison, E S I; Hart, E A; Heath, P D; Henderson, C D; Hopkins, B L; Howard, P J; Howden, P J; Huckle, E; Johnson, C; Johnson, D; Joy, A A; Kay, M; Keenan, S; Kershaw, J K; Kimberley, A M; King, A; Knights, A; Laird, G K; Langford, C; Lawlor, S; Leongamornlert, D A; Leversha, M; Lloyd, C; Lloyd, D M; Lovell, J; Martin, S; Mashreghi-Mohammadi, M; Matthews, L; McLaren, S; McLay, K E; McMurray, A; Milne, S; Nickerson, T; Nisbett, J; Nordsiek, G; Pearce, A V; Peck, A I; Porter, K M; Pandian, R; Pelan, S; Phillimore, B; Povey, S; Ramsey, Y; Rand, V; Scharfe, M; Sehra, H K; Shownkeen, R; Sims, S K; Skuce, C D; Smith, M; Steward, C A; Swarbreck, D; Sycamore, N; Tester, J; Thorpe, A; Tracey, A; Tromans, A; Thomas, D W; Wall, M; Wallis, J M; West, A P; Whitehead, S L; Willey, D L; Williams, S A; Wilming, L; Wray, P W; Young, L; Ashurst, J L; Coulson, A; Blöcker, H; Durbin, R; Sulston, J E; Hubbard, T; Jackson, M J; Bentley, D R; Beck, S; Rogers, J; Dunham, I

    2004-05-27

    Chromosome 9 is highly structurally polymorphic. It contains the largest autosomal block of heterochromatin, which is heteromorphic in 6-8% of humans, whereas pericentric inversions occur in more than 1% of the population. The finished euchromatic sequence of chromosome 9 comprises 109,044,351 base pairs and represents >99.6% of the region. Analysis of the sequence reveals many intra- and interchromosomal duplications, including segmental duplications adjacent to both the centromere and the large heterochromatic block. We have annotated 1,149 genes, including genes implicated in male-to-female sex reversal, cancer and neurodegenerative disease, and 426 pseudogenes. The chromosome contains the largest interferon gene cluster in the human genome. There is also a region of exceptionally high gene and G + C content including genes paralogous to those in the major histocompatibility complex. We have also detected recently duplicated genes that exhibit different rates of sequence divergence, presumably reflecting natural selection. PMID:15164053

  1. DNA sequence and analysis of human chromosome 9

    PubMed Central

    Humphray, S. J.; Oliver, K.; Hunt, A. R.; Plumb, R. W.; Loveland, J. E.; Howe, K. L.; Andrews, T. D.; Searle, S.; Hunt, S. E.; Scott, C. E.; Jones, M. C.; Ainscough, R.; Almeida, J. P.; Ambrose, K. D.; Ashwell, R. I. S.; Babbage, A. K.; Babbage, S.; Bagguley, C. L.; Bailey, J.; Banerjee, R.; Barker, D. J.; Barlow, K. F.; Bates, K.; Beasley, H.; Beasley, O.; Bird, C. P.; Bray-Allen, S.; Brown, A. J.; Brown, J. Y.; Burford, D.; Burrill, W.; Burton, J.; Carder, C.; Carter, N. P.; Chapman, J. C.; Chen, Y.; Clarke, G.; Clark, S. Y.; Clee, C. M.; Clegg, S.; Collier, R. E.; Corby, N.; Crosier, M.; Cummings, A. T.; Davies, J.; Dhami, P.; Dunn, M.; Dutta, I.; Dyer, L. W.; Earthrowl, M. E.; Faulkner, L.; Fleming, C. J.; Frankish, A.; Frankland, J. A.; French, L.; Fricker, D. G.; Garner, P.; Garnett, J.; Ghori, J.; Gilbert, J. G. R.; Glison, C.; Grafham, D. V.; Gribble, S.; Griffiths, C.; Griffiths-Jones, S.; Grocock, R.; Guy, J.; Hall, R. E.; Hammond, S.; Harley, J. L.; Harrison, E. S. I.; Hart, E. A.; Heath, P. D.; Henderson, C. D.; Hopkins, B. L.; Howard, P. J.; Howden, P. J.; Huckle, E.; Johnson, C.; Johnson, D.; Joy, A. A.; Kay, M.; Keenan, S.; Kershaw, J. K.; Kimberley, A. M.; King, A.; Knights, A.; Laird, G. K.; Langford, C.; Lawlor, S.; Leongamornlert, D. A.; Leversha, M.; Lloyd, C.; Lloyd, D. M.; Lovell, J.; Martin, S.; Mashreghi-Mohammadi, M.; Matthews, L.; McLaren, S.; McLay, K. E.; McMurray, A.; Milne, S.; Nickerson, T.; Nisbett, J.; Nordsiek, G.; Pearce, A. V.; Peck, A. I.; Porter, K. M.; Pandian, R.; Pelan, S.; Phillimore, B.; Povey, S.; Ramsey, Y.; Rand, V.; Scharfe, M.; Sehra, H. K.; Shownkeen, R.; Sims, S. K.; Skuce, C. D.; Smith, M.; Steward, C. A.; Swarbreck, D.; Sycamore, N.; Tester, J.; Thorpe, A.; Tracey, A.; Tromans, A.; Thomas, D. W.; Wall, M.; Wallis, J. M.; West, A. P.; Whitehead, S. L.; Willey, D. L.; Williams, S. A.; Wilming, L.; Wray, P. W.; Young, L.; Ashurst, J. L.; Coulson, A.; Blöcker, H.; Durbin, R.; Sulston, J. E.; Hubbard, T.; Jackson, M. J.; Bentley, D. R.; Beck, S.; Rogers, J.; Dunham, I.

    2009-01-01

    Chromosome 9 is highly structurally polymorphic. It contains the largest autosomal block of heterochromatin, which is heteromorphic in 6–8% of humans, whereas pericentric inversions occur in more than 1% of the population. The finished euchromatic sequence of chromosome 9 comprises 109,044,351 base pairs and represents >99.6% of the region. Analysis of the sequence reveals many intra- and interchromosomal duplications, including segmental duplications adjacent to both the centromere and the large heterochromatic block. We have annotated 1,149 genes, including genes implicated in male-to-female sex reversal, cancer and neurodegenerative disease, and 426 pseudogenes. The chromosome contains the largest interferon gene cluster in the human genome. There is also a region of exceptionally high gene and G + C content including genes paralogous to those in the major histocompatibility complex. We have also detected recently duplicated genes that exhibit different rates of sequence divergence, presumably reflecting natural selection. PMID:15164053

  2. A biostratigraphic sequence analysis in Cretaceous sediments from Eastern Venezuela

    SciTech Connect

    Paredes, I.; Carillo, M.; Fasola, A.; Luna, F. )

    1993-02-01

    This paper presents the results of a high resolution biostratigraphic study integrated with petrophysic analyses, of the Late Cretaceous sequence in several wells from the Maturin Sub-Basin, Eastern Venezuela. The main objective of this study is to integrate the different faunal and floral assemblages to the sedimentological evolution of the basin using sequential analysis techniques. This technique was applied using mainly terrestrial and marine palynomorphs which were relatively abundant and diverse as compared to the scarcity of foraminifera and nonnofossils. Based on the percentages of abundance and the diversity of the different groups of microfoss it was possible to establish the maximum flooding surfaces and condensation levels which allowed the definition of the possible candidates for the sequence boundaries. On the other hand, the identified bioevents made possible the definition of the chronostratigraphic datums of the sequence under study. The results obtained will contribute to optimize the exploration and development programs of the oil fields in Eastern Venezuela.

  3. Evolution Analysis of Simple Sequence Repeats in Plant Genome

    PubMed Central

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1–3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution. PMID:26630570

  4. Nucleotide sequence and characterization of four additional genes of the hydrogenase structural operon from Rhizobium leguminosarum bv. viciae.

    PubMed Central

    Hidalgo, E; Palacios, J M; Murillo, J; Ruiz-Argüeso, T

    1992-01-01

    The nucleotide sequence of a 2.5-kbp region following the hydrogenase structural genes (hupSL) in the H2 uptake gene cluster from Rhizobium leguminosarum bv. viciae UPM791 was determined. Four closely linked genes encoding peptides of 27.9 (hupC), 22.1 (hupD), 19.0 (hupE), and 10.4 (hupF) kDa were identified immediately downstream of hupL. Proteins with comparable apparent molecular weights were detected by heterologous expression of these genes in Escherichia coli. The six genes, hupS to hupF, are arranged as an operon, and by mutant complementation analysis, it was shown that genes hupSLCD are cotranscribed. A transcription start site preceded by the -12 to -24 consensus sequence characteristic of NtrA-dependent promoters was identified upstream of hupS. On the basis of the lack of oxygen-dependent H2 uptake activity of a hupC::Tn5 mutant and on structural characteristics of the protein, we postulate that HupC is a b-type cytochrome involved in electron transfer from hydrogenase to oxygen. The product from hupE, which is needed for full hydrogenase activity, exhibited characteristics typical of a membrane protein. The features of HupC and HupE suggest that they form, together with the hydrogenase itself, a membrane-bound protein complex involved in hydrogen oxidation. Images PMID:1597428

  5. Castor Bean Organelle Genome Sequencing and Worldwide Genetic Diversity Analysis

    PubMed Central

    Chan, Agnes P.; Williams, Amber L.; Rice, Danny W.; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M. J.; Khouri, Hoda M.; Beckstrom-Sternberg, Stephen M.; Allan, Gerard J.; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D.

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade. PMID:21750729

  6. Infrared thermal facial image sequence registration analysis and verification

    NASA Astrophysics Data System (ADS)

    Chen, Chieh-Li; Jian, Bo-Lin

    2015-03-01

    To study the emotional responses of subjects to the International Affective Picture System (IAPS), infrared thermal facial image sequence is preprocessed for registration before further analysis such that the variance caused by minor and irregular subject movements is reduced. Without affecting the comfort level and inducing minimal harm, this study proposes an infrared thermal facial image sequence registration process that will reduce the deviations caused by the unconscious head shaking of the subjects. A fixed image for registration is produced through the localization of the centroid of the eye region as well as image translation and rotation processes. Thermal image sequencing will then be automatically registered using the two-stage genetic algorithm proposed. The deviation before and after image registration will be demonstrated by image quality indices. The results show that the infrared thermal image sequence registration process proposed in this study is effective in localizing facial images accurately, which will be beneficial to the correlation analysis of psychological information related to the facial area.

  7. Congruence analysis of point clouds from unstable stereo image sequences

    NASA Astrophysics Data System (ADS)

    Jepping, C.; Bethmann, F.; Luhmann, T.

    2014-06-01

    This paper deals with the correction of exterior orientation parameters of stereo image sequences over deformed free-form surfaces without control points. Such imaging situation can occur, for example, during photogrammetric car crash test recordings where onboard high-speed stereo cameras are used to measure 3D surfaces. As a result of such measurements 3D point clouds of deformed surfaces are generated for a complete stereo sequence. The first objective of this research focusses on the development and investigation of methods for the detection of corresponding spatial and temporal tie points within the stereo image sequences (by stereo image matching and 3D point tracking) that are robust enough for a reliable handling of occlusions and other disturbances that may occur. The second objective of this research is the analysis of object deformations in order to detect stable areas (congruence analysis). For this purpose a RANSAC-based method for congruence analysis has been developed. This process is based on the sequential transformation of randomly selected point groups from one epoch to another by using a 3D similarity transformation. The paper gives a detailed description of the congruence analysis. The approach has been tested successfully on synthetic and real image data.

  8. Laser desorption mass spectrometry for DNA analysis and sequencing

    SciTech Connect

    Chen, C.H.; Taranenko, N.I.; Tang, K.; Allman, S.L.

    1995-03-01

    Laser desorption mass spectrometry has been considered as a potential new method for fast DNA sequencing. Our approach is to use matrix-assisted laser desorption to produce parent ions of DNA segments and a time-of-flight mass spectrometer to identify the sizes of DNA segments. Thus, the approach is similar to gel electrophoresis sequencing using Sanger`s enzymatic method. However, gel, radioactive tagging, and dye labeling are not required. In addition, the sequencing process can possibly be finished within a few hundred microseconds instead of hours and days. In order to use mass spectrometry for fast DNA sequencing, the following three criteria need to be satisfied. They are (1) detection of large DNA segments, (2) sensitivity reaching the femtomole region, and (3) mass resolution good enough to separate DNA segments of a single nucleotide difference. It has been very difficult to detect large DNA segments by mass spectrometry before due to the fragile chemical properties of DNA and low detection sensitivity of DNA ions. We discovered several new matrices to increase the production of DNA ions. By innovative design of a mass spectrometer, we can increase the ion energy up to 45 KeV to enhance the detection sensitivity. Recently, we succeeded in detecting a DNA segment with 500 nucleotides. The sensitivity was 100 femtomole. Thus, we have fulfilled two key criteria for using mass spectrometry for fast DNA sequencing. The major effort in the near future is to improve the resolution. Different approaches are being pursued. When high resolution of mass spectrometry can be achieved and automation of sample preparation is developed, the sequencing speed to reach 500 megabases per year can be feasible.

  9. Automated Analysis of Dynamic Ca2+ Signals in Image Sequences

    PubMed Central

    Francis, Michael; Waldrup, Josh; Qian, Xun; Taylor, Mark S.

    2014-01-01

    Intracellular Ca2+ signals are commonly studied with fluorescent Ca2+ indicator dyes and microscopy techniques. However, quantitative analysis of Ca2+ imaging data is time consuming and subject to bias. Automated signal analysis algorithms based on region of interest (ROI) detection have been implemented for one-dimensional line scan measurements, but there is no current algorithm which integrates optimized identification and analysis of ROIs in two-dimensional image sequences. Here an algorithm for rapid acquisition and analysis of ROIs in image sequences is described. It utilizes ellipses fit to noise filtered signals in order to determine optimal ROI placement, and computes Ca2+ signal parameters of amplitude, duration and spatial spread. This algorithm was implemented as a freely available plugin for ImageJ (NIH) software. Together with analysis scripts written for the open source statistical processing software R, this approach provides a high-capacity pipeline for performing quick statistical analysis of experimental output. The authors suggest that use of this analysis protocol will lead to a more complete and unbiased characterization of physiologic Ca2+ signaling. PMID:24962784

  10. Identification and sequence analysis of grain softness protein in selected wheat, rye and triticale.

    PubMed

    Kharrazi, M A S; Bobojonov, V

    2012-01-01

    Grain softness protein (GSP) is an important protein for overcoming milling and grain defenses in the innate immunity systems of cereals. The objective of this study was to evaluate and understand GSP sequences in selected wheat, rye and triticale. Using sequences for this gene from a sequence database, we performed clustering analysis to compare the sequences obtained from 3 germplasms with other studied sequences for GSP. The maximum difference between the Hirmand GSP genotype in wheat and the database sequences was 23% in EF109396 and EF109399. Most amino acid variation between the GSP sequences involved the same amino acids. The Nikita rye GSP gene showed 64% identity with DQ269918 and AY667063. The isoelectric point in the GSP of wheat and Lasko triticale was significantly higher than that of rye GSP. In addition, parameters such as optical density, grand average of hydrophobicity, percentage of hydrophobicity and hydrophilic amino acids, and number of alpha helices and beta sheets in GSP were similar in wheat and triticale but not in wheat and rye. PMID:22869084

  11. Biostratigraphic calibration in sequence stratigraphic analysis: Pliocene-Pleistocene case study from Gulf of Mexico

    SciTech Connect

    Wornardt, W.W. Jr.; Armentrout, J.M.; Clement, J.L.

    1989-03-01

    Biostratigraphic analysis provides chronostratigraphic data for correlating depositional sequences and paleoecologic data helpful in identifying depositional facies. These data sets are essential for correct analysis of sequence stratigraphy in areas of complex depositional architecture and structural style. The most useful sequence stratigraphic element for correlation is the condensed section. The condensed section is a facies consisting of thin marine beds of hemipelagic or pelagic sediments deposited at very slow rates. They are most extensive during the time of regional transgression of the shoreline. Condensed sections are excellent correlation data because they contain abundant hemipelagic and pelagic fossil materials and form regionally continuous high-amplitude seismic reflectors. In well samples condensed sections are recognized by peaks in marine fossil abundance and diversity and by high carbonate content; on electric logs they are recognized as high-resistivity clays. Three types of condensed sections are recognized in the Galveston area-South Addition A-158 No. 1 well. Condensed sections deposited in paleowater depths of less than 600 ft occur within the third-order depositional sequences, indicating relatively thick lowstand and high-stand sediments. Condensed sections deposited in paleowater depths of 600-3000 ft occur at or just below the third-order depositional sequence boundaries, indicating relatively thin or absent sediments of the highstand systems tract. Condensed sections deposited at paleowater depths deeper than 3000 ft extend through most of the third-order sequence, suggesting essentially no eustatic influence on sediment accumulation.

  12. Applications of new sequencing technologies for transcriptome analysis.

    PubMed

    Morozova, Olena; Hirst, Martin; Marra, Marco A

    2009-01-01

    Transcriptome analysis has been a key area of biological inquiry for decades. Over the years, research in the field has progressed from candidate gene-based detection of RNAs using Northern blotting to high-throughput expression profiling driven by the advent of microarrays. Next-generation sequencing technologies have revolutionized transcriptomics by providing opportunities for multidimensional examinations of cellular transcriptomes in which high-throughput expression data are obtained at a single-base resolution. PMID:19715439

  13. Addition of 5-fluorouracil to doxorubicin-paclitaxel sequence increases caspase-dependent apoptosis in breast cancer cell lines

    PubMed Central

    Zoli, Wainer; Ulivi, Paola; Tesei, Anna; Fabbri, Francesco; Rosetti, Marco; Maltoni, Roberta; Giunchi, Donata Casadei; Ricotti, Luca; Brigliadori, Giovanni; Vannini, Ivan; Amadori, Dino

    2005-01-01

    Introduction The aim of the study was to evaluate the activity of a combination of doxorubicin (Dox), paclitaxel (Pacl) and 5-fluorouracil (5-FU), to define the most effective schedule, and to investigate the mechanisms of action in human breast cancer cells. Methods The study was performed on MCF-7 and BRC-230 cell lines. The cytotoxic activity was evaluated by sulphorhodamine B assay and the type of drug interaction was assessed by the median effect principle. Cell cycle perturbation and apoptosis were evaluated by flow cytometry, and apoptosis-related marker (p53, bcl-2, bax, p21), caspase and thymidylate synthase (TS) expression were assessed by western blot. Results 5-FU, used as a single agent, exerted a low cytotoxic activity in both cell lines. The Dox→Pacl sequence produced a synergistic cytocidal effect and enhanced the efficacy of subsequent exposure to 5-FU in both cell lines. Specifically, the Dox→Pacl sequence blocked cells in the G2-M phase, and the addition of 5-FU forced the cells to progress through the cell cycle or killed them. Furthermore, Dox→Pacl pretreatment produced a significant reduction in basal TS expression in both cell lines, probably favoring the increase in 5-FU activity. The sequence Dox→Pacl→48-h washout→5-FU produced a synergistic and highly schedule-dependent interaction (combination index < 1), resulting in an induction of apoptosis in both experimental models regardless of hormonal, p53, bcl-2 or bax status. Apoptosis in MCF-7 cells was induced through caspase-9 activation and anti-apoptosis-inducing factor hyperexpression. In the BRC-230 cell line, the apoptotic process was triggered only by a caspase-dependent mechanism. In particular, at the end of the three-drug treatment, caspase-8 activation triggered downstream executioner caspase-3 and, to a lesser degree, caspase-7. Conclusion In our experimental models, characterized by different biomolecular profiles representing the different biology of human breast

  14. Targeted Analysis of Whole Genome Sequence Data to Diagnose Genetic Cardiomyopathy

    SciTech Connect

    Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa; Fahrenbach, John P.; Nelakuditi, Viswateja; Pesce, Lorenzo L.; Pytel, Peter; McNally, Elizabeth M.

    2014-09-01

    Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused on 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.

  15. Targeted Analysis of Whole Genome Sequence Data to Diagnose Genetic Cardiomyopathy

    DOE PAGESBeta

    Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa; Fahrenbach, John P.; Nelakuditi, Viswateja; Pesce, Lorenzo L.; Pytel, Peter; McNally, Elizabeth M.

    2014-09-01

    Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused onmore » 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.« less

  16. Mitochondrial DNA Sequence Analysis - Validation and Use for Forensic Casework.

    PubMed

    Holland, M M; Parsons, T J

    1999-06-01

    With the discovery of the polymerase chain reaction (PCR) in the mid-1980's, the last in a series of critical molecular biology techniques (to include the isolation of DNA from human and non-human biological material, and primary sequence analysis of DNA) had been developed to rapidly analyze minute quantities of mitochondrial DNA (mtDNA). This was especially true for mtDNA isolated from challenged sources, such as ancient or aged skeletal material and hair shafts. One of the beneficiaries of this work has been the forensic community. Over the last decade, a significant amount of research has been conducted to develop PCR-based sequencing assays for the mtDNA control region (CR), which have subsequently been used to further characterize the CR. As a result, the reliability of these assays has been investigated, the limitations of the procedures have been determined, and critical aspects of the analysis process have been identified, so that careful control and monitoring will provide the basis for reliable testing. With the application of these assays to forensic identification casework, mtDNA sequence analysis has been properly validated, and is a reliable procedure for the examination of biological evidence encountered in forensic criminalistic cases. PMID:26255820

  17. Additional analysis of dendrochemical data of Fallon, Nevada.

    PubMed

    Sheppard, Paul R; Helsel, Dennis R; Speakman, Robert J; Ridenour, Gary; Witten, Mark L

    2012-04-01

    Previously reported dendrochemical data showed temporal variability in concentration of tungsten (W) and cobalt (Co) in tree rings of Fallon, Nevada, US. Criticism of this work questioned the use of the Mann-Whitney test for determining change in element concentrations. Here, we demonstrate that Mann-Whitney is appropriate for comparing background element concentrations to possibly elevated concentrations in environmental media. Given that Mann-Whitney tests for differences in shapes of distributions, inter-tree variability (e.g., "coefficient of median variation") was calculated for each measured element across trees within subsites and time periods. For W and Co, the metals of highest interest in Fallon, inter-tree variability was always higher within versus outside of Fallon. For calibration purposes, this entire analysis was repeated at a different town, Sweet Home, Oregon, which has a known tungsten-powder facility, and inter-tree variability of W in tree rings confirmed the establishment date of that facility. Mann-Whitney testing of simulated data also confirmed its appropriateness for analysis of data affected by point-source contamination. This research adds important new dimensions to dendrochemistry of point-source contamination by adding analysis of inter-tree variability to analysis of central tendency. Fallon remains distinctive by a temporal increase in W beginning by the mid 1990s and by elevated Co since at least the early 1990s, as well as by high inter-tree variability for W and Co relative to comparison towns. PMID:22227064

  18. Initial genome sequencing and analysis of multiple myeloma.

    PubMed

    Chapman, Michael A; Lawrence, Michael S; Keats, Jonathan J; Cibulskis, Kristian; Sougnez, Carrie; Schinzel, Anna C; Harview, Christina L; Brunet, Jean-Philippe; Ahmann, Gregory J; Adli, Mazhar; Anderson, Kenneth C; Ardlie, Kristin G; Auclair, Daniel; Baker, Angela; Bergsagel, P Leif; Bernstein, Bradley E; Drier, Yotam; Fonseca, Rafael; Gabriel, Stacey B; Hofmeister, Craig C; Jagannath, Sundar; Jakubowiak, Andrzej J; Krishnan, Amrita; Levy, Joan; Liefeld, Ted; Lonial, Sagar; Mahan, Scott; Mfuko, Bunmi; Monti, Stefano; Perkins, Louise M; Onofrio, Robb; Pugh, Trevor J; Rajkumar, S Vincent; Ramos, Alex H; Siegel, David S; Sivachenko, Andrey; Stewart, A Keith; Trudel, Suzanne; Vij, Ravi; Voet, Douglas; Winckler, Wendy; Zimmerman, Todd; Carpten, John; Trent, Jeff; Hahn, William C; Garraway, Levi A; Meyerson, Matthew; Lander, Eric S; Getz, Gad; Golub, Todd R

    2011-03-24

    Multiple myeloma is an incurable malignancy of plasma cells, and its pathogenesis is poorly understood. Here we report the massively parallel sequencing of 38 tumour genomes and their comparison to matched normal DNAs. Several new and unexpected oncogenic mechanisms were suggested by the pattern of somatic mutation across the data set. These include the mutation of genes involved in protein translation (seen in nearly half of the patients), genes involved in histone methylation, and genes involved in blood coagulation. In addition, a broader than anticipated role of NF-κB signalling was indicated by mutations in 11 members of the NF-κB pathway. Of potential immediate clinical relevance, activating mutations of the kinase BRAF were observed in 4% of patients, suggesting the evaluation of BRAF inhibitors in multiple myeloma clinical trials. These results indicate that cancer genome sequencing of large collections of samples will yield new insights into cancer not anticipated by existing knowledge. PMID:21430775

  19. Initial genome sequencing and analysis of multiple myeloma

    PubMed Central

    Chapman, Michael A.; Lawrence, Michael S.; Keats, Jonathan J.; Cibulskis, Kristian; Sougnez, Carrie; Schinzel, Anna C.; Harview, Christina L.; Brunet, Jean-Philippe; Ahmann, Gregory J.; Adli, Mazhar; Anderson, Kenneth C.; Ardlie, Kristin G.; Auclair, Daniel; Baker, Angela; Bergsagel, P. Leif; Bernstein, Bradley E.; Drier, Yotam; Fonseca, Rafael; Gabriel, Stacey B.; Hofmeister, Craig C.; Jagannath, Sundar; Jakubowiak, Andrzej J.; Krishnan, Amrita; Levy, Joan; Liefeld, Ted; Lonial, Sagar; Mahan, Scott; Mfuko, Bunmi; Monti, Stefano; Perkins, Louise M.; Onofrio, Robb; Pugh, Trevor J.; Vincent Rajkumar, S.; Ramos, Alex H.; Siegel, David S.; Sivachenko, Andrey; Trudel, Suzanne; Vij, Ravi; Voet, Douglas; Winckler, Wendy; Zimmerman, Todd; Carpten, John; Trent, Jeff; Hahn, William C.; Garraway, Levi A.; Meyerson, Matthew; Lander, Eric S.; Getz, Gad; Golub, Todd R.

    2013-01-01

    Multiple myeloma is an incurable malignancy of plasma cells, and its pathogenesis is poorly understood. Here we report the massively parallel sequencing of 38 tumor genomes and their comparison to matched normal DNAs. Several new and unexpected oncogenic mechanisms were suggested by the pattern of somatic mutation across the dataset. These include the mutation of genes involved in protein translation (seen in nearly half of the patients), genes involved in histone methylation, and genes involved in blood coagulation. In addition, a broader than anticipated role of NF-κB signaling was suggested by mutations in 11 members of the NF-κB pathway. Of potential immediate clinical relevance, activating mutations of the kinase BRAF were observed in 4% of patients, suggesting the evaluation of BRAF inhibitors in multiple myeloma clinical trials. These results indicate that cancer genome sequencing of large collections of samples will yield new insights into cancer not anticipated by existing knowledge. PMID:21430775

  20. Environmental impact analysis for the main accidental sequences of ignitor

    SciTech Connect

    Carpignano, A.; Francabandiera, S.; Vella, R.; Zucchetti, M.

    1996-12-31

    A safety analysis study has been applied to the Ignitor machine using Probabilistic Safety Assessment. The main initiating events have been identified, and accident sequences have been studied by means of traditional methods such as Failure Mode and Effect Analysis (FMEA), Fault Trees (FT) and Event Trees (ET). The consequences of the radioactive environmental releases have been assessed in terms of Effective Dose Equivalent (EDEs) to the Most Exposed Individuals (MEI) of the chosen site, by means of a population dose code. Results point out the low enviromental impact of the machine. 13 refs., 1 fig., 3 tabs.

  1. The design and analysis of transposon insertion sequencing experiments.

    PubMed

    Chao, Michael C; Abel, Sören; Davis, Brigid M; Waldor, Matthew K

    2016-02-01

    Transposon insertion sequencing (TIS) is a powerful approach that can be extensively applied to the genome-wide definition of loci that are required for bacterial growth under diverse conditions. However, experimental design choices and stochastic biological processes can heavily influence the results of TIS experiments and affect downstream statistical analysis. In this Opinion article, we discuss TIS experimental parameters and how these factors relate to the benefits and limitations of the various statistical frameworks that can be applied to the computational analysis of TIS data. PMID:26775926

  2. Screening two mutations in the dysferlin gene by exon capture and sequence analysis: A case report

    PubMed Central

    WANG, XUEYAN; YANG, YUN; ZHOU, RONG

    2016-01-01

    A patient with progressive muscular atrophy was assessed for the disease-associated genes by next-generation sequencing technology and exon trap and sequence analysis. The results of the investigation identified 399 genes, covering all exons in addition to 10 bp on either side, which are specific to 659 types of neuromuscular disorders, including hypotypes. Exon capture and sequence analysis revealed that the patient possessed two splice site mutations in the dysferlin (DYSF) gene, c.144+1G>A and c.342+1G>T, and the presence of the mutations was confirmed by Sanger sequencing. The patient's mother and sister were also assessed and confirmed to have mutations within the DYSF gene, the mother with c.342+1G>T and the sister with c.144+1G>A. The two splice site mutations in the DYSF gene, c.144+1G>A and c.342+1G>T, have not previously been reported. Therefore, exon capture and sequence analysis is able to rapidly and efficiently screen for genetic alterations in neuromuscular disorders.

  3. An editing environment for DNA sequence analysis and annotation

    SciTech Connect

    Uberbacher, E.C.; Xu, Y.; Shah, M.B.; Olman, V.; Parang, M.; Mural, R.

    1998-12-31

    This paper presents a computer system for analyzing and annotating large-scale genomic sequences. The core of the system is a multiple-gene structure identification program, which predicts the most probable gene structures based on the given evidence, including pattern recognition, EST and protein homology information. A graphics-based user interface provides an environment which allows the user to interactively control the evidence to be used in the gene identification process. To overcome the computational bottleneck in the database similarity search used in the gene identification process, the authors have developed an effective way to partition a database into a set of sub-databases of related sequences, and reduced the search problem on a large database to a signature identification problem and a search problem on a much smaller sub-database. This reduces the number of sequences to be searched from N to O({radical}N) on average, and hence greatly reduces the search time, where N is the number of sequences in the original database. The system provides the user with the ability to facilitate and modify the analysis and modeling in real time.

  4. Application of Subspace Clustering in DNA Sequence Analysis.

    PubMed

    Wallace, Tim; Sekmen, Ali; Wang, Xiaofei

    2015-10-01

    Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. A series of experiments was performed to cluster randomly selected sequences. The experimental design allows for both false positives and false negatives, and estimates for the statistical significance are provided. The clustering results are consistent with the main hypothesis. A simple random mutation binary tree model is used to simulate speciation events that show the interdependence of the subspace rank versus time and mutation rates. The simple mutation model is found to be largely consistent with the observed subspace clustering singular value results. Our study indicates that the subspace clustering method may be applied in orthology analysis. PMID:26162018

  5. Analysis of Saccharides by the Addition of Amino Acids

    NASA Astrophysics Data System (ADS)

    Ozdemir, Abdil; Lin, Jung-Lee; Gillig, Kent J.; Gulfen, Mustafa; Chen, Chung-Hsuan

    2016-06-01

    In this work, we present the detection sensitivity improvement of electrospray ionization (ESI) mass spectrometry of neutral saccharides in a positive ion mode by the addition of various amino acids. Saccharides of a broad molecular weight range were chosen as the model compounds in the present study. Saccharides provide strong noncovalent interactions with amino acids, and the complex formation enhances the signal intensity and simplifies the mass spectra of saccharides. Polysaccharides provide a polymer-like ESI spectrum with a basic subunit difference between multiply charged chains. The protonated spectra of saccharides are not well identified because of different charge state distributions produced by the same molecules. Depending on the solvent used and other ions or molecules present in the solution, noncovalent interactions with saccharides may occur. These interactions are affected by the addition of amino acids. Amino acids with polar side groups show a strong tendency to interact with saccharides. In particular, serine shows a high tendency to interact with saccharides and significantly improves the detection sensitivity of saccharide compounds.

  6. Porosity Measurements and Analysis for Metal Additive Manufacturing Process Control

    PubMed Central

    Slotwinski, John A; Garboczi, Edward J; Hebenstreit, Keith M

    2014-01-01

    Additive manufacturing techniques can produce complex, high-value metal parts, with potential applications as critical metal components such as those found in aerospace engines and as customized biomedical implants. Material porosity in these parts is undesirable for aerospace parts - since porosity could lead to premature failure - and desirable for some biomedical implants - since surface-breaking pores allows for better integration with biological tissue. Changes in a part’s porosity during an additive manufacturing build may also be an indication of an undesired change in the build process. Here, we present efforts to develop an ultrasonic sensor for monitoring changes in the porosity in metal parts during fabrication on a metal powder bed fusion system. The development of well-characterized reference samples, measurements of the porosity of these samples with multiple techniques, and correlation of ultrasonic measurements with the degree of porosity are presented. A proposed sensor design, measurement strategy, and future experimental plans on a metal powder bed fusion system are also presented. PMID:26601041

  7. Porosity Measurements and Analysis for Metal Additive Manufacturing Process Control.

    PubMed

    Slotwinski, John A; Garboczi, Edward J; Hebenstreit, Keith M

    2014-01-01

    Additive manufacturing techniques can produce complex, high-value metal parts, with potential applications as critical metal components such as those found in aerospace engines and as customized biomedical implants. Material porosity in these parts is undesirable for aerospace parts - since porosity could lead to premature failure - and desirable for some biomedical implants - since surface-breaking pores allows for better integration with biological tissue. Changes in a part's porosity during an additive manufacturing build may also be an indication of an undesired change in the build process. Here, we present efforts to develop an ultrasonic sensor for monitoring changes in the porosity in metal parts during fabrication on a metal powder bed fusion system. The development of well-characterized reference samples, measurements of the porosity of these samples with multiple techniques, and correlation of ultrasonic measurements with the degree of porosity are presented. A proposed sensor design, measurement strategy, and future experimental plans on a metal powder bed fusion system are also presented. PMID:26601041

  8. Additional EIPC Study Analysis: Interim Report on High Priority Topics

    SciTech Connect

    Hadley, Stanton W

    2013-11-01

    Between 2010 and 2012 the Eastern Interconnection Planning Collaborative (EIPC) conducted a major long-term resource and transmission study of the Eastern Interconnection (EI). With guidance from a Stakeholder Steering Committee (SSC) that included representatives from the Eastern Interconnection States Planning Council (EISPC) among others, the project was conducted in two phases. Phase 1 involved a long-term capacity expansion analysis that involved creation of eight major futures plus 72 sensitivities. Three scenarios were selected for more extensive transmission- focused evaluation in Phase 2. Five power flow analyses, nine production cost model runs (including six sensitivities), and three capital cost estimations were developed during this second phase. The results from Phase 1 and 2 provided a wealth of data that could be examined further to address energy-related questions. A list of 13 topics was developed for further analysis; this paper discusses the first five.

  9. Analysis of the constitution of the beer yeast genome by PCR, sequencing and subtelomeric sequence hybridization.

    PubMed

    Casaregola, S; Nguyen, H V; Lapathitis, G; Kotyk, A; Gaillardin, C

    2001-07-01

    The lager brewing yeasts, Saccharomyces pastorianus (synonym Saccharomyces carlsbergensis), are allopolyploid, containing parts of two divergent genomes. Saccharomyces cerevisiae contributed to the formation of these hybrids, although the identity of the other species is still unclear. The presence of alleles specific to S. cerevisiae and S. pastorianus was tested for by PCR/RFLP in brewing yeasts of various origins and in members of the Saccharomyces sensu stricto complex. S. cerevisiae-type alleles of two genes, HIS4 and YCL008c, were identified in another brewing yeast, S. pastorianus CBS 1503 (Saccharomyces monacensis), thought to be the source of the other contributor to the lager hybrid. This is consistent with the hybridization of S. cerevisiae subtelomeric sequences X and Y' to the electrophoretic karyotype of this strain. S. pastorianus CBS 1503 (S. monacensis) is therefore probably not an ancestor of S. pastorianus, but a related hybrid. Saccharomyces bayanus, also thought to be one of the contributors to the lager yeast hybrid, is a heterogeneous taxon containing at least two subgroups, one close to the type strain, CBS 380T, the other close to CBS 395 (Saccharomyces uvarum). The partial sequences of several genes (HIS4, MET10, URA3) were shown to be identical or very similar (over 99%) in S. pastorianus CBS 1513 (S. carlsbergensis), S. bayanus CBS 380T and its close derivatives, showing that S. pastorianus and S. bayanus have a common ancestor. A distinction between two subgroups within S. bayanus was made on the basis of sequence analysis: the subgroup represented by S. bayanus CBS 395 (S. uvarum) has 6-8% sequence divergence within the genes HIS4, MET10 and MET2 from S. bayanus CBS 380T, indicating that the two S. bayanus subgroups diverged recently. The detection of specific alleles by PCR/RFLP and hybridization with S. cerevisiae subtelomeric sequences X and Y' to electrophoretic karyotypes of brewing yeasts and related species confirmed our

  10. Phylogeny of the Heelwalkers (Insecta: Mantophasmatodea) based on mtDNA sequences, with evidence for additional taxa in South Africa.

    PubMed

    Damgaard, Jakob; Klass, Klaus-Dieter; Picker, Mike D; Buder, Gerda

    2008-05-01

    We examined the phylogeny of Mantophasmatodea from southern Africa (South Africa, Namibia) using approx. 1300 bp of mitochondrial DNA sequence data from the genes encoding COI and 16S. The taxon sample comprised multiple specimens from eight described species (Namaquaphasma ookiepense, Austrophasma rawsonvillense, A. caledonense, A. gansbaaiense, Lobatophasma redelinghuysense, Hemilobophasma montaguense, Karoophasma botterkloofense, K. biedouwense) and four undescribed species of Austrophasmatidae; three specimens of Sclerophasma paresisense (Mantophasmatidae); and two specimens of Praedatophasma maraisi and one of Tyrannophasma gladiator (not yet convincingly assigned to any family). For outgroup comparison a broad selection from hemimetabolous insect orders was included. Equally weighted parsimony analyses of the combined COI+16S data sets with gaps in 16S scored as a fifth character state supported Austrophasmatidae and all species and genera of Mantophasmatodea as being monophyletic. Most species were highly supported with 98-100% bootstrap/7-39 Bremer support (BS), but K. biedouwense had moderate support (87/4) and A. caledonense low support (70/1). Mantophasmatodea, Austrophasmatidae, and a clade Tyrannophasma gladiator+Praedatophasma maraisi were all strongly supported (99-100/12-25), while relationships among the two latter clades and Mantophasmatidae remain ambiguous. Concerning the relationships among genera of Austrophasmatidae, support values are moderately high for some nodes, but not significant for others. We additionally calculated the partitioned BS values of COI and 16S for all nodes in the strict consensus of the combined tree. COI and 16S are highly congruent at the species level as well as at the base of Mantophasmatodea, but congruence is poor for most intergeneric relationships. In forthcoming studies, deeper relationships in the order should be additionally explored by nuclear genes, such as 18S and 28S, for a reduced sample of specimens

  11. Disclosure of hydraulic fracturing fluid chemical additives: analysis of regulations.

    PubMed

    Maule, Alexis L; Makey, Colleen M; Benson, Eugene B; Burrows, Isaac J; Scammell, Madeleine K

    2013-01-01

    Hydraulic fracturing is used to extract natural gas from shale formations. The process involves injecting into the ground fracturing fluids that contain thousands of gallons of chemical additives. Companies are not mandated by federal regulations to disclose the identities or quantities of chemicals used during hydraulic fracturing operations on private or public lands. States have begun to regulate hydraulic fracturing fluids by mandating chemical disclosure. These laws have shortcomings including nondisclosure of proprietary or "trade secret" mixtures, insufficient penalties for reporting inaccurate or incomplete information, and timelines that allow for after-the-fact reporting. These limitations leave lawmakers, regulators, public safety officers, and the public uninformed and ill-prepared to anticipate and respond to possible environmental and human health hazards associated with hydraulic fracturing fluids. We explore hydraulic fracturing exemptions from federal regulations, as well as current and future efforts to mandate chemical disclosure at the federal and state level. PMID:23552653

  12. Effect of Si additions on thermal stability and the phase transition sequence of sputtered amorphous alumina thin films

    SciTech Connect

    Bolvardi, H.; Baben, M. to; Nahif, F.; Music, D. Schnabel, V.; Shaha, K. P.; Mráz, S.; Schneider, J. M.; Bednarcik, J.; Michalikova, J.

    2015-01-14

    Si-alloyed amorphous alumina coatings having a silicon concentration of 0 to 2.7 at. % were deposited by combinatorial reactive pulsed DC magnetron sputtering of Al and Al-Si (90-10 at. %) split segments in Ar/O{sub 2} atmosphere. The effect of Si alloying on thermal stability of the as-deposited amorphous alumina thin films and the phase formation sequence was evaluated by using differential scanning calorimetry and X-ray diffraction. The thermal stability window of the amorphous phase containing 2.7 at. % of Si was increased by more than 100 °C compared to that of the unalloyed phase. A similar retarding effect of Si alloying was also observed for the α-Al{sub 2}O{sub 3} formation temperature, which increased by more than 120 °C. While for the latter retardation, the evidence for the presence of SiO{sub 2} at the grain boundaries was presented previously, this obviously cannot explain the stability enhancement reported here for the amorphous phase. Based on density functional theory molecular dynamics simulations and synchrotron X-ray diffraction experiments for amorphous Al{sub 2}O{sub 3} with and without Si incorporation, we suggest that the experimentally identified enhanced thermal stability of amorphous alumina with addition of Si is due to the formation of shorter and stronger Si–O bonds as compared to Al–O bonds.

  13. Bayesian Analysis and Segmentation of Multichannel Image Sequences

    NASA Astrophysics Data System (ADS)

    Chang, Michael Ming Hsin

    This thesis is concerned with the segmentation and analysis of multichannel image sequence data. In particular, we use maximum a posteriori probability (MAP) criterion and Gibbs random fields (GRF) to formulate the problems. We start by reviewing the significance of MAP estimation with GRF priors and study the feasibility of various optimization methods for implementing the MAP estimator. We proceed to investigate three areas where image data and parameter estimates are present in multichannels, multiframes, and interrelated in complicated manners. These areas of study include color image segmentation, multislice MR image segmentation, and optical flow estimation and segmentation in multiframe temporal sequences. Besides developing novel algorithms in each of these areas, we demonstrate how to exploit the potential of MAP estimation and GRFs, and we propose practical and efficient implementations. Illustrative examples and relevant experimental results are included.

  14. Nonlinear analysis of correlations in Alu repeat sequences in DNA

    NASA Astrophysics Data System (ADS)

    Xiao, Yi; Huang, Yanzhao; Li, Mingfeng; Xu, Ruizhen; Xiao, Saifeng

    2003-12-01

    We report on a nonlinear analysis of deterministic structures in Alu repeats, one of the richest repetitive DNA sequences in the human genome. Alu repeats contain the recognition sites for the restriction endonuclease AluI, which is what gives them their name. Using the nonlinear prediction method developed in chaos theory, we find that all Alu repeats have novel deterministic structures and show strong nonlinear correlations that are absent from exon and intron sequences. Furthermore, the deterministic structures of Alus of younger subfamilies show panlike shapes. As young Alus can be seen as mutation free copies from the “master genes,” it may be suggested that the deterministic structures of the older subfamilies are results of an evolution from a “panlike” structure to a more diffuse correlation pattern due to mutation.

  15. Sequence homology and structural analysis of the clostridial neurotoxins.

    PubMed

    Lacy, D B; Stevens, R C

    1999-09-01

    The clostridial neurotoxins (CNTs), comprised of tetanus neurotoxin (TeNT) and the seven serotypes of botulinum neurotoxin (BoNT A-G), specifically bind to neuronal cells and disrupt neurotransmitter release by cleaving proteins involved in synaptic vesicle membrane fusion. In this study, multiple CNT sequences were analyzed within the context of the 1277 residue BoNT/A crystal structure to gain insight into the events of binding, pore formation, translocation, and catalysis that are required for toxicity. A comparison of the TeNT-binding domain structure to that of BoNT/A reveals striking differences in their surface properties. Further, the solvent accessibility of a key tryptophan in the C terminus of the BoNT/A-binding domain refines the location of the ganglioside-binding site. Data collected from a single frozen crystal of BoNT/A are included in this study, revealing slight differences in the binding domain orientation as well as density for a previously unobserved translocation domain loop. This loop and the conservation of charged residues with structural proximity to putative pore-forming sequences lend insight into the CNT mechanism of pore formation and translocation. The sequence analysis of the catalytic domain revealed an area near the active-site likely to account for specificity differences between the CNTs. It revealed also a tertiary structure, highly conserved in primary sequence, which seems critical to catalysis but is 30 A from the active-site zinc ion. This observation, along with an analysis of the 54 residue "belt" from the translocation domain are discussed with respect to the mechanism of catalysis. PMID:10518945

  16. Wasabi: An Integrated Platform for Evolutionary Sequence Analysis and Data Visualization.

    PubMed

    Veidenberg, Andres; Medlar, Alan; Löytynoja, Ari

    2016-04-01

    Wasabi is an open source, web-based environment for evolutionary sequence analysis. Wasabi visualizes sequence data together with a phylogenetic tree within a modern, user-friendly interface: The interface hides extraneous options, supports context sensitive menus, drag-and-drop editing, and displays additional information, such as ancestral sequences, associated with specific tree nodes. The Wasabi environment supports reproducibility by automatically storing intermediate analysis steps and includes built-in functions to share data between users and publish analysis results. For computational analysis, Wasabi supports PRANK and PAGAN for phylogeny-aware alignment and alignment extension, and it can be easily extended with other tools. Along with drag-and-drop import of local files, Wasabi can access remote data through URL and import sequence data, GeneTrees and EPO alignments directly from Ensembl. To demonstrate a typical workflow using Wasabi, we reproduce key findings from recent comparative genomics studies, including a reanalysis of the EGLN1 gene from the tiger genome study: These case studies can be browsed within Wasabi at http://wasabiapp.org:8000?id=usecases. Wasabi runs inside a web browser and does not require any installation. One can start using it at http://wasabiapp.org. All source code is licensed under the AGPLv3. PMID:26635364

  17. MEGA: A biologist-centric software for evolutionary analysis of DNA and protein sequences

    PubMed Central

    Kumar, Sudhir; Nei, Masatoshi; Dudley, Joel; Tamura, Koichiro

    2008-01-01

    The Molecular Evolutionary Genetics Analysis (MEGA) software is a desktop application designed for comparative analysis of homologous gene sequences either from multigene families or from different species with a special emphasis on inferring evolutionary relationships and patterns of DNA and protein evolution. In addition to the tools for statistical analysis of data, MEGA provides many convenient facilities for the assembly of sequence data sets from files or web-based repositories, and it includes tools for visual presentation of the results obtained in the form of interactive phylogenetic trees and evolutionary distance matrices. Here we discuss the motivation, design principles, and priorities that have shaped the development of MEGA. We also discuss how MEGA might evolve in the future to assist researchers in their growing need to analyze large dataset using new computational methods. PMID:18417537

  18. HTSstation: a web application and open-access libraries for high-throughput sequencing data analysis.

    PubMed

    David, Fabrice P A; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch. PMID:24475057

  19. HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis

    PubMed Central

    David, Fabrice P. A.; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J.; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch. PMID:24475057

  20. In Silico Genome Comparison and Distribution Analysis of Simple Sequences Repeats in Cassava

    PubMed Central

    Vásquez, Andrea; López, Camilo

    2014-01-01

    We conducted a SSRs density analysis in different cassava genomic regions. The information obtained was useful to establish comparisons between cassava's SSRs genomic distribution and those of poplar, flax, and Jatropha. In general, cassava has a low SSR density (~50 SSRs/Mbp) and has a high proportion of pentanucleotides, (24,2 SSRs/Mbp). It was found that coding sequences have 15,5 SSRs/Mbp, introns have 82,3 SSRs/Mbp, 5′ UTRs have 196,1 SSRs/Mbp, and 3′ UTRs have 50,5 SSRs/Mbp. Through motif analysis of cassava's genome SSRs, the most abundant motif was AT/AT while in intron sequences and UTRs regions it was AG/CT. In addition, in coding sequences the motif AAG/CTT was also found to occur most frequently; in fact, it is the third most used codon in cassava. Sequences containing SSRs were classified according to their functional annotation of Gene Ontology categories. The identified SSRs here may be a valuable addition for genetic mapping and future studies in phylogenetic analyses and genomic evolution. PMID:25374887

  1. A Primary Sequence Analysis of the ARGONAUTE Protein Family in Plants

    PubMed Central

    Rodríguez-Leal, Daniel; Castillo-Cobián, Amanda; Rodríguez-Arévalo, Isaac; Vielle-Calzada, Jean-Philippe

    2016-01-01

    Small RNA (sRNA)-mediated gene silencing represents a conserved regulatory mechanism controlling a wide diversity of developmental processes through interactions of sRNAs with proteins of the ARGONAUTE (AGO) family. On the basis of a large phylogenetic analysis that includes 206 AGO genes belonging to 23 plant species, AGO genes group into four clades corresponding to the phylogenetic distribution proposed for the ten family members of Arabidopsis thaliana. A primary analysis of the corresponding protein sequences resulted in 50 sequences of amino acids (blocks) conserved across their linear length. Protein members of the AGO4/6/8/9 and AGO1/10 clades are more conserved than members of the AGO5 and AGO2/3/7 clades. In addition to blocks containing components of the PIWI, PAZ, and DUF1785 domains, members of the AGO2/3/7 and AGO4/6/8/9 clades possess other consensus block sequences that are exclusive of members within these clades, suggesting unforeseen functional specialization revealed by their primary sequence. We also show that AGO proteins of animal and plant kingdoms share linear sequences of blocks that include motifs involved in posttranslational modifications such as those regulating AGO2 in humans and the PIWI protein AUBERGINE in Drosophila. Our results open possibilities for exploring new structural and functional aspects related to the evolution of AGO proteins within the plant kingdom, and their convergence with analogous proteins in mammals and invertebrates.

  2. Cloning and sequence analysis of candidate human natural killer-enhancing factor genes

    SciTech Connect

    Shau, H.; Butterfield, L.H.; Chiu, R.; Kim, A.

    1994-12-31

    A cytosol factor from human red blood cells enhances natural killer (NK) activity. This factor, termed NK-enhancing factor (NKEF), is a protein of 44000 M{sub r} consisting of two subunits of equal size linked by disulfide bonds. NKEF is expressed in the NK-sensitive erythroleukemic cell line K562. Using an antibody specific for NKEF as a probe for immunoblot screening, we isolated several clones from a {lambda}gt11 cDNA library of K562. Additional subcloning and sequencing revealed that the candidate NKEF cDNAs fell into one of two categories of closely related but non-identical genes, referred to as NKEF A and B. They are 88% identical in amino acid sequence and 71% identical in nucleotide sequence. Southern blot analysis suggests that there are two to three NKEF family members in the genome. Analysis of predicted amino acid sequences indicates that both NKEF A and B are cytosol proteins with several phosphorylation sites each, but that they have no glycosylation sites. They are significantly homologous to several other proteins from a wide variety of organisms ranging from prokaryotes to mammals, especially with regard to several well-conserved motifs within the amino acid sequences. The biological functions of these proteins in other species are mostly unknown, but some of them were reported to be induced by oxidative stress. Therefore, as well as for immunoregulation of NK activity, NKEF may be important for cells in coping with oxidative insults. 32 refs., 3 figs.

  3. Infectious hypodermal and hematopoietic necrosis virus from Brazil: Sequencing, comparative analysis and PCR detection.

    PubMed

    Silva, Douglas C D; Nunes, Allan R D; Teixeira, Dárlio I A; Lima, João Paulo M S; Lanza, Daniel C F

    2014-08-30

    A 3739 nucleotide fragment of Infectious hypodermal and hematopoietic necrosis virus (IHHNV) from Brazil was amplified and sequenced. This fragment contains the entire coding sequences of viral proteins, the full 3' untranslated region (3'UTR) and a partial sequence of 5' untranslated region (5'UTR). The genome organization of IHHNV revealed the three typical major coding domains: a left ORF1 of 2001 bp that codes NS1, a left ORF2 (NS2) of 1091 bp that codes NS2 and a right ORF3 of 990 bp that codes VP. Nucleotide and amino acid sequences of the three viral proteins were compared with putative amino acid sequences of viruses reported from different regions. Comparisons among genomes from different geographic locations reveal 31 nucleotide regions that are 100% similar, distributed throughout the genome. An analysis of secondary structure of UTR regions, revealed regions with high probability to form hairpins, that may be involved in mechanisms of viral replication. Additionally, a maximum likelihood analysis indicates that Brazilian IHHNV belongs to lineage III, in the infectious IHHNV group, and is clustered with IHHNV isolates from Hawaii, China, Taiwan, Vietnam and South Korea. A new nested PCR targeting conserved nucleotide regions is proposed to detect IHHNV. PMID:24867614

  4. Additional challenges for uncertainty analysis in river engineering

    NASA Astrophysics Data System (ADS)

    Berends, Koen; Warmink, Jord; Hulscher, Suzanne

    2016-04-01

    the proposed intervention. The implicit assumption underlying such analysis is that both models are commensurable. We hypothesize that they are commensurable only to a certain extent. In an idealised study we have demonstrated that prediction performance loss should be expected with increasingly large engineering works. When accounting for parametric uncertainty of floodplain roughness in model identification, we see uncertainty bounds for predicted effects of interventions increase with increasing intervention scale. Calibration of these types of models therefore seems to have a shelf-life, beyond which calibration does not longer improves prediction. Therefore a qualification scheme for model use is required that can be linked to model validity. In this study, we characterize model use along three dimensions: extrapolation (using the model with different external drivers), extension (using the model for different output or indicators) and modification (using modified models). Such use of models is expected to have implications for the applicability of surrogating modelling for efficient uncertainty analysis as well, which is recommended for future research. Warmink, J. J.; Straatsma, M. W.; Huthoff, F.; Booij, M. J. & Hulscher, S. J. M. H. 2013. Uncertainty of design water levels due to combined bed form and vegetation roughness in the Dutch river Waal. Journal of Flood Risk Management 6, 302-318 . DOI: 10.1111/jfr3.12014

  5. Lineage analysis by microsatellite loci deep sequencing in mice.

    PubMed

    Luo, Tao; He, Xionglei; Xing, Ke

    2016-05-01

    Lineage analysis is the identification of all the progeny of a single progenitor cell, and has become particularly useful for studying developmental processes and cancer biology. Here, we propose a novel and effective method for lineage analysis that combines sequence capture and next-generation sequencing technology. Genome-wide mononucleotide and dinucleotide microsatellite loci in eight samples from two mice were identified and used to construct phylogenetic trees based on somatic indel mutations at these loci, which were unique enough to distinguish and parse samples from different mice into different groups along the lineage tree. For example, biopsies from the liver and stomach, which originate from the endoderm, were located in the same clade, while samples in kidney, which originate from the mesoderm, were located in another clade. Yet, tissue with a common developmental origin may still contain cells of a mixed ancestry. This genome-wide approach thus provides a non-invasive lineage analysis method based on mutations that accumulate in the genomes of opaque multicellular organism somatic cells. Mol. Reprod. Dev. 83: 387-391, 2016. © 2016 Wiley Periodicals, Inc. PMID:26932355

  6. Trypanosoma cruzi: sequence analysis of the variable region of kinetoplast minicircles.

    PubMed

    Telleria, Jenny; Lafay, Bénédicte; Virreira, Myrna; Barnabé, Christian; Tibayrenc, Michel; Svoboda, Michal

    2006-12-01

    The comparisons of 170 sequences of kinetoplast DNA minicircle hypervariable region obtained from 19 stocks of Trypanosoma cruzi and 2 stocks of Trypanosoma cruzi marenkellei showed that only 56% exhibited a significant homology one with other sequences. These sequences could be grouped into homology classes showing no significant sequence similarity with any other homology group. The 44% remaining sequences thus corresponded to unique sequences in our data set. In the DTU I ("Discrete Typing Units") 51% of the sequences were unique. In contrast, in the DTU IId, 87.5% of sequences were distributed into three classes. The results obtained for T. cruzi marinkellei, showed that all sequences were unique, without any similarity between them and T. cruzi sequences. Analysis of palindromes in all sequence sets show high frequency of the EcoRI site. Analysis of repetitive sequences suggested a common ancestral origin of the kDNA. The editing mechanism that occurs in kinetoplastidae is discussed. PMID:16730709

  7. Integrated visual analysis of protein structures, sequences, and feature data

    PubMed Central

    2015-01-01

    Background To understand the molecular mechanisms that give rise to a protein's function, biologists often need to (i) find and access all related atomic-resolution 3D structures, and (ii) map sequence-based features (e.g., domains, single-nucleotide polymorphisms, post-translational modifications) onto these structures. Results To streamline these processes we recently developed Aquaria, a resource offering unprecedented access to protein structure information based on an all-against-all comparison of SwissProt and PDB sequences. In this work, we provide a requirements analysis for several frequently occuring tasks in molecular biology and describe how design choices in Aquaria meet these requirements. Finally, we show how the interface can be used to explore features of a protein and gain biologically meaningful insights in two case studies conducted by domain experts. Conclusions The user interface design of Aquaria enables biologists to gain unprecedented access to molecular structures and simplifies the generation of insight. The tasks involved in mapping sequence features onto structures can be conducted easier and faster using Aquaria. PMID:26329268

  8. BLASTGrabber: a bioinformatic tool for visualization, analysis and sequence selection of massive BLAST data

    PubMed Central

    2014-01-01

    Background Advances in sequencing efficiency have vastly increased the sizes of biological sequence databases, including many thousands of genome-sequenced species. The BLAST algorithm remains the main search engine for retrieving sequence information, and must consequently handle data on an unprecedented scale. This has been possible due to high-performance computers and parallel processing. However, the raw BLAST output from contemporary searches involving thousands of queries becomes ill-suited for direct human processing. Few programs attempt to directly visualize and interpret BLAST output; those that do often provide a mere basic structuring of BLAST data. Results Here we present a bioinformatics application named BLASTGrabber suitable for high-throughput sequencing analysis. BLASTGrabber, being implemented as a Java application, is OS-independent and includes a user friendly graphical user interface. Text or XML-formatted BLAST output files can be directly imported, displayed and categorized based on BLAST statistics. Query names and FASTA headers can be analysed by text-mining. In addition to visualizing sequence alignments, BLAST data can be ordered as an interactive taxonomy tree. All modes of analysis support selection, export and storage of data. A Java interface-based plugin structure facilitates the addition of customized third party functionality. Conclusion The BLASTGrabber application introduces new ways of visualizing and analysing massive BLAST output data by integrating taxonomy identification, text mining capabilities and generic multi-dimensional rendering of BLAST hits. The program aims at a non-expert audience in terms of computer skills; the combination of new functionalities makes the program flexible and useful for a broad range of operations. PMID:24885091

  9. Experience using web services for biological sequence analysis.

    PubMed

    Stockinger, Heinz; Attwood, Teresa; Chohan, Shahid Nadeem; Côté, Richard; Cudré-Mauroux, Philippe; Falquet, Laurent; Fernandes, Pedro; Finn, Robert D; Hupponen, Taavi; Korpelainen, Eija; Labarga, Alberto; Laugraud, Aurelie; Lima, Tania; Pafilis, Evangelos; Pagni, Marco; Pettifer, Steve; Phan, Isabelle; Rahman, Nazim

    2008-11-01

    Programmatic access to data and tools through the web using so-called web services has an important role to play in bioinformatics. In this article, we discuss the most popular approaches based on SOAP/WS-I and REST and describe our, a cross section of the community, experiences with providing and using web services in the context of biological sequence analysis. We briefly review main technological approaches as well as best practice hints that are useful for both users and developers. Finally, syntactic and semantic data integration issues with multiple web services are discussed. PMID:18621748

  10. Determining physical constraints in transcriptional initiationcomplexes using DNA sequence analysis

    SciTech Connect

    Shultzaberger, Ryan K.; Chiang, Derek Y.; Moses, Alan M.; Eisen,Michael B.

    2007-07-01

    Eukaryotic gene expression is often under the control ofcooperatively acting transcription factors whose binding is limited bystructural constraints. By determining these structural constraints, wecan understand the "rules" that define functional cooperativity.Conversely, by understanding the rules of binding, we can inferstructural characteristics. We have developed an information theory basedmethod for approximating the physical limitations of cooperativeinteractions by comparing sequence analysis to microarray expressiondata. When applied to the coordinated binding of the sulfur amino acidregulatory protein Met4 by Cbf1 and Met31, we were able to create acombinatorial model that can correctly identify Met4 regulatedgenes.

  11. Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis

    PubMed Central

    Ré, Miguel A.; Azad, Rajeev K.

    2014-01-01

    Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms. PMID:24728338

  12. Generalization of entropy based divergence measures for symbolic sequence analysis.

    PubMed

    Ré, Miguel A; Azad, Rajeev K

    2014-01-01

    Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms. PMID:24728338

  13. Sequence analysis of the Choristoneura occidentalis granulovirus genome.

    PubMed

    Escasa, Shannon R; Lauzon, Hilary A M; Mathur, Amanda C; Krell, Peter J; Arif, Basil M

    2006-07-01

    The genome of the Choristoneura occidentalis granulovirus (ChocGV) isolated from the western spruce budworm, Choristoneura occidentalis, was sequenced completely. It was 104,710 bp long, with a 67.3% A+T content and contained 116 potential open reading frames (ORFs) covering 88.4% of the genome. Of these, 29 ORFs were conserved in all fully sequenced baculovirus genomes, 30 were GV-specific, 53 were present in some nucleopolyhedroviruses (NPVs) and/or GVs, three were common to ChocGV and Choristoneura fumiferana GV (ChfuGV) and one was so far unique. To date, ChocGV is the only GV identified that contains a homologue of the apoptosis inhibitor protein P35/P49, present in some group I NPVs. It is also the first GV without a Xestia c-nigrum GV ORF 26 homologue. Five homologous regions (hrs)/repeat regions, lacking typical NPV hr palindromes were identified. ChocGV hrs were similar to each other but not to other GV hrs. A 1.8 kb repeat region with a high A+T content (81%) and multiple repeats of 21-210 bp was found between choc36 and 37. This area resembled the non-homologous region origin of DNA replication (non-hr ori) identified in Cryptophlebia leucotreta GV (CrleGV) and Cydia pomonella GV (CpGV). Based on the mean amino acid identities of homologous proteins, ChocGV was closest to fully sequenced genomes CpGV (52.3%) and CrleGV (52.1%). The closest amino acid identity was to individual ORFs from the partially sequenced ChfuGV genome (97.2% in 38 ORFs). Phylogenetic analysis placed ChocGV in a clade with CrleGV and CpGV. PMID:16760394

  14. Isolation and sequence analysis of napin seed specific promoter from Iranian Rapeseed (Brassica napus L.).

    PubMed

    Sohrabi, Maryam; Zebarjadi, Alireza; Najaphy, Abdollah; Kahrizi, Danial

    2015-06-01

    Rapeseed (Brassica napus L.) has become an important crop during the last 30years. In addition to a high lipid level, the seeds also have a significant protein content, which constitutes 20-25% of the dry seed weight. The synthesis of storage proteins is primarily controlled at transcriptional level and seed-specific expression has been shown to be conferred upon the promoter regions of many storage protein genes. Napin is one of the main storage proteins in rapeseed(')s embryo that is produced in seed developing stage. Its promoter region located at 5' upstream of the napin gene has already been isolated (GenBank number, EU416279.1). In current research, seed-specific promoter (napin) of Iranian B. napus L. was isolated from the genomic DNA and cloned into pBI121 plant binary vector to use in future researches. For this purpose, the napin promoter was amplified by PCR method using specific primers, cloned in pSK(+) vector and sequenced. Sequencing analysis showed that the cloned promoter contained all of conserved motifs such as TATA box (TATAAA), RY repeats (CATGCA), dist-B (TCAAACACC) and prox-B elements (GCCACTTGTC), G-box (CACGTG) and CAAT Motifs, which constituted the seed-specific promoter activity and according to this analysis, the seed-specific promoter activity of cloned sequence was predicted. Based on sequence distances of nucleotide sequences, our sequence had the highest similarity (99.8%) whit B. napus sequence (with EU416279.1 accession number). Finally the promoter obtained might be interesting not only as a useful tool for biotechnological application but also for fundamental research. PMID:25797503

  15. Comparing Apples and Oranges?: Next Generation Sequencing and Its Impact on Microbiome Analysis

    PubMed Central

    Sleator, Roy D.; O’ Driscoll, Aisling; Stanton, Catherine; Cotter, Paul D.; Claesson, Marcus J.

    2016-01-01

    Rapid advancements in sequencing technologies along with falling costs present widespread opportunities for microbiome studies across a vast and diverse array of environments. These impressive technological developments have been accompanied by a considerable growth in the number of methodological variables, including sampling, storage, DNA extraction, primer pairs, sequencing technology, chemistry version, read length, insert size, and analysis pipelines, amongst others. This increase in variability threatens to compromise both the reproducibility and the comparability of studies conducted. Here we perform the first reported study comparing both amplicon and shotgun sequencing for the three leading next-generation sequencing technologies. These were applied to six human stool samples using Illumina HiSeq, MiSeq and Ion PGM shotgun sequencing, as well as amplicon sequencing across two variable 16S rRNA gene regions. Notably, we found that the factor responsible for the greatest variance in microbiota composition was the chosen methodology rather than the natural inter-individual variance, which is commonly one of the most significant drivers in microbiome studies. Amplicon sequencing suffered from this to a large extent, and this issue was particularly apparent when the 16S rRNA V1-V2 region amplicons were sequenced with MiSeq. Somewhat surprisingly, the choice of taxonomic binning software for shotgun sequences proved to be of crucial importance with even greater discriminatory power than sequencing technology and choice of amplicon. Optimal N50 assembly values for the HiSeq was obtained for 10 million reads per sample, whereas the applied MiSeq and PGM sequencing depths proved less sufficient for shotgun sequencing of stool samples. The latter technologies, on the other hand, provide a better basis for functional gene categorisation, possibly due to their longer read lengths. Hence, in addition to highlighting methodological biases, this study demonstrates the

  16. Comparing Apples and Oranges?: Next Generation Sequencing and Its Impact on Microbiome Analysis.

    PubMed

    Clooney, Adam G; Fouhy, Fiona; Sleator, Roy D; O' Driscoll, Aisling; Stanton, Catherine; Cotter, Paul D; Claesson, Marcus J

    2016-01-01

    Rapid advancements in sequencing technologies along with falling costs present widespread opportunities for microbiome studies across a vast and diverse array of environments. These impressive technological developments have been accompanied by a considerable growth in the number of methodological variables, including sampling, storage, DNA extraction, primer pairs, sequencing technology, chemistry version, read length, insert size, and analysis pipelines, amongst others. This increase in variability threatens to compromise both the reproducibility and the comparability of studies conducted. Here we perform the first reported study comparing both amplicon and shotgun sequencing for the three leading next-generation sequencing technologies. These were applied to six human stool samples using Illumina HiSeq, MiSeq and Ion PGM shotgun sequencing, as well as amplicon sequencing across two variable 16S rRNA gene regions. Notably, we found that the factor responsible for the greatest variance in microbiota composition was the chosen methodology rather than the natural inter-individual variance, which is commonly one of the most significant drivers in microbiome studies. Amplicon sequencing suffered from this to a large extent, and this issue was particularly apparent when the 16S rRNA V1-V2 region amplicons were sequenced with MiSeq. Somewhat surprisingly, the choice of taxonomic binning software for shotgun sequences proved to be of crucial importance with even greater discriminatory power than sequencing technology and choice of amplicon. Optimal N50 assembly values for the HiSeq was obtained for 10 million reads per sample, whereas the applied MiSeq and PGM sequencing depths proved less sufficient for shotgun sequencing of stool samples. The latter technologies, on the other hand, provide a better basis for functional gene categorisation, possibly due to their longer read lengths. Hence, in addition to highlighting methodological biases, this study demonstrates the

  17. Review of Current Methods, Applications, and Data Management for the Bioinformatics Analysis of Whole Exome Sequencing

    PubMed Central

    Bao, Riyue; Huang, Lei; Andrade, Jorge; Tan, Wei; Kibbe, Warren A; Jiang, Hongmei; Feng, Gang

    2014-01-01

    The advent of next-generation sequencing technologies has greatly promoted advances in the study of human diseases at the genomic, transcriptomic, and epigenetic levels. Exome sequencing, where the coding region of the genome is captured and sequenced at a deep level, has proven to be a cost-effective method to detect disease-causing variants and discover gene targets. In this review, we outline the general framework of whole exome sequence data analysis. We focus on established bioinformatics tools and applications that support five analytical steps: raw data quality assessment, pre-processing, alignment, post-processing, and variant analysis (detection, annotation, and prioritization). We evaluate the performance of open-source alignment programs and variant calling tools using simulated and benchmark datasets, and highlight the challenges posed by the lack of concordance among variant detection tools. Based on these results, we recommend adopting multiple tools and resources to reduce false positives and increase the sensitivity of variant calling. In addition, we briefly discuss the current status and solutions for big data management, analysis, and summarization in the field of bioinformatics. PMID:25288881

  18. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    PubMed

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species. PMID:15888677

  19. Genome Sequencing and Analysis of Catopsilia pomona nucleopolyhedrovirus: A Distinct Species in Group I Alphabaculovirus

    PubMed Central

    Wang, Jun; Zhu, Zheng; Zhang, Lei; Hou, Dianhai; Wang, Manli; Arif, Basil; Kou, Zheng; Wang, Hualin; Deng, Fei; Hu, Zhihong

    2016-01-01

    The genome sequence of Catopsilia pomona nucleopolyhedrovirus (CapoNPV) was determined by the Roche 454 sequencing system. The genome consisted of 128,058 bp and had an overall G+C content of 40%. There were 130 hypothetical open reading frames (ORFs) potentially encoding proteins of more than 50 amino acids and covering 92% of the genome. Among all the hypothetical ORFs, 37 baculovirus core genes, 23 lepidopteran baculovirus conserved genes and 10 genes conserved in Group I alphabaculoviruses were identified. In addition, the genome included regions of 8 typical baculoviral homologous repeat sequences (hrs). Phylogenic analysis showed that CapoNPV was in a distinct branch of clade “a” in Group I alphabaculoviruses. Gene parity plot analysis and overall similarity of ORFs indicated that CapoNPV is more closely related to the Group I alphabaculoviruses than to other baculoviruses. Interesting, CapoNPV lacks the genes encoding the fibroblast growth factor (fgf) and ac30, which are conserved in most lepidopteran and Group I baculoviruses, respectively. Sequence analysis of the F-like protein of CapoNPV showed that some amino acids were inserted into the fusion peptide region and the pre-transmembrane region of the protein. All these unique features imply that CapoNPV represents a member of a new baculovirus species. PMID:27166956

  20. Structural and transcriptional analysis of human papillomavirus type 16 sequences in cervical carcinoma cell lines.

    PubMed Central

    Baker, C C; Phelps, W C; Lindgren, V; Braun, M J; Gonda, M A; Howley, P M

    1987-01-01

    We cloned and analyzed the integrated human papillomavirus type 16 (HPV-16) genomes that are present in the human cervical carcinoma cell lines SiHa and CaSki. The single HPV-16 genome in the SiHa line was cloned as a 10-kilobase (kb) HindIII fragment. Integration of the HPV-16 genome occurred at bases 3132 and 3384 with disruption of the E2 and E4 open reading frames (ORFs). An additional 52-base-pair deletion of HPV-16 sequences fused the E2 and E4 ORFs. the 5' portion of the disrupted E2 ORF terminated immediately in the contiguous human right-flanking sequences. Heteroduplex analysis of this cloned integrated viral genome with the prototype HPV-16 DNA revealed no other deletions, insertions, or rearrangements. DNA sequence analysis of the E1 ORF, however, revealed the presence of an additional guanine at nucleotide 1138, resulting in the fusion of the E1a and E1b ORFs into a single E1 ORF. Sequence analysis of the human flanking sequences revealed one-half of an Alu sequence at the left junction and a sequence highly homologous to the human O repeat in the right-flanking region. Analysis of the three most abundant BamHI clones from the CaSki line showed that these consisted of full-length, 7.9-kb HPV-16 DNA; a 6.5-kb genome resulting from a 1.4-kb deletion of the long control region; and a 10.5-kb clone generated by a 2.6-kb tandem repeat of the 3' early region. These HPV-16 genomes were arranged in the host chromosomes as head-to-tail, tandemly repeated arrays. Transcription analysis revealed expression of the HPV-16 genome in each of these two cervical carcinoma cell lines, albeit at significantly different levels. Preliminary mapping of the viral RNA with subgenomic strand-specific probes indicated that viral transcription appeared to be derived primarily from the E6 and E7 ORFs. Images PMID:3029430

  1. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    PubMed Central

    Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461

  2. Data Analysis for Sequencing by Hybridization (SBH) Experiments

    SciTech Connect

    Salbego, David

    1995-11-28

    SCORES is user friendly software designed to analyze data from SBH (Sequencing By Hybridization) experiments. In these ANL experiments DNA samples are spotted on a nylon membrane and hybridized with radioactivity labeled oligonucleotide probes. An image analysis program (DOTS) calculates a raw value for each DNA dot from images generated by the Molecular Dynamics Phosphorimager. SCORES reads in the DOTS output for each hybridization done for a particular filter. The data for each probe is normalized against a mass probe and scaled properly. These values from 100 or more probes are then used to compute the distance (i.e., degree of similarity) between any two clones on the filter. These calculated distances define clusters of similar clones (cDNA)or contigs (genomic DNA). Histograms of the data at each stage of analysis to establish thresholds for further steps. SCORES generates various statistical tables to evaluate the quality of spotting, hybridization of filters, and of individual dots.

  3. Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud.

    PubMed

    Griffith, Malachi; Walker, Jason R; Spies, Nicholas C; Ainscough, Benjamin J; Griffith, Obi L

    2015-08-01

    Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki. PMID:26248053

  4. Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud

    PubMed Central

    Griffith, Malachi; Walker, Jason R.; Spies, Nicholas C.; Ainscough, Benjamin J.; Griffith, Obi L.

    2015-01-01

    Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki. PMID:26248053

  5. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.

    PubMed

    Bonham-Carter, Oliver; Steele, Joe; Bastola, Dhundy

    2014-11-01

    Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base-base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel-Ziv techniques from data compression. PMID:23904502

  6. Deep Sequencing Analysis of the Ixodes ricinus Haemocytome

    PubMed Central

    Franta, Zdeněk; Pedra, Joao H. F.; Ribeiro, José M. C.

    2015-01-01

    Background Ixodes ricinus is the main tick vector of the microbes that cause Lyme disease and tick-borne encephalitis in Europe. Pathogens transmitted by ticks have to overcome innate immunity barriers present in tick tissues, including midgut, salivary glands epithelia and the hemocoel. Molecularly, invertebrate immunity is initiated when pathogen recognition molecules trigger serum or cellular signalling cascades leading to the production of antimicrobials, pathogen opsonization and phagocytosis. We presently aimed at identifying hemocyte transcripts from semi-engorged female I. ricinus ticks by mass sequencing a hemocyte cDNA library and annotating immune-related transcripts based on their hemocyte abundance as well as their ubiquitous distribution. Methodology/principal findings De novo assembly of 926,596 pyrosequence reads plus 49,328,982 Illumina reads (148 nt length) from a hemocyte library, together with over 189 million Illumina reads from salivary gland and midgut libraries, generated 15,716 extracted coding sequences (CDS); these are displayed in an annotated hyperlinked spreadsheet format. Read mapping allowed the identification and annotation of tissue-enriched transcripts. A total of 327 transcripts were found significantly over expressed in the hemocyte libraries, including those coding for scavenger receptors, antimicrobial peptides, pathogen recognition proteins, proteases and protease inhibitors. Vitellogenin and lipid metabolism transcription enrichment suggests fat body components. We additionally annotated ubiquitously distributed transcripts associated with immune function, including immune-associated signal transduction proteins and transcription factors, including the STAT transcription factor. Conclusions/significance This is the first systems biology approach to describe the genes expressed in the haemocytes of this neglected disease vector. A total of 2,860 coding sequences were deposited to GenBank, increasing to 27,547 the number so

  7. Reverse transcriptase domain sequences from tree peony (Paeonia suffruticosa) long terminal repeat retrotransposons: sequence characterization and phylogenetic analysis

    PubMed Central

    Guo, Da-Long; Hou, Xiao-Gai; Jia, Tian

    2014-01-01

    Tree peony is an important horticultural plant worldwide of great ornamental and medicinal value. Long terminal repeat retrotransposons (LTR-retrotransposons) are the major components of most plant genomes and can substantially impact the genome in many ways. It is therefore crucial to understand their sequence characteristics, genetic distribution and transcriptional activity; however, no information about them is available in tree peony. Ty1-copia-like reverse transcriptase sequences were amplified from tree peony genomic DNA by polymerase chain reaction (PCR) with degenerate oligonucleotide primers corresponding to highly conserved domains of the Ty1-copia-like retrotransposons in this study. PCR fragments of roughly 270 bp were isolated and cloned, and 33 sequences were obtained. According to alignment and phylogenetic analysis, all sequences were divided into six families. The observed difference in the degree of nucleotide sequence similarity is an indication for high level of sequence heterogeneity among these clones. Most of these sequences have a frame shift, a stop codon, or both. Dot-blot analysis revealed distribution of these sequences in all the studied tree peony species. However, different hybridization signals were detected among them, which is in agreement with previous systematics studies. Reverse transcriptase PCR (RT-PCR) indicated that Ty1-copia retrotransposons in tree peony were transcriptionally inactive. The results provide basic genetic and evolutionary information of tree peony genome, and will provide valuable information for the further utilization of retrotransposons in tree peony. PMID:26019529

  8. Universal primers for the amplification and sequence analysis of actin-1 from diverse mosquito species.

    PubMed

    Staley, Molly; Dorman, Karin S; Bartholomay, Lyric C; Fernández-Salas, Ildefonso; Farfan-Ale, Jose A; Loroño-Pino, Maria A; Garcia-Rejon, Julian E; Ibarra-Juarez, Luis; Blitvich, Bradley J

    2010-06-01

    We report the development of universal primers for the reverse-transcription polymerase chain reaction (RT-PCR) amplification and nucleotide sequence analysis of actin cDNAs from taxonomically diverse mosquito species. Primers specific to conserved regions of the invertebrate actin-1 gene were designed after actin cDNA sequences of Anopheles gambiae, Bombyx mori, Drosophila melanogaster, and Caenorhabditis elegans. The efficacy of these primers was determined by RT-PCR with the use of total RNA from mosquitoes belonging to 30 species and 8 genera (Aedes, Anopheles, Culex, Deinocerites, Mansonia, Psorophora, Toxorhynchites, and Wyeomyia). The RT-PCR products were sequenced, and sequence data were used to design additional primers. One primer pair, denoted as Act-2F (5'-ATGGTCGGYATGGGNCAGAAGGACTC-3') and Act-8R (5'-GATTCCATACCCAGGAAGGADGG-3'), successfully amplified an RT-PCR product of the expected size (683-nt) in all mosquito spp. tested. We propose that this primer pair can be used as an internal control to test the quality of RNA from mosquitoes collected in vector surveillance studies. These primers can also be used in molecular experiments in which the detection, amplification or silencing of a ubiquitously expressed mosquito housekeeping gene is necessary. Sequence and phylogenetic data are also presented in this report. PMID:20649132

  9. Systematic sequencing of the Escherichia coli genome: analysis of the 0-2.4 min region.

    PubMed Central

    Yura, T; Mori, H; Nagai, H; Nagata, T; Ishihama, A; Fujita, N; Isono, K; Mizobuchi, K; Nakata, A

    1992-01-01

    A contiguous 111,402-nucleotide sequence corresponding to the 0 to 2.4 min region of the E. coli chromosome was determined as a first step to complete structural analysis of the genome. The resulting sequence was used to predict open reading frames and to search for sequence similarity against the PIR protein database. A number of novel genes were found whose predicted protein sequences showed significant homology with known proteins from various organisms, including several clusters of genes similar to those involved in fatty acid metabolism in bacteria (e.g., betT, baiF) and higher organisms, iron transport (sfuA, B, C) in Serratia marcescens, and symbiotic nitrogen fixation or electron transport (fixA, B, C, X) in Azorhizobium caulinodans. In addition, several genes and IS elements that had been mapped but not sequenced (e.g., leuA, B, C, D) were identified. We estimate that about 90 genes are represented in this region of the chromosome with little spacer. Images PMID:1630901

  10. Analysis of Long-Term Precipitation Sequencing Pattern Changes in North America

    NASA Astrophysics Data System (ADS)

    Roque, S.; Kumar, P.

    2015-12-01

    This study evaluates changes in long-term precipitation patterns in North America, focusing specifically on precipitation sequencing. Previous precipitation studies have explored changes in extreme precipitation events, intensity, and distribution, but sequencing changes and their effects are still largely not understood. Precipitation sequencing, or the overall temporal pattern of precipitation events, is a vital yet often overlooked part of developing long-term climate predictions; the assumption of long-term stationarity in climate variability, which suggests that past observed temporal patterns are likely to continue and can therefore be projected, weakens the robustness of climate models. Additionally, changes in sequencing could be a driver for fluctuations in the highly interconnected hydrologic cycle, meaning that tipping points and critical changes in the cycle could be better anticipated given a more complete picture of long-term temporal patterns. Analysis was based on precipitation data collected by the National Climatic Data Center for approximately 9000 stations in North America. Temporal patterns recorded at each station - the sequence of consecutive days with or without rain and the lengths of those increments - were reviewed and compared on a decadal and seasonal scale. Comparisons to date indicate that long-term precipitation patterns are non-stationary and therefore cannot be relied upon for long-term climate projections. It remains to be seen how exactly regional temporal patterns have fluctuated over time in North America, and results could provide interesting insight into observed hydrologic changes or serve to reinforce existing theories regarding regional hydrologic studies.

  11. A Markovian analysis of bacterial genome sequence constraints

    PubMed Central

    Skewes, Aaron D.

    2013-01-01

    The arrangement of nucleotides within a bacterial chromosome is influenced by numerous factors. The degeneracy of the third codon within each reading frame allows some flexibility of nucleotide selection; however, the third nucleotide in the triplet of each codon is at least partly determined by the preceding two. This is most evident in organisms with a strong G + C bias, as the degenerate codon must contribute disproportionately to maintaining that bias. Therefore, a correlation exists between the first two nucleotides and the third in all open reading frames. If the arrangement of nucleotides in a bacterial chromosome is represented as a Markov process, we would expect that the correlation would be completely captured by a second-order Markov model and an increase in the order of the model (e.g., third-, fourth-…order) would not capture any additional uncertainty in the process. In this manuscript, we present the results of a comprehensive study of the Markov property that exists in the DNA sequences of 906 bacterial chromosomes. All of the 906 bacterial chromosomes studied exhibit a statistically significant Markov property that extends beyond second-order, and therefore cannot be fully explained by codon usage. An unrooted tree containing all 906 bacterial chromosomes based on their transition probability matrices of third-order shares ∼25% similarity to a tree based on sequence homologies of 16S rRNA sequences. This congruence to the 16S rRNA tree is greater than for trees based on lower-order models (e.g., second-order), and higher-order models result in diminishing improvements in congruence. A nucleotide correlation most likely exists within every bacterial chromosome that extends past three nucleotides. This correlation places significant limits on the number of nucleotide sequences that can represent probable bacterial chromosomes. Transition matrix usage is largely conserved by taxa, indicating that this property is likely inherited, however some

  12. Radar image sequence analysis of inhomogeneous water surfaces

    NASA Astrophysics Data System (ADS)

    Seemann, Joerg; Senet, Christian M.; Dankert, Heiko; Hatten, Helge; Ziemer, Friedwart

    1999-10-01

    The radar backscatter from the ocean surface, called sea clutter, is modulated by the surface wave field. A method was developed to estimate the near-surface current, the water depth and calibrated surface wave spectra from nautical radar image sequences. The algorithm is based on the three- dimensional Fast Fourier Transformation (FFT) of the spatio- temporal sea clutter pattern in the wavenumber-frequency domain. The dispersion relation is used to define a filter to separate the spectral signal of the imaged waves from the background noise component caused by speckle noise. The signal-to-noise ratio (SNR) contains information about the significant wave height. The method has been proved to be reliable for the analysis of homogeneous water surfaces in offshore installations. Radar images are inhomogeneous because of the dependency of the image transfer function (ITF) on the azimuth angle between the wave propagation and the antenna viewing direction. The inhomogeneity of radar imaging is analyzed using image sequences of a homogeneous deep-water surface sampled by a ship-borne radar. Changing water depths in shallow-water regions induce horizontal gradients of the tidal current. Wave refraction occurs due to the spatial variability of the current and water depth. These areas cannot be investigated with the standard method. A new method, based on local wavenumber estimation with the multiple-signal classification (MUSIC) algorithm, is outlined. The MUSIC algorithm provides superior wavenumber resolution on local spatial scales. First results, retrieved from a radar image sequence taken from an installation at a coastal site, are presented.

  13. Sequence Analysis of the Genome of the Neodiprion sertifer Nucleopolyhedrovirus†

    PubMed Central

    Garcia-Maruniak, Alejandra; Maruniak, James E.; Zanotto, Paolo M. A.; Doumbouya, Aissa E.; Liu, Jaw-Ching; Merritt, Thomas M.; Lanoie, Jennifer S.

    2004-01-01

    The genome of the Neodiprion sertifer nucleopolyhedrovirus (NeseNPV), which infects the European pine sawfly, N. sertifer (Hymenoptera: Diprionidae), was sequenced and analyzed. The genome was 86,462 bp in size. The C+G content of 34% was lower than that of the majority of baculoviruses. A total of 90 methionine-initiated open reading frames (ORFs) with more than 50 amino acids and minimal overlapping were found. From those, 43 ORFs were homologous to other baculovirus ORFs, and 29 of these were from the 30 conserved core genes among all baculoviruses. A NeseNPV homolog to the ld130 gene, which is present in all other baculovirus genomes sequenced to date, could not be identified. Six NeseNPV ORFs were similar to non-baculovirus-related genes, one of which was a trypsin-like gene. Only one iap gene, containing a single BIR motif and a RING finger, was found in NeseNPV. Two NeseNPV ORFs (nese18 and nese19) were duplicates transcribed in opposite orientations from each other. NeseNPV did not have an AcMNPV ORF 2 homolog characterized as the baculovirus repeat ORF (bro). Six homologous regions (hrs) were located within the NeseNPV genome, each containing small palindromes embedded within direct repeats. A phylogenetic analysis was done to root the tree based upon the sequences of DNA polymerase genes of NeseNPV, 23 other baculoviruses, and other phyla. Baculovirus phylogeny was then constructed with 29 conserved genes from 24 baculovirus genomes. Culex nigripalpus nucleopolyhedrovirus (CuniNPV) was the most distantly related baculovirus, branching to the hymenopteran NeseNPV and the lepidopteran nucleopolyhedroviruses and granuloviruses. PMID:15194780

  14. Bacterial Genomic Data Analysis in the Next-Generation Sequencing Era.

    PubMed

    Orsini, Massimiliano; Cuccuru, Gianmauro; Uva, Paolo; Fotia, Giorgio

    2016-01-01

    Bacterial genome sequencing is now an affordable choice for many laboratories for applications in research, diagnostic, and clinical microbiology. Nowadays, an overabundance of tools is available for genomic data analysis. However, tools differ for algorithms, languages, hardware requirements, and user interface, and combining them as it is necessary for sequence data interpretation often requires (bio)informatics skills which can be difficult to find in many laboratories. In addition, multiple data sources, as well as exceedingly large dataset sizes, and increasingly computational complexity further challenge the accessibility, reproducibility, and transparency of the entire process. In this chapter we will cover the main bioinformatics steps required for a complete bacterial genome analysis using next-generation sequencing data, from the raw sequence data to assembled and annotated genomes. All the tools described are available in the Orione framework ( http://orione.crs4.it ), which uniquely combines in a transparent way the most used open source bioinformatics tools for microbiology, allowing microbiologist without any specific hardware or informatics skill to conduct data-intensive computational analyses from quality control to microbial gene annotation. PMID:27115645

  15. Molecular cloning, sequence characteristics, and tissue expression analysis of ECE1 gene in Tibetan pig.

    PubMed

    Wang, Yan-Dong; Zhang, Jian; Li, Chuan-Hao; Xu, Hai-Peng; Chen, Wei; Zeng, Yong-Qing; Wang, Hui

    2015-10-25

    Low air pressure and low oxygen partial pressure at high altitude seriously affect the survival and development of human beings and animals. ECE1 is a recently discovered gene that is involved in anti-hypoxia, but the full-length cDNA sequence has not been obtained. For a better understanding of the structure and function of the ECE1 gene and to study its effect in Tibetan pig, the cDNA of the ECE1 gene from the muscle of Tibetan pig was cloned, sequenced and characterized. The ECE1 full-length cDNA sequence consists of 2262 bp coding sequence (CDS) that encodes 753 amino acids with a molecular mass of 85,449 kD, 2 bp 5'UTR and 1507 bp 3'UTR. In addition, the phylogenetic tree analysis revealed that the Tibetan pig ECE1 has a closer genetic relationship and evolution distance with the land mammals ECE1. Furthermore, analysis by qPCR showed that the ECE1 transcript is constitutively expressed in the 10 tissues tested: the liver, subcutaneous fat, kidney, muscle, stomach, heart, brain, spleen, pancreas, and lung. These results serve as a foundation for further insight into the Tibetan pig ECE1 gene. PMID:26115769

  16. Implementation of Cloud based Next Generation Sequencing data analysis in a clinical laboratory

    PubMed Central

    2014-01-01

    Background The introduction of next generation sequencing (NGS) has revolutionized molecular diagnostics, though several challenges remain limiting the widespread adoption of NGS testing into clinical practice. One such difficulty includes the development of a robust bioinformatics pipeline that can handle the volume of data generated by high-throughput sequencing in a cost-effective manner. Analysis of sequencing data typically requires a substantial level of computing power that is often cost-prohibitive to most clinical diagnostics laboratories. Findings To address this challenge, our institution has developed a Galaxy-based data analysis pipeline which relies on a web-based, cloud-computing infrastructure to process NGS data and identify genetic variants. It provides additional flexibility, needed to control storage costs, resulting in a pipeline that is cost-effective on a per-sample basis. It does not require the usage of EBS disk to run a sample. Conclusions We demonstrate the validation and feasibility of implementing this bioinformatics pipeline in a molecular diagnostics laboratory. Four samples were analyzed in duplicate pairs and showed 100% concordance in mutations identified. This pipeline is currently being used in the clinic and all identified pathogenic variants confirmed using Sanger sequencing further validating the software. PMID:24885806

  17. Sequence characterization, in silico mapping and cytosine methylation analysis of markers linked to apospory in Paspalum notatum.

    PubMed

    Podio, Maricel; Rodríguez, María P; Felitti, Silvina; Stein, Juliana; Martínez, Eric J; Siena, Lorena A; Quarin, Camilo L; Pessino, Silvina C; Ortiz, Juan Pablo A

    2012-12-01

    In previous studies we reported the identification of several AFLP, RAPD and RFLP molecular markers linked to apospory in Paspalum notatum. The objective of this work was to sequence these markers, obtain their flanking regions by chromosome walking and perform an in silico mapping analysis in rice and maize. The methylation status of two apospory-related sequences was also assessed using methylation-sensitive RFLP experiments. Fourteen molecular markers were analyzed and several protein-coding sequences were identified. Copy number estimates and RFLP linkage analysis showed that the sequence PnMAI3 displayed 2-4 copies per genome and linkage to apospory. Extension of this marker by chromosome walking revealed an additional protein-coding sequence mapping in silico in the apospory-syntenic regions of rice and maize. Approximately 5 kb corresponding to different markers were characterized through the global sequencing procedure. A more refined analysis based on sequence information indicated synteny with segments of chromosomes 2 and 12 of rice and chromosomes 3 and 5 of maize. Two loci associated with apomixis locus were tested in methylation-sensitive RFLP experiments using genomic DNA extracted from leaves. Although both target sequences were methylated no methylation polymorphisms associated with the mode of reproduction were detected. PMID:23271945

  18. Next Generation Sequencing-Based Analysis of Repetitive DNA in the Model Dioceous Plant Silene latifolia

    PubMed Central

    Macas, Jiří; Kejnovský, Eduard; Neumann, Pavel; Novák, Petr; Koblížková, Andrea; Vyskot, Boris

    2011-01-01

    Background Silene latifolia is a dioceous plant with well distinguished X and Y chromosomes that is used as a model to study sex determination and sex chromosome evolution in plants. However, efficient utilization of this species has been hampered by the lack of large-scale sequencing resources and detailed analysis of its genome composition, especially with respect to repetitive DNA, which makes up the majority of the genome. Methodology/Principal Findings We performed low-pass 454 sequencing followed by similarity-based clustering of 454 reads in order to identify and characterize sequences of all major groups of S. latifolia repeats. Illumina sequencing data from male and female genomes were also generated and employed to quantify the genomic proportions of individual repeat families. The majority of identified repeats belonged to LTR-retrotransposons, constituting about 50% of genomic DNA, with Ty3/gypsy elements being more frequent than Ty1/copia. While there were differences between the male and female genome in the abundance of several repeat families, their overall repeat composition was highly similar. Specific localization patterns on sex chromosomes were found for several satellite repeats using in situ hybridization with probes based on k-mer frequency analysis of Illumina sequencing data. Conclusions/Significance This study provides comprehensive information about the sequence composition and abundance of repeats representing over 60% of the S. latifolia genome. The results revealed generally low divergence in repeat composition between the sex chromosomes, which is consistent with their relatively recent origin. In addition, the study generated various data resources that are available for future exploration of the S. latifolia genome. PMID:22096552

  19. Genomic Sequencing and Analysis of Sucra jujuba Nucleopolyhedrovirus

    PubMed Central

    Liu, Xiaoping; Yin, Feifei; Zhu, Zheng; Hou, Dianhai; Wang, Jun; Zhang, Lei; Wang, Manli; Wang, Hualin; Hu, Zhihong; Deng, Fei

    2014-01-01

    The complete nucleotide sequence of Sucra jujuba nucleopolyhedrovirus (SujuNPV) was determined by 454 pyrosequencing. The SujuNPV genome was 135,952 bp in length with an A+T content of 61.34%. It contained 131 putative open reading frames (ORFs) covering 87.9% of the genome. Among these ORFs, 37 were conserved in all baculovirus genomes that have been completely sequenced, 24 were conserved in lepidopteran baculoviruses, 65 were found in other baculoviruses, and 5 were unique to the SujuNPV genome. Seven homologous regions (hrs) were identified in the SujuNPV genome. SujuNPV contained several genes that were duplicated or copied multiple times: two copies of helicase, DNA binding protein gene (dbp), p26 and cg30, three copies of the inhibitor of the apoptosis gene (iap), and four copies of the baculovirus repeated ORF (bro). Phylogenetic analysis suggested that SujuNPV belongs to a subclade of group II alphabaculovirus, which differs from other baculoviruses in that all nine members of this subclade contain a second copy of dbp. PMID:25329074

  20. Harmonic Analysis of Sedimentary Cyclic Sequences in Kansas, Midcontinent, USA

    USGS Publications Warehouse

    Merriam, D.F.; Robinson, J.E.

    1997-01-01

    Several stratigraphic sequences in the Upper Carboniferous (Pennsylvanian) in Kansas (Midcontinent, USA) were analyzed quantitatively for periodic repetitions. The sequences were coded by lithologic type into strings of datasets. The strings then were analyzed by an adaptation of a one-dimensional Fourier transform analysis and examined for evidence of periodicity. The method was tested using different states in coding to determine the robustness of the method and data. The most persistent response is in multiples of 8-10 ft (2.5-3.0 m) and probably is dependent on the depositional thickness of the original lithologic units. Other cyclicities occurred in multiples of the basic frequency of 8-10 with persistent ones at 22 and 30 feet (6.5-9.0 m) and large ones at 80 and 160 feet (25-50 m). These levels of thickness relate well to the basic cyclothem and megacyclothem as measured on outcrop. We propose that this approach is a suitable one for analyzing cyclic events in the stratigraphic record.

  1. Multilocus Sequence Analysis for Leishmania braziliensis Outbreak Investigation

    PubMed Central

    Marlow, Mariel A.; Boité, Mariana C.; Ferreira, Gabriel Eduardo M.; Steindel, Mario; Cupolillo, Elisa

    2014-01-01

    With the emergence of leishmaniasis in new regions around the world, molecular epidemiological methods with adequate discriminatory power, reproducibility, high throughput and inter-laboratory comparability are needed for outbreak investigation of this complex parasitic disease. As multilocus sequence analysis (MLSA) has been projected as the future gold standard technique for Leishmania species characterization, we propose a MLSA panel of six housekeeping gene loci (6pgd, mpi, icd, hsp70, mdhmt, mdhnc) for investigating intraspecific genetic variation of L. (Viannia) braziliensis strains and compare the resulting genetic clusters with several epidemiological factors relevant to outbreak investigation. The recent outbreak of cutaneous leishmaniasis caused by L. (V.) braziliensis in the southern Brazilian state of Santa Catarina is used to demonstrate the applicability of this technique. Sequenced fragments from six genetic markers from 86 L. (V.) braziliensis strains from twelve Brazilian states, including 33 strains from Santa Catarina, were used to determine clonal complexes, genetic structure, and phylogenic networks. Associations between genetic clusters and networks with epidemiological characteristics of patients were investigated. MLSA revealed epidemiological patterns among L. (V.) braziliensis strains, even identifying strains from imported cases among the Santa Catarina strains that presented extensive homogeneity. Evidence presented here has demonstrated MLSA possesses adequate discriminatory power for outbreak investigation, as well as other potential uses in the molecular epidemiology of leishmaniasis. PMID:24551258

  2. Whale song analyses using bioinformatics sequence analysis approaches

    NASA Astrophysics Data System (ADS)

    Chen, Yian A.; Almeida, Jonas S.; Chou, Lien-Siang

    2005-04-01

    Animal songs are frequently analyzed using discrete hierarchical units, such as units, themes and songs. Because animal songs and bio-sequences may be understood as analogous, bioinformatics analysis tools DNA/protein sequence alignment and alignment-free methods are proposed to quantify the theme similarities of the songs of false killer whales recorded off northeast Taiwan. The eighteen themes with discrete units that were identified in an earlier study [Y. A. Chen, masters thesis, University of Charleston, 2001] were compared quantitatively using several distance metrics. These metrics included the scores calculated using the Smith-Waterman algorithm with the repeated procedure; the standardized Euclidian distance and the angle metrics based on word frequencies. The theme classifications based on different metrics were summarized and compared in dendrograms using cluster analyses. The results agree with earlier classifications derived by human observation qualitatively. These methods further quantify the similarities among themes. These methods could be applied to the analyses of other animal songs on a larger scale. For instance, these techniques could be used to investigate song evolution and cultural transmission quantifying the dissimilarities of humpback whale songs across different seasons, years, populations, and geographic regions. [Work supported by SC Sea Grant, and Ilan County Government, Taiwan.

  3. Complete nucleotide sequence and transcriptional analysis of snakehead fish retrovirus.

    PubMed Central

    Hart, D; Frerichs, G N; Rambaut, A; Onions, D E

    1996-01-01

    The complete genome of the snakehead fish retrovirus has been cloned and sequenced, and its transcriptional profile in cell culture has been determined. The 11.2-kb provirus displays a complex expression pattern capable of encoding accessory proteins and is unique in the predicted location of the env initiation codon and signal peptide upstream of gag and the common splice donor site. The virus is distinguishable from all known retrovirus groups by the presence of an arginine tRNA primer binding site. The coding regions are highly divergent and show a number of unusual characteristics, including a large Gag coiled-coil region, a Pol domain of unknown function, and a long, lentiviral-like, Env cytoplasmic domain. Phylogenetic analysis of the Pol sequence emphasizes the divergent nature of the virus from the avian and mammalian retroviruses. The snakehead virus is also distinct from a previously characterized complex fish retrovirus, suggesting that discrete groups of these viruses have yet to be identified in the lower vertebrates. PMID:8648695

  4. Comparative Topological Analysis of Neuronal Arbors via Sequence Representation and Alignment

    NASA Astrophysics Data System (ADS)

    Gillette, Todd Aaron

    neocortical pyramidal cell axons and rodent neocortical dendritic targeting interneurons to be substantially more asymmetric than perisomatic-targeting interneurons. With optimization techniques adapted from the field of genomic alignment, these methods compose a framework with the potential to be made orders of magnitude more efficient. Moreover, the framework is capable of handling expanded sequence representations that include additional branch features, enabling analysis of correspondence and joint conservation of various morphological characteristics.

  5. Gnome--an Internet-based sequence analysis tool.

    PubMed

    Nakai, K; Tokimori, T; Ogiwara, A; Uchiyama, I; Niiyama, T

    1994-09-01

    Gnome (GenomeNet Open Mail-service Environment) is a sequence analysis tool that enables an end-user to make use of several Internet- (mainly e-mail) based services with an easy-to-use graphical user interface. Users can conduct homology and motif searches, and database-entry retrieval against the latest databases by emitting search requests to and receiving their results form a search-server by e-mail. The search results are viewed and managed efficiently with this system. The Macintosh and X (Motif) versions of the Gnome client and the UNIX version of the Gnome server are available to academic users free of charge. PMID:7828072

  6. Human factors review for Severe Accident Sequence Analysis (SASA)

    SciTech Connect

    Krois, P.A.; Haas, P.M.; Manning, J.J.; Bovell, C.R.

    1984-01-01

    The paper will discuss work being conducted during this human factors review including: (1) support of the Severe Accident Sequence Analysis (SASA) Program based on an assessment of operator actions, and (2) development of a descriptive model of operator severe accident management. Research by SASA analysts on the Browns Ferry Unit One (BF1) anticipated transient without scram (ATWS) was supported through a concurrent assessment of operator performance to demonstrate contributions to SASA analyses from human factors data and methods. A descriptive model was developed called the Function Oriented Accident Management (FOAM) model, which serves as a structure for bridging human factors, operations, and engineering expertise and which is useful for identifying needs/deficiencies in the area of accident management. The assessment of human factors issues related to ATWS required extensive coordination with SASA analysts. The analysis was consolidated primarily to six operator actions identified in the Emergency Procedure Guidelines (EPGs) as being the most critical to the accident sequence. These actions were assessed through simulator exercises, qualitative reviews, and quantitative human reliability analyses. The FOAM descriptive model assumes as a starting point that multiple operator/system failures exceed the scope of procedures and necessitates a knowledge-based emergency response by the operators. The FOAM model provides a functionally-oriented structure for assembling human factors, operations, and engineering data and expertise into operator guidance for unconventional emergency responses to mitigate severe accident progression and avoid/minimize core degradation. Operators must also respond to potential radiological release beyond plant protective barriers. Research needs in accident management and potential uses of the FOAM model are described. 11 references, 1 figure.

  7. Multiple Comparison Analysis of Two New Genomic Sequences of ILTV Strains from China with Other Strains from Different Geographic Regions

    PubMed Central

    Zhao, Yan; Kong, Congcong; Wang, Yunfeng

    2015-01-01

    To date, twenty complete genome sequences of ILTV strains have been published in GenBank, including one strain from China, and nineteen strains from Australian and the United States. To investigate the genomic information on ILTVs from different geographic regions, two additional individual complete genome sequences of WG and K317 strains from China were determined. The genomes of WG and K317 strains were 153,505 and 153,639 bp in length, respectively. Alignments performed on the amino acid sequences of the twelve glycoproteins showed that 13 out of 116 mutational sites were present only among the Chinese strain WG and the Australian strains SA2 and A20. The phylogenetic tree analysis suggested that the WG strain established close relationships with the Australian strain SA2. The recombination events were detected and confirmed in different subregions of the WG strain with the sequences of SA2 and K317 strains as parental. In this study, two new complete genome sequences of Chinese ILTV strains were used in comparative analysis with other complete genome sequences of ILTV strains from China, the United States, and Australia. The analysis of genome comparison, phylogenetic trees, and recombination events showed close relationships among the Chinese strain WG and the Australian strains SA2. The information of the two new complete genome sequences from China will help to facilitate the analysis of phylogenetic relationships and the molecular differences among ILTV strains from different geographic regions. PMID:26186451

  8. Multiple Comparison Analysis of Two New Genomic Sequences of ILTV Strains from China with Other Strains from Different Geographic Regions.

    PubMed

    Zhao, Yan; Kong, Congcong; Wang, Yunfeng

    2015-01-01

    To date, twenty complete genome sequences of ILTV strains have been published in GenBank, including one strain from China, and nineteen strains from Australian and the United States. To investigate the genomic information on ILTVs from different geographic regions, two additional individual complete genome sequences of WG and K317 strains from China were determined. The genomes of WG and K317 strains were 153,505 and 153,639 bp in length, respectively. Alignments performed on the amino acid sequences of the twelve glycoproteins showed that 13 out of 116 mutational sites were present only among the Chinese strain WG and the Australian strains SA2 and A20. The phylogenetic tree analysis suggested that the WG strain established close relationships with the Australian strain SA2. The recombination events were detected and confirmed in different subregions of the WG strain with the sequences of SA2 and K317 strains as parental. In this study, two new complete genome sequences of Chinese ILTV strains were used in comparative analysis with other complete genome sequences of ILTV strains from China, the United States, and Australia. The analysis of genome comparison, phylogenetic trees, and recombination events showed close relationships among the Chinese strain WG and the Australian strains SA2. The information of the two new complete genome sequences from China will help to facilitate the analysis of phylogenetic relationships and the molecular differences among ILTV strains from different geographic regions. PMID:26186451

  9. Transcriptome analysis of Emiliania huxleyi cells grown under different conditions using high-throughput sequencing data

    NASA Astrophysics Data System (ADS)

    Andreson, R.; Anlauf, H.; Mackinder, L.; Iglesias-Rodriguez, D.; LaRoche, J.; Lenhard, B.

    2012-04-01

    Coccolithophores are ideal for studying genes responsible for biomineralization processes due to relatively small genome sizes, ability to grow in culture, and as a natural model system for measuring expression of calcification-related genes in two life stages. As the Emiliania huxleyi has several annotated calcification-related proteins, we have concentrated on analyzing its genes and promoter areas. Many recent studies have focused primarily on transcriptome analysis of E. huxleyi using nutrient-limited conditions to get more information about up-regulated genes involved in biomineralization and calcification processes. Although there are more than 100,000 EST sequences for E. huxleyi available from these projects in public databases, that data is often insufficient to identify the exact position of transcription start site (TSS) to perform precise analysis (nucleotide content, motif search) of core promoters and regulatory mechanisms in immediate flanking areas. ESTs are not ideal for these kinds of analyses because the standard technologies of producing 5' EST libraries do not guarantee that the exact 5' end of the transcript will be captured. To determine the extent and accurate positions of 5' ends of transcripts and therefore the positions of core promoters, Cap analysis of gene expression (CAGE) sequencing method was used for sequencing RNA of E. huxleyi in both stages, calcifying and non-calcifying. As an additional info, gene expression levels of RNA for 21 samples were retrieved with whole transcriptome shotgun sequencing (RNA-Seq). The collections of reads these methods produced were used to map and annotate genes on several samples and measure the RNA expression levels in different conditions. Although there are not much data available for close organisms, it is possible to compare these results with other species to find conserved regulatory mechanisms between genes related to calcification. Visualization tools allowing browsing of annotated genes

  10. Mutations analysis of C1 inhibitor coding sequence gene among Portuguese patients with hereditary angioedema.

    PubMed

    Martinho, A; Mendes, J; Simões, O; Nunes, R; Gomes, J; Dias Castro, E; Leiria-Pinto, P; Ferreira, M B; Pereira, C; Castel-Branco, M G; Pais, L

    2013-04-01

    Mutations that modify the amino acid sequence of C1-INH (except Val458Met) are associated with HAE. More than 200 different mutations scattering the entire C1-INH gene have been reported. The main objective of this study was to report the mutational findings in a HAE cohort of 138 Portuguese patients followed in specialized consultation all over the country. DNA was extracted from peripheral blood with QiaSymphony BioRobot (QIAGEN Portugal). The sequence reactions were performed by using a DNA sequencing kit (Big Dye terminator cycle sequencing v1.1/v3.1 from Applied Biosystems) and sequencing products were immediately submitted to direct sequencing on an Applied Biosystem 3130 DNA Analyser. DNA sequences were analyzed at four different stages. Raw data and sequence alignments of all 8 exons and intron-exon boundaries were performed for each patient individually with SeqScape software and using SERPING1 gene NG_009625 of 24,300 bp (12-March-2011) as reference sequence. Sequence comparisons among patients and controls were performed with software CodonCode Aligner v.3.7 from CodonCode Corp and with Geneious 4.5 from Biomatters Lda. A total of 94 point mutations were observed among patients, and 67% of them were located on exon 8. In addition, we noticed one not described stop codon at position c.1459 C>T in three different patients. Translation termination was also found on exon 3 and 7, as a result of mutations at positions c.481A>7, c.1174C>T. In this population, the prevalence of the missense mutation p.Arg444Cys was 39 out of 42. Mutational analysis revealed 22 different pathogenic mutations, of which 64% were not described on HAE database. Although identification of disease causing mutations is not necessary to establish HAE diagnosis, studies on gene expression and characterization of rearrangements in SERPING1 gene are suggested in order to get new insights on function and genetic tests of C1 inhibitor. PMID:23123409

  11. [Sequencing and analysis of the complete genome sequence of WU polyomavirus in Fuzhou, China].

    PubMed

    Xiu, Wen-qiong; Shen, Xiao-na; Liu, Guang-hua; Xie, Jian-feng; Kang, Yu-lan; Wang, Mei-ai; Zhang, Wen-qing; Weng, Qi-zhu; Yan, Yan-sheng

    2011-03-01

    WU polyomavirus (WUPyV), a new member of the genus Polyomavirus in the family Polyomaviridae, is recently found in patients with respiratory tract infections. In our study, the complete genome of the two WUPyV isolates (FZ18, FZTF) were sequenced and deposited in GenBank (accession nos. FJ890981, FJ890982). The two sequences of the WUPyV isolates in this study varied little from each other. Compared with other complete genome sequences of WUPyV in GenBank (strain B0, S1-S4, CLFF, accession nos. EF444549, EF444550, EF444551, EF444552, EF444553, EU296475 respectively), the sequence length in nucleotides is 5228bp, 1bp shorter than the known sequences. The deleted base pair was at nucleotide position 4536 in the non-coding region of large T antigen (LTAg). The genome of the WUPyV encoded for five proteins. They were three capsid proteins: VP2, VP1, VP3 and LTAg, small T antigen (STAg), respectively. To investigate whether these nucleotide sequences had any unique features, we compared the genome sequence of the 2 WUPyV isolates in Fuzhou, China to those documented in the GenBank database by using PHYLIP software version 3.65 and the neighbor-joining method. The 2 WUPyV strains in our study were clustered together. Strain FZTF was more closed to the reference strain B0 of Australian than strain FZ18. PMID:21528542

  12. Applying machine learning techniques to DNA sequence analysis. Progress report, Year 2, February 14, 1992--December 11, 1992

    SciTech Connect

    Shavlik, J.W.; Noordewier, M.O.

    1992-12-31

    We are primarily developing a machine teaming (ML) system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being teamed. Using this information, our teaming algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, our KBANN algorithm maps inference rules about a given recognition task into a neural network. Neural network training techniques then use the training examples to refine these inference rules. We call these rules a domain theory, following the convention in the machine teaming community. We have been applying this approach to several problems in DNA sequence analysis. In addition, we have been extending the capabilities of our teaming system along several dimensions. We have also been investigating parallel algorithms that perform sequence alignments in the presence of frameshift errors.

  13. Sequencing and analysis of a South Asian-Indian personal genome

    PubMed Central

    2012-01-01

    Background With over 1.3 billion people, India is estimated to contain three times more genetic diversity than does Europe. Next-generation sequencing technologies have facilitated the understanding of diversity by enabling whole genome sequencing at greater speed and lower cost. While genomes from people of European and Asian descent have been sequenced, only recently has a single male genome from the Indian subcontinent been published at sufficient depth and coverage. In this study we have sequenced and analyzed the genome of a South Asian Indian female (SAIF) from the Indian state of Kerala. Results We identified over 3.4 million SNPs in this genome including over 89,873 private variations. Comparison of the SAIF genome with several published personal genomes revealed that this individual shared ~50% of the SNPs with each of these genomes. Analysis of the SAIF mitochondrial genome showed that it was closely related to the U1 haplogroup which has been previously observed in Kerala. We assessed the SAIF genome for SNPs with health and disease consequences and found that the individual was at a higher risk for multiple sclerosis and a few other diseases. In analyzing SNPs that modulate drug response, we found a variation that predicts a favorable response to metformin, a drug used to treat diabetes. SNPs predictive of adverse reaction to warfarin indicated that the SAIF individual is not at risk for bleeding if treated with typical doses of warfarin. In addition, we report the presence of several additional SNPs of medical relevance. Conclusions This is the first study to report the complete whole genome sequence of a female from the state of Kerala in India. The availability of this complete genome and variants will further aid studies aimed at understanding genetic diversity, identifying clinically relevant changes and assessing disease burden in the Indian population. PMID:22938532

  14. Analysis of separate isolates of Bordetella pertussis repeated DNA sequences.

    PubMed

    McPheat, W L; Hanson, J H; Livey, I; Robertson, J S

    1989-06-01

    Two independent isolates of a Bordetella pertussis repeated DNA unit were sequenced and shown to be an insertion sequence element with five nucleotide differences between the two copies. The sequences were 1053 bp in length with near-perfect terminal inverted repeats of 28 bp, had three open reading frames, and were each flanked by short direct repeats. The two insertion sequences showed considerable homology to two other B. pertussis repeated DNA sequences reported recently: IS481 and a 530 bp repeated DNA unit. The B. pertussis insertion sequence would appear to comprise a group of closely related sequences differing mainly in flanking direct repeats and the terminal inverted repeats. The two isolates reported here, which were from the adenylate cyclase and agglutinogen 2 regions of the genome, were numbered IS48lvl and IS48lv2 respectively. PMID:2559151

  15. Data Analysis for Sequencing by Hybridization (SBH) Experiments

    1995-11-28

    SCORES is user friendly software designed to analyze data from SBH (Sequencing By Hybridization) experiments. In these ANL experiments DNA samples are spotted on a nylon membrane and hybridized with radioactivity labeled oligonucleotide probes. An image analysis program (DOTS) calculates a raw value for each DNA dot from images generated by the Molecular Dynamics Phosphorimager. SCORES reads in the DOTS output for each hybridization done for a particular filter. The data for each probe ismore » normalized against a mass probe and scaled properly. These values from 100 or more probes are then used to compute the distance (i.e., degree of similarity) between any two clones on the filter. These calculated distances define clusters of similar clones (cDNA)or contigs (genomic DNA). Histograms of the data at each stage of analysis to establish thresholds for further steps. SCORES generates various statistical tables to evaluate the quality of spotting, hybridization of filters, and of individual dots.« less

  16. DNA sequencing with capillary electrophoresis and single cell analysis with mass spectrometry

    SciTech Connect

    Fung, N.

    1998-03-27

    Since the first demonstration of the laser in the 1960`s, lasers have found numerous applications in analytical chemistry. In this work, two different applications are described, namely, DNA sequencing with capillary gel electrophoresis and single cell analysis with mass spectrometry. Two projects are described in which high-speed DNA separations with capillary gel electrophoresis were demonstrated. In the third project, flow cytometry and mass spectrometry were coupled via a laser vaporization/ionization interface and individual mammalian cells were analyzed. First, DNA Sanger fragments were separated by capillary gel electrophoresis. A separation speed of 20 basepairs per minute was demonstrated with a mixed poly(ethylene oxide) (PEO) sieving solution. In addition, a new capillary wall treatment protocol was developed in which bare (or uncoated) capillaries can be used in DNA sequencing. Second, a temperature programming scheme was used to separate DNA Sanger fragments. Third, flow cytometry and mass spectrometry were coupled with a laser vaporization/ionization interface.

  17. Quantitative analysis of the relationship between nucleotide sequence and functional activity.

    PubMed Central

    Stormo, G D; Schneider, T D; Gold, L

    1986-01-01

    Matrices can be used to evaluate sequences for functional activity. Multiple regression can solve for the matrix that gives the best fit between sequence evaluations and quantitative activities. This analysis shows that the best model for context effects on suppression by su2 involves primarily the two nucleotides 3' to the amber codon, and that their contributions are independent and additive. Context effects on 2AP mutagenesis also involve the two nucleotides 3' to the 2AP insertion, but their effects are not independent. In a construct for producing beta-galactosidase, the effects on translational yields of the tri-nucleotide 5' to the initiation codon are dependent on the entire triplet. Models based on these quantitative results are presented for each of the examples. PMID:3092188

  18. Analysis of DNA structure and sequence requirements for Pseudomonas aeruginosa MutL endonuclease activity.

    PubMed

    Correa, Elisa M E; De Tullio, Luisina; Vélez, Pablo S; Martina, Mariana A; Argaraña, Carlos E; Barra, José L

    2013-12-01

    The hallmark of the mismatch repair system in bacterial and eukaryotic organisms devoid of MutH is the presence of a MutL homologue with endonuclease activity. The aim of this study was to analyse whether different DNA structures affect Pseudomonas aeruginosa MutL (PaMutL) endonuclease activity and to determine if a specific nucleotide sequence is required for this activity. Our results showed that PaMutL was able to nick covalently closed circular plasmids but not linear DNA at high ionic strengths, while the activity on linear DNA was only found below 60 mM salt. In addition, single strand DNA, ss/ds DNA boundaries and negatively supercoiling degree were not required for PaMutL nicking activity. Finally, the analysis of the incision sites revealed that PaMutL, as well as Bacillus thuringiensis MutL homologue, did not show DNA sequence specificity. PMID:23969026

  19. Analysis of new microsatellite markers developed from reported sequences of Japanese flounder Paralichthys olivaceus

    NASA Astrophysics Data System (ADS)

    Yu, Haiyang; Jiang, Liming; Chen, Wei; Wang, Xubo; Wang, Zhigang; Zhang, Quanqi

    2010-12-01

    The expressed sequence tags (ESTs) of Japanese flounder, Paralichthys olivaceus, were selected from GenBank to identify simple sequence repeats (SSRs) or microsatellites. A bioinformatic analysis of 11111 ESTs identified 751 SSR-containing ESTs, including 440 dinucleotide, 254 trinucleotide, 53 tetranucleotide, 95 pentanucleotide and 40 hexanucleotide microsatellites respectively. The CA/TG and GA/TC repeats were the most abundant microsatellites. AT-rich types were predominant among trinucleotide and tetranucleotide microsatellites. PCR primers were designed to amplify 10 identified microsatellites loci. The PCR results from eight pairs of primers showed polymorphisms in wild populations. In 30 wild individuals, the mean observed and expected heterozygosities of these 8 polymorphic SSRs were 0.71 and 0.83 respectively and the average PIC value was 0.8. These microsatellite markers should prove to be a useful addition to the microsatellite markers that are now available for this species.

  20. Analysis of diversity of chromophytic phytoplankton in a mangrove ecosystem using rbcL gene sequencing.

    PubMed

    Samanta, Brajogopal; Bhadury, Punyasloke

    2014-04-01

    Phytoplankton forms the basis of primary production in mangrove environments. The phylogeny and diversity based on the amplification and sequencing of rbcL, the large subunit encoding the key enzyme ribulose-1, 5-bisphosphate carboxylase/oxygenase was investigated for improved understanding of the community structure and temporal trends of chromophytic eukaryotic phytoplankton assemblages in Sundarbans, the world's largest continuous mangrove. Diatoms (Bacillariophyceae) were by far the most frequently detected group in clone libraries (485 out of 525 clones), consistent with their importance as a major bloom-forming group. Other major chromophytic algal groups including Cryptophyceae, Haptophyceae, Pelagophyceae, Eustigmatophyceae, and Raphidophyceae which are important component of the assemblages were detected for the first time from Sundarbans based on rbcL approach. Many of the sequences from Sundarbans rbcL clone libraries showed identity with key bloom forming diatom genera namely Thalassiosira, Skeletonema and Nitzschia. Similarly, several rbcL sequences which were diatom-like were also detected highlighting the need to explore diatom communities from the study area. Some of the rbcL sequences detected from Sundarbans were ubiquitous in distribution showing 100% identities with uncultured rbcL sequences targeted previously from the Gulf of Mexico and California upwelling system that are geographically separated from study area. Novel rbcL lineages were also detected highlighting the need to culture and sequence phytoplankton from the ecoregion. Principal component analysis revealed that nitrate is an important variable that is associated with observed variation in phytoplankton assemblages (operational taxonomic units). This study applied molecular tools to highlight the ecological significance of diatoms, in addition to other chromophytic algal groups in Sundarbans. PMID:26988190

  1. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    PubMed

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities. PMID:27006240

  2. Analysis of simian immunodeficiency virus sequence variation in tissues of rhesus macaques with simian AIDS.

    PubMed Central

    Kodama, T; Mori, K; Kawahara, T; Ringler, D J; Desrosiers, R C

    1993-01-01

    One rhesus macaque displayed severe encephalomyelitis and another displayed severe enterocolitis following infection with molecularly cloned simian immunodeficiency virus (SIV) strain SIVmac239. Little or no free anti-SIV antibody developed in these two macaques, and they died relatively quickly (4 to 6 months) after infection. Manifestation of the tissue-specific disease in these macaques was associated with the emergence of variants with high replicative capacity for macrophages and primary infection of tissue macrophages. The nature of sequence variation in the central region (vif, vpr, and vpx), the env gene, and the nef long terminal repeat (LTR) region in brain, colon, and other tissues was examined to see whether specific genetic changes were associated with SIV replication in brain or gut. Sequence analysis revealed strong conservation of the intergenic central region, nef, and the LTR. However, analysis of env sequences in these two macaques and one other revealed significant, interesting patterns of sequence variation. (i) Changes in env that were found previously to contribute to the replicative ability of SIVmac for macrophages in culture were present in the tissues of these animals. (ii) The greatest variability was located in the regions between V1 and V2 and from "V3" through C3 in gp120, which are different in location from the variable regions observed previously in animals with strong antibody responses and long-term persistent infection. (iii) The predominant sequence change of D-->N at position 385 in C3 is most surprising, since this change in both SIV and human immunodeficiency virus type 1 has been associated with dramatically diminished affinity for CD4 and replication in vitro. (iv) The nature of sequence changes at some positions (146, 178, 345, 385, and "V3") suggests that viral replication in brain and gut may be facilitated by specific sequence changes in env in addition to those that impart a general ability to replicate well in

  3. Analysis of the microbiome: Advantages of whole genome shotgun versus 16S amplicon sequencing.

    PubMed

    Ranjan, Ravi; Rani, Asha; Metwally, Ahmed; McGee, Halvor S; Perkins, David L

    2016-01-22

    The human microbiome has emerged as a major player in regulating human health and disease. Translational studies of the microbiome have the potential to indicate clinical applications such as fecal transplants and probiotics. However, one major issue is accurate identification of microbes constituting the microbiota. Studies of the microbiome have frequently utilized sequencing of the conserved 16S ribosomal RNA (rRNA) gene. We present a comparative study of an alternative approach using whole genome shotgun sequencing (WGS). In the present study, we analyzed the human fecal microbiome compiling a total of 194.1 × 10(6) reads from a single sample using multiple sequencing methods and platforms. Specifically, after establishing the reproducibility of our methods with extensive multiplexing, we compared: 1) The 16S rRNA amplicon versus the WGS method, 2) the Illumina HiSeq versus MiSeq platforms, 3) the analysis of reads versus de novo assembled contigs, and 4) the effect of shorter versus longer reads. Our study demonstrates that whole genome shotgun sequencing has multiple advantages compared with the 16S amplicon method including enhanced detection of bacterial species, increased detection of diversity and increased prediction of genes. In addition, increased length, either due to longer reads or the assembly of contigs, improved the accuracy of species detection. PMID:26718401

  4. Genome Sequencing and Analysis of the Biomass-Degrading Fungus Trichoderma reesei (syn. Hypocrea jecorina)

    SciTech Connect

    Martinez, Antonio D.; Berka, Randy; Henrissat, Bernard; Saloheimo, Markku; Arvas, Mikko; Baker, Scott E.; Chapman, Jaro d; Chertkov, Olga; Coutinho, Pedro M.; Cullen, Dan; Danchin, Etienne G.; Grigoriev, Igor V.; Harris, Paul; Jackson, Melissa ?.; kubicek, Christian P.; Han, Cliff F.; Ho, Isaac; Larrando, Luis F.; Lopez de Leon, Alfredo; Magnuson, Jon K.; Merino, Sandy; Misra, Monica; Nelson, Beth; Putnam, Nicholas; Robbertse, Barbara; Salamov, Asaf; Schmoll, Monika; Terry, Astrid ?.; Thayer, Nina; Westerholm-Parvinen, Ann; Schoch, Conrad L.; Yao, Jian ?.; Barbote, Ravi; Nelson, Mary Anne; Detter, Chris J.; Bruce, David; Kuske, Cheryl; Xie, Gary; Richardson, P. M.; Rokhsar, Daniel S.; Lucas, Susan; Rubin, Eddie M.; Dunn-Coleman, Nigel; Ward, Michael ?.; Brettin, T.

    2008-05-01

    A major thrust of the white biotechnology movement involves the development of enzyme systems which depolymerize biomass to simple sugars which are subsequently converted to sustainable biofuels (e.g., ethanol) and chemical intermediates. The fungus Trichoderma reesei (syn. Hypocrea jecorina) represents a paradigm for the industrial production of highly efficient cellulases and hemicellulases needed for hydrolysis of biomass polysaccharides. Herein we describe intriguing attributes of the T. reeseigenome in relation to the future of fuel biotechnology. The T. reesei genome sequence was derived using a whole genome shotgun approach combined with finishing work to generate an assembly comprising 89 scaffolds totaling 34 Mbp with few gaps. In total, 9,130 gene models were predicted using a combination of ab initio and sequence similarity-based methods and EST data. Considering the industrial utility and effectiveness of its enzymes, the T. reesei genome surprisingly encodes the fewest cellulases and hemicellulases of any fungus having the ability to hydrolyze plant cell wall polysaccharides and whose genome has been sequenced. Many genes encoding carbohydrate active enzymes are distributed non-randomly in groups or clusters that interestingly lie between regions of synteny with other Sordariomycetes. Additionally, the T. reesei genome contains a multitude of genes encoding biosynthetic pathways for secondary metabolites (possible antibacterial and antifungal compounds) which may promote successful competition and survival in the crowded and competitive soil habitat occupied by T. reesei. Our analysis coupled with the availability of genome sequence data provides a roadmap for construction of enhanced T. reesei strains for industrial applications.

  5. Cloning and sequence analysis of cDNA for human cathepsin D.

    PubMed Central

    Faust, P L; Kornfeld, S; Chirgwin, J M

    1985-01-01

    An 1110-base-pair cDNA clone for human cathepsin D was obtained by screening a lambda gt10 human hepatoma G2 cDNA library with a human renin exon 3 genomic fragment. Poly(A)+ RNA blot analysis with this cathepsin D clone demonstrated a message length of about 2.2 kilobases. The partial clone was used to screen a size-selected human kidney cDNA library, from which two cathepsin D recombinant plasmids with inserts of about 2200 and 2150 base pairs were obtained. The nucleotide sequences of these clones and of the lambda gt10 clone were determined. The amino acid sequence predicted from the cDNA sequence shows that human cathepsin D consists of 412 amino acids with 20 and 44 amino acids in a pre- and a prosegment, respectively. The mature protein region shows 87% amino acid identity with porcine cathepsin D but differs in having nine additional amino acids. Two of these are at the COOH terminus; the other seven are positioned between the previously determined junction for the light and heavy chains of porcine cathepsin D. A high degree of sequence homology was observed between human cathepsin D and other aspartyl proteases, suggesting a conservation of three-dimensional structure in this family of proteins. Images PMID:3927292

  6. Sequence-Level Analysis of the Major European Huntington Disease Haplotype

    PubMed Central

    Lee, Jong-Min; Kim, Kyung-Hee; Shin, Aram; Chao, Michael J.; Abu Elneel, Kawther; Gillis, Tammy; Mysore, Jayalakshmi Srinidhi; Kaye, Julia A.; Zahed, Hengameh; Kratter, Ian H.; Daub, Aaron C.; Finkbeiner, Steven; Li, Hong; Roach, Jared C.; Goodman, Nathan; Hood, Leroy; Myers, Richard H.; MacDonald, Marcy E.; Gusella, James F.

    2015-01-01

    Huntington disease (HD) reflects the dominant consequences of a CAG-repeat expansion in HTT. Analysis of common SNP-based haplotypes has revealed that most European HD subjects have distinguishable HTT haplotypes on their normal and disease chromosomes and that ∼50% of the latter share the same major HD haplotype. We reasoned that sequence-level investigation of this founder haplotype could provide significant insights into the history of HD and valuable information for gene-targeting approaches. Consequently, we performed whole-genome sequencing of HD and control subjects from four independent families in whom the major European HD haplotype segregates with the disease. Analysis of the full-sequence-based HTT haplotype indicated that these four families share a common ancestor sufficiently distant to have permitted the accumulation of family-specific variants. Confirmation of new CAG-expansion mutations on this haplotype suggests that unlike most founders of human disease, the common ancestor of HD-affected families with the major haplotype most likely did not have HD. Further, availability of the full sequence data validated the use of SNP imputation to predict the optimal variants for capturing heterozygosity in personalized allele-specific gene-silencing approaches. As few as ten SNPs are capable of revealing heterozygosity in more than 97% of European HD subjects. Extension of allele-specific silencing strategies to the few remaining homozygous individuals is likely to be achievable through additional known SNPs and discovery of private variants by complete sequencing of HTT. These data suggest that the current development of gene-based targeting for HD could be extended to personalized allele-specific approaches in essentially all HD individuals of European ancestry. PMID:26320893

  7. Sequence-Level Analysis of the Major European Huntington Disease Haplotype.

    PubMed

    Lee, Jong-Min; Kim, Kyung-Hee; Shin, Aram; Chao, Michael J; Abu Elneel, Kawther; Gillis, Tammy; Mysore, Jayalakshmi Srinidhi; Kaye, Julia A; Zahed, Hengameh; Kratter, Ian H; Daub, Aaron C; Finkbeiner, Steven; Li, Hong; Roach, Jared C; Goodman, Nathan; Hood, Leroy; Myers, Richard H; MacDonald, Marcy E; Gusella, James F

    2015-09-01

    Huntington disease (HD) reflects the dominant consequences of a CAG-repeat expansion in HTT. Analysis of common SNP-based haplotypes has revealed that most European HD subjects have distinguishable HTT haplotypes on their normal and disease chromosomes and that ∼50% of the latter share the same major HD haplotype. We reasoned that sequence-level investigation of this founder haplotype could provide significant insights into the history of HD and valuable information for gene-targeting approaches. Consequently, we performed whole-genome sequencing of HD and control subjects from four independent families in whom the major European HD haplotype segregates with the disease. Analysis of the full-sequence-based HTT haplotype indicated that these four families share a common ancestor sufficiently distant to have permitted the accumulation of family-specific variants. Confirmation of new CAG-expansion mutations on this haplotype suggests that unlike most founders of human disease, the common ancestor of HD-affected families with the major haplotype most likely did not have HD. Further, availability of the full sequence data validated the use of SNP imputation to predict the optimal variants for capturing heterozygosity in personalized allele-specific gene-silencing approaches. As few as ten SNPs are capable of revealing heterozygosity in more than 97% of European HD subjects. Extension of allele-specific silencing strategies to the few remaining homozygous individuals is likely to be achievable through additional known SNPs and discovery of private variants by complete sequencing of HTT. These data suggest that the current development of gene-based targeting for HD could be extended to personalized allele-specific approaches in essentially all HD individuals of European ancestry. PMID:26320893

  8. Mapping and analysis of Caenorhabditis elegans transcription factor sequence specificities

    PubMed Central

    Narasimhan, Kamesh; Lambert, Samuel A; Yang, Ally WH; Riddell, Jeremy; Mnaimneh, Sanie; Zheng, Hong; Albu, Mihai; Najafabadi, Hamed S; Reece-Hoyes, John S; Fuxman Bass, Juan I; Walhout, Albertha JM; Weirauch, Matthew T; Hughes, Timothy R

    2015-01-01

    Caenorhabditis elegans is a powerful model for studying gene regulation, as it has a compact genome and a wealth of genomic tools. However, identification of regulatory elements has been limited, as DNA-binding motifs are known for only 71 of the estimated 763 sequence-specific transcription factors (TFs). To address this problem, we performed protein binding microarray experiments on representatives of canonical TF families in C. elegans, obtaining motifs for 129 TFs. Additionally, we predict motifs for many TFs that have DNA-binding domains similar to those already characterized, increasing coverage of binding specificities to 292 C. elegans TFs (∼40%). These data highlight the diversification of binding motifs for the nuclear hormone receptor and C2H2 zinc finger families and reveal unexpected diversity of motifs for T-box and DM families. Motif enrichment in promoters of functionally related genes is consistent with known biology and also identifies putative regulatory roles for unstudied TFs. DOI: http://dx.doi.org/10.7554/eLife.06967.001 PMID:25905672

  9. Targeted Sequencing and Meta-Analysis of Preterm Birth

    PubMed Central

    Schuster, Jessica; McGonnigal, Bethany; Dewan, Andrew; Padbury, James

    2016-01-01

    Understanding the genetic contribution(s) to the risk of preterm birth may lead to the development of interventions for treatment, prediction and prevention. Twin studies suggest heritability of preterm birth is 36–40%. Large epidemiological analyses support a primary maternal origin for recurrence of preterm birth, with little effect of paternal or fetal genetic factors. We exploited an “extreme phenotype” of preterm birth to leverage the likelihood of genetic discovery. We compared variants identified by targeted sequencing of women with 2–3 generations of preterm birth with term controls without history of preterm birth. We used a meta-genomic, bi-clustering algorithm to identify gene sets coordinately associated with preterm birth. We identified 33 genes including 217 variants from 5 modules that were significantly different between cases and controls. The most frequently identified and connected genes in the exome library were IGF1, ATM and IQGAP2. Likewise, SOS1, RAF1 and AKT3 were most frequent in the haplotype library. Additionally, SERPINB8, AZU1 and WASF3 showed significant differences in abundance of variants in the univariate comparison of cases and controls. The biological processes impacted by these gene sets included: cell motility, migration and locomotion; response to glucocorticoid stimulus; signal transduction; metabolic regulation and control of apoptosis. PMID:27163930

  10. HPV-QUEST: A highly customized system for automated HPV sequence analysis capable of processing Next Generation sequencing data set.

    PubMed

    Yin, Li; Yao, Jiqiang; Gardner, Brent P; Chang, Kaifen; Yu, Fahong; Goodenow, Maureen M

    2012-01-01

    Next Generation sequencing (NGS) applied to human papilloma viruses (HPV) can provide sensitive methods to investigate the molecular epidemiology of multiple type HPV infection. Currently a genotyping system with a comprehensive collection of updated HPV reference sequences and a capacity to handle NGS data sets is lacking. HPV-QUEST was developed as an automated and rapid HPV genotyping system. The web-based HPV-QUEST subtyping algorithm was developed using HTML, PHP, Perl scripting language, and MYSQL as the database backend. HPV-QUEST includes a database of annotated HPV reference sequences with updated nomenclature covering 5 genuses, 14 species and 150 mucosal and cutaneous types to genotype blasted query sequences. HPV-QUEST processes up to 10 megabases of sequences within 1 to 2 minutes. Results are reported in html, text and excel formats and display e-value, blast score, and local and coverage identities; provide genus, species, type, infection site and risk for the best matched reference HPV sequence; and produce results ready for additional analyses. PMID:22570520

  11. Regio- and Stereoselective 1,2-Dihydropyridine Alkylation/Addition Sequence for the Synthesis of Piperidines with Quaternary Centers**

    PubMed Central

    Duttwyler, Simon; Chen, Shuming; Lu, Colin; Mercado, Brandon Q.; Bergman, Robert G.; Ellman, Jonathan A.

    2014-01-01

    The first example of C-alkylation of 1,2-dihydropyridines with alkyl triflates and Michael acceptors was developed to introduce quaternary carbon centers with high regio- and diastereoselectivity. Hydride or carbon nucleophile addition to the resultant iminium ion also proceeded with high diastereoselectivity. Carbon nucleophile addition results in an unprecedented level of substitution to provide piperidine rings with adjacent tetrasubstituted carbons. PMID:24604837

  12. The DNA sequence and analysis of human chromosome 14.

    PubMed

    Heilig, Roland; Eckenberg, Ralph; Petit, Jean-Louis; Fonknechten, Núria; Da Silva, Corinne; Cattolico, Laurence; Levy, Michaël; Barbe, Valérie; de Berardinis, Véronique; Ureta-Vidal, Abel; Pelletier, Eric; Vico, Virginie; Anthouard, Véronique; Rowen, Lee; Madan, Anup; Qin, Shizhen; Sun, Hui; Du, Hui; Pepin, Kymberlie; Artiguenave, François; Robert, Catherine; Cruaud, Corinne; Brüls, Thomas; Jaillon, Olivier; Friedlander, Lucie; Samson, Gaelle; Brottier, Philippe; Cure, Susan; Ségurens, Béatrice; Anière, Franck; Samain, Sylvie; Crespeau, Hervé; Abbasi, Nissa; Aiach, Nathalie; Boscus, Didier; Dickhoff, Rachel; Dors, Monica; Dubois, Ivan; Friedman, Cynthia; Gouyvenoux, Michel; James, Rose; Madan, Anuradha; Mairey-Estrada, Barbara; Mangenot, Sophie; Martins, Nathalie; Ménard, Manuela; Oztas, Sophie; Ratcliffe, Amber; Shaffer, Tristan; Trask, Barbara; Vacherie, Benoit; Bellemere, Chadia; Belser, Caroline; Besnard-Gonnet, Marielle; Bartol-Mavel, Delphine; Boutard, Magali; Briez-Silla, Stéphanie; Combette, Stephane; Dufossé-Laurent, Virginie; Ferron, Carolyne; Lechaplais, Christophe; Louesse, Claudine; Muselet, Delphine; Magdelenat, Ghislaine; Pateau, Emilie; Petit, Emmanuelle; Sirvain-Trukniewicz, Peggy; Trybou, Arnaud; Vega-Czarny, Nathalie; Bataille, Elodie; Bluet, Elodie; Bordelais, Isabelle; Dubois, Maria; Dumont, Corinne; Guérin, Thomas; Haffray, Sébastien; Hammadi, Rachid; Muanga, Jacqueline; Pellouin, Virginie; Robert, Dominique; Wunderle, Edith; Gauguet, Gilbert; Roy, Alice; Sainte-Marthe, Laurent; Verdier, Jean; Verdier-Discala, Claude; Hillier, LaDeana; Fulton, Lucinda; McPherson, John; Matsuda, Fumihiko; Wilson, Richard; Scarpelli, Claude; Gyapay, Gábor; Wincker, Patrick; Saurin, William; Quétier, Francis; Waterston, Robert; Hood, Leroy; Weissenbach, Jean

    2003-02-01

    Chromosome 14 is one of five acrocentric chromosomes in the human genome. These chromosomes are characterized by a heterochromatic short arm that contains essentially ribosomal RNA genes, and a euchromatic long arm in which most, if not all, of the protein-coding genes are located. The finished sequence of human chromosome 14 comprises 87,410,661 base pairs, representing 100% of its euchromatic portion, in a single continuous segment covering the entire long arm with no gaps. Two loci of crucial importance for the immune system, as well as more than 60 disease genes, have been localized so far on chromosome 14. We identified 1,050 genes and gene fragments, and 393 pseudogenes. On the basis of comparisons with other vertebrate genomes, we estimate that more than 96% of the chromosome 14 genes have been annotated. From an analysis of the CpG island occurrences, we estimate that 70% of these annotated genes are complete at their 5' end. PMID:12508121

  13. Predictive sequence analysis of the Candidatus Liberibacter asiaticus proteome.

    PubMed

    Cong, Qian; Kinch, Lisa N; Kim, Bong-Hyun; Grishin, Nick V

    2012-01-01

    Candidatus Liberibacter asiaticus (Ca. L. asiaticus) is a parasitic gram-negative bacterium that is closely associated with Huanglongbing (HLB), a worldwide citrus disease. Given the difficulty in culturing the bacterium and thus in its experimental characterization, computational analyses of the whole Ca. L. asiaticus proteome can provide much needed insights into the mechanisms of the disease and guide the development of treatment strategies. In this study, we applied state-of-the-art sequence analysis tools to every Ca. L. asiaticus protein. Our results are available as a public website at http://prodata.swmed.edu/liberibacter_asiaticus/. In particular, we manually curated the results to predict the subcellular localization, spatial structure and function of all Ca. L. asiaticus proteins (http://prodata.swmed.edu/liberibacter_asiaticus/curated/). This extensive information should facilitate the study of Ca. L. asiaticus proteome function and its relationship to disease. Pilot studies based on the information from our website have revealed several potential virulence factors, discussed herein. PMID:22815919

  14. Predictive Sequence Analysis of the Candidatus Liberibacter asiaticus Proteome

    PubMed Central

    Cong, Qian; Kinch, Lisa N.; Kim, Bong-Hyun; Grishin, Nick V.

    2012-01-01

    Candidatus Liberibacter asiaticus (Ca. L. asiaticus) is a parasitic Gram-negative bacterium that is closely associated with Huanglongbing (HLB), a worldwide citrus disease. Given the difficulty in culturing the bacterium and thus in its experimental characterization, computational analyses of the whole Ca. L. asiaticus proteome can provide much needed insights into the mechanisms of the disease and guide the development of treatment strategies. In this study, we applied state-of-the-art sequence analysis tools to every Ca. L. asiaticus protein. Our results are available as a public website at http://prodata.swmed.edu/liberibacter_asiaticus/. In particular, we manually curated the results to predict the subcellular localization, spatial structure and function of all Ca. L. asiaticus proteins (http://prodata.swmed.edu/liberibacter_asiaticus/curated/). This extensive information should facilitate the study of Ca. L. asiaticus proteome function and its relationship to disease. Pilot studies based on the information from our website have revealed several potential virulence factors, discussed herein. PMID:22815919

  15. Advanced accident sequence precursor analysis level 1 models

    SciTech Connect

    Sattison, M.B.; Thatcher, T.A.; Knudsen, J.K.; Schroeder, J.A.; Siu, N.O.

    1996-03-01

    INEL has been involved in the development of plant-specific Accident Sequence Precursor (ASP) models for the past two years. These models were developed for use with the SAPHIRE suite of PRA computer codes. They contained event tree/linked fault tree Level 1 risk models for the following initiating events: general transient, loss-of-offsite-power, steam generator tube rupture, small loss-of-coolant-accident, and anticipated transient without scram. Early in 1995 the ASP models were revised based on review comments from the NRC and an independent peer review. These models were released as Revision 1. The Office of Nuclear Regulatory Research has sponsored several projects at the INEL this fiscal year to further enhance the capabilities of the ASP models. Revision 2 models incorporates more detailed plant information into the models concerning plant response to station blackout conditions, information on battery life, and other unique features gleaned from an Office of Nuclear Reactor Regulation quick review of the Individual Plant Examination submittals. These models are currently being delivered to the NRC as they are completed. A related project is a feasibility study and model development of low power/shutdown (LP/SD) and external event extensions to the ASP models. This project will establish criteria for selection of LP/SD and external initiator operational events for analysis within the ASP program. Prototype models for each pertinent initiating event (loss of shutdown cooling, loss of inventory control, fire, flood, seismic, etc.) will be developed. A third project concerns development of enhancements to SAPHIRE. In relation to the ASP program, a new SAPHIRE module, GEM, was developed as a specific user interface for performing ASP evaluations. This module greatly simplifies the analysis process for determining the conditional core damage probability for a given combination of initiating events and equipment failures or degradations.

  16. Characterization and phylogenetic relationships among microsporidia infecting silkworm, Bombyx mori, using inter simple sequence repeat (ISSR) and small subunit rRNA (SSU-rRNA) sequence analysis.

    PubMed

    Rao, S Nageswara; Nath, B Surendra; Saratchandra, B

    2005-06-01

    This study is the first report on the genetic characterization and relationships among different microsporidia infecting the silkworm, Bombyx mori, using inter simple sequence repeat PCR (ISSR-PCR) analysis. Six different microsporidians were distinguished through molecular DNA typing using ISSR-PCR. Thus, ISSR-PCR analysis can be a powerful tool to detect polymorphisms and identify microsporidians, which are difficult to study with microscopy because of their extremely small size. Of the 100 ISSR primers tested, only 28 primers had reproducibility and high polymorphism (93%). A total of 24 ISSR primers produced 55 unique genetic markers, which could be used to differentiate the microsporidians from each other. Among the 28 SSRs tested, the most abundant were (CA)n, (GA)n, and (GT)n repeats. The degree of band sharing was used to evaluate genetic similarity between different microsporidian isolates and to construct a phylogenetic tree using Jaccard's similarity coefficient. The results indicate that the DNA profiles based on ISSR markers can be used as diagnostic tools to identify different microsporidia with considerable accuracy. In addition, the small subunit ribosomal RNA (SSU-rRNA) sequence gene was amplified, cloned, and sequenced from each of the 6 microsporidian isolates. These sequences were compared with 20 other microsporidian SSU-rRNA sequences to develop a phylogenetic tree for the microsporidia isolated from the silkworms. This method was found to be useful in establishing the phylogenetic relationships among the different microsporidians isolated from silkworms. Of the 6 microsporidian isolates, NIK-1s revealed an SSU-rRNA gene sequence similar to Nosema bombycis, indicating that NIK-1s is similar to N. bombycis; the remaining 5 isolates, which differed from each other and from N. bombycis, were considered to be different variants belonging to the species N. bombycis. PMID:16121233

  17. GENOMIC SEQUENCE ANALYSIS OF LEPTOSPIRA BORGPETERSENII SEROVAR HARDJO

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A genomic library from Leptospira borgpetersenii serovar hardjo strain JB197 was prepared by mechanically shearing the DNA and inserting it into a positive selection vector. DNA was prepared from approximately 22,000 random clones and used as templates for automated sequencing. Sequence data was c...

  18. PSSARD: protein sequence-structure analysis relational database.

    PubMed

    Guruprasad, Kunchur; Srikanth, K; Babu, A V N

    2005-09-15

    We have implemented a relational database comprising a representative dataset of amino acid sequences and their associated secondary structure. The representative amino acid sequences were selected according to the PDB_SELECT program by choosing proteins corresponding to protein crystal structure data deposited in the protein data bank that share less than 25% overall pair-wise sequence identity. The secondary structure was extracted from the protein data bank website. The information content in the database includes the protein description, PDB code, crystal structure resolution, total number of amino acid residues in the protein chain, amino acid sequence, secondary structure conformation and its summary. The database is freely accessible from the website mentioned below and is useful to query on any of the above fields. The database is particularly useful to quickly retrieve amino acid sequences that are compatible to any super-secondary structure conformation from several proteins simultaneously. PMID:16054209

  19. Integration of Seismic Sequence Analysis and High Resolution Sequence Stratigraphy for Delineating the Sedimentation Characteristics and Modeling of Baltim Area, Off-Shore Nile Delta, Egypt

    NASA Astrophysics Data System (ADS)

    Nasr El-Deen Badawy, A. M. E. S.; Abu El-Ata, A. S. A.; El-Gendy, N. H.

    2014-12-01

    The current study is aiming to discuss the Messinian Prospectivity of the concerned area, which is located in the offshore Nile Delta, about 25 Km from the Mediterranean Sea shoreline. An integrated exploration approach applied, using a variety of the 2D/3D seismic data, subsurface borehole geologic and log data of the selected wells distributed in the study area, as well as the geophysical and biostratigraphic data. The well data comprise well markers, and electric logs, where the geological data represented by litho-stratigraphic information, as well as ditch samples analysis of the studied interval. The geophysical data include check shots, VSP, velocity cubes and 3D seismic lines. Biostratigraphic data include biozones, benthonic to planktonic ratios, nannofossils and foraminiferal data. Seismic interpretation and seismic stratigraphic analysis, in the form of seismic sequence analysis, seismic facies analysis, seismic unit analysis and geologic confirmation have been done by the aid of Petrel and Kingdom computer softwares. The seismic lines were interpreted for defining the different parasequences and picking the various smaller sequences for mapping, after picking each sequence from the seismic correlation, it is facilitated the mapping of every sequence laterally. In addition, the interpretation of structures and isopach of every sequence has been carried out, and the seismic attributes for every sequence were possible, to extract the sands present in each sequence, and to study the extensions of these sands that act as a reservoir. The integration of all results was taken as a base to produce the various models for the study area. The first one was the depositional environmental model, which showed that, the area varies from intertidal-littoral southward at Nidoco wells to inner-middle neritic at Baltim East wells then to outer neritic, and changes to bathyal and then to abyssal at the extreme north. The geologic model for the area was constructed

  20. Differentiation of sheep pox and goat poxviruses by sequence analysis and PCR-RFLP of P32 gene.

    PubMed

    Hosamani, Madhusudan; Mondal, Bimalendu; Tembhurne, Prabhakar A; Bandyopadhyay, Santanu Kumar; Singh, Raj Kumar; Rasool, Thaha Jamal

    2004-08-01

    Sheep pox and Goat pox are highly contagious viral diseases of small ruminants. These diseases were earlier thought to be caused by a single species of virus, as they are serologically indistinguishable. P32, one of the major immunogenic genes of Capripoxvirus, was isolated and Sequenced from two Indian isolates of goat poxvirus (GPV) and a vaccine strain of sheep poxvirus (SPV). The sequences were compared with other P32 sequences of capripoxviruses available in the database. Sequence analysis revealed that sheep pox and goat poxviruses share 97.5 and 94.7% homology at nucleotide and amino acid level, respectively. A major difference between them is the presence of an additional aspartic acid at 55th position of P32 of sheep poxvirus that is absent in both goat poxvirus and lumpy skin disease virus. Further, six unique neutral nucleotide substitutions were observed at positions 77, 275, 403, 552, 867 and 964 in the sequence of goat poxvirus, which can be taken as GPV signature residues. Similar unique nucleotide signatures could be identified in SPV and LSDV sequences also. Phylogenetic analysis showed that members of the Capripoxvirus could be delineated into three distinct clusters of GPV, SPV and LSDV based on the P32 genomic sequence. Using this information, a PCR-RFLP method has been developed for unequivocal genomic differentiation of SPV and GPV. PMID:15215685

  1. BI-29VARIANT ANALYSIS OF PRIMARY AND RECURRENT GLIOBLASTOMA USING ION AMPLISEQTM COMPREHENSIVE CANCER PANEL AND WHOLE EXOME SEQUENCING

    PubMed Central

    Virk, Selene; Gibson, Richard; Barnholtz-Sloan, Jill; Quinones-Mateu, Miguel

    2014-01-01

    BACKGROUND: Glioblastoma is the most deadly and frequently occurring adult primary brain tumor. The characterization of genetic variants and molecular signatures in glioblastoma is heavily reliant upon genomic sequencing. The availability of rapid and economical sequencing platforms is necessary for the widespread adoption of high-throughput sequencing in the clinical environment. METHODS: Utilizing patient matched triplet samples consisting of normal blood and snap-frozen primary and recurrent glioblastoma tumor samples from the Ohio Brain Tumor Study, we compared whole exome sequencing data from TCGA to sequencing data obtained from Ion AmpliSeqTM Comprehensive Cancer Panel (CCP). RESULTS: As we anticipated, the number of variants identified from the exome sequencing data (n = 619) was greater than those identified from the Ion AmpliSeqTM CCP data (n = 22). Surprisingly, there were only six variants common across both data sets. In addition, none of the variants from the Ion AmpliSeqTM CCP data were shared across patient samples. CONCLUSIONS: Our pilot results suggest disparities in both the number and category of mutations identified from analysis of data generated from the Ion AmpliSeqTM CCP and whole exome sequencing. Future studies are needed to elucidate the nature of these differences and to determine the clinical relevance of variants that may be associated with glioblastoma recurrence and response to treatment. High-throughput sequencing based cancer panels may be improved by the development of brain tumor specific panels.

  2. Massively parallel sequencing of short tandem repeats-Population data and mixture analysis results for the PowerSeq™ system.

    PubMed

    van der Gaag, Kristiaan J; de Leeuw, Rick H; Hoogenboom, Jerry; Patel, Jaynish; Storts, Douglas R; Laros, Jeroen F J; de Knijff, Peter

    2016-09-01

    Current forensic DNA analysis predominantly involves identification of human donors by analysis of short tandem repeats (STRs) using Capillary Electrophoresis (CE). Recent developments in Massively Parallel Sequencing (MPS) technologies offer new possibilities in analysis of STRs since they might overcome some of the limitations of CE analysis. In this study 17 STRs and Amelogenin were sequenced in high coverage using a prototype version of the Promega PowerSeq™ system for 297 population samples from the Netherlands, Nepal, Bhutan and Central African Pygmies. In addition, 45 two-person mixtures with different minor contributions down to 1% were analysed to investigate the performance of this system for mixed samples. Regarding fragment length, complete concordance between the MPS and CE-based data was found, marking the reliability of MPS PowerSeq™ system. As expected, MPS presented a broader allele range and higher power of discrimination and exclusion rate. The high coverage sequencing data were used to determine stutter characteristics for all loci and stutter ratios were compared to CE data. The separation of alleles with the same length but exhibiting different stutter ratios lowers the overall variation in stutter ratio and helps in differentiation of stutters from genuine alleles in mixed samples. All alleles of the minor contributors were detected in the sequence reads even for the 1% contributions, but analysis of mixtures below 5% without prior information of the mixture ratio is complicated by PCR and sequencing artefacts. PMID:27347657

  3. Loss of DHR sequences at Browns Ferry Unit One - accident-sequence analysis

    SciTech Connect

    Cook, D.H.; Grene, S.R.; Harrington, R.M.; Hodge, S.A.

    1983-05-01

    This study describes the predicted response of Unit One at the Browns Ferry Nuclear Plant to a postulated loss of decay heat removal (DHR) capability following scram from full power with the power conversion system unavailable. In accident sequences without DHR capability, the residual heat removal (RHR) system functions of pressure suppression pool cooling and reactor vessel shutdown cooling are unavailable. Consequently, all decay heat energy is stored in the pressure suppression pool with a concomitant increase in pool temperature and primary containment pressure. With the assumption that DHR capability is not regained during the lengthy course of this accident sequence, the containment ultimately fails by overpressurization. Although unlikely, this catastrophic failure might lead to loss of the ability to inject cooling water into the reactor vessel, causing subsequent core uncovery and meltdown. The timing of these events and the effective mitigating actions that might be taken by the operator are discussed in this report.

  4. Sequence analysis of the complete mitochondrial genome of Youxian sheldrake.

    PubMed

    He, Shao-Ping; Liu, Li-Li; Yu, Qi-Fang; Li, Si; He, Jian-Hua

    2016-01-01

    Youxian sheldrake is excellent native breeds in Hunan province in China. The complete mitochondrial (mt) genome sequence plays an important role in the accurate determination of phylogenetic relationships among metazoans. This is the first study to determine the complete mitochondrial genome sequence of Youxian sheldrake using PCR-based amplification and Sanger sequencing. The characteristic of the entire mitochondrial genome was analyzed in detail, the total length of the mitogenome is 16,605 bp, with the base composition of 29.21% A, 22.18% T, 32.84% C, 15.77% G in the Youxian sheldrake. It contained 2 ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and a major non-coding control region (D-loop region). The complete mitochondrial genome sequence of Youxian sheldrake provided an important data for further study of the phylogenetics of poultry, and available data for the genetics and breeding. PMID:25090395

  5. Cloning and sequence analysis of banana streak virus DNA.

    PubMed

    Harper, G; Hull, R

    1998-01-01

    Banana streak virus (BSV), a member of the Badnavirus group of plant viruses, causes severe problems in banana cultivation, reducing fruit yield and restricting plant breeding and the movement of germplasm. Current detection methods are relatively insensitive. In order to develop a PCR-based diagnostic method that is both reliable and sensitive, the genome of a Nigerian isolate of BSV has been sequenced and shown to comprise 7389 bp and to be organized in a manner characteristic of badnaviruses. Comparison of this sequence with those of other badnaviruses showed that BSV is a distinct virus. PCR with primers based on sequence data indicated that BSV sequences are present in the banana genome. PMID:9926402

  6. Genomics Analysis of Replicative Helicase DnaB Sequences in Proteobacteria

    PubMed Central

    Poggi, Silvana; Chandra, Sathees B.

    2014-01-01

    Replicative Helicase DnaB interacts with DnaA, DnaC, DnaG, and DNA polymerase III to commence replication, increase the movement rate of the replication fork, and to assemble part of the primosome. The formation of the replication fork is limited by the ability to load DnaB to the DNA, thus DnaB has shown to be vital to a large extent. In the absence of DnaB, the replication fork is not maintained and in a state of inactivity the replication fork degrades and collapses. To further understand importance of this enzyme from an evolutionary perspective, a genomic analysis DnaB protein sequences, chosen from five Proteobacteria subclasses was performed. Our analysis indicates that, DnaB replicative helicases of Alphaproteobacteria and Epsilonproteobacteria have diverged at an earlier stage from Betaproteobacteria, Deltaproteobacteria and Gammaproteobacteria as well as from one another. Our results were further supported, when we reanalyzed and reconstructed the phylogenetic tree after the inclusion of sequences from Actinobacteria and Firmicute phylum. In addition, Betaproteobacteria, Deltaproteobacteria, and Gammaproteobacteria appear to share a closer common ancestor than from the other two subclasses. The Dot-plot analysis indicated that, the region between amino acid residues 320 to 400 was strongly conserved among all five subclasses. PMID:25395727

  7. EST sequencing of Onychophora and phylogenomic analysis of Metazoa.

    PubMed

    Roeding, Falko; Hagner-Holler, Silke; Ruhberg, Hilke; Ebersberger, Ingo; von Haeseler, Arndt; Kube, Michael; Reinhardt, Richard; Burmester, Thorsten

    2007-12-01

    Onychophora (velvet worms) represent a small animal taxon considered to be related to Euarthropoda. We have obtained 1873 5' cDNA sequences (expressed sequence tags, ESTs) from the velvet worm Epiperipatus sp., which were assembled into 833 contigs. BLAST similarity searches revealed that 51.9% of the contigs had matches in the protein databases with expectation values lower than 10(-4). Most ESTs had the best hit with proteins from either Chordata or Arthropoda (approximately 40% respectively). The ESTs included sequences of 27 ribosomal proteins. The orthologous sequences from 28 other species of a broad range of phyla were obtained from the databases, including other EST projects. A concatenated amino acid alignment comprising 5021 positions was constructed, which covers 4259 positions when problematic regions were removed. Bayesian and maximum likelihood methods place Epiperipatus within the monophyletic Ecdysozoa (Onychophora, Arthropoda, Tardigrada and Nematoda), but its exact relation to the Euarthropoda remained unresolved. The "Articulata" concept was not supported. Tardigrada and Nematoda formed a well-supported monophylum, suggesting that Tardigrada are actually Cycloneuralia. In agreement with previous studies, we have demonstrated that random sequencing of cDNAs results in sequence information suitable for phylogenomic approaches to resolve metazoan relationships. PMID:17933557

  8. Analysis methods for the determination of anthropogenic additions of P to agricultural soils

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Phosphorus additions and measurement in soil is of concern on lands where biosolids have been applied. Colorimetric analysis for plant-available P may be inadequate for the accurate assessment of soil P. Phosphate additions in a regulatory environment need to be accurately assessed as the reported...

  9. Analysis of expressed sequence tags (ESTs) from a normalized cDNA library and isolation of EST simple sequence repeats from the invasive cotton mealybug Phenacoccus solenopsis.

    PubMed

    Li, Hui; Lang, Kun-Ling; Fu, Hai-Bin; Shen, Chang-Peng; Wan, Fang-Hao; Chu, Dong

    2015-12-01

    The cotton mealybug, Phenacoccus solenopsis Tinsley, is a serious and invasive pest. At present, genetic resources for studying P. solenopsis are limited, and this negatively affects genetic research on the organism and, consequently, translational work to improve management of this pest. In the present study, expressed sequence tags (ESTs) were analyzed from a normalized complementary DNA library of P. solenopsis. In addition, EST-derived microsatellite loci (also known as simple sequence repeats or SSRs) were isolated and characterized. A total of 1107 high-quality ESTs were acquired from the library. Clustering and assembly analysis resulted in 785 unigenes, which were classified functionally into 23 categories according to the Gene Ontology database. Seven EST-based SSR markers were developed in this study and are expected to be useful in characterizing how this invasive species was introduced, as well as providing insights into its genetic microevolution. PMID:25380551

  10. Survey and analysis of simple sequence repeats (SSRs) in three genomes of Candida species.

    PubMed

    Jia, Dongmei

    2016-06-15

    Simple sequence repeats (SSRs) or microsatellites, which composed of tandem repeated short units of 1-6bp, have been paying attention continuously. Here, the distribution, composition and polymorphism of microsatellites and compound microsatellites were analyzed in three available genomes of Candida species (Candida dubliniensis, Candida glabrata and Candida orthopsilosis). The results show that there were 118,047, 66,259 and 61,119 microsatellites in genomes of C. dubliniensis, C. glabrata and C. orthopsilosis, respectively. The SSRs covered more than 1/3 length of genomes in the three species. The microsatellites, which just consist of bases A and (or) T, such as (A)n, (T)n, (AT)n, (TA)n, (AAT)n, (TAA)n, (TTA)n, (ATA)n, (ATT)n and (TAT)n, were predominant in the three genomes. The length of microsatellites was focused on 6bp and 9bp either in the three genomes or in its coding sequences. What's more, the relative abundance (19.89/kbp) and relative density (167.87bp/kbp) of SSRs in sequence of mitochondrion of C. glabrata were significantly great than that in any one of genomes or chromosomes of the three species. In addition, the distance between any two adjacent microsatellites was an important factor to influence the formation of compound microsatellites. The analysis may be helpful for further studying the roles of microsatellites in genomes' origination, organization and evolution of Candida species. PMID:26883055

  11. Targeted Next‐Generation Sequencing Analysis of 1,000 Individuals with Intellectual Disability

    PubMed Central

    Grozeva, Detelina; Carss, Keren; Spasic‐Boskovic, Olivera; Tejada, Maria‐Isabel; Gecz, Jozef; Shaw, Marie; Corbett, Mark; Haan, Eric; Thompson, Elizabeth; Friend, Kathryn; Hussain, Zaamin; Hackett, Anna; Field, Michael; Renieri, Alessandra; Stevenson, Roger; Schwartz, Charles; Floyd, James A.B.; Bentham, Jamie; Cosgrove, Catherine; Keavney, Bernard; Bhattacharya, Shoumo; Hurles, Matthew

    2015-01-01

    ABSTRACT To identify genetic causes of intellectual disability (ID), we screened a cohort of 986 individuals with moderate to severe ID for variants in 565 known or candidate ID‐associated genes using targeted next‐generation sequencing. Likely pathogenic rare variants were found in ∼11% of the cases (113 variants in 107/986 individuals: ∼8% of the individuals had a likely pathogenic loss‐of‐function [LoF] variant, whereas ∼3% had a known pathogenic missense variant). Variants in SETD5, ATRX, CUL4B, MECP2, and ARID1B were the most common causes of ID. This study assessed the value of sequencing a cohort of probands to provide a molecular diagnosis of ID, without the availability of DNA from both parents for de novo sequence analysis. This modeling is clinically relevant as 28% of all UK families with dependent children are single parent households. In conclusion, to diagnose patients with ID in the absence of parental DNA, we recommend investigation of all LoF variants in known genes that cause ID and assessment of a limited list of proven pathogenic missense variants in these genes. This will provide 11% additional diagnostic yield beyond the 10%–15% yield from array CGH alone. PMID:26350204

  12. Targeted Next-Generation Sequencing Analysis of 1,000 Individuals with Intellectual Disability.

    PubMed

    Grozeva, Detelina; Carss, Keren; Spasic-Boskovic, Olivera; Tejada, Maria-Isabel; Gecz, Jozef; Shaw, Marie; Corbett, Mark; Haan, Eric; Thompson, Elizabeth; Friend, Kathryn; Hussain, Zaamin; Hackett, Anna; Field, Michael; Renieri, Alessandra; Stevenson, Roger; Schwartz, Charles; Floyd, James A B; Bentham, Jamie; Cosgrove, Catherine; Keavney, Bernard; Bhattacharya, Shoumo; Hurles, Matthew; Raymond, F Lucy

    2015-12-01

    To identify genetic causes of intellectual disability (ID), we screened a cohort of 986 individuals with moderate to severe ID for variants in 565 known or candidate ID-associated genes using targeted next-generation sequencing. Likely pathogenic rare variants were found in ∼11% of the cases (113 variants in 107/986 individuals: ∼8% of the individuals had a likely pathogenic loss-of-function [LoF] variant, whereas ∼3% had a known pathogenic missense variant). Variants in SETD5, ATRX, CUL4B, MECP2, and ARID1B were the most common causes of ID. This study assessed the value of sequencing a cohort of probands to provide a molecular diagnosis of ID, without the availability of DNA from both parents for de novo sequence analysis. This modeling is clinically relevant as 28% of all UK families with dependent children are single parent households. In conclusion, to diagnose patients with ID in the absence of parental DNA, we recommend investigation of all LoF variants in known genes that cause ID and assessment of a limited list of proven pathogenic missense variants in these genes. This will provide 11% additional diagnostic yield beyond the 10%-15% yield from array CGH alone. PMID:26350204

  13. Analysis of transposable elements in the genome of Asparagus officinalis from high coverage sequence data.

    PubMed

    Li, Shu-Fen; Gao, Wu-Jun; Zhao, Xin-Peng; Dong, Tian-Yu; Deng, Chuan-Liang; Lu, Long-Dou

    2014-01-01

    Asparagus officinalis is an economically and nutritionally important vegetable crop that is widely cultivated and is used as a model dioecious species to study plant sex determination and sex chromosome evolution. To improve our understanding of its genome composition, especially with respect to transposable elements (TEs), which make up the majority of the genome, we performed Illumina HiSeq2000 sequencing of both male and female asparagus genomes followed by bioinformatics analysis. We generated 17 Gb of sequence (12×coverage) and assembled them into 163,406 scaffolds with a total cumulated length of 400 Mbp, which represent about 30% of asparagus genome. Overall, TEs masked about 53% of the A. officinalis assembly. Majority of the identified TEs belonged to LTR retrotransposons, which constitute about 28% of genomic DNA, with Ty1/copia elements being more diverse and accumulated to higher copy numbers than Ty3/gypsy. Compared with LTR retrotransposons, non-LTR retrotransposons and DNA transposons were relatively rare. In addition, comparison of the abundance of the TE groups between male and female genomes showed that the overall TE composition was highly similar, with only slight differences in the abundance of several TE groups, which is consistent with the relatively recent origin of asparagus sex chromosomes. This study greatly improves our knowledge of the repetitive sequence construction of asparagus, which facilitates the identification of TEs responsible for the early evolution of plant sex chromosomes and is helpful for further studies on this dioecious plant. PMID:24810432

  14. Tissue-specific transcriptome sequencing analysis expands the non-human primate reference transcriptome resource (NHPRTR)

    PubMed Central

    Peng, Xinxia; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Nishida, Andrew; Pipes, Lenore; Bozinoski, Marjan; Thomas, Matthew J.; Kelly, Sara; Weiss, Jeffrey M.; Raveendran, Muthuswamy; Muzny, Donna; Gibbs, Richard A.; Rogers, Jeffrey; Schroth, Gary P.; Katze, Michael G.; Mason, Christopher E.

    2015-01-01

    The non-human primate reference transcriptome resource (NHPRTR, available online at http://nhprtr.org/) aims to generate comprehensive RNA-seq data from a wide variety of non-human primates (NHPs), from lemurs to hominids. In the 2012 Phase I of the NHPRTR project, 19 billion fragments or 3.8 terabases of transcriptome sequences were collected from pools of ∼20 tissues in 15 species and subspecies. Here we describe a major expansion of NHPRTR by adding 10.1 billion fragments of tissue-specific RNA-seq data. For this effort, we selected 11 of the original 15 NHP species and subspecies and constructed total RNA libraries for the same ∼15 tissues in each. The sequence quality is such that 88% of the reads align to human reference sequences, allowing us to compute the full list of expression abundance across all tissues for each species, using the reads mapped to human genes. This update also includes improved transcript annotations derived from RNA-seq data for rhesus and cynomolgus macaques, two of the most commonly used NHP models and additional RNA-seq data compiled from related projects. Together, these comprehensive reference transcriptomes from multiple primates serve as a valuable community resource for genome annotation, gene dynamics and comparative functional analysis. PMID:25392405

  15. Sequence analysis of the aminoacylase-1 family. A new proposed signature for metalloexopeptidases.

    PubMed

    Biagini, A; Puigserver, A

    2001-03-01

    The amino acid sequence analysis of the human and porcine aminoacylases-1, the carboxypeptidase S precursor from Saccharomyces cerevisiae, the succinyl-diaminopimelate desuccinylase from Escherichia coli, Haemophilus influenzae and Corynebacterium glutamicum, the acetylornithine deacetylase from Escherichia coli and Dictyostelium discoideum and the carboxypeptidase G(2) precursor from Pseudomonas strain, using the Basic Local Alignment Search Tool (BLAST) and the Position-Specific Iterated BLAST (PSI-BLAST), allowed us to suggest that all these enzymes, which share common functional and biochemical features, belong to the same structural family. The three amino acid blocks which were found to be highly conserved, using the CLUSTAL W program, could be assigned to the catalytic active site, based on the general three-dimensional structure of the carboxypeptidase G(2) from the Pseudomonas strain precursor. Six additional proteins with the same signature have been retrieved after performing two successive PSI-BLAST iterations using the sequence of the conserved motif, namely Lactobacillus delbrueckii aminoacyl-histidine dipeptidase, Streptomyces griseus aminopeptidase, Saccharomyces cerevisiae aminopeptidase Y precursor, two Bacillus stearothermophilus N-carbamyl-L-amino acid amidohydrolases and Pseudomonas sp. hydantoin utilization protein C. The three conserved amino acid motifs corresponded to the following blocks: (i) [S, G, A]-H-x-D-x-V; (ii) G-x-x-D; and (iii) x-E-E. This new sequence signature is clearly different from that commonly reported in the literature for proteins belonging to the ArgE/DapE/CPG2/YscS family. PMID:11250542

  16. Extra-binomial variation approach for analysis of pooled DNA sequencing data

    PubMed Central

    Wallace, Chris

    2012-01-01

    Motivation: The invention of next-generation sequencing technology has made it possible to study the rare variants that are more likely to pinpoint causal disease genes. To make such experiments financially viable, DNA samples from several subjects are often pooled before sequencing. This induces large between-pool variation which, together with other sources of experimental error, creates over-dispersed data. Statistical analysis of pooled sequencing data needs to appropriately model this additional variance to avoid inflating the false-positive rate. Results: We propose a new statistical method based on an extra-binomial model to address the over-dispersion and apply it to pooled case-control data. We demonstrate that our model provides a better fit to the data than either a standard binomial model or a traditional extra-binomial model proposed by Williams and can analyse both rare and common variants with lower or more variable pool depths compared to the other methods. Availability: Package ‘extraBinomial’ is on http://cran.r-project.org/ Contact: chris.wallace@cimr.cam.ac.uk Supplementary information: Supplementary data are available at Bioinformatics Online. PMID:22976083

  17. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Foley, B.; Korber, B.; Mellors, J.W.; Jeang, K.T.; Wain-Hobson, S.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  18. Transcriptome analysis of expressed sequence tags from the venom glands of the fish Thalassophryne nattereri.

    PubMed

    Magalhães, G S; Junqueira-de-Azevedo, I L M; Lopes-Ferreira, M; Lorenzini, D M; Ho, P L; Moura-da-Silva, A M

    2006-06-01

    Thalassophryne nattereri (niquim) is a venomous fish found on the northern and northeastern coasts of Brazil. Every year, hundreds of humans are affected by the poison, which causes excruciating local pain, edema, and necrosis, and can lead to permanent disabilities. In experimental models, T. nattereri venom induces edema and nociception, which are correlated to human symptoms and dependent on venom kininogenase activity; myotoxicity; impairment of blood flow; platelet lysis and cytotoxicity on endothelial cells. These effects were observed with minute amounts of venom. To characterize the primary structure of T. nattereri venom toxins, a list of transcripts within the venom gland was made using the expressed sequence tag (EST) strategy. Here we report the analysis of 775 ESTs that were obtained from a directional cDNA library of T. nattereri venom gland. Of these ESTs, 527 (68%) were related to sequences previously described. These were categorized into 10 groups according to their biological functions. Sequences involved in gene and protein expression accounted for 14.3% of the ESTs, reflecting the important role of protein synthesis in this gland. Other groups included proteins engaged in the assembly of disulfide bonds (0.5%), chaperones involved in the folding of nascent proteins (1.4%), and sequences related to clusterin (1.5%), as well as transcripts related to calcium binding proteins (1.0%). We detected a large cluster (1.3%) related to cocaine- and amphetamine-regulated transcript (CART), a peptide involved in the regulation of food intake. Surprisingly, several retrotransposon-like sequences (1.0%) were found in the library. It may be that their presence accounts for some of the variation in venom toxins. The toxin category (18.8%) included natterins (18%), which are a new group of kininogenases recently described by our group, and a group of C-type lectins (0.8%). In addition, a considerable number of sequences (32%) was not related to sequences in the

  19. Additional Routes to Staphylococcus aureus Daptomycin Resistance as Revealed by Comparative Genome Sequencing, Transcriptional Profiling, and Phenotypic Studies

    PubMed Central

    Song, Yang; Rubio, Aileen; Jayaswal, Radheshyam K.; Silverman, Jared A.; Wilkinson, Brian J.

    2013-01-01

    Daptomycin is an extensively used anti-staphylococcal agent due to the rise in methicillin-resistant Staphylococcus aureus, but the mechanism(s) of resistance is poorly understood. Comparative genome sequencing, transcriptomics, ultrastructure, and cell envelope studies were carried out on two relatively higher level (4 and 8 µg/ml−1) laboratory-derived daptomycin-resistant strains (strains CB1541 and CB1540 respectively) compared to their parent strain (CB1118; MW2). Several mutations were found in the strains. Both strains had the same mutations in the two-component system genes walK and agrA. In strain CB1540 mutations were also detected in the ribose phosphate pyrophosphokinase (prs) and polyribonucleotide nucleotidyltransferase genes (pnpA), a hypothetical protein gene, and in an intergenic region. In strain CB1541 there were mutations in clpP, an ATP-dependent protease, and two different hypothetical protein genes. The strain CB1540 transcriptome was characterized by upregulation of cap (capsule) operon genes, genes involved in the accumulation of the compatible solute glycine betaine, ure genes of the urease operon, and mscL encoding a mechanosensitive chanel. Downregulated genes included smpB, femAB and femH involved in the formation of the pentaglycine interpeptide bridge, genes involved in protein synthesis and fermentation, and spa encoding protein A. Genes altered in their expression common to both transcriptomes included some involved in glycine betaine accumulation, mscL, ure genes, femH, spa and smpB. However, the CB1541 transcriptome was further characterized by upregulation of various heat shock chaperone and protease genes, consistent with a mutation in clpP, and lytM and sceD. Both strains showed slow growth, and strongly decreased autolytic activity that appeared to be mainly due to decreased autolysin production. In contrast to previous common findings, we did not find any mutations in phospholipid biosynthesis genes, and it appears there

  20. Experimental design, preprocessing, normalization and differential expression analysis of small RNA sequencing experiments

    PubMed Central

    2011-01-01

    Prior to the advent of new, deep sequencing methods, small RNA (sRNA) discovery was dependent on Sanger sequencing, which was time-consuming and limited knowledge to only the most abundant sRNA. The innovation of large-scale, next-generation sequencing has exponentially increased knowledge of the biology, diversity and abundance of sRNA populations. In this review, we discuss issues involved in the design of sRNA sequencing experiments, including choosing a sequencing platform, inherent biases that affect sRNA measurements and replication. We outline the steps involved in preprocessing sRNA sequencing data and review both the principles behind and the current options for normalization. Finally, we discuss differential expression analysis in the absence and presence of biological replicates. While our focus is on sRNA sequencing experiments, many of the principles discussed are applicable to the sequencing of other RNA populations. PMID:21356093

  1. Analysis of T-DNA/Host-Plant DNA Junction Sequences in Single-Copy Transgenic Barley Lines

    PubMed Central

    Bartlett, Joanne G.; Smedley, Mark A.; Harwood, Wendy A.

    2014-01-01

    Sequencing across the junction between an integrated transfer DNA (T-DNA) and a host plant genome provides two important pieces of information. The junctions themselves provide information regarding the proportion of T-DNA which has integrated into the host plant genome, whilst the transgene flanking sequences can be used to study the local genetic environment of the integrated transgene. In addition, this information is important in the safety assessment of GM crops and essential for GM traceability. In this study, a detailed analysis was carried out on the right-border T-DNA junction sequences of single-copy independent transgenic barley lines. T-DNA truncations at the right-border were found to be relatively common and affected 33.3% of the lines. In addition, 14.3% of lines had rearranged construct sequence after the right border break-point. An in depth analysis of the host-plant flanking sequences revealed that a significant proportion of the T-DNAs integrated into or close to known repetitive elements. However, this integration into repetitive DNA did not have a negative effect on transgene expression. PMID:24833334

  2. Computer analysis of phytochrome sequences and reevaluation of the phytochrome secondary structure by Fourier transform infrared spectroscopy.

    PubMed

    Sühnel, J; Hermann, G; Dornberger, U; Fritzsche, H

    1997-07-18

    A repertoire of various methods of computer sequence analysis was applied to phytochromes in order to gain new insights into their structure and function. A statistical analysis of 23 complete phytochrome sequences revealed regions of non-random amino acid composition, which are supposed to be of particular structural or functional importance. All phytochromes other than phyD and phyE from Arabidopsis have at least one such region at the N-terminus between residues 2 and 35. A sequence similarity search of current databases indicated striking homologies between all phytochromes and a hypothetical 84.2-kDa protein from the cyanobacterium Synechocystis. Furthermore, scanning the phytochrome sequences for the occurrence of patterns defined in the PROSITE database detected the signature of the WD repeats of the beta-transducin family within the functionally important 623-779 region (sequence numbering of phyA from Avena) in a number of phytochromes. A multiple sequence alignment performed with 23 complete phytochrome sequences is made available via the IMB Jena World-Wide Web server (http://www.imb-jena.de/PHYTO.html). It can be used as a working tool for future theoretical and experimental studies. Based on the multiple alignment striking sequence differences between phytochromes A and B were detected directly at the N-terminal end, where all phytochromes B have an additional stretch of 15-42 amino acids. There is also a variety of positions with totally conserved but different amino acids in phytochromes A and B. Most of these changes are found in the sequence segment 150-200. It is, therefore, suggested that this region might be of importance in determining the photosensory specificity of the two phytochromes. The secondary structure prediction based on the multiple alignment resulted in a small but significant beta-sheet content. This finding is confirmed by a reevaluation of the secondary structure using FTIR spectroscopy. PMID:9252112

  3. Population Structure of Clinical Vibrio parahaemolyticus from 17 Coastal Countries, Determined through Multilocus Sequence Analysis

    PubMed Central

    Lu, Jun; Wang, Guangzhou; Zhou, Lin; Min, Lingfeng; Han, Chongxu

    2014-01-01

    Vibrio parahaemolyticus is a leading cause of food-borne gastroenteritis worldwide. Although this bacterium has been the subject of much research, the population structure of clinical strains from worldwide collections remains largely undescribed, and the recorded outbreaks of V. parahaemolyticus gastroenteritis highlight the need for the subtyping of this species. We present a broad phylogenetic analysis of 490 clinical V. parahaemolyticus isolates from 17 coastal countries through multilocus sequence analysis (MLST). The 490 tested isolates fell into 161 sequence types (STs). The eBURST algorithm revealed that the 161 clinically relevant STs belonged to 8 clonal complexes, 11 doublets, and 94 singletons, showing a high level of genetic diversity. CC3 was found to be a global epidemic clone of V. parahaemolyticus, and ST-3 was the only ST with an international distribution. recA was observed to be evolving more rapidly, exhibiting the highest degree of nucleotide diversity (0.028) and the largest number of polymorphic nucleotide sites (177). We also found that the high variability of recA was an important cause of differences between the results of the eBURST and ME tree analyses, suggesting that recA has a much greater influence on the apparent evolutionary classification of V. parahaemolyticus based on the current MLST scheme. In conclusion, it is evident that a high degree of genetic diversity within the V. parahaemolyticus population and multiple sequence types are contributing to the burden of disease around the world. MLST, with a fully extractable database, is a powerful system for analysis of the clonal relationships of strains at a global scale. With the addition of more strains, the pubMLST database will provide more detailed and accurate information, which will be conducive to our future research on the population structure of V. parahaemolyticus. PMID:25225911

  4. Sequence analysis of mitochondrial DNA hypervariable regions using infrared fluorescence detection.

    PubMed

    Steffens, D L; Roy, R

    1998-06-01

    The non-coding region of the mitochondrial genome provides an attractive target for human forensic identification studies. Two hypervariable (HV) regions, each approximately 250-350 bp in length, contain the majority of mitochondrial DNA (mtDNA) sequence variability among different individuals. Various approaches to determine mtDNA sequence were evaluated utilizing highly sensitive infrared (IR) fluorescence detection. HV regions were amplified either together or separately and cycle-sequenced using a Thermo Sequenase protocol. An M13 universal primer sequence tail covalently attached to the 5' terminus of an amplification primer facilitated electrophoretic analysis and direct sequencing of the amplification products using IR detection. PMID:9631201

  5. Transcriptome Analysis of the Mud Crab (Scylla paramamosain) by 454 Deep Sequencing: Assembly, Annotation, and Marker Discovery

    PubMed Central

    Ma, Hongyu; Ma, Chunyan; Li, Shujuan; Jiang, Wei; Li, Xincang; Liu, Yuexing; Ma, Lingbo

    2014-01-01

    In this study, we reported the characterization of the first transcriptome of the mud crab (Scylla paramamosain). Pooled cDNAs of four tissue types from twelve wild individuals were sequenced using the Roche 454 FLX platform. Analysis performed included de novo assembly of transcriptome sequences, functional annotation, and molecular marker discovery. A total of 1,314,101 high quality reads with an average length of 411 bp were generated by 454 sequencing on a mixed cDNA library. De novo assembly of these 1,314,101 reads produced 76,778 contigs (consisting of 818,154 reads) with 5.4-fold average sequencing coverage. The remaining 495,947 reads were singletons. A total of 78,268 unigenes were identified based on sequence similarity with known proteins (E≤0.00001) in UniProt and non-redundant protein databases. Meanwhile, 44,433 sequences were identified (E≤0.00001) using a BLASTN search against the NCBI nucleotide database. Gene Ontology (GO) analysis indicated that biosynthetic process, cell part, and ion binding were the most abundant terms in biological process, cellular component, and molecular function categories, respectively. Kyoto Encyclopedia of Genes and Genome (KEGG) pathway analysis revealed that 4,878 unigenes distributed in 281 different pathways. In addition, 19,011 microsatellites and 37,063 potential single nucleotide polymorphisms were detected from the transcriptome of S. paramamosain. Finally, thirty polymorphic microsatellite markers were developed and used to assess genetic diversity of a wild population of S. paramamosain. So far, existing sequence resources for S. paramamosain are extremely limited. The present study provides a characterization of transcriptome from multiple tissues and individuals, as well as an assessment of genetic diversity of a wild population. These sequence resources will facilitate the investigation of population genetic diversity, the development of genetic maps, and the conduct of molecular marker

  6. Analysis of CNT additives in porous layered thin film lubrication with electric double layer

    NASA Astrophysics Data System (ADS)

    Rao, T. V. V. L. N.; Rani, A. M. A.; Sufian, S.; Mohamed, N. M.

    2015-07-01

    This paper presents an analysis of thin film lubrication of porous layered carbon nanotubes (CNTs) additive slider bearing with electric double layer. The CNTs additive lubricant flow in the thin fluid film and porous layers are governed by Stokes and Brinkman equations respectively, including electro-kinetic force. The apparent viscosity and nondimensional pressure expression are derived. The nondimensional load capacity increases under the influence of electro-viscosity, CNT additives volume fraction, permeability and thickness of porous layer. A CNTs additive lubricated porous thin film slider bearing with electric double layer provides higher load capacity.

  7. Genome Sequence and Analysis of the Soil Cellulolytic Actinomycete Thermobifida fusca YX

    SciTech Connect

    Lykidis, A; Mavromatis, K; Ivanova, N; Anderson, Iain; Land, Miriam L; DiBartolo, Genevieve; Martinez, Michele; Lapidus, Alla L.; Lucas, Susan; Copeland, A; Richardson, P M; Wilson, David B; Kyrpides, Nikos C

    2007-01-01

    Thermobifida fusca is a moderately thermophilic soil bacterium that belongs to Actinobacteria. It is a major degrader of plant cell walls and has been used as a model organism for the study of secreted, thermostable cellulases. The complete genome sequence showed that T. fusca has a single circular chromosome of 3,642,249 bp predicted to encode 3,117 proteins and 65 RNA species with a coding density of 85%. Genome analysis revealed the existence of 29 putative glycoside hydrolases in addition to the previously identified cellulases and xylanases. The glycosyl hydrolases include enzymes predicted to exhibit mainly dextran/starch- and xylan-degrading functions. T. fusca possesses two protein secretion systems: the sec general secretion system and the twin-arginine translocation system. Several of the secreted cellulases have sequence signatures indicating their secretion may be mediated by the twin-arginine translocation system. T. fusca has extensive transport systems for import of carbohydrates coupled to transcriptional regulators controlling the expression of the transporters and glycosylhydrolases. In addition to providing an overview of the physiology of a soil actinomycete, this study presents insights on the transcriptional regulation and secretion of cellulases which may facilitate the industrial exploitation of these systems.

  8. Genome Sequence and Analysis of the Soil Cellulolytic ActinomyceteThermobifida fusca

    SciTech Connect

    Lykidis, Athanasios; Mavromatis, Konstantinos; Ivanova, Natalia; Anderson, Iain; Land, Miriam; DiBartolo, Genevieve; Martinez, Michele; Lapidus, Alla; Lucas, Susan; Copeland, Alex; Richardson, Paul; Wilson,David B.; Kyrpides, Nikos

    2007-02-01

    Thermobifida fusca is a moderately thermophilic soilbacterium that belongs to Actinobacteria. 3 It is a major degrader ofplant cell walls and has been used as a model organism for the study of 4secreted, thermostable cellulases. The complete genome sequence showedthat T. fusca has a 5 single circular chromosome of 3642249 bp predictedto encode 3117 proteins and 65 RNA6 species with a coding densityof 85percent. Genome analysis revealed the existence of 29 putative 7glycoside hydrolases in addition to the previously identified cellulasesand xylanases. The 8 glycosyl hydrolases include enzymes predicted toexhibit mainly dextran/starch and xylan 9 degrading functions. T. fuscapossesses two protein secretion systems: the sec general secretion 10system and the twin-arginine translocation system. Several of thesecreted cellulases have 11 sequence signatures indicating theirsecretion may be mediated by the twin-arginine12 translocation system. T.fusca has extensive transport systems for import of carbohydrates 13coupled to transcriptional regulators controlling the expression of thetransporters and14 glycosylhydrolases. In addition to providing anoverview of the physiology of a soil 15 actinomycete, this study presentsinsights on the transcriptional regulation and secretion of16 cellulaseswhich may facilitate the industrial exploitation of thesesystems.

  9. SxtA gene sequence analysis of dinoflagellate Alexandrium minutum

    NASA Astrophysics Data System (ADS)

    Norshaha, Safida Anira; Latib, Norhidayu Abdul; Usup, Gires; Yusof, Nurul Yuziana Mohd

    2015-09-01

    The dinoflagellate Alexandrium minutum is typically known for the production of potent neurotoxins such as saxitoxin, affecting the health of human seafood consumers via paralytic shellfish poisoning (PSP). These phenomena is related to the harmful algal blooms (HABs) that is believed to be influenced by environmental and nutritional factors. Previous study has revealed that SxtA gene is a starting gene that involved in the saxitoxin production pathway. The aim of this study was to analyse the sequence of the sxtA gene in A. minutum. The dinoflagellates culture was cultured at temperature 26°C with 16:8-hour light:dark photocycle. After the samples were harvested, RNA was extracted, complementary DNA (cDNA) was synthesised and amplified by polymerase chain reaction (PCR). The PCR products were then purified and cloned before sequenced. The SxtA sequence obtained was then analyzed in order to identify the presence of SxtA gene in Alexandrium minutum.

  10. Bioinformatic Analysis of Toll-Like Receptor Sequences and Structures.

    PubMed

    Monie, Tom P; Gay, Nicholas J; Gangloff, Monique

    2016-01-01

    Continual advancements in computing power and sophistication, coupled with rapid increases in protein sequence and structural information, have made bioinformatic tools an invaluable resource for the molecular and structural biologist. With the degree of sequence information continuing to expand at an almost exponential rate, it is essential that scientists today have a basic understanding of how to utilise, manipulate and analyse this information for the benefit of their own experiments. In the context of Toll-Interleukin I Receptor domain containing proteins, we describe here a series of the more common and user-friendly bioinformatic tools available as Internet-based resources. These will enable the identification and alignment of protein sequences; the identification of functional motifs; the characterisation of protein secondary structure; the identification of protein structural folds and distantly homologous proteins; and the validation of the structural geometry of modelled protein structures. PMID:26803620

  11. Efficient analysis of mouse genome sequences reveal many nonsense variants.

    PubMed

    Steeland, Sophie; Timmermans, Steven; Van Ryckeghem, Sara; Hulpiau, Paco; Saeys, Yvan; Van Montagu, Marc; Vandenbroucke, Roosmarijn E; Libert, Claude

    2016-05-17

    Genetic polymorphisms in coding genes play an important role when using mouse inbred strains as research models. They have been shown to influence research results, explain phenotypical differences between inbred strains, and increase the amount of interesting gene variants present in the many available inbred lines. SPRET/Ei is an inbred strain derived from Mus spretus that has ∼1% sequence difference with the C57BL/6J reference genome. We obtained a listing of all SNPs and insertions/deletions (indels) present in SPRET/Ei from the Mouse Genomes Project (Wellcome Trust Sanger Institute) and processed these data to obtain an overview of all transcripts having nonsynonymous coding sequence variants. We identified 8,883 unique variants affecting 10,096 different transcripts from 6,328 protein-coding genes, which is about 28% of all coding genes. Because only a subset of these variants results in drastic changes in proteins, we focused on variations that are nonsense mutations that ultimately resulted in a gain of a stop codon. These genes were identified by in silico changing the C57BL/6J coding sequences to the SPRET/Ei sequences, converting them to amino acid (AA) sequences, and comparing the AA sequences. All variants and transcripts affected were also stored in a database, which can be browsed using a SPRET/Ei M. spretus variants web tool (www.spretus.org), including a manual. We validated the tool by demonstrating the loss of function of three proteins predicted to be severely truncated, namely Fas, IRAK2, and IFNγR1. PMID:27147605

  12. Multilocus sequence analysis of Brazilian Rhizobium microsymbionts of common bean (Phaseolus vulgaris L.) reveals unexpected taxonomic diversity.

    PubMed

    Ribeiro, Renan Augusto; Barcellos, Fernando Gomes; Thompson, Fabiano L; Hungria, Mariangela

    2009-05-01

    The diazotrophic bacteria collectively known as "rhizobia" are important for establishing symbiotic N(2)-fixing associations with many legumes. These microbes have been used for over a century as an environmentally beneficial and cost-effective means of ensuring acceptable yields of agricultural legumes. The most widely used phylogenetic marker for identification and classification of rhizobia has been the 16S rRNA gene; however, this marker fails to discriminate some closely related species. In this study, we established the first multilocus sequence analysis (MLSA) scheme for the identification and classification of rhizobial microsymbionts of common bean (Phaseolus vulgaris L.). We analyzed 12 Brazilian strains representative of a collection of over 850 isolates in addition to type and reference rhizobial strains, by sequencing recA, dnaK, gltA, glnII and rpoA genes. Gene sequence similarities among the five type/reference Rhizobium strains which are symbionts of common bean ranged from 95 to 100% for 16S rRNA, and from 83 to 99% for the other five genes. Rhizobial species described as symbionts of common bean also formed separate groups upon analysis of single and concatenated gene sequences, and clusters formed in each tree were in good mutual agreement. The five additional loci may thus be considered useful markers of the genus Rhizobium; in addition, MLSA also revealed broad genetic diversity among strains classified as Rhizobium tropici, providing evidence of new species. PMID:19403105

  13. 16S rRNA Gene Sequencing, Multilocus Sequence Analysis, and Mass Spectrometry Identification of the Proposed New Species “Clostridium neonatale”

    PubMed Central

    Bouvet, Philippe; Ferraris, Laurent; Dauphin, Brunhilde; Popoff, Michel-Robert; Butel, Marie Jose

    2014-01-01

    In 2002, an outbreak of necrotizing enterocolitis in a Canadian neonatal intensive care unit was associated with a proposed novel species of Clostridium, “Clostridium neonatale.” To date, there are no data about the isolation, identification, or clinical significance of this species. Additionally, C. neonatale has not been formally classified as a new species, rendering its identification challenging. Indeed, the C. neonatale 16S rRNA gene sequence shows high similarity to another Clostridium species involved in neonatal necrotizing enterocolitis, Clostridium butyricum. By performing a polyphasic study combining phylogenetic analysis (16S rRNA gene sequencing and multilocus sequence analysis) and phenotypic characterization with mass spectrometry, we demonstrated that C. neonatale is a new species within the Clostridium genus sensu stricto, for which we propose the name Clostridium neonatale sp. nov. Now that the status of C. neonatale has been clarified, matrix-assisted laser desorption ionization–time of flight mass spectrometry (MALDI-TOF MS) can be used for better differential identification of C. neonatale and C. butyricum clinical isolates. This is necessary to precisely define the role and clinical significance of C. neonatale, a species that may have been misidentified and underrepresented during previous neonatal necrotizing enterocolitis studies. PMID:25232167

  14. Mercury: Next-gen Data Analysis and Annotation Pipeline (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Sexton, David [Baylor

    2013-01-25

    David Sexton (Baylor) gives a talk titled "Mercury: Next-gen Data Analysis and Annotation Pipeline" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  15. Mercury: Next-gen Data Analysis and Annotation Pipeline (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Sexton, David

    2012-06-01

    David Sexton (Baylor) gives a talk titled "Mercury: Next-gen Data Analysis and Annotation Pipeline" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  16. Using Whole Genome Analysis to Examine Recombination across Diverse Sequence Types of Staphylococcus aureus

    PubMed Central

    Driebe, Elizabeth M.; Sahl, Jason W.; Roe, Chandler; Bowers, Jolene R.; Schupp, James M.; Gillece, John D.; Kelley, Erin; Price, Lance B.; Pearson, Talima R.; Hepp, Crystal M.; Brzoska, Pius M.; Cummings, Craig A.; Furtado, Manohar R.; Andersen, Paal S.; Stegger, Marc; Engelthaler, David M.; Keim, Paul S.

    2015-01-01

    Staphylococcus aureus is an important clinical pathogen worldwide and understanding this organism's phylogeny and, in particular, the role of recombination, is important both to understand the overall spread of virulent lineages and to characterize outbreaks. To further elucidate the phylogeny of S. aureus, 35 diverse strains were sequenced using whole genome sequencing. In addition, 29 publicly available whole genome sequences were included to create a single nucleotide polymorphism (SNP)-based phylogenetic tree encompassing 11 distinct lineages. All strains of a particular sequence type fell into the same clade with clear groupings of the major clonal complexes of CC8, CC5, CC30, CC45 and CC1. Using a novel analysis method, we plotted the homoplasy density and SNP density across the whole genome and found evidence of recombination throughout the entire chromosome, but when we examined individual clonal lineages we found very little recombination. However, when we analyzed three branches of multiple lineages, we saw intermediate and differing levels of recombination between them. These data demonstrate that in S. aureus, recombination occurs across major lineages that subsequently expand in a clonal manner. Estimated mutation rates for the CC8 and CC5 lineages were different from each other. While the CC8 lineage rate was similar to previous studies, the CC5 lineage was 100-fold greater. Fifty known virulence genes were screened in all genomes in silico to determine their distribution across major clades. Thirty-three genes were present variably across clades, most of which were not constrained by ancestry, indicating horizontal gene transfer or gene loss. PMID:26161978

  17. High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.

    PubMed

    Inagaki, Soichi; Henry, Isabelle M; Lieberman, Meric C; Comai, Luca

    2015-01-01

    Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines. PMID:26445462

  18. High-throughput analysis of T-DNA location and structure using sequence capture

    SciTech Connect

    Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.; Comai, Luca

    2015-10-07

    Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. As a result, our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.

  19. High-throughput analysis of T-DNA location and structure using sequence capture

    DOE PAGESBeta

    Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.; Comai, Luca

    2015-10-07

    Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously,more » using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. As a result, our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.« less

  20. Sequence analysis of Meq oncogene among Indian isolates of Marek's disease herpesvirus.

    PubMed

    Gupta, Mridula; Deka, Dipak; Ramneek

    2016-09-01

    Marek's disease (MD), caused by Marek's disease virus (MDV), is a highly contagious neoplastic disease of chicken that can be prevented by vaccination. However, in recent years many cases of vaccine failure have been reported worldwide as chickens develop symptoms of MD in spite of proper vaccination. Distinct polymorphism and point mutations in Meq gene of MDV have been reported to be associated with virulence and oncogenicity. The present study was carried out with the objective to isolate and characterize field isolates of MDV on the basis of Meq gene. Twenty five samples of suspected cases of MD were collected and processed for virus isolation in duck embryo fibroblast (DEF) primary culture where 28% (7 of 25) samples showed characteristic cytopathic effects of MDV in the form of plaques and syncytia. Additional evidence of presence of MDV in these samples was confirmed by PCR. To analyze diversity in all seven isolates of MDV, a polymorphism study was carried out by cloning and sequencing of full length of Meq gene (1020 bp). Sequence homology of 7 isolates with 23 reference strains showed 98.10-99.40% similarity in nucleotide and 95.90-98.50% similarity in amino acid sequences. Six isolates revealed 5 repeat sequences of 4 prolines (PPPP) whereas, one isolate revealed only 4 repeats. In phylogenetic analysis, these isolates formed a separate cluster showing close relatedness to the Chinese isolates. The study indicates a high mutation rate in field isolates of MDV that may be probable cause of vaccination failure. PMID:27617224

  1. High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture

    PubMed Central

    Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.; Comai, Luca

    2015-01-01

    Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines. PMID:26445462

  2. Using Whole Genome Analysis to Examine Recombination across Diverse Sequence Types of Staphylococcus aureus.

    PubMed

    Driebe, Elizabeth M; Sahl, Jason W; Roe, Chandler; Bowers, Jolene R; Schupp, James M; Gillece, John D; Kelley, Erin; Price, Lance B; Pearson, Talima R; Hepp, Crystal M; Brzoska, Pius M; Cummings, Craig A; Furtado, Manohar R; Andersen, Paal S; Stegger, Marc; Engelthaler, David M; Keim, Paul S

    2015-01-01

    Staphylococcus aureus is an important clinical pathogen worldwide and understanding this organism's phylogeny and, in particular, the role of recombination, is important both to understand the overall spread of virulent lineages and to characterize outbreaks. To further elucidate the phylogeny of S. aureus, 35 diverse strains were sequenced using whole genome sequencing. In addition, 29 publicly available whole genome sequences were included to create a single nucleotide polymorphism (SNP)-based phylogenetic tree encompassing 11 distinct lineages. All strains of a particular sequence type fell into the same clade with clear groupings of the major clonal complexes of CC8, CC5, CC30, CC45 and CC1. Using a novel analysis method, we plotted the homoplasy density and SNP density across the whole genome and found evidence of recombination throughout the entire chromosome, but when we examined individual clonal lineages we found very little recombination. However, when we analyzed three branches of multiple lineages, we saw intermediate and differing levels of recombination between them. These data demonstrate that in S. aureus, recombination occurs across major lineages that subsequently expand in a clonal manner. Estimated mutation rates for the CC8 and CC5 lineages were different from each other. While the CC8 lineage rate was similar to previous studies, the CC5 lineage was 100-fold greater. Fifty known virulence genes were screened in all genomes in silico to determine their distribution across major clades. Thirty-three genes were present variably across clades, most of which were not constrained by ancestry, indicating horizontal gene transfer or gene loss. PMID:26161978

  3. Cerebellar contributions to visuomotor adaptation and motor sequence learning: an ALE meta-analysis

    PubMed Central

    Bernard, Jessica A.; Seidler, Rachael D.

    2013-01-01

    Cerebellar contributions to motor learning are well-documented. For example, under some conditions, patients with cerebellar damage are impaired at visuomotor adaptation and at acquiring new action sequences. Moreover, cerebellar activation has been observed in functional MRI (fMRI) investigations of various motor learning tasks. The early phases of motor learning are cognitively demanding, relying on processes such as working memory, which have been linked to the cerebellum as well. Here, we investigated cerebellar contributions to motor learning using activation likelihood estimation (ALE) meta-analysis. This allowed us to determine, across studies and tasks, whether or not the location of cerebellar activation is constant across differing motor learning tasks, and whether or not cerebellar activation in early learning overlaps with that observed for working memory. We found that different regions of the anterior cerebellum are engaged for implicit and explicit sequence learning and visuomotor adaptation, providing additional evidence for the modularity of cerebellar function. Furthermore, we found that lobule VI of the cerebellum, which has been implicated in working memory, is activated during the early stages of explicit motor sequence learning. This provides evidence for a potential role for the cerebellum in the cognitive processing associated with motor learning. However, though lobule VI was activated across both early explicit sequence learning and working memory studies, there was no spatial overlap between these two regions. Together, our results support the idea of modularity in the formation of internal representations of new motor tasks in the cerebellum, and highlight the cognitive processing relied upon during the early phases of motor skill learning. PMID:23403800

  4. Whole genome sequence and analysis of the Marwari horse breed and its genetic origin

    PubMed Central

    2014-01-01

    Background The horse (Equus ferus caballus) is one of the earliest domesticated species and has played an important role in the development of human societies over the past 5,000 years. In this study, we characterized the genome of the Marwari horse, a rare breed with unique phenotypic characteristics, including inwardly turned ear tips. It is thought to have originated from the crossbreeding of local Indian ponies with Arabian horses beginning in the 12th century. Results We generated 101 Gb (~30 × coverage) of whole genome sequences from a Marwari horse using the Illumina HiSeq2000 sequencer. The sequences were mapped to the horse reference genome at a mapping rate of ~98% and with ~95% of the genome having at least 10 × coverage. A total of 5.9 million single nucleotide variations, 0.6 million small insertions or deletions, and 2,569 copy number variation blocks were identified. We confirmed a strong Arabian and Mongolian component in the Marwari genome. Novel variants from the Marwari sequences were annotated, and were found to be enriched in olfactory functions. Additionally, we suggest a potential functional genetic variant in the TSHZ1 gene (p.Ala344>Val) associated with the inward-turning ear tip shape of the Marwari horses. Conclusions Here, we present an analysis of the Marwari horse genome. This is the first genomic data for an Asian breed, and is an invaluable resource for future studies of genetic variation associated with phenotypes and diseases in horses. PMID:25521865

  5. Differentiation of Xylella fastidiosa Strains via Multilocus Sequence Analysis of Environmentally Mediated Genes (MLSA-E)

    PubMed Central

    Parker, Jennifer K.; Havird, Justin C.

    2012-01-01

    Isolates of the plant pathogen Xylella fastidiosa are genetically very similar, but studies on their biological traits have indicated differences in virulence and infection symptomatology. Taxonomic analyses have identified several subspecies, and phylogenetic analyses of housekeeping genes have shown broad host-based genetic differences; however, results are still inconclusive for genetic differentiation of isolates within subspecies. This study employs multilocus sequence analysis of environmentally mediated genes (MLSA-E; genes influenced by environmental factors) to investigate X. fastidiosa relationships and differentiate isolates with low genetic variability. Potential environmentally mediated genes, including host colonization and survival genes related to infection establishment, were identified a priori. The ratio of the rate of nonsynonymous substitutions to the rate of synonymous substitutions (dN/dS) was calculated to select genes that may be under increased positive selection compared to previously studied housekeeping genes. Nine genes were sequenced from 54 X. fastidiosa isolates infecting different host plants across the United States. Results of maximum likelihood (ML) and Bayesian phylogenetic (BP) analyses are in agreement with known X. fastidiosa subspecies clades but show novel within-subspecies differentiation, including geographic differentiation, and provide additional information regarding host-based isolate variation and specificity. dN/dS ratios of environmentally mediated genes, though <1 due to high sequence similarity, are significantly greater than housekeeping gene dN/dS ratios and correlate with increased sequence variability. MLSA-E can more precisely resolve relationships between closely related bacterial strains with low genetic variability, such as X. fastidiosa isolates. Discovering the genetic relationships between X. fastidiosa isolates will provide new insights into the epidemiology of populations of X. fastidiosa, allowing

  6. FASTAptamer: A Bioinformatic Toolkit for High-throughput Sequence Analysis of Combinatorial Selections

    PubMed Central

    Alam, Khalid K; Chang, Jonathan L; Burke, Donald H

    2015-01-01

    High-throughput sequence (HTS) analysis of combinatorial selection populations accelerates lead discovery and optimization and offers dynamic insight into selection processes. An underlying principle is that selection enriches high-fitness sequences as a fraction of the population, whereas low-fitness sequences are depleted. HTS analysis readily provides the requisite numerical information by tracking the evolutionary trajectory of individual sequences in response to selection pressures. Unlike genomic data, for which a number of software solutions exist, user-friendly tools are not readily available for the combinatorial selections field, leading many users to create custom software. FASTAptamer was designed to address the sequence-level analysis needs of the field. The open source FASTAptamer toolkit counts, normalizes and ranks read counts in a FASTQ file, compares populations for sequence distribution, generates clusters of sequence families, calculates fold-enrichment of sequences throughout the course of a selection and searches for degenerate sequence motifs. While originally designed for aptamer selections, FASTAptamer can be applied to any selection strategy that can utilize next-generation DNA sequencing, such as ribozyme or deoxyribozyme selections, in vivo mutagenesis and various surface display technologies (peptide, antibody fragment, mRNA, etc.). FASTAptamer software, sample data and a user's guide are available for download at http://burkelab.missouri.edu/fastaptamer.html. PMID:25734917

  7. Phylogenetic Analysis of the Bifidobacterium Genus Using Glycolysis Enzyme Sequences

    PubMed Central

    Brandt, Katelyn; Barrangou, Rodolphe

    2016-01-01

    Bifidobacteria are important members of the human gastrointestinal tract that promote the establishment of a healthy microbial consortium in the gut of infants. Recent studies have established that the Bifidobacterium genus is a polymorphic phylogenetic clade, which encompasses a diversity of species and subspecies that encode a broad range of proteins implicated in complex and non-digestible carbohydrate uptake and catabolism, ranging from human breast milk oligosaccharides, to plant fibers. Recent genomic studies have created a need to properly place Bifidobacterium species in a phylogenetic tree. Current approaches, based on core-genome analyses come at the cost of intensive sequencing and demanding analytical processes. Here, we propose a typing method based on sequences of glycolysis genes and the proteins they encode, to provide insights into diversity, typing, and phylogeny in this complex and broad genus. We show that glycolysis genes occur broadly in these genomes, to encode the machinery necessary for the biochemical spine of the cell, and provide a robust phylogenetic marker. Furthermore, glycolytic sequences-based trees are congruent with both the classical 16S rRNA phylogeny, and core genome-based strain clustering. Furthermore, these glycolysis markers can also be used to provide insights into the adaptive evolution of this genus, especially with regards to trends toward a high GC content. This streamlined method may open new avenues for phylogenetic studies on a broad scale, given the widespread occurrence of the glycolysis pathway in bacteria, and the diversity of the sequences they encode. PMID:27242688

  8. Functional analysis of bipartite begomovirus coat protein promoter sequences

    SciTech Connect

    Lacatus, Gabriela; Sunter, Garry

    2008-06-20

    We demonstrate that the AL2 gene of Cabbage leaf curl virus (CaLCuV) activates the CP promoter in mesophyll and acts to derepress the promoter in vascular tissue, similar to that observed for Tomato golden mosaic virus (TGMV). Binding studies indicate that sequences mediating repression and activation of the TGMV and CaLCuV CP promoter specifically bind different nuclear factors common to Nicotiana benthamiana, spinach and tomato. However, chromatin immunoprecipitation demonstrates that TGMV AL2 can interact with both sequences independently. Binding of nuclear protein(s) from different crop species to viral sequences conserved in both bipartite and monopartite begomoviruses, including TGMV, CaLCuV, Pepper golden mosaic virus and Tomato yellow leaf curl virus suggests that bipartite begomoviruses bind common host factors to regulate the CP promoter. This is consistent with a model in which AL2 interacts with different components of the cellular transcription machinery that bind viral sequences important for repression and activation of begomovirus CP promoters.

  9. DNA sequence and analysis of human chromosome 8.

    PubMed

    Nusbaum, Chad; Mikkelsen, Tarjei S; Zody, Michael C; Asakawa, Shuichi; Taudien, Stefan; Garber, Manuel; Kodira, Chinnappa D; Schueler, Mary G; Shimizu, Atsushi; Whittaker, Charles A; Chang, Jean L; Cuomo, Christina A; Dewar, Ken; FitzGerald, Michael G; Yang, Xiaoping; Allen, Nicole R; Anderson, Scott; Asakawa, Teruyo; Blechschmidt, Karin; Bloom, Toby; Borowsky, Mark L; Butler, Jonathan; Cook, April; Corum, Benjamin; DeArellano, Kurt; DeCaprio, David; Dooley, Kathleen T; Dorris, Lester; Engels, Reinhard; Glöckner, Gernot; Hafez, Nabil; Hagopian, Daniel S; Hall, Jennifer L; Ishikawa, Sabine K; Jaffe, David B; Kamat, Asha; Kudoh, Jun; Lehmann, Rüdiger; Lokitsang, Tashi; Macdonald, Pendexter; Major, John E; Matthews, Charles D; Mauceli, Evan; Menzel, Uwe; Mihalev, Atanas H; Minoshima, Shinsei; Murayama, Yuji; Naylor, Jerome W; Nicol, Robert; Nguyen, Cindy; O'Leary, Sinéad B; O'Neill, Keith; Parker, Stephen C J; Polley, Andreas; Raymond, Christina K; Reichwald, Kathrin; Rodriguez, Joseph; Sasaki, Takashi; Schilhabel, Markus; Siddiqui, Roman; Smith, Cherylyn L; Sneddon, Tam P; Talamas, Jessica A; Tenzin, Pema; Topham, Kerri; Venkataraman, Vijay; Wen, Gaiping; Yamazaki, Satoru; Young, Sarah K; Zeng, Qiandong; Zimmer, Andrew R; Rosenthal, Andre; Birren, Bruce W; Platzer, Matthias; Shimizu, Nobuyoshi; Lander, Eric S

    2006-01-19

    The International Human Genome Sequencing Consortium (IHGSC) recently completed a sequence of the human genome. As part of this project, we have focused on chromosome 8. Although some chromosomes exhibit extreme characteristics in terms of length, gene content, repeat content and fraction segmentally duplicated, chromosome 8 is distinctly typical in character, being very close to the genome median in each of these aspects. This work describes a finished sequence and gene catalogue for the chromosome, which represents just over 5% of the euchromatic human genome. A unique feature of the chromosome is a vast region of approximately 15 megabases on distal 8p that appears to have a strikingly high mutation rate, which has accelerated in the hominids relative to other sequenced mammals. This fast-evolving region contains a number of genes related to innate immunity and the nervous system, including loci that appear to be under positive selection--these include the major defensin (DEF) gene cluster and MCPH1, a gene that may have contributed to the evolution of expanded brain size in the great apes. The data from chromosome 8 should allow a better understanding of both normal and disease biology and genome evolution. PMID:16421571

  10. Phylogenetic Analysis of the Bifidobacterium Genus Using Glycolysis Enzyme Sequences.

    PubMed

    Brandt, Katelyn; Barrangou, Rodolphe

    2016-01-01

    Bifidobacteria are important members of the human gastrointestinal tract that promote the establishment of a healthy microbial consortium in the gut of infants. Recent studies have established that the Bifidobacterium genus is a polymorphic phylogenetic clade, which encompasses a diversity of species and subspecies that encode a broad range of proteins implicated in complex and non-digestible carbohydrate uptake and catabolism, ranging from human breast milk oligosaccharides, to plant fibers. Recent genomic studies have created a need to properly place Bifidobacterium species in a phylogenetic tree. Current approaches, based on core-genome analyses come at the cost of intensive sequencing and demanding analytical processes. Here, we propose a typing method based on sequences of glycolysis genes and the proteins they encode, to provide insights into diversity, typing, and phylogeny in this complex and broad genus. We show that glycolysis genes occur broadly in these genomes, to encode the machinery necessary for the biochemical spine of the cell, and provide a robust phylogenetic marker. Furthermore, glycolytic sequences-based trees are congruent with both the classical 16S rRNA phylogeny, and core genome-based strain clustering. Furthermore, these glycolysis markers can also be used to provide insights into the adaptive evolution of this genus, especially with regards to trends toward a high GC content. This streamlined method may open new avenues for phylogenetic studies on a broad scale, given the widespread occurrence of the glycolysis pathway in bacteria, and the diversity of the sequences they encode. PMID:27242688

  11. Analysis of the complete DNA sequence of murine cytomegalovirus.

    PubMed Central

    Rawlinson, W D; Farrell, H E; Barrell, B G

    1996-01-01

    The complete DNA sequence of the Smith strain of murine cytomegalovirus (MCMV) was determined from virion DNA by using a whole-genome shotgun approach. The genome has an overall G+C content of 58.7%, consists of 230,278 bp, and is arranged as a single unique sequence with short (31-bp) terminal direct repeats and several short internal repeats. Significant similarity to the genome of the sequenced human cytomegalovirus (HCMV) strain AD169 is evident, particularly for 78 open reading frames encoded by the central part of the genome. There is a very similar distribution of G+C content across the two genomes. Sequences toward the ends of the MCMV genome encode tandem arrays of homologous glycoproteins (gps) arranged as two gene families. The left end encodes 15 gps that represent one family, and the right end encodes a different family of 11 gps. A homolog (m144) of cellular major histocompatibility complex (MHC) class I genes is located at the end of the genome opposite the HCMV MHC class I homolog (UL18). G protein-coupled receptor (GCR) homologs (M33 and M78) occur in positions congruent with two (UL33 and UL78) of the four putative HCMV GCR homologs. Counterparts of all of the known enzyme homologs in HCMV are present in the MCMV genome, including the phosphotransferase gene (M97), whose product phosphorylates ganciclovir in HCMV-infected cells, and the assembly protein (M80). PMID:8971012

  12. Learning Progressions and Teaching Sequences: A Review and Analysis

    ERIC Educational Resources Information Center

    Duschl, Richard; Maeng, Seungho; Sezen, Asli

    2011-01-01

    Our paper is an analytical review of the design, development and reporting of learning progressions and teaching sequences. Research questions are: (1) what criteria are being used to propose a "hypothetical learning progression/trajectory" and (2) what measurements/evidence are being used to empirically define and refine a "hypothetical learning…

  13. Genome sequence and analysis of the tuber crop potato.

    PubMed

    Xu, Xun; Pan, Shengkai; Cheng, Shifeng; Zhang, Bo; Mu, Desheng; Ni, Peixiang; Zhang, Gengyun; Yang, Shuang; Li, Ruiqiang; Wang, Jun; Orjeda, Gisella; Guzman, Frank; Torres, Michael; Lozano, Roberto; Ponce, Olga; Martinez, Diana; De la Cruz, Germán; Chakrabarti, S K; Patil, Virupaksh U; Skryabin, Konstantin G; Kuznetsov, Boris B; Ravin, Nikolai V; Kolganova, Tatjana V; Beletsky, Alexey V; Mardanov, Andrei V; Di Genova, Alex; Bolser, Daniel M; Martin, David M A; Li, Guangcun; Yang, Yu; Kuang, Hanhui; Hu, Qun; Xiong, Xingyao; Bishop, Gerard J; Sagredo, Boris; Mejía, Nilo; Zagorski, Wlodzimierz; Gromadka, Robert; Gawor, Jan; Szczesny, Pawel; Huang, Sanwen; Zhang, Zhonghua; Liang, Chunbo; He, Jun; Li, Ying; He, Ying; Xu, Jianfei; Zhang, Youjun; Xie, Binyan; Du, Yongchen; Qu, Dongyu; Bonierbale, Merideth; Ghislain, Marc; Herrera, Maria del Rosario; Giuliano, Giovanni; Pietrella, Marco; Perrotta, Gaetano; Facella, Paolo; O'Brien, Kimberly; Feingold, Sergio E; Barreiro, Leandro E; Massa, Gabriela A; Diambra, Luis; Whitty, Brett R; Vaillancourt, Brieanne; Lin, Haining; Massa, Alicia N; Geoffroy, Michael; Lundback, Steven; DellaPenna, Dean; Buell, C Robin; Sharma, Sanjeev Kumar; Marshall, David F; Waugh, Robbie; Bryan, Glenn J; Destefanis, Marialaura; Nagy, Istvan; Milbourne, Dan; Thomson, Susan J; Fiers, Mark; Jacobs, Jeanne M E; Nielsen, Kåre L; Sønderkær, Mads; Iovene, Marina; Torres, Giovana A; Jiang, Jiming; Veilleux, Richard E; Bachem, Christian W B; de Boer, Jan; Borm, Theo; Kloosterman, Bjorn; van Eck, Herman; Datema, Erwin; Hekkert, Bas te Lintel; Goverse, Aska; van Ham, Roeland C H J; Visser, Richard G F

    2011-07-14

    Potato (Solanum tuberosum L.) is the world's most important non-grain food crop and is central to global food security. It is clonally propagated, highly heterozygous, autotetraploid, and suffers acute inbreeding depression. Here we use a homozygous doubled-monoploid potato clone to sequence and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade. We also sequenced a heterozygous diploid clone and show that gene presence/absence variants and other potentially deleterious mutations occur frequently and are a likely cause of inbreeding depression. Gene family expansion, tissue-specific expression and recruitment of genes to new pathways contributed to the evolution of tuber development. The potato genome sequence provides a platform for genetic improvement of this vital crop. PMID:21743474

  14. Transcriptome analysis of blueberry using 454 EST sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Blueberry (Vaccinium corymbosum) is a major berry crop in the United States, and one that has great nutritional and economical value. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities du...

  15. Digital fragment analysis of short tandem repeats by high-throughput amplicon sequencing.

    PubMed

    Darby, Brian J; Erickson, Shay F; Hervey, Samuel D; Ellis-Felege, Susan N

    2016-07-01

    High-throughput sequencing has been proposed as a method to genotype microsatellites and overcome the four main technical drawbacks of capillary electrophoresis: amplification artifacts, imprecise sizing, length homoplasy, and limited multiplex capability. The objective of this project was to test a high-throughput amplicon sequencing approach to fragment analysis of short tandem repeats and characterize its advantages and disadvantages against traditional capillary electrophoresis. We amplified and sequenced 12 muskrat microsatellite loci from 180 muskrat specimens and analyzed the sequencing data for precision of allele calling, propensity for amplification or sequencing artifacts, and for evidence of length homoplasy. Of the 294 total alleles, we detected by sequencing, only 164 alleles would have been detected by capillary electrophoresis as the remaining 130 alleles (44%) would have been hidden by length homoplasy. The ability to detect a greater number of unique alleles resulted in the ability to resolve greater population genetic structure. The primary advantages of fragment analysis by sequencing are the ability to precisely size fragments, resolve length homoplasy, multiplex many individuals and many loci into a single high-throughput run, and compare data across projects and across laboratories (present and future) with minimal technical calibration. A significant disadvantage of fragment analysis by sequencing is that the method is only practical and cost-effective when performed on batches of several hundred samples with multiple loci. Future work is needed to optimize throughput while minimizing costs and to update existing microsatellite allele calling and analysis programs to accommodate sequence-aware microsatellite data. PMID:27386092

  16. Novel technologies applied to the nucleotide sequencing and comparative sequence analysis of the genomes of infectious agents in veterinary medicine.

    PubMed

    Granberg, F; Bálint, Á; Belák, S

    2016-04-01

    Next-generation sequencing (NGS), also referred to as deep, high-throughput or massively parallel sequencing, is a powerful new tool that can be used for the complex diagnosis and intensive monitoring of infectious disease in veterinary medicine. NGS technologies are also being increasingly used to study the aetiology, genomics, evolution and epidemiology of infectious disease, as well as host-pathogen interactions and other aspects of infection biology. This review briefly summarises recent progress and achievements in this field by first introducing a range of novel techniques and then presenting examples of NGS applications in veterinary infection biology. Various work steps and processes for sampling and sample preparation, sequence analysis and comparative genomics, and improving the accuracy of genomic prediction are discussed, as are bioinformatics requirements. Examples of sequencing-based applications and comparative genomics in veterinary medicine are then provided. This review is based on novel references selected from the literature and on experiences of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine, Uppsala, Sweden. PMID:27217166

  17. Long terminal repeat of murine retroviral DNAs: sequence analysis, host-proviral junctions, and preintegration site.

    PubMed Central

    Van Beveren, C; Rands, E; Chattopadhyay, S K; Lowy, D R; Verma, I M

    1982-01-01

    The nucleotide sequence of the long terminal repeat (LTR) of three murine retroviral DNAs has been determined. The data indicate that the U5 region (sequences originating from the 5' end of the genome) of various LTRs is more conserved than the U3 region (sequences from the 3' end of the genome). The location and sequence of the control elements such as the 5' cap, "TATA-like" sequences, "CCAAT-box," and presumptive polyadenylic acid addition signal AATAAA in the various LTRs are nearly identical. Some murine retroviral DNAs contain a duplication of sequences within the LTR ranging in size from 58 to 100 base pairs. A variant of molecularly cloned Moloney murine sarcoma virus DNA in which one of the two LTRs integrated into the viral DNA was also analyzed. A 4-base-pair duplication was generated at the site of integration of LTR in the viral DNA. The host-viral junction of two molecularly cloned AKR-murine leukemia virus DNAs (clones 623 and 614) was determined. In the case of AKR-623 DNA, a 3- or 4-base-pair direct repeat of cellular sequences flanking the viral DNA was observed. However, AKR-614 DNA contained a 5-base-pair repeat of cellular sequences. The nucleotide sequence of the preintegration site of AKR-623 DNA revealed that the cellular sequences duplicated during integration are present only once. Finally, a striking homology between the sequences flanking the preintegration site and viral LTRs was observed. Images PMID:6281466

  18. Comparative analysis of antigen-targeting sequences used in DNA vaccines.

    PubMed

    Carvalho, Joana A; Azzoni, Adriano R; Prazeres, Duarte M F; Monteiro, Gabriel A

    2010-03-01

    Plasmid vectors can be optimized by including specific signals that promote antigen targeting to the major antigen presentation and processing pathways, increasing the immunogenicity and potency of DNA vaccines. A pVAX1-based backbone was used to encode the Green Fluorescence Protein (GFP) reporter gene fused either to ISG (Invariant Surface Glycoprotein) or to TSA (trans-sialidase) Trypanosoma brucei genes. The plasmids were further engineered to carry antigen-targeting sequences, which promote protein transport to the extracellular space (secretion signal), lysosomes (LAMP-1) and to the endoplasmic reticulum (adenovirus e1a). Transfection efficiency was not affected by differences in the size between each construct as no differences in the plasmid copy number per cell were found. This finding also suggests that the addition of both ISG gene and targeting sequences did not add sensitive regions prone to nuclease attack to the plasmid. Cells transfected with pVAX1GFP had a significant higher number of transcripts. This could be a result of lower mRNA stability and/or a lower transcription rate associated with the bigger transcripts. On the other hand, no differences were found between transcript levels of each ISG-GFP plasmids. Therefore, the addition of these targeting sequences does not affect the maturation/stability of the transcripts. Microscopy analysis showed differences in protein localization and fluorescent levels of cells transfected with pVAX1GFP and ISG constructs. Moreover, cells transfected with the lamp and secretory sequences presented a distinct distribution pattern when compared with ISG protein. Protein expression was quantified by flow cytometry. Higher cell fluorescence was observed in cells expressing the cytoplasmic fusion protein (ISG-GFP or TSA-GFP) compared with cells where the protein was transported to the lysosomal pathway. Protein transport to the endoplasmic reticulum does not lead to a decrease in the mean fluorescence values. The

  19. Genome Sequence and Analysis of the Oral Bacterium Fusobacterium nucleatum Strain ATCC 25586

    PubMed Central

    Kapatral, Vinayak; Anderson, Iain; Ivanova, Natalia; Reznik, Gary; Los, Tamara; Lykidis, Athanasios; Bhattacharyya, Anamitra; Bartman, Allen; Gardner, Warren; Grechkin, Galina; Zhu, Lihua; Vasieva, Olga; Chu, Lien; Kogan, Yakov; Chaga, Oleg; Goltsman, Eugene; Bernal, Axel; Larsen, Niels; D'Souza, Mark; Walunas, Theresa; Pusch, Gordon; Haselkorn, Robert; Fonstein, Michael; Kyrpides, Nikos; Overbeek, Ross

    2002-01-01

    We present a complete DNA sequence and metabolic analysis of the dominant oral bacterium Fusobacterium nucleatum. Although not considered a major dental pathogen on its own, this anaerobe facilitates the aggregation and establishment of several other species including the dental pathogens Porphyromonas gingivalis and Bacteroides forsythus. The F. nucleatum strain ATCC 25586 genome was assembled from shotgun sequences and analyzed using the ERGO bioinformatics suite (http://www.integratedgenomics.com). The genome contains 2.17 Mb encoding 2,067 open reading frames, organized on a single circular chromosome with 27% GC content. Despite its taxonomic position among the gram-negative bacteria, several features of its core metabolism are similar to that of gram-positive Clostridium spp., Enterococcus spp., and Lactococcus spp. The genome analysis has revealed several key aspects of the pathways of organic acid, amino acid, carbohydrate, and lipid metabolism. Nine very-high-molecular-weight outer membrane proteins are predicted from the sequence, none of which has been reported in the literature. More than 137 transporters for the uptake of a variety of substrates such as peptides, sugars, metal ions, and cofactors have been identified. Biosynthetic pathways exist for only three amino acids: glutamate, aspartate, and asparagine. The remaining amino acids are imported as such or as di- or oligopeptides that are subsequently degraded in the cytoplasm. A principal source of energy appears to be the fermentation of glutamate to butyrate. Additionally, desulfuration of cysteine and methionine yields ammonia, H2S, methyl mercaptan, and butyrate, which are capable of arresting fibroblast growth, thus preventing wound healing and aiding penetration of the gingival epithelium. The metabolic capabilities of F. nucleatum revealed by its genome are therefore consistent with its specialized niche in the mouth. PMID:11889109

  20. Genome sequence and analysis of the oral bacterium Fusobacterium nucleatum strain ATCC 25586.

    PubMed

    Kapatral, Vinayak; Anderson, Iain; Ivanova, Natalia; Reznik, Gary; Los, Tamara; Lykidis, Athanasios; Bhattacharyya, Anamitra; Bartman, Allen; Gardner, Warren; Grechkin, Galina; Zhu, Lihua; Vasieva, Olga; Chu, Lien; Kogan, Yakov; Chaga, Oleg; Goltsman, Eugene; Bernal, Axel; Larsen, Niels; D'Souza, Mark; Walunas, Theresa; Pusch, Gordon; Haselkorn, Robert; Fonstein, Michael; Kyrpides, Nikos; Overbeek, Ross

    2002-04-01

    We present a complete DNA sequence and metabolic analysis of the dominant oral bacterium Fusobacterium nucleatum. Although not considered a major dental pathogen on its own, this anaerobe facilitates the aggregation and establishment of several other species including the dental pathogens Porphyromonas gingivalis and Bacteroides forsythus. The F. nucleatum strain ATCC 25586 genome was assembled from shotgun sequences and analyzed using the ERGO bioinformatics suite (http://www.integratedgenomics.com). The genome contains 2.17 Mb encoding 2,067 open reading frames, organized on a single circular chromosome with 27% GC content. Despite its taxonomic position among the gram-negative bacteria, several features of its core metabolism are similar to that of gram-positive Clostridium spp., Enterococcus spp., and Lactococcus spp. The genome analysis has revealed several key aspects of the pathways of organic acid, amino acid, carbohydrate, and lipid metabolism. Nine very-high-molecular-weight outer membrane proteins are predicted from the sequence, none of which has been reported in the literature. More than 137 transporters for the uptake of a variety of substrates such as peptides, sugars, metal ions, and cofactors have been identified. Biosynthetic pathways exist for only three amino acids: glutamate, aspartate, and asparagine. The remaining amino acids are imported as such or as di- or oligopeptides that are subsequently degraded in the cytoplasm. A principal source of energy appears to be the fermentation of glutamate to butyrate. Additionally, desulfuration of cysteine and methionine yields ammonia, H(2)S, methyl mercaptan, and butyrate, which are capable of arresting fibroblast growth, thus preventing wound healing and aiding penetration of the gingival epithelium. The metabolic capabilities of F. nucleatum revealed by its genome are therefore consistent with its specialized niche in the mouth. PMID:11889109

  1. Hypergeometric analysis of tiling-array and sequence data: detection and interpretation of peaks.

    PubMed

    Taskesen, Erdogan; Hoogeboezem, Remco; Delwel, Ruud; Reinders, Marcel Jt

    2013-01-01

    Probing protein-deoxyribonucleic acid (DNA) is gaining popularity as it sheds light on molecular mechanisms that regulate the expression of genes. Currently, tiling-arrays and next-generation sequencing technology can be used to measure these interactions. Both methods generate a signal over the genome in which contiguous regions of peaks on the genome represent the presence of an interacting molecule. Many methods do exist to identify functional regions of interest (ROIs) on the genome. However the detection of ROIs are often not an end-point in research questions and it therefore requires data dragging between tools to relate the ROIs to information present in databases, such as gene-ontology, pathway information, or enrichment of certain genomic content. We introduce hypergeometric analysis of tiling-array and sequence data (HATSEQ), a powerful tool that accurately identifies functional ROIs on the genome where a genomic signal significantly deviates from the general genome-wide behavior. HATSEQ also includes a number of built-in post-analyses with which biological meaning can be attached to the detected ROIs in terms of gene pathways and de-novo motif analysis, and provides different visualizations and statistical summaries for the detected ROIs. In addition, HATSEQ has an intuitive graphic user interface that lowers the barrier for researchers to analyze their data without the need of scripting languages. We compared the results of HATSEQ against two other popular chromatin immunoprecipitation sequencing (ChIP-Seq) methods and observed overlap in the detected ROIs but HATSEQ is more specific in delineating the peak boundaries. We also discuss the versatility of HATSEQ by using a Signal Transducer and Activator of Transcription 1 (STAT1) ChIP-Seq data-set, and show that the detected ROIs are highly specific for the expected STAT1 binding motif. HATSEQ is freely available at: http://hema13.erasmusmc.nl/index.php/HATSEQ. PMID:24187504

  2. Hypergeometric analysis of tiling-array and sequence data: detection and interpretation of peaks

    PubMed Central

    Taskesen, Erdogan; Hoogeboezem, Remco; Delwel, Ruud; Reinders, Marcel JT

    2013-01-01

    Probing protein-deoxyribonucleic acid (DNA) is gaining popularity as it sheds light on molecular mechanisms that regulate the expression of genes. Currently, tiling-arrays and next-generation sequencing technology can be used to measure these interactions. Both methods generate a signal over the genome in which contiguous regions of peaks on the genome represent the presence of an interacting molecule. Many methods do exist to identify functional regions of interest (ROIs) on the genome. However the detection of ROIs are often not an end-point in research questions and it therefore requires data dragging between tools to relate the ROIs to information present in databases, such as gene-ontology, pathway information, or enrichment of certain genomic content. We introduce hypergeometric analysis of tiling-array and sequence data (HATSEQ), a powerful tool that accurately identifies functional ROIs on the genome where a genomic signal significantly deviates from the general genome-wide behavior. HATSEQ also includes a number of built-in post-analyses with which biological meaning can be attached to the detected ROIs in terms of gene pathways and de-novo motif analysis, and provides different visualizations and statistical summaries for the detected ROIs. In addition, HATSEQ has an intuitive graphic user interface that lowers the barrier for researchers to analyze their data without the need of scripting languages. We compared the results of HATSEQ against two other popular chromatin immunoprecipitation sequencing (ChIP-Seq) methods and observed overlap in the detected ROIs but HATSEQ is more specific in delineating the peak boundaries. We also discuss the versatility of HATSEQ by using a Signal Transducer and Activator of Transcription 1 (STAT1) ChIP-Seq data-set, and show that the detected ROIs are highly specific for the expected STAT1 binding motif. HATSEQ is freely available at: http://hema13.erasmusmc.nl/index.php/HATSEQ. PMID:24187504

  3. The DNA Sequence And Comparative Analysis Of Human Chromosome5

    SciTech Connect

    Schmutz, Jeremy; Martin, Joel; Terry, Astrid; Couronne, Olivier; Grimwood, Jane; Lowry, Steve; Gordon, Laurie A.; Scott, Duncan; Xie,Gary; Huang, Wayne; Hellsten, Uffe; Tran-Gyamfi, Mary; She, Xinwei; Prabhakar, Shyam; Aerts, Andrea; Altherr, Michael; Bajorek, Eva; Black,Stacey; Branscomb, Elbert; Caoile, Chenier; Challacombe, Jean F.; Chan,Yee Man; Denys, Mirian; Detter, John C.; Escobar, Julio; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Israni, Sanjay; Jett, Jamie; Kadner,Kristen; Kimball, Heather; Kobayashi, Arthur; Lopez, Frederick; Lou,Yunian; Martinez, Diego; Medina, Catherine; Morgan, Jenna; Nandkeshwar,Richard; Noonan, James P.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Priest, James; Ramirez, Lucia; Retterer, James; Rodriguez, Alex; Rogers,Stephanie; Salamov, Asaf; Salazar, Angelica; Thayer, Nina; Tice, Hope; Tsai, Ming; Ustaszewska, Anna; Vo, Nu; Wheeler, Jeremy; Wu, Kevin; Yang,Joan; Dickson, Mark; Cheng, Jan-Fang; Eichler, Evan E.; Olsen, Anne; Pennacchio, Len A.; Rokhsar, Daniel S.; Richardson, Paul; Lucas, SusanM.; Myers, Richard M.; Rubin, Edward M.

    2004-08-01

    Chromosome 5 is one of the largest human chromosomes and contains numerous intrachromosomal duplications, yet it has one of the lowest gene densities. This is partially explained by numerous gene-poor regions that display a remarkable degree of noncoding conservation with non-mammalian vertebrates, suggesting that they are functionally constrained. In total, we compiled 177.7 million base pairs of highly accurate finished sequence containing 923 manually curated protein-coding genes including the protocadherin and interleukin gene families. We also completely sequenced versions of the large chromosome-5-specific internal duplications. These duplications are very recent evolutionary events and probably have a mechanistic role in human physiological variation, as deletions in these regions are the cause of debilitating disorders including spinal muscular atrophy.

  4. The sequence and analysis of duplication rich human chromosome 16

    SciTech Connect

    Martin, J; Han, C; Gordon, L A; Terry, A; Prabhakar, S; She, X; Xie, G; Hellsten, U; Chan, Y M; Altherr, M; Couronne, O; Aerts, A; Bajorek, E; Black, S; Blumer, H; Branscomb, E; Brown, N; Bruno, W J; Buckingham, J; Callen, D F; Campbell, C S; Campbell, M L; Campbell, E W; Caoile, C; Challacombe, J F; Chasteen, L A; Chertkov, O; Chi, H C; Christensen, M; Clark, L M; Cohn, J D; Denys, M; Detter, J C; Dickson, M; Dimitrijevic-Bussod, M; Escobar, J; Fawcett, J J; Flowers, D; Fotopulos, D; Glavina, T; Gomez, M; Gonzales, E; Goodstein, D; Goodwin, L A; Grady, D L; Grigoriev, I; Groza, M; Hammon, N; Hawkins, T; Haydu, L; Hildebrand, C E; Huang, W; Israni, S; Jett, J; Jewett, P B; Kadner, K; Kimball, H; Kobayashi, A; Krawczyk, M; Leyba, T; Longmire, J L; Lopez, F; Lou, Y; Lowry, S; Ludeman, T; Manohar, C F; Mark, G A; McMurray, K L; Meincke, L J; Morgan, J; Moyzis, R K; Mundt, M O; Munk, A C; Nandkeshwar, R D; Pitluck, S; Pollard, M; Predki, P; Parson-Quintana, B; Ramirez, L; Rash, S; Retterer, J; Ricke, D O; Robinson, D; Rodriguez, A; Salamov, A; Saunders, E H; Scott, D; Shough, T; Stallings, R L; Stalvey, M; Sutherland, R D; Tapia, R; Tesmer, J G; Thayer, N; Thompson, L S; Tice, H; Torney, D C; Tran-Gyamfi, M; Tsai, M; Ulanovsky, L E; Ustaszewska, A; Vo, N; White, P S; Williams, A L; Wills, P L; Wu, J; Wu, K; Yang, J; DeJong, P; Bruce, D; Doggett, N A; Deaven, L; Schmutz, J; Grimwood, J; Richardson, P; Rokhsar, D S; Eichler, E E; Gilna, P; Lucas, S M; Myers, R M; Rubin, E M; Pennacchio, L A

    2005-04-06

    Human chromosome 16 features one of the highest levels of segmentally duplicated sequence among the human autosomes. We report here the 78,884,754 base pairs of finished chromosome 16 sequence, representing over 99.9% of its euchromatin. Manual annotation revealed 880 protein-coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes, and 3 RNA pseudogenes. These genes include metallothionein, cadherin, and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobase pairs were identified and result in gene content differences among humans. While the segmental duplications of chromosome 16 are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events likely to have had an impact on the evolution of primates and human disease susceptibility.

  5. Cloning and sequence analysis of myostatin promoter in sheep.

    PubMed

    Du, Rong; Chen, Yong-Fu; An, Xiao-Rong; Yang, Xing-Yuan; Ma, Yi; Zhang, Lei; Yuan, Xiao-Li; Chen, Li-Mei; Qin, Jian

    2005-12-01

    To better understand the structure and function of the myostatin's gene promoter region in sheep, we cloned and sequenced a 1.517 kb fragment containing the 5'-regulatory region of the sheep myostatin gene (GenBank accession number is AY918121). The promoter sequence consists of three TATA boxes, one CAAT box, and eight putative E-boxes. Some putative muscle growth response elements for Octamer-binding factor 1(Octamer), Activator protein 1(AP1), Growth factor independence 1 zinc finger protein (Gfi-1B), Myocyte enhancer factor 2 (MEF2), Muscle-specific Mt binding site (MTBF), Glucocorticoid response elements (GRE) and Progesterone receptor binding site (PRE) were detected. Some of the motifs are conserved as compared to with that in the goat, bovine and porcine myostatin promoters. However, some differences were also found. PMID:16287620

  6. Analysis of the 2012 Oct 27 Haida Gwaii Aftershock Sequence

    NASA Astrophysics Data System (ADS)

    Mulder, T.; Brillon, C.; Bentkowski, W.; White, M.; Rosenberger, A.; Rogers, G. C.; Vernon, F.; Kao, H.

    2013-12-01

    The magnitude 7.7 thrust earthquake that occurred on 2012 Oct 28 offshore of Haida Gwaii (formerly the Queen Charlotte Islands), in British Columbia, Canada, produced a rich and on-going aftershock sequence. Ten months of aftershock events are determined from analyst reviewed solutions and automatic detectors and locators. For automated solutions, rotating the waveforms and running P and S wave filters (Rosenberger, 2010) over them produced phase arrivals for an improved catalogue of aftershocks compared to using a traditional signal to noise ratio detector on standard vertical and horizontal component seismograms. The automated aftershock locations from the rotated waveforms are compared to the automated locations from the standard vertical and horizontal waveforms and to analyst locations (which are generally M>2.5). The best of the automated solutions are comparable in quality to analyst solutions and much more numerous making this a viable method of processing extensive aftershock sequences. They outline a region approximately 50 km wide and 100 km long, with the aftershocks in two parallel bands. Most of the aftershocks are not on the rupture surface but are in the overlying or underlying plates. It is thought that this earthquake represents the Pacific plate thrusting underneath the North America plate with the rupture surface lying beneath the sedimentary Queen Charlotte terrace and terminating to the east in the vicinity of the Queen Charlotte fault. Due to the one-sided station distribution on land, depth trades off with distance offshore, resulting in poor depth determinations. However, using ocean bottom seismometers deployed early in the aftershock sequence, depth resolution was significantly improved. First motion focal North America plate with the rupture surface lying beneath the sedimentary Queen Charlotte terrace and terminating to the east in the vicinity of the Queen Charlotte fault.mechanisms for a portion of the aftershock sequence are compared

  7. Analysis of Whole Transcriptome Sequencing Data: Workflow and Software

    PubMed Central

    Yang, In Seok

    2015-01-01

    RNA is a polymeric molecule implicated in various biological processes, such as the coding, decoding, regulation, and expression of genes. Numerous studies have examined RNA features using whole transcriptome sequencing (RNA-seq) approaches. RNA-seq is a powerful technique for characterizing and quantifying the transcriptome and accelerates the development of bioinformatics software. In this review, we introduce routine RNA-seq workflow together with related software, focusing particularly on transcriptome reconstruction and expression quantification. PMID:26865842

  8. Analysis of Whole Transcriptome Sequencing Data: Workflow and Software.

    PubMed

    Yang, In Seok; Kim, Sangwoo

    2015-12-01

    RNA is a polymeric molecule implicated in various biological processes, such as the coding, decoding, regulation, and expression of genes. Numerous studies have examined RNA features using whole transcriptome sequencing (RNA-seq) approaches. RNA-seq is a powerful technique for characterizing and quantifying the transcriptome and accelerates the development of bioinformatics software. In this review, we introduce routine RNA-seq workflow together with related software, focusing particularly on transcriptome reconstruction and expression quantification. PMID:26865842

  9. Structure prediction and analysis of neuraminidase sequence variants.

    PubMed

    Thayer, Kelly M

    2016-07-01

    Analyzing protein structure has become an integral aspect of understanding systems of biochemical import. The laboratory experiment endeavors to introduce protein folding to ascertain structures of proteins for which the structure is unavailable, as well as to critically evaluate the quality of the prediction obtained. The model system used is the highly mutable influenza virus protein neuraminidase, which is the key target in the development of therapeutics. In light of recent pandemics, understanding how mutations confer drug resistance, which translates at the molecular level to understanding how different sequence variants differ, constitutes an area of great interest because of the ramifications in public health. This lab targets upper level undergraduate biochemistry students, and aims to introduce tools to be used to explore protein folding and protein visualization in the context of the neuraminidase case study. Students proceed to critically evaluate the folded models by comparison with crystallographic structures. When validity is established, they fold a neuraminidase sequence for which a structure is not available. Through structural alignment and visual inspection of the 150 loop, students gain molecular insight into two possible conformations of the protein, which are actively being studied. Folding the third chosen sequence mimics a true research environment in allowing students to generate a structure from a sequence for which a structure was not previously available, and to assess whether their particular variant has an open or closed loop. From this vantage, they are then challenged to speculate about the connection between loop conformation and drug susceptibility. © 2016 by The International Union of Biochemistry and Molecular Biology, 44(4):361-376, 2016. PMID:26900942

  10. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments.

    PubMed

    Schwarz, Roland F; Tamuri, Asif U; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M; Schultz, Jörg; Goldman, Nick

    2016-05-01

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles). PMID:26819408

  11. Analysis of xylem formation in pine by cDNA sequencing

    NASA Technical Reports Server (NTRS)

    Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.; Whetten, R. W.; Davies, E. (Principal Investigator)

    1998-01-01

    Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.

  12. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments

    PubMed Central

    Schwarz, Roland F.; Tamuri, Asif U.; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M.; Schultz, Jörg; Goldman, Nick

    2016-01-01

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles). PMID:26819408

  13. Comprehensive analysis of expressed sequence tags from cultivated and wild radish (Raphanus spp.)

    PubMed Central

    2013-01-01

    Background Radish (Raphanus sativus L., 2n = 2× = 18) is an economically important vegetable crop worldwide. A large collection of radish expressed sequence tags (ESTs) has been generated but remains largely uncharacterized. Results In this study, approximately 315,000 ESTs derived from 22 Raphanus cDNA libraries from 18 different genotypes were analyzed, for the purpose of gene and marker discovery and to evaluate large-scale genome duplication and phylogenetic relationships among Raphanus spp. The ESTs were assembled into 85,083 unigenes, of which 90%, 65%, 89% and 89% had homologous sequences in the GenBank nr, SwissProt, TrEMBL and Arabidopsis protein databases, respectively. A total of 66,194 (78%) could be assigned at least one gene ontology (GO) term. Comparative analysis identified 5,595 gene families unique to radish that were significantly enriched with genes related to small molecule metabolism, as well as 12,899 specific to the Brassicaceae that were enriched with genes related to seed oil body biogenesis and responses to phytohormones. The analysis further indicated that the divergence of radish and Brassica rapa occurred approximately 8.9-14.9 million years ago (MYA), following a whole-genome duplication event (12.8-21.4 MYA) in their common ancestor. An additional whole-genome duplication event in radish occurred at 5.1-8.4 MYA, after its divergence from B. rapa. A total of 13,570 simple sequence repeats (SSRs) and 28,758 high-quality single nucleotide polymorphisms (SNPs) were also identified. Using a subset of SNPs, the phylogenetic relationships of eight different accessions of Raphanus was inferred. Conclusion Comprehensive analysis of radish ESTs provided new insights into radish genome evolution and the phylogenetic relationships of different radish accessions. Moreover, the radish EST sequences and the associated SSR and SNP markers described in this study represent a valuable resource for radish functional genomics studies and

  14. Analysis of a marine picoplankton community by 16S rRNA gene cloning and sequencing.

    PubMed Central

    Schmidt, T M; DeLong, E F; Pace, N R

    1991-01-01

    The phylogenetic diversity of an oligotrophic marine picoplankton community was examined by analyzing the sequences of cloned ribosomal genes. This strategy does not rely on cultivation of the resident microorganisms. Bulk genomic DNA was isolated from picoplankton collected in the north central Pacific Ocean by tangential flow filtration. The mixed-population DNA was fragmented, size fractionated, and cloned into bacteriophage lambda. Thirty-eight clones containing 16S rRNA genes were identified in a screen of 3.2 x 10(4) recombinant phage, and portions of the rRNA gene were amplified by polymerase chain reaction and sequenced. The resulting sequences were used to establish the identities of the picoplankton by comparison with an established data base of rRNA sequences. Fifteen unique eubacterial sequences were obtained, including four from cyanobacteria and eleven from proteobacteria. A single eucaryote related to dinoflagellates was identified; no archaebacterial sequences were detected. The cyanobacterial sequences are all closely related to sequences from cultivated marine Synechococcus strains and with cyanobacterial sequences obtained from the Atlantic Ocean (Sargasso Sea). Several sequences were related to common marine isolates of the gamma subdivision of proteobacteria. In addition to sequences closely related to those of described bacteria, sequences were obtained from two phylogenetic groups of organisms that are not closely related to any known rRNA sequences from cultivated organisms. Both of these novel phylogenetic clusters are proteobacteria, one group within the alpha subdivision and the other distinct from known proteobacterial subdivisions. The rRNA sequences of the alpha-related group are nearly identical to those of some Sargasso Sea picoplankton, suggesting a global distribution of these organisms. Images PMID:2066334

  15. Validation analysis of probabilistic models of dietary exposure to food additives.

    PubMed

    Gilsenan, M B; Thompson, R L; Lambe, J; Gibney, M J

    2003-10-01

    The validity of a range of simple conceptual models designed specifically for the estimation of food additive intakes using probabilistic analysis was assessed. Modelled intake estimates that fell below traditional conservative point estimates of intake and above 'true' additive intakes (calculated from a reference database at brand level) were considered to be in a valid region. Models were developed for 10 food additives by combining food intake data, the probability of an additive being present in a food group and additive concentration data. Food intake and additive concentration data were entered as raw data or as a lognormal distribution, and the probability of an additive being present was entered based on the per cent brands or the per cent eating occasions within a food group that contained an additive. Since the three model components assumed two possible modes of input, the validity of eight (2(3)) model combinations was assessed. All model inputs were derived from the reference database. An iterative approach was employed in which the validity of individual model components was assessed first, followed by validation of full conceptual models. While the distribution of intake estimates from models fell below conservative intakes, which assume that the additive is present at maximum permitted levels (MPLs) in all foods in which it is permitted, intake estimates were not consistently above 'true' intakes. These analyses indicate the need for more complex models for the estimation of food additive intakes using probabilistic analysis. Such models should incorporate information on market share and/or brand loyalty. PMID:14555358

  16. The use of additive and subtractive approaches to examine the nuclear localization sequence of the polyomavirus major capsid protein VP1

    NASA Technical Reports Server (NTRS)

    Chang, D.; Haynes, J. I. 2nd; Brady, J. N.; Consigli, R. A.; Spooner, B. S. (Principal Investigator)

    1992-01-01

    A nuclear localization signal (NLS) has been identified in the N-terminal (Ala1-Pro-Lys-Arg-Lys-Ser-Gly-Val-Ser-Lys-Cys11) amino acid sequence of the polyomavirus major capsid protein VP1. The importance of this amino acid sequence for nuclear transport of VP1 protein was demonstrated by a genetic "subtractive" study using the constructs pSG5VP1 (full-length VP1) and pSG5 delta 5'VP1 (truncated VP1, lacking amino acids Ala1-Cys11). These constructs were used to transfect COS-7 cells, and expression and intracellular localization of the VP1 protein was visualized by indirect immunofluorescence. These studies revealed that the full-length VP1 was expressed and localized in the nucleus, while the truncated VP1 protein was localized in the cytoplasm and not transported to the nucleus. These findings were substantiated by an "additive" approach using FITC-labeled conjugates of synthetic peptides homologous to the NLS of VP1 cross-linked to bovine serum albumin or immunoglobulin G. Both conjugates localized in the nucleus after microinjection into the cytoplasm of 3T6 cells. The importance of individual amino acids found in the basic sequence (Lys3-Arg-Lys5) of the NLS was also investigated. This was accomplished by synthesizing three additional peptides in which lysine-3 was substituted with threonine, arginine-4 was substituted with threonine, or lysine-5 was substituted with threonine. It was found that lysine-3 was crucial for nuclear transport, since substitution of this amino acid with threonine prevented nuclear localization of the microinjected, FITC-labeled conjugate.

  17. The Complete Genome Sequence and Analysis of the Epsilonproteobacterium Arcobacter butzleri

    PubMed Central

    Miller, William G.; Parker, Craig T.; Rubenfield, Marc; Mendz, George L.; Wösten, Marc M. S. M.; Ussery, David W.; Stolz, John F.; Binnewies, Tim T.; Hallin, Peter F.; Wang, Guilin; Malek, Joel A.; Rogosin, Andrea; Stanker, Larry H.; Mandrell, Robert E.

    2007-01-01

    Background Arcobacter butzleri is a member of the epsilon subdivision of the Proteobacteria and a close taxonomic relative of established pathogens, such as Campylobacter jejuni and Helicobacter pylori. Here we present the complete genome sequence of the human clinical isolate, A. butzleri strain RM4018. Methodology/Principal Findings Arcobacter butzleri is a member of the Campylobacteraceae, but the majority of its proteome is most similar to those of Sulfuromonas denitrificans and Wolinella succinogenes, both members of the Helicobacteraceae, and those of the deep-sea vent Epsilonproteobacteria Sulfurovum and Nitratiruptor. In addition, many of the genes and pathways described here, e.g. those involved in signal transduction and sulfur metabolism, have been identified previously within the epsilon subdivision only in S. denitrificans, W. succinogenes, Sulfurovum, and/or Nitratiruptor, or are unique to the subdivision. In addition, the analyses indicated also that a substantial proportion of the A. butzleri genome is devoted to growth and survival under diverse environmental conditions, with a large number of respiration-associated proteins, signal transduction and chemotaxis proteins and proteins involved in DNA repair and adaptation. To investigate the genomic diversity of A. butzleri strains, we constructed an A. butzleri DNA microarray comprising 2238 genes from strain RM4018. Comparative genomic indexing analysis of 12 additional A. butzleri strains identified both the core genes of A. butzleri and intraspecies hypervariable regions, where <70% of the genes were present in at least two strains. Conclusion/Significance The presence of pathways and loci associated often with non-host-associated organisms, as well as genes associated with virulence, suggests that A. butzleri is a free-living, water-borne organism that might be classified rightfully as an emerging pathogen. The genome sequence and analyses presented in this study are an important first step in

  18. A Proposed Taxonomy of Anaerobic Fungi (Class Neocallimastigomycetes) Suitable for Large-Scale Sequence-Based Community Structure Analysis

    PubMed Central

    Kittelmann, Sandra; Naylor, Graham E.; Koolaard, John P.; Janssen, Peter H.

    2012-01-01

    Anaerobic fungi are key players in the breakdown of fibrous plant material in the rumen, but not much is known about the composition and stability of fungal communities in ruminants. We analyzed anaerobic fungi in 53 rumen samples from farmed sheep (4 different flocks), cattle, and deer feeding on a variety of diets. Denaturing gradient gel electrophoresis fingerprinting of the internal transcribed spacer 1 (ITS1) region of the rrn operon revealed a high diversity of anaerobic fungal phylotypes across all samples. Clone libraries of the ITS1 region were constructed from DNA from 11 rumen samples that had distinctly different fungal communities. A total of 417 new sequences were generated to expand the number and diversity of ITS1 sequences available. Major phylogenetic groups of anaerobic fungi in New Zealand ruminants belonged to the genera Piromyces, Neocallimastix, Caecomyces and Orpinomyces. In addition, sequences forming four novel clades were obtained, which may represent so far undetected genera or species of anaerobic fungi. We propose a revised phylogeny and pragmatic taxonomy for anaerobic fungi, which was tested and proved suitable for analysis of datasets stemming from high-throughput next-generation sequencing methods. Comparing our revised taxonomy to the taxonomic assignment of sequences deposited in the GenBank database, we believe that >29% of ITS1 sequences derived from anaerobic fungal isolates or clones are misnamed at the genus level. PMID:22615827

  19. Analysis of Binary Series to Evaluate Astronomical Forcing of a Middle Permian Chert Sequence in South China

    NASA Astrophysics Data System (ADS)

    Hinnov, L. A.; Yao, X.; Zhou, Y.

    2014-12-01

    We describe a Middle Permian radiolarian chert sequence in South China (Chaohu area), with sequence of chert and mudstone layers formulated into binary series.Two interpolation approaches were tested: linear interpolation resulting in a "triangle" series, and staircase interpolation resulting in a "boxcar" series. Spectral analysis of the triangle series reveals decimeter chert-mudstone cycles which represent theoretical Middle Permian 32 kyr obliquity cycling. Tuning these cycles to a 32-kyr periodicity reveals that other cm-scale cycles are in the precession index band and have a strong ~400 kyr amplitude modulation. Additional tuning tests further support a hypothesis of astronomical forcing of the chert sequence. Analysis of the boxcar series reveals additional "eccentricity" terms transmitted by the boxcar representation of the modulating precession-scale cycles. An astronomical time scale reconstructed from these results assumes a Roadian/Wordian boundary age of 268.8 Ma for the onset of the first chert layer at the base of the sequence and ends at 264.1 Ma, for a total duration of 4.7 Myrs. We propose that monsoon-controlled upwelling contributed to the development of the chert-mudstone cycles. A seasonal monsoon controlled by astronomical forcing influenced the intensity of upwelling, modulating radiolarian productivity and silica deposition.

  20. Comparison of the rotavirus nonstructural protein NSP1 (NS53) from different species by sequence analysis and northern blot hybridization.

    PubMed

    Dunn, S J; Cross, T L; Greenberg, H B

    1994-08-15

    The nucleotide sequence of gene 5 encoding the rotavirus nonstructural protein NSP1 (NS53) of 6 strains (EW, EHP, RRV, I321, OSU, and Gottfried) was determined and compared to 6 previously reported strains (SA11, UK, RF, Hu803, DS-1, and Wa). The 12 rotavirus strains were derived from a total of five separate species (murine, bovine, simian, porcine, and human). Gene sizes ranged from 1564 to 1611 nucleotides in length and the deduced protein sequences were found to be 486 to 495 amino acids in length. Comparisons of NSP1 amino acid sequences showed identities ranging from 36 to 92%. This diversity was most evident between strains from different species. Phylogenetic analysis revealed a clustering of NSP1 sequences according to species origin with the exception that the human and porcine strains were included in a single grouping. Northern blot hybridizations using additional rotavirus strains from the five species confirmed the grouping found by sequence analysis. The species specificity of NSP1 is consistent with the hypothesis that NSP1 plays a role in host range restriction. PMID:8030275

  1. Identification of antigen-specific B cell receptor sequences using public repertoire analysis

    PubMed Central

    Galson, Jacob D.; Rance, Richard; Parkhill, Julian; Lunter, Gerton; Pollard, Andrew J.; Kelly, Dominic F.

    2014-01-01

    High-throughput sequencing allows detailed study of the B cell receptor (BCR) repertoire post-immunization but it remains unclear to what extent the de novo identification of antigen-specific sequences from the total BCR repertoire is possible. A Hib-MenC-TT conjugate vaccine containing H. influenzae type b (Hib) and group C meningococcal (MenC) polysaccharides as well as tetanus toxoid (TT) was used to investigate the BCR repertoire of adult humans following immunization and test the hypothesis that public or convergent repertoire analysis could identify antigen specific sequences. A number of antigen-specific BCR sequences have previously been reported for Hib and TT which made a vaccine containing these 2 antigens an ideal immunological stimulus. Analysis of identical complementarity determining region (CDR)3 amino acid (AA) sequences that were shared by individuals in the post-vaccine repertoire identified a number of known Hib-specific sequences but only one previously described TT sequence. The extension of this analysis to non-identical but highly similar CDR3 AA sequences revealed a number of other TT-related sequences. The anti-Hib avidity index post-vaccination was strongly correlated with the relative frequency of Hib-specific sequences, indicating that the post-vaccination public BCR repertoire may be related to more conventional measures of immunogenicity correlating with disease protection. Analysis of public BCR repertoire provided evidence of convergent BCR evolution in individuals exposed to the same antigens. If this finding is confirmed, the public repertoire could be used for rapid and direct identification of protective antigen-specific BCR sequences from peripheral blood. PMID:25392534

  2. Sequence and structural analysis of BTB domain proteins

    PubMed Central

    Stogios, Peter J; Downs, Gregory S; Jauhal, Jimmy JS; Nandra, Sukhjeen K; Privé, Gilbert G

    2005-01-01

    Background The BTB domain (also known as the POZ domain) is a versatile protein-protein interaction motif that participates in a wide range of cellular functions, including transcriptional regulation, cytoskeleton dynamics, ion channel assembly and gating, and targeting proteins for ubiquitination. Several BTB domain structures have been experimentally determined, revealing a highly conserved core structure. Results We surveyed the protein architecture, genomic distribution and sequence conservation of BTB domain proteins in 17 fully sequenced eukaryotes. The BTB domain is typically found as a single copy in proteins that contain only one or two other types of domain, and this defines the BTB-zinc finger (BTB-ZF), BTB-BACK-kelch (BBK), voltage-gated potassium channel T1 (T1-Kv), MATH-BTB, BTB-NPH3 and BTB-BACK-PHR (BBP) families of proteins, among others. In contrast, the Skp1 and ElonginC proteins consist almost exclusively of the core BTB fold. There are numerous lineage-specific expansions of BTB proteins, as seen by the relatively large number of BTB-ZF and BBK proteins in vertebrates, MATH-BTB proteins in Caenorhabditis elegans, and BTB-NPH3 proteins in Arabidopsis thaliana. Using the structural homology between Skp1 and the PLZF BTB homodimer, we present a model of a BTB-Cul3 SCF-like E3 ubiquitin ligase complex that shows that the BTB dimer or the T1 tetramer is compatible in this complex. Conclusion Despite widely divergent sequences, the BTB fold is structurally well conserved. The fold has adapted to several different modes of self-association and interactions with non-BTB proteins. PMID:16207353

  3. Clinical integration of next generation sequencing: a policy analysis.

    PubMed

    Kaufman, David; Curnutte, Margaret; McGuire, Amy L

    2014-01-01

    Clinical next generation sequencing (NGS) technologies are challenging existing regulatory paradigms. We advocate a coordinate policy approach, which first requires a comprehensive understanding of the existing regulatory and legal structures. This paper introduces four key policy domains - including quality assurance, insurance coverage, intellectual property management, and data sharing - that must be addressed to ensure high quality clinical NGS. In bringing these policy issues into conversation through this special issue for the Journal of Law, Medicine & Ethics, we hope to lay the foundation for further discussion by a range of stakeholder groups with diverse and strong interests in the governance of NGS. PMID:25298287

  4. Multivariate qualitative analysis of banned additives in food safety using surface enhanced Raman scattering spectroscopy

    NASA Astrophysics Data System (ADS)

    He, Shixuan; Xie, Wanyi; Zhang, Wei; Zhang, Liqun; Wang, Yunxia; Liu, Xiaoling; Liu, Yulong; Du, Chunlei

    2015-02-01

    A novel strategy which combines iteratively cubic spline fitting baseline correction method with discriminant partial least squares qualitative analysis is employed to analyze the surface enhanced Raman scattering (SERS) spectroscopy of banned food additives, such as Sudan I dye and Rhodamine B in food, Malachite green residues in aquaculture fish. Multivariate qualitative analysis methods, using the combination of spectra preprocessing iteratively cubic spline fitting (ICSF) baseline correction with principal component analysis (PCA) and discriminant partial least squares (DPLS) classification respectively, are applied to investigate the effectiveness of SERS spectroscopy for predicting the class assignments of unknown banned food additives. PCA cannot be used to predict the class assignments of unknown samples. However, the DPLS classification can discriminate the class assignment of unknown banned additives using the information of differences in relative intensities. The results demonstrate that SERS spectroscopy combined with ICSF baseline correction method and exploratory analysis methodology DPLS classification can be potentially used for distinguishing the banned food additives in field of food safety.

  5. Multivariate qualitative analysis of banned additives in food safety using surface enhanced Raman scattering spectroscopy.

    PubMed

    He, Shixuan; Xie, Wanyi; Zhang, Wei; Zhang, Liqun; Wang, Yunxia; Liu, Xiaoling; Liu, Yulong; Du, Chunlei

    2015-02-25

    A novel strategy which combines iteratively cubic spline fitting baseline correction method with discriminant partial least squares qualitative analysis is employed to analyze the surface enhanced Raman scattering (SERS) spectroscopy of banned food additives, such as Sudan I dye and Rhodamine B in food, Malachite green residues in aquaculture fish. Multivariate qualitative analysis methods, using the combination of spectra preprocessing iteratively cubic spline fitting (ICSF) baseline correction with principal component analysis (PCA) and discriminant partial least squares (DPLS) classification respectively, are applied to investigate the effectiveness of SERS spectroscopy for predicting the class assignments of unknown banned food additives. PCA cannot be used to predict the class assignments of unknown samples. However, the DPLS classification can discriminate the class assignment of unknown banned additives using the information of differences in relative intensities. The results demonstrate that SERS spectroscopy combined with ICSF baseline correction method and exploratory analysis methodology DPLS classification can be potentially used for distinguishing the banned food additives in field of food safety. PMID:25300041

  6. 7 CFR 91.38 - Additional fees for appeal of analysis.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 3 2011-01-01 2011-01-01 false Additional fees for appeal of analysis. 91.38 Section 91.38 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING... for laboratory service that appears in this paragraph. The new fiscal year for Science and...

  7. 7 CFR 91.38 - Additional fees for appeal of analysis.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 3 2010-01-01 2010-01-01 false Additional fees for appeal of analysis. 91.38 Section 91.38 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING... for laboratory service that appears in this paragraph. The new fiscal year for Science and...

  8. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    ScienceCinema

    FitzGerald, Michael [Broad Institute

    2013-02-12

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  9. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    SciTech Connect

    FitzGerald, Michael

    2012-06-01

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  10. Stimulation of terrestrial ecosystem carbon storage by nitrogen addition: a meta-analysis.

    PubMed

    Yue, Kai; Peng, Yan; Peng, Changhui; Yang, Wanqin; Peng, Xin; Wu, Fuzhong

    2016-01-01

    Elevated nitrogen (N) deposition alters the terrestrial carbon (C) cycle, which is likely to feed back to further climate change. However, how the overall terrestrial ecosystem C pools and fluxes respond to N addition remains unclear. By synthesizing data from multiple terrestrial ecosystems, we quantified the response of C pools and fluxes to experimental N addition using a comprehensive meta-analysis method. Our results showed that N addition significantly stimulated soil total C storage by 5.82% ([2.47%, 9.27%], 95% CI, the same below) and increased the C contents of the above- and below-ground parts of plants by 25.65% [11.07%, 42.12%] and 15.93% [6.80%, 25.85%], respectively. Furthermore, N addition significantly increased aboveground net primary production by 52.38% [40.58%, 65.19%] and litterfall by 14.67% [9.24%, 20.38%] at a global scale. However, the C influx from the plant litter to the soil through litter decomposition and the efflux from the soil due to microbial respiration and soil respiration showed insignificant responses to N addition. Overall, our meta-analysis suggested that N addition will increase soil C storage and plant C in both above- and below-ground parts, indicating that terrestrial ecosystems might act to strengthen as a C sink under increasing N deposition. PMID:26813078

  11. Stimulation of terrestrial ecosystem carbon storage by nitrogen addition: a meta-analysis

    PubMed Central

    Yue, Kai; Peng, Yan; Peng, Changhui; Yang, Wanqin; Peng, Xin; Wu, Fuzhong

    2016-01-01

    Elevated nitrogen (N) deposition alters the terrestrial carbon (C) cycle, which is likely to feed back to further climate change. However, how the overall terrestrial ecosystem C pools and fluxes respond to N addition remains unclear. By synthesizing data from multiple terrestrial ecosystems, we quantified the response of C pools and fluxes to experimental N addition using a comprehensive meta-analysis method. Our results showed that N addition significantly stimulated soil total C storage by 5.82% ([2.47%, 9.27%], 95% CI, the same below) and increased the C contents of the above- and below-ground parts of plants by 25.65% [11.07%, 42.12%] and 15.93% [6.80%, 25.85%], respectively. Furthermore, N addition significantly increased aboveground net primary production by 52.38% [40.58%, 65.19%] and litterfall by 14.67% [9.24%, 20.38%] at a global scale. However, the C influx from the plant litter to the soil through litter decomposition and the efflux from the soil due to microbial respiration and soil respiration showed insignificant responses to N addition. Overall, our meta-analysis suggested that N addition will increase soil C storage and plant C in both above- and below-ground parts, indicating that terrestrial ecosystems might act to strengthen as a C sink under increasing N deposition. PMID:26813078

  12. Stimulation of terrestrial ecosystem carbon storage by nitrogen addition: a meta-analysis

    NASA Astrophysics Data System (ADS)

    Yue, Kai; Peng, Yan; Peng, Changhui; Yang, Wanqin; Peng, Xin; Wu, Fuzhong

    2016-01-01

    Elevated nitrogen (N) deposition alters the terrestrial carbon (C) cycle, which is likely to feed back to further climate change. However, how the overall terrestrial ecosystem C pools and fluxes respond to N addition remains unclear. By synthesizing data from multiple terrestrial ecosystems, we quantified the response of C pools and fluxes to experimental N addition using a comprehensive meta-analysis method. Our results showed that N addition significantly stimulated soil total C storage by 5.82% ([2.47%, 9.27%], 95% CI, the same below) and increased the C contents of the above- and below-ground parts of plants by 25.65% [11.07%, 42.12%] and 15.93% [6.80%, 25.85%], respectively. Furthermore, N addition significantly increased aboveground net primary production by 52.38% [40.58%, 65.19%] and litterfall by 14.67% [9.24%, 20.38%] at a global scale. However, the C influx from the plant litter to the soil through litter decomposition and the efflux from the soil due to microbial respiration and soil respiration showed insignificant responses to N addition. Overall, our meta-analysis suggested that N addition will increase soil C storage and plant C in both above- and below-ground parts, indicating that terrestrial ecosystems might act to strengthen as a C sink under increasing N deposition.

  13. IMSA: integrated metagenomic sequence analysis for identification of exogenous reads in a host genomic background.

    PubMed

    Dimon, Michelle T; Wood, Henry M; Rabbitts, Pamela H; Arron, Sarah T

    2013-01-01

    Metagenomics, the study of microbial genomes within diverse environments, is a rapidly developing field. The identification of microbial sequences within a host organism enables the study of human intestinal, respiratory, and skin microbiota, and has allowed the identification of novel viruses in diseases such as Merkel cell carcinoma. There are few publicly available tools for metagenomic high throughput sequence analysis. We present Integrated Metagenomic Sequence Analysis (IMSA), a flexible, fast, and robust computational analysis pipeline that is available for public use. IMSA takes input sequence from high throughput datasets and uses a user-defined host database to filter out host sequence. IMSA then aligns the filtered reads to a user-defined universal database to characterize exogenous reads within the host background. IMSA assigns a score to each node of the taxonomy based on read frequency, and can output this as a taxonomy report suitable for cluster analysis or as a taxonomy map (TaxMap). IMSA also outputs the specific sequence reads assigned to a taxon of interest for downstream analysis. We demonstrate the use of IMSA to detect pathogens and normal flora within sequence data from a primary human cervical cancer carrying HPV16, a primary human cutaneous squamous cell carcinoma carrying HPV 16, the CaSki cell line carrying HPV16, and the HeLa cell line carrying HPV18. PMID:23717627

  14. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology

    PubMed Central

    Grüning, Björn A.; Paszkiewicz, Konrad; Pritchard, Leighton

    2013-01-01

    The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of “effector” proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen’s predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu). PMID:24109552

  15. Design and assembly sequence analysis of option 3 for CETF reference space station

    NASA Technical Reports Server (NTRS)

    Garrett, L. Bernard; Andersen, Gregory C.; Hall, John B., Jr.; Allen, Cheryl L.; Scott, A. D., Jr.; So, Kenneth T.

    1987-01-01

    A design and assembly sequence was conducted on one option of the Dual Keel Space Station examined by a NASA Critical Evaluation Task Force to establish viability of several variations of that option. A goal of the study was to produce and analyze technical data to support Task Force decisions to either examine particular Option 3 variations in more depth or eliminate them from further consideration. An analysis of the phasing assembly showed that use of an Expendable Launch Vehicle in conjunction with the Space Transportation System (STS) can accelerate the buildup of the Station and ease the STS launch rate constraints. The study also showed that use of an Orbital Maneuvering Vehicle on the first flight can significantly benefit Station assembly and, by performing Station subsystem functions, can alleviate the need for operational control and reboost systems during the early flights. In addition to launch and assembly sequencing, the study assessed stability and control, and analyzed node-packaging options and the effects of keel removal on the structural dynamics of the Station. Results of these analyses are presented and discussed.

  16. Transcript analysis of a goat mesenteric lymph node by deep next-generation sequencing.

    PubMed

    E, G X; Zhao, Y J; Na, R S; Huang, Y F

    2016-01-01

    Deep RNA sequencing (RNA-seq) provides a practical and inexpensive alternative for exploring genomic data in non-model organisms. The functional annotation of non-model mammalian genomes, such as that of goats, is still poor compared to that of humans and mice. In the current study, we performed a whole transcriptome analysis of an intestinal mucous membrane lymph node to comprehensively characterize the transcript catalogue of this tissue in a goat. Using an Illumina HiSeq 4000 sequencing platform, 9.692 GB of raw reads were acquired. A total of 57,526 lymph transcripts were obtained, and the majority of these were mapped to known transcriptional units (42.67%). A comparison of the mRNA expression of the mesenteric lymph nodes during the juvenile and post-adolescent stages revealed 8949 transcripts that were differentially expressed, including 6174 known genes. In addition, we functionally classified these transcripts using Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) terms. A total of 6174 known genes were assigned to 64 GO terms, and 3782 genes were assigned to 303 KEGG pathways, including some related to immunity. Our results reveal the complex transcriptome profile of the lymph node and suggest that the immune system is immature in the mesenteric lymph nodes of juvenile goats. PMID:27173308

  17. Genome Sequence and Analysis of the Soil Cellulolytic ActinomyceteThermobifida fusca

    SciTech Connect

    Lykidis, Athanasios; Ivanova, Natalia; Anderson, Iain; Mavromatis, Konstantinos; Copeland, Alex; Richardson, Paul; Lucas, Susan; DiBartolo, Genevieve; Martinez, Michele; Lapidus, Alla; Wilson, David B.; Kyrpides, Nikos

    2006-01-01

    Thermobifida fusca is a moderately thermophilic soilbacterium that belongs to Actinobacteria. It is a major degrader of plantcell walls and has been used as a model organism for the study ofsecreted, thermostable cellulases. The complete genome sequence showedthat T. fusca has a single circular chromosome of 3642249 bp predicted toencode 3117 proteins and 65 RNA species with a coding density of 85percent. Genome analysis revealed the existence of 29 putative glycosidehydrolases in addition to the previously identified cellulases andxylanases. The glycosyl hydrolases include enzymes predicted to exhibitmainly dextran/starch and xylan degrading functions. T. fusca possessestwo protein secretion systems: the sec general secretion system and thetwin-arginine translocation system. Several of the secreted cellulaseshave sequence signatures indicating their secretion may be mediated bythe twin-arginine translocation system. T. fusca has extensive transportsystems for import of carbohydrates coupled to transcriptional regulatorscontrolling the expression of the transporters and glycosylhydrolases. Inaddition to providing an overview of the physiology of a soilactinomycete, this study presents insights on the transcriptionalregulation and secretion of cellulases which may facilitate theindustrial exploitation of these systems.

  18. Expressed sequence tag analysis of functional genes associated with adventitious rooting in Liriodendron hybrids.

    PubMed

    Zhong, Y D; Sun, X Y; Liu, E Y; Li, Y Q; Gao, Z; Yu, F X

    2016-01-01

    Liriodendron hybrids (Liriodendron chinense x L. tulipifera) are important landscaping and afforestation hardwood trees. To date, little genomic research on adventitious rooting has been reported in these hybrids, as well as in the genus Liriodendron. In the present study, we used adventitious roots to construct the first cDNA library for Liriodendron hybrids. A total of 5176 expressed sequence tags (ESTs) were generated and clustered into 2921 unigenes. Among these unigenes, 2547 had significant homology to the non-redundant protein database representing a wide variety of putative functions. Homologs of these genes regulated many aspects of adventitious rooting, including those for auxin signal transduction and root hair development. Results of quantitative real-time polymerase chain reaction showed that AUX1, IRE, and FB1 were highly expressed in adventitious roots and the expression of AUX1, ARF1, NAC1, RHD1, and IRE increased during the development of adventitious roots. Additionally, 181 simple sequence repeats were identified from 166 ESTs and more than 91.16% of these were dinucleotide and trinucleotide repeats. To the best of our knowledge, the present study reports the identification of the genes associated with adventitious rooting in the genus Liriodendron for the first time and provides a valuable resource for future genomic studies. Expression analysis of selected genes could allow us to identify regulatory genes that may be essential for adventitious rooting. PMID:27420958

  19. Transcriptome Analysis of Leaf Tissue of Raphanus sativus by RNA Sequencing

    PubMed Central

    Yin, Yongtai; Wu, Gang; Xia, Heng; Wang, Xiaodong; Fu, Chunhua; Li, Maoteng; Wu, Jiangsheng

    2013-01-01

    Raphanus sativus is not only a popular edible vegetable but also an important source of medicinal compounds. However, the paucity of knowledge about the transcriptome of R. sativus greatly impedes better understanding of the functional genomics and medicinal potential of R. sativus. In this study, the transcriptome sequencing of leaf tissues in R. sativus was performed for the first time. Approximately 22 million clean reads were generated and used for transcriptome assembly. The generated unigenes were subsequently annotated against gene ontology (GO) database. KEGG analysis further revealed two important pathways in the bolting stage of R.sativus including spliceosome assembly and alkaloid synthesis. In addition, a total of 6,295 simple sequence repeats (SSRs) with various motifs were identified in the unigene library of R. sativus. Finally, four unigenes of R. sativus were selected for alignment with their homologs from other plants, and phylogenetic trees for each of the genes were constructed. Taken together, this study will provide a platform to facilitate gene discovery and advance functional genomic research of R. sativus. PMID:24265813

  20. Mutational analysis of the consensus sequence of a replication origin from yeast chromosome III.

    PubMed Central

    Van Houten, J V; Newlon, C S

    1990-01-01

    Yeast autonomously replicating sequence (ARS) elements contain an 11-base-pair core consensus sequence (5'-[A/T]TTTAT[A/G]TTT[A/T]-3') that is required for function. The contribution of each position within this sequence to ARS activity was tested by creating all possible single-base mutations within the core consensus sequence of ARS307 (formerly called the C2G1 ARS) and testing their effects on high-frequency transformation and on plasmid stability. Of the 33 mutations, 22 abolished ARS function as measured by high-frequency transformation, 7 caused more than twofold reductions in plasmid stability, and 4 had no effect on plasmid stability. Mutations that reduced or abolished ARS activity occurred at each position in the consensus sequence, demonstrating that each position of this sequence contributes to ARS function. Of the four mutations that had no effect on ARS activity, three created alternative perfect matches to the core consensus sequence, demonstrating that the alternate bases allowed by the consensus sequence are, indeed, interchangeable. In addition, a change from T to C at position 6 did not perturb wild-type efficiency. To test whether the essential region extends beyond the 11-base-pair consensus sequence, the effects on plasmid stability of point mutations one base 3' to the T-rich strand of the core consensus sequence (position 12) and deletion mutations that altered bases 5' to the T-rich strand of the core consensus sequence were examined. An A at position 12 or the removal of three T residues 5' to the core consensus sequence severely diminished ARS efficiency, showing that the region required for full ARS efficiency extends beyond the core consensus sequence in both directions. PMID:2196439

  1. Global Multilocus Sequence Type Analysis of Chlamydia trachomatis Strains from 16 Countries

    PubMed Central

    Isaksson, Jenny; Ryberg, Martin; Tångrot, Jeanette; Saleh, Isam; Versteeg, Bart; Gravningen, Kirsten; Bruisten, Sylvia

    2015-01-01

    The Uppsala University Chlamydia trachomatis multilocus sequence type (MLST) database (http://mlstdb.bmc.uu.se) is based on five target regions (non-housekeeping genes) and the ompA gene. Each target has various numbers of alleles—hctB, 89; CT058, 51; CT144, 30; CT172, 38; and pbpB, 35—derived from 13 studies. Our aims were to perform an overall analysis of all C. trachomatis MLST sequence types (STs) in the database, examine STs with global spread, and evaluate the phylogenetic capability by using the five targets. A total of 415 STs were recognized from 2,089 specimens. The addition of 49 ompA gene variants created 459 profiles. ST variation and their geographical distribution were characterized using eBURST and minimum spanning tree analyses. There were 609 samples from men having sex with men (MSM), with 4 predominating STs detected in this group, comprising 63% of MSM cases. Four other STs predominated among 1,383 heterosexual cases comprising, 31% of this group. The diversity index in ocular trachoma cases was significantly lower than in sexually transmitted chlamydia infections. Predominating STs were identified in 12 available C. trachomatis whole genomes which were compared to 22 C. trachomatis full genomes without predominating STs. No specific gene in the 12 genomes with predominating STs could be linked to successful spread of certain STs. Phylogenetic analysis showed that MLST targets provide a tree similar to trees based on whole-genome analysis. The presented MLST scheme identified C. trachomatis strains with global spread. It provides a tool for epidemiological investigations and is useful for phylogenetic analyses. PMID:25926497

  2. Accident sequence precursor analysis level 2/3 model development

    SciTech Connect

    Lui, C.H.; Galyean, W.J.; Brownson, D.A.

    1997-02-01

    The US Nuclear Regulatory Commission`s Accident Sequence Precursor (ASP) program currently uses simple Level 1 models to assess the conditional core damage probability for operational events occurring in commercial nuclear power plants (NPP). Since not all accident sequences leading to core damage will result in the same radiological consequences, it is necessary to develop simple Level 2/3 models that can be used to analyze the response of the NPP containment structure in the context of a core damage accident, estimate the magnitude of the resulting radioactive releases to the environment, and calculate the consequences associated with these releases. The simple Level 2/3 model development work was initiated in 1995, and several prototype models have been completed. Once developed, these simple Level 2/3 models are linked to the simple Level 1 models to provide risk perspectives for operational events. This paper describes the methods implemented for the development of these simple Level 2/3 ASP models, and the linkage process to the existing Level 1 models.

  3. Analysis of sequencing and scheduling methods for arrival traffic

    NASA Technical Reports Server (NTRS)

    Neuman, Frank; Erzberger, Heinz

    1990-01-01

    The air traffic control subsystem that performs scheduling is discussed. The function of the scheduling algorithms is to plan automatically the most efficient landing order and to assign optimally spaced landing times to all arrivals. Several important scheduling algorithms are described and the statistical performance of the scheduling algorithms is examined. Scheduling brings order to an arrival sequence for aircraft. First-come-first-served scheduling (FCFS) establishes a fair order, based on estimated times of arrival, and determines proper separations. Because of the randomness of the traffic, gaps will remain in the scheduled sequence of aircraft. These gaps are filled, or partially filled, by time-advancing the leading aircraft after a gap while still preserving the FCFS order. Tightly scheduled groups of aircraft remain with a mix of heavy and large aircraft. Separation requirements differ for different types of aircraft trailing each other. Advantage is taken of this fact through mild reordering of the traffic, thus shortening the groups and reducing average delays. Actual delays for different samples with the same statistical parameters vary widely, especially for heavy traffic.

  4. Identification and sequence analysis of potyviruses infecting crops in Vietnam.

    PubMed

    Ha, C; Revill, P; Harding, R M; Vu, M; Dale, J L

    2008-01-01

    Fifty-two virus isolates from 13 distinct potyvirus species infecting crops in Vietnam were identified and the 3' region of each genome was sequenced. The viruses were: bean common mosaic virus (BCMV), potato virus Y (PVY), sugarcane mosaic virus (SCMV), sorghum mosaic virus (SrMV), chilli veinal mottle virus (ChiVMV), zucchini yellow mosaic virus (ZYMV), leek yellow stripe virus (LYMV), shallot yellow stripe virus (SYSV), onion yellow dwarf virus (OYDV), turnip mosaic virus (TuMV), dasheen mosaic virus (DsMV), sweet potato feathery mottle virus (SPFMV) and a novel potyvirus infecting chilli, tentatively named chilli ringspot virus (ChiRSV). With the exception of BCMV and PVY, this is first report of these viruses in Vietnam. Further, rabbit bell (Crotalaria anagyroides) and typhonia (Typhonium trilobatum) were identified as new natural hosts of the peanut stunt virus (PStV) strain of BCMV and of DsMV, respectively. Sequence and phylogenetic analyses of the entire CP-coding region revealed considerable variability in BCMV, SCMV, PVY, ZYMV and DsMV. PMID:17906829

  5. Analysis of the dermatophyte Trichophyton rubrum expressed sequence tags

    PubMed Central

    Wang, Lingling; Ma, Li; Leng, Wenchuan; Liu, Tao; Yu, Lu; Yang, Jian; Yang, Li; Zhang, Wenliang; Zhang, Qian; Dong, Jie; Xue, Ying; Zhu, Yafang; Xu, Xingye; Wan, Zhe; Ding, Guohui; Yu, Fudong; Tu, Kang; Li, Yixue; Li, Ruoyu; Shen, Yan; Jin, Qi

    2006-01-01

    Background Dermatophytes are the primary causative agent of dermatophytoses, a disease that affects billions of individuals worldwide. Trichophyton rubrum is the most common of the superficial fungi. Although T. rubrum is a recognized pathogen for humans, little is known about how its transcriptional pattern is related to development of the fungus and establishment of disease. It is therefore necessary to identify genes whose expression is relevant to growth, metabolism and virulence of T. rubrum. Results We generated 10 cDNA libraries covering nearly the entire growth phase and used them to isolate 11,085 unique expressed sequence tags (ESTs), including 3,816 contigs and 7,269 singletons. Comparisons with the GenBank non-redundant (NR) protein database revealed putative functions or matched homologs from other organisms for 7,764 (70%) of the ESTs. The remaining 3,321 (30%) of ESTs were only weakly similar or not similar to known sequences, suggesting that these ESTs represent novel genes. Conclusion The present data provide a comprehensive view of fungal physiological processes including metabolism, sexual and asexual growth cycles, signal transduction and pathogenic mechanisms. PMID:17032460

  6. In Vivo Enhancer Analysis Chromosome 16 Conserved NoncodingSequences

    SciTech Connect

    Pennacchio, Len A.; Ahituv, Nadav; Moses, Alan M.; Nobrega,Marcelo; Prabhakar, Shyam; Shoukry, Malak; Minovitsky, Simon; Visel,Axel; Dubchak, Inna; Holt, Amy; Lewis, Keith D.; Plajzer-Frick, Ingrid; Akiyama, Jennifer; De Val, Sarah; Afzal, Veena; Black, Brian L.; Couronne, Olivier; Eisen, Michael B.; Rubin, Edward M.

    2006-02-01

    The identification of enhancers with predicted specificitiesin vertebrate genomes remains a significant challenge that is hampered bya lack of experimentally validated training sets. In this study, weleveraged extreme evolutionary sequence conservation as a filter toidentify putative gene regulatory elements and characterized the in vivoenhancer activity of human-fish conserved and ultraconserved1 noncodingelements on human chromosome 16 as well as such elements from elsewherein the genome. We initially tested 165 of these extremely conservedsequences in a transgenic mouse enhancer assay and observed that 48percent (79/165) functioned reproducibly as tissue-specific enhancers ofgene expression at embryonic day 11.5. While driving expression in abroad range of anatomical structures in the embryo, the majority of the79 enhancers drove expression in various regions of the developingnervous system. Studying a set of DNA elements that specifically droveforebrain expression, we identified DNA signatures specifically enrichedin these elements and used these parameters to rank all ~;3,400human-fugu conserved noncoding elements in the human genome. The testingof the top predictions in transgenic mice resulted in a three-foldenrichment for sequences with forebrain enhancer activity. These datadramatically expand the catalogue of in vivo-characterized human geneenhancers and illustrate the future utility of such training sets for avariety of iological applications including decoding the regulatoryvocabulary of the human genome.

  7. Signature Peptide-Enabled Metagenomics (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    McMahon, Ben [LANL

    2013-01-25

    Ben McMahon of Los Alamos National Laboratory (LANL) presents "Signature Peptide-Enabled Metagenomics" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  8. Signature Peptide-Enabled Metagenomics (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    McMahon, Ben

    2012-06-01

    Ben McMahon of Los Alamos National Laboratory (LANL) presents "Signature Peptide-Enabled Metagenomics" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  9. Sequence and analysis of chromosome 3 of the plant Arabidopsis thaliana.

    PubMed

    Salanoubat, M; Lemcke, K; Rieger, M; Ansorge, W; Unseld, M; Fartmann, B; Valle, G; Blöcker, H; Perez-Alonso, M; Obermaier, B; Delseny, M; Boutry, M; Grivell, L A; Mache, R; Puigdomènech, P; De Simone, V; Choisne, N; Artiguenave, F; Robert, C; Brottier, P; Wincker, P; Cattolico, L; Weissenbach, J; Saurin, W; Quétier, F; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Benes, V; Wurmbach, E; Drzonek, H; Erfle, H; Jordan, N; Bangert, S; Wiedelmann, R; Kranz, H; Voss, H; Holland, R; Brandt, P; Nyakatura, G; Vezzi, A; D'Angelo, M; Pallavicini, A; Toppo, S; Simionati, B; Conrad, A; Hornischer, K; Kauer, G; Löhnert, T H; Nordsiek, G; Reichelt, J; Scharfe, M; Schön, O; Bargues, M; Terol, J; Climent, J; Navarro, P; Collado, C; Perez-Perez, A; Ottenwälder, B; Duchemin, D; Cooke, R; Laudie, M; Berger-Llauro, C; Purnelle, B; Masuy, D; de Haan, M; Maarse, A C; Alcaraz, J P; Cottet, A; Casacuberta, E; Monfort, A; Argiriou, A; flores, M; Liguori, R; Vitale, D; Mannhaupt, G; Haase, D; Schoof, H; Rudd, S; Zaccaria, P; Mewes, H W; Mayer, K F; Kaul, S; Town, C D; Koo, H L; Tallon, L J; Jenkins, J; Rooney, T; Rizzo, M; Walts, A; Utterback, T; Fujii, C Y; Shea, T P; Creasy, T H; Haas, B; Maiti, R; Wu, D; Peterson, J; Van Aken, S; Pai, G; Militscher, J; Sellers, P; Gill, J E; Feldblyum, T V; Preuss, D; Lin, X; Nierman, W C; Salzberg, S L; White, O; Venter, J C; Fraser, C M; Kaneko, T; Nakamura, Y; Sato, S; Kato, T; Asamizu, E; Sasamoto, S; Kimura, T; Idesawa, K; Kawashima, K; Kishida, Y; Kiyokawa, C; Kohara, M; Matsumoto, M; Matsuno, A; Muraki, A; Nakayama, S; Nakazaki, N; Shinpo, S; Takeuchi, C; Wada, T; Watanabe, A; Yamada, M; Yasuda, M; Tabata, S

    2000-12-14

    Arabidopsis thaliana is an important model system for plant biologists. In 1996 an international collaboration (the Arabidopsis Genome Initiative) was formed to sequence the whole genome of Arabidopsis and in 1999 the sequence of the first two chromosomes was reported. The sequence of the last three chromosomes and an analysis of the whole genome are reported in this issue. Here we present the sequence of chromosome 3, organized into four sequence segments (contigs). The two largest (13.5 and 9.2 Mb) correspond to the top (long) and the bottom (short) arms of chromosome 3, and the two small contigs are located in the genetically defined centromere. This chromosome encodes 5,220 of the roughly 25,500 predicted protein-coding genes in the genome. About 20% of the predicted proteins have significant homology to proteins in eukaryotic genomes for which the complete sequence is available, pointing to important conserved cellular functions among eukaryotes. PMID:11130713

  10. On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis

    PubMed Central

    Li, Bing; Chun, Hyonho; Zhao, Hongyu

    2014-01-01

    We introduce a nonparametric method for estimating non-gaussian graphical models based on a new statistical relation called additive conditional independence, which is a three-way relation among random vectors that resembles the logical structure of conditional independence. Additive conditional independence allows us to use one-dimensional kernel regardless of the dimension of the graph, which not only avoids the curse of dimensionality but also simplifies computation. It also gives rise to a parallel structure to the gaussian graphical model that replaces the precision matrix by an additive precision operator. The estimators derived from additive conditional independence cover the recently introduced nonparanormal graphical model as a special case, but outperform it when the gaussian copula assumption is violated. We compare the new method with existing ones by simulations and in genetic pathway analysis. PMID:26401064

  11. Analysis of occupational accidents: prevention through the use of additional technical safety measures for machinery

    PubMed Central

    Dźwiarek, Marek; Latała, Agata

    2016-01-01

    This article presents an analysis of results of 1035 serious and 341 minor accidents recorded by Poland's National Labour Inspectorate (PIP) in 2005–2011, in view of their prevention by means of additional safety measures applied by machinery users. Since the analysis aimed at formulating principles for the application of technical safety measures, the analysed accidents should bear additional attributes: the type of machine operation, technical safety measures and the type of events causing injuries. The analysis proved that the executed tasks and injury-causing events were closely connected and there was a relation between casualty events and technical safety measures. In the case of tasks consisting of manual feeding and collecting materials, the injuries usually occur because of the rotating motion of tools or crushing due to a closing motion. Numerous accidents also happened in the course of supporting actions, like removing pollutants, correcting material position, cleaning, etc. PMID:26652689

  12. Reducing the matrix effects in chemical analysis: fusion of isotope dilution and standard addition methods

    NASA Astrophysics Data System (ADS)

    Pagliano, Enea; Meija, Juris

    2016-04-01

    The combination of isotope dilution and mass spectrometry has become an ubiquitous tool of chemical analysis. Often perceived as one of the most accurate methods of chemical analysis, it is not without shortcomings. Current isotope dilution equations are not capable of fully addressing one of the key problems encountered in chemical analysis: the possible effect of sample matrix on measured isotope ratios. The method of standard addition does compensate for the effect of sample matrix by making sure that all measured solutions have identical composition. While it is impossible to attain such condition in traditional isotope dilution, we present equations which allow for matrix-matching between all measured solutions by fusion of isotope dilution and standard addition methods.

  13. Analysis of occupational accidents: prevention through the use of additional technical safety measures for machinery.

    PubMed

    Dźwiarek, Marek; Latała, Agata

    2016-01-01

    This article presents an analysis of results of 1035 serious and 341 minor accidents recorded by Poland's National Labour Inspectorate (PIP) in 2005-2011, in view of their prevention by means of additional safety measures applied by machinery users. Since the analysis aimed at formulating principles for the application of technical safety measures, the analysed accidents should bear additional attributes: the type of machine operation, technical safety measures and the type of events causing injuries. The analysis proved that the executed tasks and injury-causing events were closely connected and there was a relation between casualty events and technical safety measures. In the case of tasks consisting of manual feeding and collecting materials, the injuries usually occur because of the rotating motion of tools or crushing due to a closing motion. Numerous accidents also happened in the course of supporting actions, like removing pollutants, correcting material position, cleaning, etc. PMID:26652689

  14. A functional analysis of the spacer of V(D)J recombination signal sequences.

    PubMed

    Lee, Alfred Ian; Fugmann, Sebastian D; Cowell, Lindsay G; Ptaszek, Leon M; Kelsoe, Garnett; Schatz, David G

    2003-10-01

    During lymphocyte development, V(D)J recombination assembles antigen receptor genes from component V, D, and J gene segments. These gene segments are flanked by a recombination signal sequence (RSS), which serves as the binding site for the recombination machinery. The murine Jbeta2.6 gene segment is a recombinationally inactive pseudogene, but examination of its RSS reveals no obvious reason for its failure to recombine. Mutagenesis of the Jbeta2.6 RSS demonstrates that the sequences of the heptamer, nonamer, and spacer are all important. Strikingly, changes solely in the spacer sequence can result in dramatic differences in the level of recombination. The subsequent analysis of a library of more than 4,000 spacer variants revealed that spacer residues of particular functional importance are correlated with their degree of conservation. Biochemical assays indicate distinct cooperation between the spacer and heptamer/nonamer along each step of the reaction pathway. The results suggest that the spacer serves not only to ensure the appropriate distance between the heptamer and nonamer but also regulates RSS activity by providing additional RAG:RSS interaction surfaces. We conclude that while RSSs are defined by a "digital" requirement for absolutely conserved nucleotides, the quality of RSS function is determined in an "analog" manner by numerous complex interactions between the RAG proteins and the less-well conserved nucleotides in the heptamer, the nonamer, and, importantly, the spacer. Those modulatory effects are accurately predicted by a new computational algorithm for "RSS information content." The interplay between such binary and multiplicative modes of interactions provides a general model for analyzing protein-DNA interactions in various biological systems. PMID:14551903

  15. Expressed sequence tag analysis of the emu (Dromaius novaehollandiae) pituitary by 454 GS Junior pyrosequencing.

    PubMed

    Kim, Ji Eun; Leung, Frederick C; Jiang, Jingwei; Kwok, Amy H Y; Bennett, Darin C; Cheng, Kimberly M

    2013-01-01

    Emus (Dromaius novaehollandiae) are farmed for their oil for pharmaceutical and cosmetic uses. This emu pituitary expressed sequence tag study was undertaken to identify novel transcripts in the emu pituitary to propel their identification and functional studies. By mapping reads derived from the Roche 454 GS Junior pyrosequencer to 8 reference species (human, mouse, chicken, zebra finch, fruit fly, turkey, round worm, and Carolina anole lizard) from the UniGene database, a total of 81,788 reads (53,312 mapped reads) were obtained and assembled with Reference Sequence (RefSeq). We annotated 6,676 potential emu genes by referencing 7 species (excluding lizard) and identified 1,232 potential genes common among 3 species (human, mouse, and chicken) with complete available reference genomes. Gene Ontology analysis revealed 376 Gene Ontology terms showing, with the highest counts, their involvements in biological processes, metabolism, and cellular components. These potential genes were detected to associate with 20 pathways including mitogen-activated protein kinase, insulin, neurotrophin signaling pathways, and carbohydrate digestion and absorption pathway. We also revealed a panel of tissue-specific genes including regulator of G-protein signaling protein (RGS), glucagon-like peptide receptor (GLPR), and growth hormone-inducible transmembrane protein (GHITM). Additionally, fatty acid binding protein (FABP), fatty acid desaturase (FAS), and stearoyl-coenzyme A desaturase (SCD), key enzyme genes in fat metabolism, were found to be also expressed in emu pituitary. This expressed sequence tag study represents the first step in functional characterization of emu pituitary gene expression and SNP identification for the improvement of fat production in the emu. PMID:23243234

  16. DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations

    PubMed Central

    Andrews, T. Daniel; Jeelall, Yogesh; Talaulikar, Dipti; Goodnow, Christopher C.

    2016-01-01

    Background. Massively parallel sequencing technology is being used to sequence highly diverse populations of DNA such as that derived from heterogeneous cell mixtures containing both wild-type and disease-related states. At the core of such molecule tagging techniques is the tagging and identification of sequence reads derived from individual input DNA molecules, which must be first computationally disambiguated to generate read groups sharing common sequence tags, with each read group representing a single input DNA molecule. This disambiguation typically generates huge numbers of reads groups, each of which requires additional variant detection analysis steps to be run specific to each read group, thus representing a significant computational challenge. While sequencing technologies for producing these data are approaching maturity, the lack of available computational tools for analysing such heterogeneous sequence data represents an obstacle to the widespread adoption of this technology. Results. Using synthetic data we successfully detect unique variants at dilution levels of 1 in a 1,000,000 molecules, and find DeeepSNVMiner obtains significantly lower false positive and false negative rates compared to popular variant callers GATK, SAMTools, FreeBayes and LoFreq, particularly as the variant concentration levels decrease. In a dilution series with genomic DNA from two cells lines, we find DeepSNVMiner identifies a known somatic variant when present at concentrations of only 1 in 1,000 molecules in the input material, the lowest concentration amongst all variant callers tested. Conclusions. Here we present DeepSNVMiner; a tool to disambiguate tagged sequence groups and robustly identify sequence variants specific to subsets of starting DNA molecules that may indicate the presence of a disease. DeepSNVMiner is an automated workflow of custom sequence analysis utilities and open source tools able to differentiate somatic DNA variants from artefactual sequence

  17. DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations.

    PubMed

    Andrews, T Daniel; Jeelall, Yogesh; Talaulikar, Dipti; Goodnow, Christopher C; Field, Matthew A

    2016-01-01

    Background. Massively parallel sequencing technology is being used to sequence highly diverse populations of DNA such as that derived from heterogeneous cell mixtures containing both wild-type and disease-related states. At the core of such molecule tagging techniques is the tagging and identification of sequence reads derived from individual input DNA molecules, which must be first computationally disambiguated to generate read groups sharing common sequence tags, with each read group representing a single input DNA molecule. This disambiguation typically generates huge numbers of reads groups, each of which requires additional variant detection analysis steps to be run specific to each read group, thus representing a significant computational challenge. While sequencing technologies for producing these data are approaching maturity, the lack of available computational tools for analysing such heterogeneous sequence data represents an obstacle to the widespread adoption of this technology. Results. Using synthetic data we successfully detect unique variants at dilution levels of 1 in a 1,000,000 molecules, and find DeeepSNVMiner obtains significantly lower false positive and false negative rates compared to popular variant callers GATK, SAMTools, FreeBayes and LoFreq, particularly as the variant concentration levels decrease. In a dilution series with genomic DNA from two cells lines, we find DeepSNVMiner identifies a known somatic variant when present at concentrations of only 1 in 1,000 molecules in the input material, the lowest concentration amongst all variant callers tested. Conclusions. Here we present DeepSNVMiner; a tool to disambiguate tagged sequence groups and robustly identify sequence variants specific to subsets of starting DNA molecules that may indicate the presence of a disease. DeepSNVMiner is an automated workflow of custom sequence analysis utilities and open source tools able to differentiate somatic DNA variants from artefactual sequence

  18. Deep-sequencing transcriptome analysis of chilling tolerance mechanisms of a subnival alpine plant, Chorispora bungeana

    PubMed Central

    2012-01-01

    Background The plant tolerance mechanisms to low temperature have been studied extensively in the model plant Arabidopsis at the transcriptional level. However, few studies were carried out in plants with strong inherited cold tolerance. Chorispora bungeana is a subnival alpine plant possessing strong cold tolerance mechanisms. To get a deeper insight into its cold tolerance mechanisms, the transcriptome profiles of chilling-treated C. bungeana seedlings were analyzed by Illumina deep-sequencing and compared with Arabidopsis. Results Two cDNA libraries constructed from mRNAs of control and chilling-treated seedlings were sequenced by Illumina technology. A total of 54,870 unigenes were obtained by de novo assembly, and 3,484 chilling up-regulated and 4,571 down-regulated unigenes were identified. The expressions of 18 out of top 20 up-regulated unigenes were confirmed by qPCR analysis. Functional network analysis of the up-regulated genes revealed some common biological processes, including cold responses, and molecular functions in C. bungeana and Arabidopsis responding to chilling. Karrikins were found as new plant growth regulators involved in chilling responses of C. bungeana and Arabidopsis. However, genes involved in cold acclimation were enriched in chilling up-regulated genes in Arabidopsis but not in C. bungeana. In addition, although transcription activations were stimulated in both C. bungeana and Arabidopsis, no CBF putative ortholog was up-regulated in C. bungeana while CBF2 and CBF3 were chilling up-regulated in Arabidopsis. On the other hand, up-regulated genes related to protein phosphorylation and auto-ubiquitination processes were over-represented in C. bungeana but not in Arabidopsis. Conclusions We conducted the first deep-sequencing transcriptome profiling and chilling stress regulatory network analysis of C. bungeana, a subnival alpine plant with inherited cold tolerance. Comparative transcriptome analysis suggests that cold acclimation is not

  19. Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses.

    PubMed

    Yanagisawa, Hironobu; Tomita, Reiko; Katsu, Koji; Uehara, Takuya; Atsumi, Go; Tateda, Chika; Kobayashi, Kappei; Sekine, Ken-Taro

    2016-03-01

    The presence of high molecular weight double-stranded RNA (dsRNA) within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing) analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS) would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV), a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt) that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT)-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as "DECS-C," is a powerful method for detecting novel plant viruses. PMID:27072419

  20. Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses

    PubMed Central

    Yanagisawa, Hironobu; Tomita, Reiko; Katsu, Koji; Uehara, Takuya; Atsumi, Go; Tateda, Chika; Kobayashi, Kappei; Sekine, Ken-Taro

    2016-01-01

    The presence of high molecular weight double-stranded RNA (dsRNA) within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing) analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS) would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV), a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt) that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT)-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as “DECS-C,” is a powerful method for detecting novel plant viruses. PMID:27072419

  1. Exploring genome wide bisulfite sequencing for DNA methylation analysis in livestock: a technical assessment.

    PubMed

    Doherty, Rachael; Couldrey, Christine

    2014-01-01

    Recent advances made in "omics" technologies are contributing to a revolution in livestock selection and breeding practices. Epigenetic mechanisms, including DNA methylation are important determinants for the control of gene expression in mammals. DNA methylation research will help our understanding of how environmental factors contribute to phenotypic variation of complex production and health traits. High-throughput sequencing is a vital tool for the comprehensive analysis of DNA methylation, and bisulfite-based strategies coupled with DNA sequencing allows for quantitative, site-specific methylation analysis at the genome level or genome wide. Reduced representation bisulfite sequencing (RRBS) and more recently whole genome bisulfite sequencing (WGBS) have proven to be effective techniques for studying DNA methylation in both humans and mice. Here we report the development of RRBS and WGBS for use in sheep, the first application of this technology in livestock species. Important technical issues associated with these methodologies including fragment size selection and sequence depth are examined and discussed. PMID:24860595

  2. UNIT 11.10 N-Terminal Sequence Analysis of Proteins and Peptides

    PubMed Central

    Speicher, Kaye D.; Gorman, Nicole; Speicher, David W.

    2009-01-01

    Automated N-terminal sequence analysis involves a series of chemical reactions that derivatize and remove one amino acid at a time from the N-terminal of purified peptides or intact proteins. At least several pmoles of a purified protein or 10 to 20 pmoles of a purified peptide with an unmodified N-terminal is required in order to obtain useful sequence information. In recent years the demand for N-terminal sequencing has decreased substantially as some applications for protein identification and characterization can now be more effectively performed using mass spectrometry. However, N-terminal sequencing remains the method of choice for verifying the N-terminal boundary of recombinant proteins, determining the N-terminal of protease-resistant domains, identifying proteins isolated from species where most of the genome has not yet been sequenced, and mapping modified or crosslinked sites in proteins that prove to be refractory to analysis by mass spectrometry. PMID:18429102

  3. Sequence analysis demonstrates that Onion yellow dwarf virus isolates from China contain a P3 region much larger than other potyviruses.

    PubMed

    Chen, J; Adams, M J; Zheng, H-Y; Chen, J-P

    2003-06-01

    The complete sequence of an isolate of Onion yellow dwarf virus (OYDV) from Yuhang, Zhejiang province, China, was determined. It was 10538 nts in length and was predicted to encode a polyprotein 3403 amino acids (aa) long with a calculated M(r) of 385.1 kDa. The predicted P3 protein (530 aa) was larger than that of any of the potyviruses sequenced to date (344-378 aa). The additional sequence occurs at the N-terminus of the protein, does not represent a duplication from elsewhere in the OYDV genome and could not be matched to any other sequences in the databases. Similar sequences were found in 4 other Chinese OYDV isolates. Phylogenetic analysis of the amino acid sequences of the polyprotein showed that OYDV is distantly related to Pea seed-borne mosaic virus and the potyviruses of grasses and cereals. PMID:12756621

  4. A Phylogenetic Analysis of the Genus Fragaria (Strawberry) Using Intron-Containing Sequence from the ADH-1 Gene

    PubMed Central

    DiMeglio, Laura M.; Yu, Hongrun; Davis, Thomas M.

    2014-01-01

    The genus Fragaria encompasses species at ploidy levels ranging from diploid to decaploid. The cultivated strawberry, Fragaria×ananassa, and its two immediate progenitors, F. chiloensis and F. virginiana, are octoploids. To elucidate the ancestries of these octoploid species, we performed a phylogenetic analysis using intron-containing sequences of the nuclear ADH-1 gene from 39 germplasm accessions representing nineteen Fragaria species and one outgroup species, Dasiphora fruticosa. All trees from Maximum Parsimony and Maximum Likelihood analyses showed two major clades, Clade A and Clade B. Each of the sampled octoploids contributed alleles to both major clades. All octoploid-derived alleles in Clade A clustered with alleles of diploid F. vesca, with the exception of one octoploid allele that clustered with the alleles of diploid F. mandshurica. All octoploid-derived alleles in clade B clustered with the alleles of only one diploid species, F. iinumae. When gaps encoded as binary characters were included in the Maximum Parsimony analysis, tree resolution was improved with the addition of six nodes, and the bootstrap support was generally higher, rising above the 50% threshold for an additional nine branches. These results, coupled with the congruence of the sequence data and the coded gap data, validate and encourage the employment of sequence sets containing gaps for phylogenetic analysis. Our phylogenetic conclusions, based upon sequence data from the ADH-1 gene located on F. vesca linkage group II, complement and generally agree with those obtained from analyses of protein-encoding genes GBSSI-2 and DHAR located on F. vesca linkage groups V and VII, respectively, but differ from a previous study that utilized rDNA sequences and did not detect the ancestral role of F. iinumae. PMID:25078607

  5. The Sequence and Analysis of Duplication Rich Human Chromosome 16

    DOE R&D Accomplishments Database

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-01-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  6. The sequence and analysis of duplication rich human chromosome 16

    SciTech Connect

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-08-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  7. Probabilistic topic modeling for the analysis and classification of genomic sequences

    PubMed Central

    2015-01-01

    Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734

  8. Knowledge-based factor analysis of multidimensional nuclear medicine image sequences

    NASA Astrophysics Data System (ADS)

    Yap, Jeffrey T.; Chen, Chin-Tu; Cooper, Malcolm; Treffert, Jon D.

    1994-05-01

    We have developed a knowledge-based approach to analyzing dynamic nuclear medicine data sets using factor analysis. Prior knowledge is used as constraints to produce factor images and their associated time functions which are physically and physiologically realistic. These methods have been applied to both planar and tomographic image sequences acquired using various single-photon emitting and positron emitting radiotracers. Computer-simulated data, non-human primate studies, and human clinical studies have been used to develop and evaluate the methodology. The organ systems studied include the kidneys, heart, brain, liver, and bone. The factors generated represent various isolated aspects of physiologic function, such as tissue perfusion and clearance. In some clinical studies, the factors have indicated the potential to isolate diseased tissue from normally functioning tissue. In addition, the factor analysis of data acquired using newly developed radioligands has shown the ability to differentiate the specific binding of the radioligand to the targeted receptors from the non-specific binding. This suggests the potential use of factor analysis in the development and evaluation of radiolabeled compounds as well as in the investigation of specific receptor systems and their role in diagnosing disease.

  9. Whole Transcriptome Analysis Using Next-Generation Sequencing of Sterile-Cultured Eisenia andrei for Immune System Research

    PubMed Central

    Mikami, Yoshikazu; Fukushima, Atsushi; Kuwada-Kusunose, Takao; Sakurai, Tetsuya; Kitano, Taiichi; Komiyama, Yusuke; Iwase, Takashi; Komiyama, Kazuo

    2015-01-01

    Recently, earthworms have become a useful model for research into the immune system, and it is expected that results obtained using this model will shed light on the sophisticated vertebrate immune system and the evolution of the immune response, and additionally help identify new biomolecules with therapeutic applications. However, for earthworms to be used as a genetic model of the invertebrate immune system, basic molecular and genetic resources, such as an expressed sequence tag (EST) database, must be developed for this organism. Next-generation sequencing technologies have generated EST libraries by RNA-seq in many model species. In this study, we used Illumina RNA-sequence technology to perform a comprehensive transcriptome analysis using an RNA sample pooled from sterile-cultured Eisenia andrei. All clean reads were assembled de novo into 41,423 unigenes using the Trinity program. Using this transcriptome data, we performed BLAST analysis against the GenBank non-redundant (NR) database and obtained a total of 12,285 significant BLAST hits. Furthermore, gene ontology (GO) analysis assigned 78 unigenes to 24 immune class GO terms. In addition, we detected a unigene with high similarity to beta-1,3-glucuronyltransferase 1 (GlcAT-P), which mediates a glucuronyl transfer reaction during the biosynthesis of the carbohydrate epitope HNK-1 (human natural killer-1, also known as CD57), a marker of NK cells. The identified transcripts will be used to facilitate future research into the immune system using E. andrei. PMID:25706644

  10. Whole transcriptome analysis using next-generation sequencing of sterile-cultured Eisenia andrei for immune system research.

    PubMed

    Mikami, Yoshikazu; Fukushima, Atsushi; Kuwada-Kusunose, Takao; Sakurai, Tetsuya; Kitano, Taiichi; Komiyama, Yusuke; Iwase, Takashi; Komiyama, Kazuo

    2015-01-01

    Recently, earthworms have become a useful model for research into the immune system, and it is expected that results obtained using this model will shed light on the sophisticated vertebrate immune system and the evolution of the immune response, and additionally help identify new biomolecules with therapeutic applications. However, for earthworms to be used as a genetic model of the invertebrate immune system, basic molecular and genetic resources, such as an expressed sequence tag (EST) database, must be developed for this organism. Next-generation sequencing technologies have generated EST libraries by RNA-seq in many model species. In this study, we used Illumina RNA-sequence technology to perform a comprehensive transcriptome analysis using an RNA sample pooled from sterile-cultured Eisenia andrei. All clean reads were assembled de novo into 41,423 unigenes using the Trinity program. Using this transcriptome data, we performed BLAST analysis against the GenBank non-redundant (NR) database and obtained a total of 12,285 significant BLAST hits. Furthermore, gene ontology (GO) analysis assigned 78 unigenes to 24 immune class GO terms. In addition, we detected a unigene with high similarity to beta-1,3-glucuronyltransferase 1 (GlcAT-P), which mediates a glucuronyl transfer reaction during the biosynthesis of the carbohydrate epitope HNK-1 (human natural killer-1, also known as CD57), a marker of NK cells. The identified transcripts will be used to facilitate future research into the immune system using E. andrei. PMID:25706644

  11. Molecular cloning and sequencing analysis of the interferon β from Coturnix.

    PubMed

    Zheng, Bei; Chang, Wei-Shan

    2014-01-01

    One pair of primers was designed according to Gallus and Meleagris gallopavo interferon β (IFN-β) sequences published in GenBank. The primers and RNA extraction from the spleen of Coturnix were used to amplify Coturnix IFN-β cDNA by real-time polymerase chain reaction (RT-PCR). The product was cloned into pEasy-T1 vector. Evaluating recombinant plasmid by PCR and restriction enzyme digestion. Sequence the cloning sequences, comparing the sequencing results by NCBI. We successfully got a Coturnix IFN-β partial sequence. The sequence was subtyped and put to homologous analysis. The results suggested the homology of IFN-β gene of Coturnix and gene of Coturnix and chicken (88.7%), the homology of IFN-β gene of Coturnix and chicken (88.7%), the homology of IFN-β gene of Coturnix and Anas platyrhynchos (72.5%), the homology of IFN-β sequence registered in GenBank. The analysis of the genetic tree showed that the relationship of Coturnix and chicken IFN-β had a high homology. It can be seen that in this study we successfully got a partial sequence of IFN-β of quail. PMID:26155095

  12. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    PubMed Central

    Neely, Robert K; Roberts, Richard J

    2008-01-01

    Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360), cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases. PMID:18479503

  13. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants

    PubMed Central

    Gundry, Michael; Vijg, Jan

    2011-01-01

    DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5,000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a

  14. Median network analysis of defectively sequenced entire mitochondrial genomes from early and contemporary disease studies.

    PubMed

    Bandelt, Hans-Jürgen; Yao, Yong-Gang; Bravi, Claudio M; Salas, Antonio; Kivisild, Toomas

    2009-03-01

    Sequence analysis of the mitochondrial genome has become a routine method in the study of mitochondrial diseases. Quite often, the sequencing efforts in the search of pathogenic or disease-associated mutations are affected by technical and interpretive problems, caused by sample mix-up, contamination, biochemical problems, incomplete sequencing, misdocumentation and insufficient reference to previously published data. To assess data quality in case studies of mitochondrial diseases, it is recommended to compare any mtDNA sequence under consideration to their phylogenetically closest lineages available in the Web. The median network method has proven useful for visualizing potential problems with the data. We contrast some early reports of complete mtDNA sequences to more recent total mtDNA sequencing efforts in studies of various mitochondrial diseases. We conclude that the quality of complete mtDNA sequences generated in the medical field in the past few years is somewhat unsatisfactory and may even fall behind that of pioneer manual sequencing in the early nineties. Our study provides a paradigm for an a posteriori evaluation of sequence quality and for detection of potential problems with inferring a pathogenic status of a particular mutation. PMID:19322152

  15. Growth characteristics and complete genomic sequence analysis of a novel pseudorabies virus in China.

    PubMed

    Yu, Teng; Chen, Fangzhou; Ku, Xugang; Fan, Jie; Zhu, Yinxing; Ma, Hailong; Li, Subei; Wu, Bin; He, Qigai

    2016-08-01

    Swine pseudorabies (PR) re-emerged in Bartha-vaccinated pig herds and caused death of millions of piglets in China since the later part of 2011. We isolated a novel pseudorabies virus (PRV), named HNX strain, from the brain of abortion fetuses to diagnose the disease. To reveal the genomic organization and characterize the HNX strain, the complete genomes of HNX and Fa strain, an isolate in the 1960s, were sequenced and analyzed. The genomic size of HNX and Fa strains were 142,294 and 141,930 nt, respectively, with corresponding G + C contents of 73.56 and 73.70 %. The two strains consistently possessed 70 open reading frames. In addition, comparative genomic analysis between HNX and Bartha strains was performed to understand the possible reason of immune failure. The major virulence-associated genes of HNX strain had slight changes, whereas glycoprotein B and glycoprotein C genes of HNX strain had 73 mutations; the homology at the whole genomic level between HNX and Bartha strains was 90.6 %. Genome-wide comparison between HNX and Fa strains indicated that the strains shared about 96.4 % of homology and clustered in a separate Chinese isolate group; the two strains are also distant from the isolates from other countries. Similarity plot and bootscanning analysis of complete genome sequences of nine PRV strains, including HNX and Fa, four newly Chinese strains, and three traditional reference strains, revealed that non-recombination events occurred in the HNX strain. The PRV HNX strain with genomic variations might contribute to the PR outbreak in China since the later part of 2011. PMID:27012685

  16. Quantitative trait analysis in sequencing studies under trait-dependent sampling.

    PubMed

    Lin, Dan-Yu; Zeng, Donglin; Tang, Zheng-Zheng

    2013-07-23

    It is not economically feasible to sequence all study subjects in a large cohort. A cost-effective strategy is to sequence only the subjects with the extreme values of a quantitative trait. In the National Heart, Lung, and Blood Institute Exome Sequencing Project, subjects with the highest or lowest values of body mass index, LDL, or blood pressure were selected for whole-exome sequencing. Failure to account for such trait-dependent sampling can cause severe inflation of type I error and substantial loss of power in quantitative trait analysis, especially when combining results from multiple studies with different selection criteria. We present valid and efficient statistical methods for association analysis of sequencing data under trait-dependent sampling. We pay special attention to gene-based analysis of rare variants. Our methods can be used to perform quantitative trait analysis not only for the trait that is used to select subjects for sequencing but for any other traits that are measured. For a particular trait of interest, our approach properly combines the association results from all studies with measurements of that trait. This meta-analysis is substantially more powerful than the analysis of any single study. By contrast, meta-analysis of standard linear regression results (ignoring trait-dependent sampling) can be less powerful than the analysis of a single study. The advantages of the proposed methods are demonstrated through simulation studies and the National Heart, Lung, and Blood Institute Exome Sequencing Project data. The methods are applicable to other types of genetic association studies and nongenetic studies. PMID:23847208

  17. MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data

    PubMed Central

    Huson, Daniel H.; Beier, Sina; Flade, Isabell; Ruscheweyh, Hans-Joachim; Tappu, Rewati

    2016-01-01

    There is increasing interest in employing shotgun sequencing, rather than amplicon sequencing, to analyze microbiome samples. Typical projects may involve hundreds of samples and billions of sequencing reads. The comparison of such samples against a protein reference database generates billions of alignments and the analysis of such data is computationally challenging. To address this, we have substantially rewritten and extended our widely-used microbiome analysis tool MEGAN so as to facilitate the interactive analysis of the taxonomic and functional content of very large microbiome datasets. Other new features include a functional classifier called InterPro2GO, gene-centric read assembly, principal coordinate analysis of taxonomy and function, and support for metadata. The new program is called MEGAN Community Edition (CE) and is open source. By integrating MEGAN CE with our high-throughput DNA-to-protein alignment tool DIAMOND and by providing a new program MeganServer that allows access to metagenome analysis files hosted on a server, we provide a straightforward, yet powerful and complete pipeline for the analysis of metagenome shotgun sequences. We illustrate how to perform a full-scale computational analysis of a metagenomic sequencing project, involving 12 samples and 800 million reads, in less than three days on a single server. All source code is available here: https://github.com/danielhuson/megan-ce PMID:27327495

  18. MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data.

    PubMed

    Huson, Daniel H; Beier, Sina; Flade, Isabell; Górska, Anna; El-Hadidi, Mohamed; Mitra, Suparna; Ruscheweyh, Hans-Joachim; Tappu, Rewati

    2016-06-01

    There is increasing interest in employing shotgun sequencing, rather than amplicon sequencing, to analyze microbiome samples. Typical projects may involve hundreds of samples and billions of sequencing reads. The comparison of such samples against a protein reference database generates billions of alignments and the analysis of such data is computationally challenging. To address this, we have substantially rewritten and extended our widely-used microbiome analysis tool MEGAN so as to facilitate the interactive analysis of the taxonomic and functional content of very large microbiome datasets. Other new features include a functional classifier called InterPro2GO, gene-centric read assembly, principal coordinate analysis of taxonomy and function, and support for metadata. The new program is called MEGAN Community Edition (CE) and is open source. By integrating MEGAN CE with our high-throughput DNA-to-protein alignment tool DIAMOND and by providing a new program MeganServer that allows access to metagenome analysis files hosted on a server, we provide a straightforward, yet powerful and complete pipeline for the analysis of metagenome shotgun sequences. We illustrate how to perform a full-scale computational analysis of a metagenomic sequencing project, involving 12 samples and 800 million reads, in less than three days on a single server. All source code is available here: https://github.com/danielhuson/megan-ce. PMID:27327495

  19. Inhibition of protein kinase C catalytic activity by additional regions within the human protein kinase Calpha-regulatory domain lying outside of the pseudosubstrate sequence.

    PubMed

    Kirwan, Angie F; Bibby, Ashley C; Mvilongo, Thierry; Riedel, Heimo; Burke, Thomas; Millis, Sherri Z; Parissenti, Amadeo M

    2003-07-15

    The N-terminal pseudosubstrate site within the protein kinase Calpha (PKCalpha)-regulatory domain has long been regarded as the major determinant for autoinhibition of catalytic domain activity. Previously, we observed that the PKC-inhibitory capacity of the human PKCalpha-regulatory domain was only reduced partially on removal of the pseudosubstrate sequence [Parissenti, Kirwan, Kim, Colantonio and Schimmer (1998) J. Biol. Chem. 273, 8940-8945]. This finding suggested that one or more additional region(s) contributes to the inhibition of catalytic domain activity. To assess this hypothesis, we first examined the PKC-inhibitory capacity of a smaller fragment of the PKCalpha-regulatory domain consisting of the C1a, C1b and V2 regions [GST-Ralpha(39-177): this protein contained the full regulatory domain of human PKCalpha fused to glutathione S-transferase (GST), but lacked amino acids 1-38 (including the pseudosubstrate sequence) and amino acids 178-270 (including the C2 region)]. GST-Ralpha(39-177) significantly inhibited PKC in a phorbol-independent manner and could not bind the peptide substrate used in our assays. These results suggested that a region within C1/V2 directly inhibits catalytic domain activity. Providing further in vivo support for this hypothesis, we found that expression of N-terminally truncated pseudosubstrate-less bovine PKCalpha holoenzymes in yeast was capable of inhibiting cell growth in a phorbol-dependent manner. This suggested that additional autoinhibitory force(s) remained within the truncated holoenzymes that could be relieved by phorbol ester. Using tandem PCR-mediated mutagenesis, we observed that mutation of amino acids 33-86 within GST-Ralpha(39-177) dramatically reduced its PKC-inhibitory capacity when protamine was used as substrate. Mutagenesis of a broad range of sequences within C2 (amino acids 159-242) also significantly reduced PKC-inhibitory capacity. Taken together, these observations support strongly the existence of

  20. Advanced accident sequence precursor analysis level 2 models

    SciTech Connect

    Galyean, W.J.; Brownson, D.A.; Rempe, J.L.

    1996-03-01

    The U.S. Nuclear Regulatory Commission Accident Sequence Precursor program pursues the ultimate objective of performing risk significant evaluations on operational events (precursors) occurring in commercial nuclear power plants. To achieve this objective, the Office of Nuclear Regulatory Research is supporting the development of simple probabilistic risk assessment models for all commercial nuclear power plants (NPP) in the U.S. Presently, only simple Level 1 plant models have been developed which estimate core damage frequencies. In order to provide a true risk perspective, the consequences associated with postulated core damage accidents also need to be considered. With the objective of performing risk evaluations in an integrated and consistent manner, a linked event tree approach which propagates the front end results to back end was developed. This approach utilizes simple plant models that analyze the response of the NPP containment structure in the context of a core damage accident, estimate the magnitude and timing of a radioactive release to the environment, and calculate the consequences for a given release. Detailed models and results from previous studies, such as the NUREG-1150 study, are used to quantify these simple models. These simple models are then linked to the existing Level 1 models, and are evaluated using the SAPHIRE code. To demonstrate the approach, prototypic models have been developed for a boiling water reactor, Peach Bottom, and a pressurized water reactor, Zion.

  1. Sequence analysis and homology modeling of peroxidase from Medicago sativa

    PubMed Central

    Hooda, Vinita; Gundala, Prasada babu; Chinthala, Paramageetham

    2012-01-01

    Plant peroxidases are one of the most extensively studied group of enzymes which find applications in the environment, health, pharmaceutical, chemical and biotechnological processes. Class III secretary peroxidase from alfalfa (Medicago sativa) has been characterized using bioinformatics approach Physiochemical properties and topology of alfalfa peroxidase were compared with that of soybean and horseradish peroxidase, two most popular commercially available peroxidase preparations. Lower value of instability index as predicted by ProtParam and presence of extra disulphide linkages as predicted by Cys_REC suggested alfalfa peroxidase to be more stable than either of the commercial preparations. Multiple Sequence Alignment (MSA) with other functionally similar proteins revealed the presence of highly conserved catalytic residues. Three dimensional model of alfalfa peroxidase was constructed based on the crystal structure of soybean peroxidase (PDB Id: 1FHF A) by homology modelling approach. The model was checked for stereo chemical quality by PROCHECH, VERIFY 3D, WHAT IF, ERRAT, 3D MATCH AND ProSA servers. The best model was selected, energy minimized and used to analyze structure function relationship with substrate hydrogen peroxide by Autodock 4.0. The enzyme substrate complex was viewed with Swiss PDB viewer and one residue ASP43 was found to stabilize the interaction by hydrogen bonds. The results of the study may be a guiding point for further investigations on alfalfa peroxidase. PMID:23275690

  2. Improved data analysis for the MinION nanopore sequencer

    PubMed Central

    Jain, Miten; Fiddes, Ian; Miga, Karen H.; Olsen, Hugh E.; Paten, Benedict; Akeson, Mark

    2016-01-01

    The Oxford Nanopore MinION sequences individual DNA molecules using an array of pores that read nucleotide identities based on ionic current steps. We evaluated and optimized MinION performance using M13 genomic dsDNA. Using expectation-maximization (EM) we obtained robust maximum likelihood (ML) estimates for read insertion, deletion and substitution error rates (4.9%, 7.8%, and 5.1% respectively). We found that 99% of high-quality ‘2D’ MinION reads mapped to reference at a mean identity of 85%. We present a MinION-tailored tool for single nucleotide variant (SNV) detection that uses ML parameter estimates and marginalization over many possible read alignments to achieve precision and recall of up to 99%. By pairing our high-confidence alignment strategy with long MinION reads, we resolved the copy number for a cancer/testis gene family (CT47) within an unresolved region of human chromosome Xq24. PMID:25686389

  3. Secure distributed genome analysis for GWAS and sequence comparison computation

    PubMed Central

    2015-01-01

    Background The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center. Methods In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest. Results We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques. Conclusions This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice. PMID:26733307

  4. Are physicians prepared for whole genome sequencing? a qualitative analysis.

    PubMed

    Christensen, K D; Vassy, J L; Jamal, L; Lehmann, L S; Slashinski, M J; Perry, D L; Robinson, J O; Blumenthal-Barby, J; Feuerman, L Z; Murray, M F; Green, R C; McGuire, A L

    2016-02-01

    Although the integration of whole genome sequencing (WGS) into standard medical practice is rapidly becoming feasible, physicians may be unprepared to use it. Primary care physicians (PCPs) and cardiologists enrolled in a randomized clinical trial of WGS received genomics education before completing semi-structured interviews. Themes about preparedness were identified in transcripts through team-based consensus-coding. Data from 11 PCPs and 9 cardiologists suggested that physicians enrolled in the trial primarily to prepare themselves for widespread use of WGS in the future. PCPs were concerned about their general genomic knowledge, while cardiologists were concerned about how to interpret specific types of results and secondary findings. Both cohorts anticipated preparing extensively before disclosing results to patients by using educational resources with which they were already familiar, and both cohorts anticipated making referrals to genetics specialists as needed. A lack of laboratory guidance, time pressures, and a lack of standards contributed to feeling unprepared. Physicians had specialty-specific concerns about their preparedness to use WGS. Findings identify specific policy changes that could help physicians feel more prepared, and highlight how providers of all types will need to become familiar with interpreting WGS results. PMID:26080898

  5. Comparative genomic analysis of a neurotoxigenic Clostridium species using partial genome sequence: Phylogenetic analysis of a few conserved proteins involved in cellular processes and metabolism.

    PubMed

    Alam, Syed Imteyaz; Dixit, Aparna; Tomar, Arvind; Singh, Lokendra

    2010-04-01

    Clostridial organisms produce neurotoxins, which are generally regarded as the most potent toxic substances of biological origin and potential biological warfare agents. Clostridium tetani produces tetanus neurotoxin and is responsible for the fatal tetanus disease. In spite of the extensive immunization regimen, the disease is an important cause of death especially among neonates. Strains of C. tetani have not been genetically characterized except the complete genome sequencing of strain E88. The present study reports the genetic makeup and phylogenetic affiliations of an environmental strain of this bacterium with respect to C. tetani E88 and other clostridia. A shot gun library was constructed from the genomic DNA of C. tetani drde, isolated from decaying fish sample. Unique clones were sequenced and sequences compared with its closest relative C. tetani E88. A total of 275 clones were obtained and 32,457 bases of non-redundant sequence were generated. A total of 150 base changes were observed over the entire length of sequence obtained, including, additions, deletions and base substitutions. Of the total 120 ORFs detected, 48 exhibited closest similarity to E88 proteins of which three are hypothetical proteins. Eight of the ORFs exhibited similarity with hypothetical proteins from other organisms and 10 aligned with other proteins from unrelated organisms. There is an overall conservation of protein sequences among the two strains of C. tetani and. Selected ORFs involved in cellular processes and metabolism were subjected to phylogenetic analysis. PMID:19527791

  6. Systematic Internal Transcribed Spacer Sequence Analysis for Identification of Clinical Mold Isolates in Diagnostic Mycology: a 5-Year Study▿ †

    PubMed Central

    Ciardo, Diana E.; Lucke, Katja; Imhof, Alex; Bloemberg, Guido V.; Böttger, Erik C.

    2010-01-01

    The implementation of internal transcribed spacer (ITS) sequencing for routine identification of molds in the diagnostic mycology laboratory was analyzed in a 5-year study. All mold isolates (n = 6,900) recovered in our laboratory from 2005 to 2009 were included in this study. According to a defined work flow, which in addition to troublesome phenotypic identification takes clinical relevance into account, 233 isolates were subjected to ITS sequence analysis. Sequencing resulted in successful identification for 78.6% of the analyzed isolates (57.1% at species level, 21.5% at genus level). In comparison, extended in-depth phenotypic characterization of the isolates subjected to sequencing achieved taxonomic assignment for 47.6% of these, with a mere 13.3% at species level. Optimization of DNA extraction further improved the efficacy of molecular identification. This study is the first of its kind to testify to the systematic implementation of sequence-based identification procedures in the routine workup of mold isolates in the diagnostic mycology laboratory. PMID:20573873

  7. Sequence analysis of the complete genome of Trichoplusia ni single nucleopolyhedrovirus and the identification of a baculoviral photolyase gene

    SciTech Connect

    Willis, Leslie G.; Siepp, Robyn; Stewart, Taryn M.; Erlandson, Martin A.; Theilmann, David A. . E-mail: TheilmannD@agr.gc.ca

    2005-08-01

    The genome of the Trichoplusia ni single nucleopolyhedrovirus (TnSNPV), a group II NPV which infects the cabbage looper (T. ni), has been completely sequenced and analyzed. The TnSNPV DNA genome consists of 134,394 bp and has an overall G + C content of 39%. Gene analysis predicted 144 open reading frames (ORFs) of 150 nucleotides or greater that showed minimal overlap. Comparisons with previously sequenced baculoviruses indicate that 119 TnSNPV ORFs were homologues of previously reported viral gene sequences. Ninety-four TnSNPV ORFs returned an Autographa californica multiple NPV (AcMNPV) homologue while 25 ORFs returned poor or no sequence matches with the current databases. A putative photolyase gene was also identified that had highest amino acid identity to the photolyase genes of Chrysodeixis chalcites NPV (ChchNPV) (47%) and Danio rerio (zebrafish) (40%). In addition unlike all other baculoviruses no obvious homologous repeat (hr) sequences were identified. Comparison of the TnSNPV and AcMNPV genomes provides a unique opportunity to examine two baculoviruses that are highly virulent for a common insect host (T. ni) yet belong to diverse baculovirus taxonomic groups and possess distinct biological features. In vitro fusion assays demonstrated that the TnSNPV F protein induces membrane fusion and syncytia formation and were compared to syncytia formed by AcMNPV GP64.

  8. Neutron-activation analysis by standard addition and solvent extraction Determination of traces of antimony.

    PubMed

    Alian, A; Shabana, R; Sanad, W; Allam, B; Khalifa, K

    1968-02-01

    The application of neutron activation analysis by standard addition and solvent extraction to the determination of traces of antimony in aluminium and rocks is reported. Three simple extraction procedures, using isopropyl ether, hexone, and tributyl phosphate, are described for the selective separation of radioantimony from interfering radionuclides. Antimony concentration is measured by counting the activities of the (122)Sb and (124)Sb photopeaks at 0.564 and 0.603 MeV. PMID:18960289

  9. Antimicrobial susceptibility among clinical Nocardia species identified by multilocus sequence analysis.

    PubMed

    McTaggart, Lisa R; Doucet, Jennifer; Witkowska, Maria; Richardson, Susan E

    2015-01-01

    Antimicrobial susceptibility patterns of 112 clinical isolates, 28 type strains, and 9 reference strains of Nocardia were determined using the Sensititre Rapmyco microdilution panel (Thermo Fisher, Inc.). Isolates were identified by highly discriminatory multilocus sequence analysis and were chosen to represent the diversity of species recovered from clinical specimens in Ontario, Canada. Susceptibility to the most commonly used drug, trimethoprim-sulfamethoxazole, was observed in 97% of isolates. Linezolid and amikacin were also highly effective; 100% and 99% of all isolates demonstrated a susceptible phenotype. For the remaining antimicrobials, resistance was species specific with isolates of Nocardia otitidiscaviarum, N. brasiliensis, N. abscessus complex, N. nova complex, N. transvalensis complex, N. farcinica, and N. cyriacigeorgica displaying the traditional characteristic drug pattern types. In addition, the antimicrobial susceptibility profiles of a variety of rarely encountered species isolated from clinical specimens are reported for the first time and were categorized into four additional drug pattern types. Finally, MICs for the control strains N. nova ATCC BAA-2227, N. asteroides ATCC 19247(T), and N. farcinica ATCC 23826 were robustly determined to demonstrate method reproducibility and suitability of the commercial Sensititre Rapmyco panel for antimicrobial susceptibility testing of Nocardia spp. isolated from clinical specimens. The reported values will facilitate quality control and standardization among laboratories. PMID:25348540

  10. Antimicrobial Susceptibility among Clinical Nocardia Species Identified by Multilocus Sequence Analysis

    PubMed Central

    Doucet, Jennifer; Witkowska, Maria; Richardson, Susan E.

    2014-01-01

    Antimicrobial susceptibility patterns of 112 clinical isolates, 28 type strains, and 9 reference strains of Nocardia were determined using the Sensititre Rapmyco microdilution panel (Thermo Fisher, Inc.). Isolates were identified by highly discriminatory multilocus sequence analysis and were chosen to represent the diversity of species recovered from clinical specimens in Ontario, Canada. Susceptibility to the most commonly used drug, trimethoprim-sulfamethoxazole, was observed in 97% of isolates. Linezolid and amikacin were also highly effective; 100% and 99% of all isolates demonstrated a susceptible phenotype. For the remaining antimicrobials, resistance was species specific with isolates of Nocardia otitidiscaviarum, N. brasiliensis, N. abscessus complex, N. nova complex, N. transvalensis complex, N. farcinica, and N. cyriacigeorgica displaying the traditional characteristic drug pattern types. In addition, the antimicrobial susceptibility profiles of a variety of rarely encountered species isolated from clinical specimens are reported for the first time and were categorized into four additional drug pattern types. Finally, MICs for the control strains N. nova ATCC BAA-2227, N. asteroides ATCC 19247T, and N. farcinica ATCC 23826 were robustly determined to demonstrate method reproducibility and suitability of the commercial Sensititre Rapmyco panel for antimicrobial susceptibility testing of Nocardia spp. isolated from clinical specimens. The reported values will facilitate quality control and standardization among laboratories. PMID:25348540

  11. Gene identification and DNA sequence analysis in the GC-poor 20 megabase region of human chromosome 21.

    PubMed

    Yu, J; Tong, S; Shen, Y; Kao, F T

    1997-06-24

    In contrast to the distal half of the long arm of chromosome 21, the proximal half of approximately 20 megabases of DNA, including 21q11-21 bands, is low in GC content, CpG islands, and identified genes. Despite intensive searches, very few genes and cDNAs have been found in this region. Since the 21q11-21 region is associated with certain Down syndrome pathologies like mental retardation, the identification of relevant genes in this region is important. We used a different approach by constructing microdissection libraries specifically for this region and isolating unique sequence microclones for detailed molecular analysis. We found that this region is enriched with middle and low-copy repetitive sequences, and is also heavily methylated. By sequencing and homology analysis, we identified a significant number of genes/cDNAs, most of which appear to belong to gene families. In addition, we used unique sequence microclones in direct screening of cDNA libraries and isolated 12 cDNAs for this region. Thus, although the 21q11-21 region is gene poor, it is not completely devoid of genes/cDNAs. The presence of high proportions of middle and low-copy repetitive sequences in this region may have evolutionary significance in the genome organization and function of this region. Since 21q11-21 is heavily methylated, the expression of genes in this region may be regulated by a delicate balance of methylation and demethylation, and the presence of an additional copy of chromosome 21 may seriously disturb this balance and cause specific Down syndrome anomalies including mental retardation. PMID:9192657

  12. Similarity/Dissimilarity Analysis of Protein Sequences Based on a New Spectrum-Like Graphical Representation

    PubMed Central

    Yao, Yuhua; Yan, Shoujiang; Xu, Huimin; Han, Jianning; Nan, Xuying; He, Ping-an; Dai, Qi

    2014-01-01

    Sequence comparison is one of the foundations in bioinformatics, which can be used to study evolutionary relations among the sequences. In this study, a 2D spectrum-like graphical representation of protein sequences is presented based on the hydrophobicity scale of amino acids. The frequencies of amplitudes of 4-subsequences are adopted to characterize a spectrum-like graph, and a 17D vector is used as the descriptor of protein sequence. The χ2 value of compatibility test is performed. New similarity analysis approach is illustrated on the all protein sequences, which are encoded by the mitochondrion genome of 20 different species. Finally, comparison with the ClustalW method shows the utility of our method. PMID:25002811

  13. Complete genome sequencing and comparative genomic analysis of functionally diverse Lysinibacillus sphaericus III(3)7.

    PubMed

    Rey, Andrés; Silva-Quintero, Laura; Dussán, Jenny

    2016-09-01

    Lysinibacillus sphaericus III(3)7 is a native Colombian strain, the first one isolated from soil samples. This strain has shown high levels of pathogenic activity against Culex quinquefaciatus larvae in laboratory assays compared to other members of the same species. Using Pacific Biosciences sequencing technology we sequenced, annotated (de novo) and described the genome of strain III(3)7, achieving a complete genome sequence status. We then performed a comparative analysis between the newly sequenced genome and the ones previously reported for Colombian isolates L. sphaericus OT4b.31, CBAM5 and OT4b.25, with the inclusion of L. sphaericus C3-41 that has been used as a reference genome for most of previous genome sequencing projects. We concluded that L. sphaericus III(3)7 is highly similar with strain OT4b.25 and shares high levels of synteny with isolates CBAM5 and C3-41. PMID:27419068

  14. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats

    PubMed Central

    Baud, Amelie; Hermsen, Roel; Guryev, Victor; Stridh, Pernilla; Graham, Delyth; McBride, Martin W.; Foroud, Tatiana; Calderari, Sophie; Diez, Margarita; Ockinger, Johan; Beyeen, Amennai D.; Gillett, Alan; Abdelmagid, Nada; Guerreiro-Cacais, Andre Ortlieb; Jagodic, Maja; Tuncel, Jonatan; Norin, Ulrika; Beattie, Elisabeth; Huynh, Ngan; Miller, William H.; Koller, Daniel L.; Alam, Imranul; Falak, Samreen; Osborne-Pellegrin, Mary; Martinez-Membrives, Esther; Canete, Toni; Blazquez, Gloria; Vicens-Costa, Elia; Mont-Cardona, Carme; Diaz-Moran, Sira; Tobena, Adolf; Hummel, Oliver; Zelenika, Diana; Saar, Kathrin; Patone, Giannino; Bauerfeind, Anja; Bihoreau, Marie-Therese; Heinig, Matthias; Lee, Young-Ae; Rintisch, Carola; Schulz, Herbert; Wheeler, David A.; Worley, Kim C.; Muzny, Donna M.; Gibbs, Richard A.; Lathrop, Mark; Lansu, Nico; Toonen, Pim; Ruzius, Frans Paul; de Bruijn, Ewart; Hauser, Heidi; Adams, David J.; Keane, Thomas; Atanur, Santosh S.; Aitman, Tim J.; Flicek, Paul; Malinauskas, Tomas; Jones, E. Yvonne; Ekman, Diana; Lopez-Aumatell, Regina; Dominiczak, Anna F; Johannesson, Martina; Holmdahl, Rikard; Olsson, Tomas; Gauguier, Dominique; Hubner, Norbert; Fernandez-Teruel, Alberto; Cuppen, Edwin; Mott, Richard; Flint, Jonathan

    2013-01-01

    Genetic mapping on fully sequenced individuals is transforming our understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We identify 35 causal genes involved in 31 phenotypes, implicating novel genes in models of anxiety, heart disease and multiple sclerosis. The relation between sequence and genetic variation is unexpectedly complex: at approximately 40% of quantitative trait loci a single sequence variant cannot account for the phenotypic effect. Using comparable sequence and mapping data from mice, we show the extent and spatial pattern of variation in inbred rats differ significantly from those of inbred mice, and that the genetic variants in orthologous genes rarely contribute to the same phenotype in both species. PMID:23708188

  15. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats.

    PubMed

    Baud, Amelie; Hermsen, Roel; Guryev, Victor; Stridh, Pernilla; Graham, Delyth; McBride, Martin W; Foroud, Tatiana; Calderari, Sophie; Diez, Margarita; Ockinger, Johan; Beyeen, Amennai D; Gillett, Alan; Abdelmagid, Nada; Guerreiro-Cacais, Andre Ortlieb; Jagodic, Maja; Tuncel, Jonatan; Norin, Ulrika; Beattie, Elisabeth; Huynh, Ngan; Miller, William H; Koller, Daniel L; Alam, Imranul; Falak, Samreen; Osborne-Pellegrin, Mary; Martinez-Membrives, Esther; Canete, Toni; Blazquez, Gloria; Vicens-Costa, Elia; Mont-Cardona, Carme; Diaz-Moran, Sira; Tobena, Adolf; Hummel, Oliver; Zelenika, Diana; Saar, Kathrin; Patone, Giannino; Bauerfeind, Anja; Bihoreau, Marie-Therese; Heinig, Matthias; Lee, Young-Ae; Rintisch, Carola; Schulz, Herbert; Wheeler, David A; Worley, Kim C; Muzny, Donna M; Gibbs, Richard A; Lathrop, Mark; Lansu, Nico; Toonen, Pim; Ruzius, Frans Paul; de Bruijn, Ewart; Hauser, Heidi; Adams, David J; Keane, Thomas; Atanur, Santosh S; Aitman, Tim J; Flicek, Paul; Malinauskas, Tomas; Jones, E Yvonne; Ekman, Diana; Lopez-Aumatell, Regina; Dominiczak, Anna F; Johannesson, Martina; Holmdahl, Rikard; Olsson, Tomas; Gauguier, Dominique; Hubner, Norbert; Fernandez-Teruel, Alberto; Cuppen, Edwin; Mott, Richard; Flint, Jonathan

    2013-07-01

    Genetic mapping on fully sequenced individuals is transforming understanding of the relationship between molecular variation and variation in complex traits. Here we report a combined sequence and genetic mapping analysis in outbred rats that maps 355 quantitative trait loci for 122 phenotypes. We identify 35 causal genes involved in 31 phenotypes, implicating new genes in models of anxiety, heart disease and multiple sclerosis. The relationship between sequence and genetic variation is unexpectedly complex: at approximately 40% of quantitative trait loci, a single sequence variant cannot account for the phenotypic effect. Using comparable sequence and mapping data from mice, we show that the extent and spatial pattern of variation in inbred rats differ substantially from those of inbred mice and that the genetic variants in orthologous genes rarely contribute to the same phenotype in both species. PMID:23708188

  16. Applying machine learning techniques to DNA sequence analysis. Progress report, February 14, 1991--February 13, 1992

    SciTech Connect

    Shavlik, J.W.

    1992-04-01

    We are developing a machine learning system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being learned. Using this information (which we call a ``domain theory``), our learning algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, the KBANN algorithm maps inference rules, such as consensus sequences, into a neural (connectionist) network. Neural network training techniques then use the training examples of refine these inference rules. We have been applying this approach to several problems in DNA sequence analysis and have also been extending the capabilities of our learning system along several dimensions.

  17. Assessing Mitochondrial DNA Variation and Copy Number in Lymphocytes of ~2,000 Sardinians Using Tailored Sequencing Analysis Tools.

    PubMed

    Ding, Jun; Sidore, Carlo; Butler, Thomas J; Wing, Mary Kate; Qian, Yong; Meirelles, Osorio; Busonero, Fabio; Tsoi, Lam C; Maschio, Andrea; Angius, Andrea; Kang, Hyun Min; Nagaraja, Ramaiah; Cucca, Francesco; Abecasis, Gonçalo R; Schlessinger, David

    2015-07-01

    DNA sequencing identifies common and rare genetic variants for association studies, but studies typically focus on variants in nuclear DNA and ignore the mitochondrial genome. In fact, analyzing variants in mitochondrial DNA (mtDNA) sequences presents special problems, which we resolve here with a general solution for the analysis of mtDNA in next-generation sequencing studies. The new program package comprises 1) an algorithm designed to identify mtDNA variants (i.e., homoplasmies and heteroplasmies), incorporating sequencing error rates at each base in a likelihood calculation and allowing allele fractions at a variant site to differ across individuals; and 2) an estimation of mtDNA copy number in a cell directly from whole-genome sequencing data. We also apply the methods to DNA sequence from lymphocytes of ~2,000 SardiNIA Project participants. As expected, mothers and offspring share all homoplasmies but a lesser proportion of heteroplasmies. Both homoplasmies and heteroplasmies show 5-fold higher transition/transversion ratios than variants in nuclear DNA. Also, heteroplasmy increases with age, though on average only ~1 heteroplasmy reaches the 4% level between ages 20 and 90. In addition, we find that mtDNA copy number averages ~110 copies/lymphocyte and is ~54% heritable, implying substantial genetic regulation of the level of mtDNA. Copy numbers also decrease modestly but significantly with age, and females on average have significantly more copies than males. The mtDNA copy numbers are significantly associated with waist circumference (p-value = 0.0031) and waist-hip ratio (p-value = 2.4×10-5), but not with body mass index, indicating an association with central fat distribution. To our knowledge, this is the largest population analysis to date of mtDNA dynamics, revealing the age-imposed increase in heteroplasmy, the relatively high heritability of copy number, and the association of copy number with metabolic traits. PMID:26172475

  18. Multilocus Sequence Analysis for the Assessment of Phylogenetic Diversity and Biogeography in Hyphomonas Bacteria from Diverse Marine Environments

    PubMed Central

    Li, Guizhen; Liu, Yang; Sun, Fengqin; Shao, Zongze

    2014-01-01

    Hyphomonas, a genus of budding, prosthecate bacteria, are primarily found in the marine environment. Seven type strains, and 35 strains from our collections of Hyphomonas, isolated from the Pacific Ocean, Atlantic Ocean, Arctic Ocean, South China Sea and the Baltic Sea, were investigated in this study using multilocus sequence analysis (MLSA). The phylogenetic structure of these bacteria was evaluated using the 16S rRNA gene, and five housekeeping genes (leuA, clpA, pyrH, gatA and rpoD) as well as their concatenated sequences. Our results showed that each housekeeping gene and the concatenated gene sequence all yield a higher taxonomic resolution than the 16S rRNA gene. The 42 strains assorted into 12 groups. Each group represents an independent species, which was confirmed by virtual DNA-DNA hybridization (DDH) estimated from draft genome sequences. Hyphomonas MLSA interspecies and intraspecies boundaries ranged from 93.3% to 96.3%, similarity calculated using a combined DDH and MLSA approach. Furthermore, six novel species (groups I, II, III, IV, V and XII) of the genus Hyphomonas exist, based on sequence similarities of the MLSA and DDH values. Additionally, we propose that the leuA gene (93.0% sequence similarity across our dataset) alone could be used as a fast and practical means for identifying species within Hyphomonas. Finally, Hyphomonas' geographic distribution shows that strains from the same area tend to cluster together as discrete species. This study provides a framework for the discrimination and phylogenetic analysis of the genus Hyphomonas for the first time, and will contribute to a more thorough understanding of the biological and ecological roles of this genus. PMID:25019154

  19. A novel long-range PCR sequencing method for genetic analysis of the entire PKD1 gene.

    PubMed

    Tan, Ying-Cai; Michaeel, Alber; Blumenfeld, Jon; Donahue, Stephanie; Parker, Tom; Levine, Daniel; Rennert, Hanna

    2012-07-01

    Genetic testing of PKD1 and PKD2 is useful for the diagnosis and prognosis of autosomal dominant polycystic kidney disease; however, analysis is complicated by the large transcript size, the complexity of the gene region, and the high level of gene variations. We developed a novel mutation screening assay for PKD1 by directly sequencing long-range (LR) PCR products. By using this method, the entire PKD1 coding region was amplified by nine reactions, generating product sizes from 2 to 6 kb, circumventing the need for specific PCR amplification of individual exons. This method was compared with direct sequencing used by a reference laboratory and the SURVEYOR-WAVE Nucleic Acid High Sensitivity Fragment Analysis System (Transgenomic) screening method for five patients with autosomal dominant polycystic kidney disease. A total of 53 heterozygous genetic changes were identified by LR PCR sequencing, including 41 (of 42) variations detected by SURVEYOR nuclease and all 32 variations reported by the reference laboratory, detecting an additional 12 intronic changes not identified by the other two methods. Compared with the reference laboratory, LR PCR sequencing had a sensitivity of 100%, a specificity of 98.5%, and an accuracy of 98.8%; compared with the SURVEYOR-WAVE method, it had a sensitivity of 97.1%, a specificity of 100%, and an accuracy of 99.4%. In conclusion, LR PCR sequencing was superior to the direct sequencing and screening methods for detecting genetic variations, achieving high sensitivity and improved intronic coverage with a faster turnaround time and lower costs, and providing a reliable tool for complex genetic analyses. PMID:22608885

  20. Sequence and analysis of the human ABL gene, the BCR gene, and regions involved in the Philadelphia chromosomal translocation

    SciTech Connect

    Burian, D.; Clifton, S.W.; Crabtree, J.

    1995-05-01

    The complete human BCR gene (152j-141 nt) on chromosome 22 and greater than 80% of the human ABL gene (179-512 nt) on chromosome 9 have been sequenced from mapped cosmid and plasmid clones via a shotgun strategy. Because these two chromosomes are translocated with breakpoints within the BCR and ABL genes in Philadelphia chromosome-positive leukemias, knowledge of these sequences also might provide insight into the validity of various theories of chromosomal rearrangements. Comparison of these genes with their cDNA sequences reveal the positions of 23 BCR exons and putative alternative BCR first and second exons, as well as the common ABL exons 2-11, respectively. Additionally, these regions include the alternative ABL first exons 1b and 1a, a new gene 5` to the first ABL exon, and an open reading frame with homology to an EST within the BCR fourth intron. Further analysis reveals an Alu homology of 38.83 and 39.35% for the BCR and ABL genes, respectively, with other repeat elements present to a lesser extent. Four new Philadelphia chromosome translocation breakpoints from chronic myelogenous leukemia patients also were sequenced, and the positions of these and several other previously sequenced breakpoints now have been mapped precisely, although no consistent breakpoint features immediately were apparent. Comparative analysis of genomic sequences encompassing the murine homologues to the human ABL exons 1b and 1a, as well as regions encompassing the ABL exons 2 and 3, reveals that although there is a high degree of homology in their corresponding exons and promoter regions, these two vertebrate species show a striking lack of homology outside these regions. 122 refs., 5 figs., 4 tabs.

  1. Deep Sequencing of Porphyromonas gingivalis and Comparative Transcriptome Analysis of a LuxS Mutant

    PubMed Central

    Hirano, Takanori; Beck, David A. C.; Demuth, Donald R.; Hackett, Murray; Lamont, Richard J.

    2012-01-01

    Porphyromonas gingivalis is a major etiological agent in chronic and aggressive forms of periodontal disease. The organism is an asaccharolytic anaerobe and is a constituent of mixed species biofilms in a variety of microenvironments in the oral cavity. P. gingivalis expresses a range of virulence factors over which it exerts tight control. High-throughput sequencing technologies provide the opportunity to relate functional genomics to basic biology. In this study we report qualitative and quantitative RNA-Seq analysis of the transcriptome of P. gingivalis. We have also applied RNA-Seq to the transcriptome of a ΔluxS mutant of P. gingivalis deficient in AI-2-mediated bacterial communication. The transcriptome analysis confirmed the expression of all predicted ORFs for strain ATCC 33277, including 854 hypothetical proteins, and allowed the identification of hitherto unknown transcriptional units. Twelve non-coding RNAs were identified, including 11 small RNAs and one cobalamin riboswitch. Fifty-seven genes were differentially regulated in the LuxS mutant. Addition of exogenous synthetic 4,5-dihydroxy-2,3-pentanedione (DPD, AI-2 precursor) to the ΔluxS mutant culture complemented expression of a subset of genes, indicating that LuxS is involved in both AI-2 signaling and non-signaling dependent systems in P. gingivalis. This work provides an important dataset for future study of P. gingivalis pathophysiology and further defines the LuxS regulon in this oral pathogen. PMID:22919670

  2. The genome sequence of Leishmania (Leishmania) amazonensis: functional annotation and extended analysis of gene models.

    PubMed

    Real, Fernando; Vidal, Ramon Oliveira; Carazzolle, Marcelo Falsarella; Mondego, Jorge Maurício Costa; Costa, Gustavo Gilson Lacerda; Herai, Roberto Hirochi; Würtele, Martin; de Carvalho, Lucas Miguel; Carmona e Ferreira, Renata; Mortara, Renato Arruda; Barbiéri, Clara Lucia; Mieczkowski, Piotr; da Silveira, José Franco; Briones, Marcelo Ribeiro da Silva; Pereira, Gonçalo Amarante Guimarães; Bahia, Diana

    2013-12-01

    We present the sequencing and annotation of the Leishmania (Leishmania) amazonensis genome, an etiological agent of human cutaneous leishmaniasis in the Amazon region of Brazil. L. (L.) amazonensis shares features with Leishmania (L.) mexicana but also exhibits unique characteristics regarding geographical distribution and clinical manifestations of cutaneous lesions (e.g. borderline disseminated cutaneous leishmaniasis). Predicted genes were scored for orthologous gene families and conserved domains in comparison with other human pathogenic Leishmania spp. Carboxypeptidase, aminotransferase, and 3'-nucleotidase genes and ATPase, thioredoxin, and chaperone-related domains were represented more abundantly in L. (L.) amazonensis and L. (L.) mexicana species. Phylogenetic analysis revealed that these two species share groups of amastin surface proteins unique to the genus that could be related to specific features of disease outcomes and host cell interactions. Additionally, we describe a hypothetical hybrid interactome of potentially secreted L. (L.) amazonensis proteins and host proteins under the assumption that parasite factors mimic their mammalian counterparts. The model predicts an interaction between an L. (L.) amazonensis heat-shock protein and mammalian Toll-like receptor 9, which is implicated in important immune responses such as cytokine and nitric oxide production. The analysis presented here represents valuable information for future studies of leishmaniasis pathogenicity and treatment. PMID:23857904

  3. Basics of Genome Sequence Analysis in Bioinformatics -- its Fundamental Ideas and Problems

    NASA Astrophysics Data System (ADS)

    Suzuki, Tomonori; Miyazaki, Satoru

    2009-02-01

    The genome sequences are one of the most fundamental data among various omics analyses. So far, basic bioinformatics tools have developing to treat genome sequences. First step of genome sequence analysis is to predict or assign "genes" on genome sequences. In the case of Eukaryotes, we can identify genes by use of full length cDNA sequences with local alignment tools such as search, blast and fasta, etc. However, it is difficult to catch mRNAs (transcripts) in Prokaryotes. Therefore, computational prediction for gene identification is first choice to start genome sequence analysis. In this review, we pick up methods for computational gene prediction first. Once genes are predicted, next step is to functions for proteins or RNAs encoded on a gene. Then, how we can define the distance between gene sequences is very important for the further analysis. So, we describe the basics of mathematical concept for gene comparison. And we also introduce our novel concept for biological sequence comparisons for the view point of informational theory. In the post genome era, many researchers are very interested in not only gene functions but also the gene regulations whose information is also on genome sequences. Cis-regulatory elements, however, is too short to find some mathematical rules. Therefore, computationally predicted cis-elements tend to include many false-positives. To reduce the ratio false-positives, we need reliable database of set of cis-regulatory elements called cis-regulatory modules for a gene. So, we are trying to develop the Cis-Regulatory Elements Module Reference Database. In the third section, we introduce you the procedure to construct the Cis-Regulatory Elements Module Reference Database and its user interfaces.

  4. Reservoir sequence analysis: A new technology for the 90`s and its application to oil and gas fields

    SciTech Connect

    Wornardt, W.W.

    1996-08-01

    Reservoir Sequence Analysis when applied to existing fields can increase the production, life of the field and extend the field with a minimum of cost. In this technology we identify reservoir sands in a standard-of-reference well, to establish a seismic sequence stratigraphic well-tie for the entire field. Age date the Maximum Flooding Surfaces and Sequence Boundaries above and below reservoir sands on a well-log and seismic pro- file and/or workstation using High Resolution Biostratigraphic Analysis, species abundance and diversity histograms and their patterns, and paleoenvironmental paleobathymetric changes. Identify the systems tracts and their corresponding reservoir sands in between age dated Maximum Flooding Surfaces. Interpret the reservoir sands as to type, i.e. IVF, point bar, coastal belt, forced regression, falling stage, bottom-set (shingled) turbidites, slope fan channel, channel overbank, and basin floor fans. Identify and correlate the same individual sands in different wells, and note new sands in a well and sands that shale-out in a well. Correlate the Maximum Flooding Surfaces above and below the reservoir section in additional wells to see which part of the reservoir section and sands have been penetrated. Identify systems tracts in additional wells and construct isopach, sand percent maps of individual systems tract interval in each well. Correlate sand packages, with a high degree of confidence, from upthrown to downthrown fault blocks, around salt domes, and updip with downdip.

  5. Analysis of loss of decay-heat-removal sequences at Browns Ferry Unit One

    SciTech Connect

    Harrington, R.M.

    1983-01-01

    This paper summarizes the Oak Ridge National Laboratory (ORNL) report Loss of DHR Sequences at Browns Ferry Unit One - Accident Sequence Analysis (NUREG/CR-2973). The Loss of DHR investigation is the third in a series of accident studies concerning the BWR 4 - MK I containment plant design. These studies, sponsored by the Nuclear Regulatory Commission Severe Accident Sequence Analysis (SASA) program, have been conducted at ORNL with the full cooperation of the Tennessee Valley Authority (TVA). The purpose of the SASA studies is to predetermine the probable course of postulated severe accidents so as to establish the timing and the sequence of events. The SASA studies also produce recommendations concerning the implementation of better system design and better emergency operating instructions and operator training. The ORNL studies also include a detailed, best-estimate calculation of the release and transport of radioactive fission products following postulated severe accidents.

  6. Partial N-terminal sequence analysis of human class II molecules expressing the DQw3 determinant.

    PubMed

    Obata, F; Endo, T; Yoshii, M; Otani, F; Igarashi, M; Takenouchi, T; Ikeda, H; Ogasawara, K; Kasahara, M; Wakisaka, A

    1985-09-01

    HLA-DQ molecules were isolated from DRw9-homozygous and DR4-homozygous cell lines by using a monoclonal antibody HU-18, which recognizes class II molecules carrying the conventional DQw3 determinant. The partial N-terminal sequence analysis of the DQw3 molecules revealed that they have sequences homologous to those of murine I-A molecules. Within the limits of our sequence analysis, the DQw3 molecules from the two cell lines are identical to each other in both the alpha and beta chains. The DQ alpha as well as DQ beta chains were found to have amino acid substitutions when compared to other I-A-like molecules whose sequences have been reported. These differences may contribute to the DQw supertypic specificity. The polymorphic nature of DQ molecules is in marked contrast to that of DR molecules where DR alpha chains are highly conserved while DR beta chains have easily detectable amino acid substitutions. PMID:2411700

  7. Multifractal detrended cross-correlation analysis of genome sequences using chaos-game representation

    NASA Astrophysics Data System (ADS)

    Pal, Mayukha; Kiran, V. Satya; Rao, P. Madhusudana; Manimaran, P.

    2016-08-01

    We characterized the multifractal nature and power law cross-correlation between any pair of genome sequence through an integrative approach combining 2D multifractal detrended cross-correlation analysis and chaos game representation. In this paper, we have analyzed genomes of some prokaryotes and calculated fractal spectra h(q) and f(α) . From our analysis, we observed existence of multifractal nature and power law cross-correlation behavior between any pair of genome sequences. Cluster analysis was performed on the calculated scaling exponents to identify the class affiliation and the same is represented as a dendrogram. We suggest this approach may find applications in next generation sequence analysis, big data analytics etc.

  8. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Patel, Kamlesh D [Ken]; SNL,

    2013-01-25

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  9. Single Molecule Sequencing with a HeliScope Genetic Analysis System

    PubMed Central

    Thompson, John F.; Steinmann, Kathleen E.

    2010-01-01

    Helicos™ Single Molecule Sequencing (SMS) provides a unique view of genome biology through direct sequencing of cellular nucleic acids in an unbiased manner, providing both accurate quantitation and sequence information. Sample preparation does not require ligation or PCR amplification, avoiding the GC-content and size biases observed in other technologies. DNA is simply sheared, tailed with poly A, and hybridized to a flow cell surface containing oligo-dT for sequencing-by-synthesis of billions of molecules in parallel. This process also requires far less material than other technologies. Gene expression measurements can be done using 1st-strand cDNA-based methods (RNA- Seq) or using a novel approach that allows direct hybridization and sequencing of cellular RNA for the most direct quantitation possible. A diverse array of applications have been successfully performed including genome sequencing for accurate variant detection, ChIP-Seq using picogram quantities of DNA, copy number variation studies from both fresh tumor tissue and FFPE tissue samples, sequencing of ancient and degraded DNAs, small RNA studies leading to the identification of new classes of RNAs and the direct capture and sequencing of RNA from cell quantities as few as 250 cells. Because most next generation sequencing technologies require amplification and a specific size range of target molecules, DNAs not meeting those criteria cannot be sequenced in a reliable manner. Single-molecule sequencing does not suffer from those limitations as no amplification is necessary and degraded or modified molecules can be used directly as templates. Principles and methods for using the Helicos® Genetic Analysis System will be discussed. PMID:20890904

  10. Analysis of Developmental Sequences within the Structural Approach: Conceptual, Empirical, and Methodological Considerations.

    ERIC Educational Resources Information Center

    Schroder, Eberhard; Edelstein, Wolfgang

    In this paper conceptual and methodological issues in the analysis of developmental sequences are discussed. Conceptually, the reconstruction of the logic of acquisition calls for the use of task or structure analysis. Methodologically, it calls for an individual-oriented approach, the use of statement calculus for formulation of the postulated…

  11. The nuclear genome of Brachypodium distachyon: analysis of BAC end sequences.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Due in part to its small genome (~350 Mb), Brachypodium distachyon is emerging as a model system for temperate grasses, including important crops like wheat and barley. We present the analysis of 10.9% of the Brachypodium genome based on 64,696 BAC end sequences (BES). Analysis of repeat DNA content...

  12. A comparison of ARMS and DNA sequencing for mutation analysis in clinical biopsy samples

    PubMed Central

    2010-01-01

    Background We have compared mutation analysis by DNA sequencing and Amplification Refractory Mutation System™ (ARMS™) for their ability to detect mutations in clinical biopsy specimens. Methods We have evaluated five real-time ARMS assays: BRAF 1799T>A, [this includes V600E and V600K] and NRAS 182A>G [Q61R] and 181C>A [Q61K] in melanoma, EGFR 2573T>G [L858R], 2235-2249del15 [E746-A750del] in non-small-cell lung cancer, and compared the results to DNA sequencing of the mutation 'hot-spots' in these genes in formalin-fixed paraffin-embedded tumour (FF-PET) DNA. Results The ARMS assays maximised the number of samples that could be analysed when both the quality and quantity of DNA was low, and improved both the sensitivity and speed of analysis compared with sequencing. ARMS was more robust with fewer reaction failures compared with sequencing and was more sensitive as it was able to detect functional mutations that were not detected by DNA sequencing. DNA sequencing was able to detect a small number of lower frequency recurrent mutations across the exons screened that were not interrogated using the specific ARMS assays in these studies. Conclusions ARMS was more sensitive and robust at detecting defined somatic mutations than DNA sequencing on clinical samples where the predominant sample type was FF-PET. PMID:20925915

  13. Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation.

    PubMed

    Chen, Chuming; Natale, Darren A; Finn, Robert D; Huang, Hongzhan; Zhang, Jian; Wu, Cathy H; Mazumder, Raja

    2011-01-01

    The accelerating growth in the number of protein sequences taxes both the computational and manual resources needed to analyze them. One approach to dealing with this problem is to minimize the number of proteins subjected to such analysis in a way that minimizes loss of information. To this end we have developed a set of Representative Proteomes (RPs), each selected from a Representative Proteome Group (RPG) containing similar proteomes calculated based on co-membership in UniRef50 clusters. A Representative Proteome is the proteome that can best represent all the proteomes in its group in terms of the majority of the sequence space and information. RPs at 75%, 55%, 35% and 15% co-membership threshold (CMT) are provided to allow users to decrease or increase the granularity of the sequence space based on their requirements. We find that a CMT of 55% (RP55) most closely follows standard taxonomic classifications. Further analysis of this set reveals that sequence space is reduced by more than 80% relative to UniProtKB, while retaining both sequence diversity (over 95% of InterPro domains) and annotation information (93% of experimentally characterized proteins). All sets can be browsed and are available for sequence similarity searches and download at http://www.proteininformationresource.org/rps, while the set of 637 RPs determined using a 55% CMT are also available for text searches. Potential applications include sequence similarity searches, protein classification and targeted protein annotation and characterization. PMID:21556138

  14. Fluorescence energy transfer dye-labeled primers for DNA sequencing and analysis.

    PubMed Central

    Ju, J; Ruan, C; Fuller, C W; Glazer, A N; Mathies, R A

    1995-01-01

    Fluorescent dye-labeled DNA primers have been developed that exploit fluorescence energy transfer (ET) to optimize the absorption and emission properties of the label. These primers carry a fluorescein derivative at the 5' end as a common donor and other fluorescein and rhodamine derivatives attached to a modified thymidine residue within the primer sequence as acceptors. Adjustment of the donor-acceptor spacing through the placement of the modified thymidine in the primer sequence allowed generation of four primers, all having strong absorption at a common excitation wavelength (488 nm) and fluorescence emission maxima of 525, 555, 580, and 605 nm. The ET efficiency of these primers ranges from 65% to 97%, and they exhibit similar electrophoretic mobilities by gel electrophoresis. With argon-ion laser excitation, the fluorescence of the ET primers and of the DNA sequencing fragments generated with ET primers is 2- to 6-fold greater than that of the corresponding primers or fragments labeled with single dyes. The higher fluorescence intensity of the ET primers allows DNA sequencing with one-fourth of the DNA template typically required when using T7 DNA polymerase. With single-stranded M13mp18 DNA as the template, a typical sequencing reaction with ET primers on a commercial sequencer provided DNA sequences with 99.8% accuracy in the first 500 bases. ET primers should be generally useful in the development of other multiplex DNA sequencing and analysis methods. Images Fig. 4 Fig. 5 PMID:7753809

  15. Analysis of simian hemorrhagic fever virus (SHFV) subgenomic RNAs, junction sequences, and 5' leader.

    PubMed

    Zeng, L; Godeny, E K; Methven, S L; Brinton, M A

    1995-03-10

    Full-length simian hemorrhagic fever virus (SHFV) genome RNA (about 15 kb in length) and six subgenomic RNAs, ranging in size from 0.65 to 4.7 kb, were detected by Northern blot hybridization in MA104 cytoplasmic extracts with a 3' genomic antisense probe. The 5' regions of the two smallest subgenomic RNAs (RNAs 6 and 7) were cloned and sequenced. Sequence analysis indicated that these two RNAs contained a common 5' leader sequence joined to the subgenomic RNA bodies via a highly conserved junction sequence; the junction sequence of RNA 7 was 5'-TTAACC-3', while that of RNA 6 was 5'-TCAACC-3'. The complete 5' leader sequence (208 nt) was obtained from genomic RNA. The genomic 5' junction sequence is identical to that of RNA 7. Northern blot hybridization with an antisense 5' leader probe confirmed the presence of the complete leader sequence in all six species of subgenomic RNA. In its virion morphology, genome size, gene order, and replication strategy, SHFV is most similar to viruses such as equine arteritis virus, lactate dehydrogenase-elevating virus, and Lelystad virus/porcine respiratory and reproductive syndrome virus. PMID:7886957

  16. Streptococcus suis Serotypes Characterized by Analysis of Chaperonin 60 Gene Sequences

    PubMed Central

    Brousseau, Ronald; Hill, Janet E.; Préfontaine, Gabrielle; Goh, Swee-Han; Harel, Josée; Hemmingsen, Sean M.

    2001-01-01

    Streptococcus suis is an important pathogen of swine which occasionally infects humans as well. There are 35 serotypes known for this organism, and it would be desirable to develop rapid methods methods to identify and differentiate the strains of this species. To that effect, partial chaperonin 60 gene sequences were determined for the 35 serotype reference strains of S. suis. Analysis of a pairwise distance matrix showed that the distances ranged from 0 to 0.275 when values were calculated by the maximum-likelihood method. For five of the strains the distances from serotype 1 were greater than 0.1, and for two of these strains the distances were were more than 0.25, suggesting that they belong to a different species. Most of the nucleotide differences were silent; alignment of protein sequences showed that there were only 11 distinct sequences for the 35 strains under study. The chaperonin 60 gene phylogenetic tree was similar to the previously published tree based on 16S rRNA sequences, and it was also observed that strains with identical chaperonin 60 gene sequences tended to have identical 16S rRNA sequences. The chaperonin 60 gene sequences provided a higher level of discrimination between serotypes than the 16S RNA sequences provided and could form the basis for a diagnostic protocol. PMID:11571190

  17. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  18. Circular Helix-Like Curve: An Effective Tool of Biological Sequence Analysis and Comparison.

    PubMed

    Li, Yushuang; Xiao, Wenli

    2016-01-01

    This paper constructed a novel injection from a DNA sequence to a 3D graph, named circular helix-like curve (CHC). The presented graphical representation is available for visualizing characterizations of a single DNA sequence and identifying similarities and differences among several DNAs. A 12-dimensional vector extracted from CHC, as a numerical characterization of CHC, was applied to analyze phylogenetic relationships of 11 species, 74 ribosomal RNAs, 48 Hepatitis E viruses, and 18 eutherian mammals, respectively. Successful experiments illustrated that CHC is an effective tool of biological sequence analysis and comparison. PMID:27403205

  19. Circular Helix-Like Curve: An Effective Tool of Biological Sequence Analysis and Comparison

    PubMed Central

    Li, Yushuang

    2016-01-01

    This paper constructed a novel injection from a DNA sequence to a 3D graph, named circular helix-like curve (CHC). The presented graphical representation is available for visualizing characterizations of a single DNA sequence and identifying similarities and differences among several DNAs. A 12-dimensional vector extracted from CHC, as a numerical characterization of CHC, was applied to analyze phylogenetic relationships of 11 species, 74 ribosomal RNAs, 48 Hepatitis E viruses, and 18 eutherian mammals, respectively. Successful experiments illustrated that CHC is an effective tool of biological sequence analysis and comparison. PMID:27403205

  20. ANALYSIS OF DISTRIBUTION FEEDER LOSSES DUE TO ADDITION OF DISTRIBUTED PHOTOVOLTAIC GENERATORS

    SciTech Connect

    Tuffner, Francis K.; Singh, Ruchi

    2011-08-09

    Distributed generators (DG) are small scale power supplying sources owned by customers or utilities and scattered throughout the power system distribution network. Distributed generation can be both renewable and non-renewable. Addition of distributed generation is primarily to increase feeder capacity and to provide peak load reduction. However, this addition comes with several impacts on the distribution feeder. Several studies have shown that addition of DG leads to reduction of feeder loss. However, most of these studies have considered lumped load and distributed load models to analyze the effects on system losses, where the dynamic variation of load due to seasonal changes is ignored. It is very important for utilities to minimize the losses under all scenarios to decrease revenue losses, promote efficient asset utilization, and therefore, increase feeder capacity. This paper will investigate an IEEE 13-node feeder populated with photovoltaic generators on detailed residential houses with water heater, Heating Ventilation and Air conditioning (HVAC) units, lights, and other plug and convenience loads. An analysis of losses for different power system components, such as transformers, underground and overhead lines, and triplex lines, will be performed. The analysis will utilize different seasons and different solar penetration levels (15%, 30%).

  1. Analysis of redox additive-based overcharge protection for rechargeable lithium batteries

    NASA Technical Reports Server (NTRS)

    Narayanan, S. R.; Surampudi, S.; Attia, A. I.; Bankston, C. P.

    1991-01-01

    The overcharge condition in secondary lithium batteries employing redox additives for overcharge protection, has been theoretically analyzed in terms of a finite linear diffusion model. The analysis leads to expressions relating the steady-state overcharge current density and cell voltage to the concentration, diffusion coefficient, standard reduction potential of the redox couple, and interelectrode distance. The model permits the estimation of the maximum permissible overcharge rate for any chosen set of system conditions. Digital simulation of the overcharge experiment leads to numerical representation of the potential transients, and estimate of the influence of diffusion coefficient and interelectrode distance on the transient attainment of the steady state during overcharge. The model has been experimentally verified using 1,1-prime-dimethyl ferrocene as a redox additive. The analysis of the experimental results in terms of the theory allows the calculation of the diffusion coefficient and the formal potential of the redox couple. The model and the theoretical results may be exploited in the design and optimization of overcharge protection by the redox additive approach.

  2. FourCSeq: analysis of 4C sequencing data

    PubMed Central

    Klein, Felix A.; Pakozdi, Tibor; Anders, Simon; Ghavi-Helm, Yad; Furlong, Eileen E. M.; Huber, Wolfgang

    2015-01-01

    Motivation: Circularized Chromosome Conformation Capture (4C) is a powerful technique for studying the spatial interactions of a specific genomic region called the ‘viewpoint’ with the rest of the genome, both in a single condition or comparing different experimental conditions or cell types. Observed ligation frequencies typically show a strong, regular dependence on genomic distance from the viewpoint, on top of which specific interaction peaks are superimposed. Here, we address the computational task to find these specific peaks and to detect changes between different biological conditions. Results: We model the overall trend of decreasing interaction frequency with genomic distance by fitting a smooth monotonically decreasing function to suitably transformed count data. Based on the fit, z-scores are calculated from the residuals, and high z-scores are interpreted as peaks providing evidence for specific interactions. To compare different conditions, we normalize fragment counts between samples, and call for differential contact frequencies using the statistical method DESeq2 adapted from RNA-Seq analysis. Availability and implementation: A full end-to-end analysis pipeline is implemented in the R package FourCSeq available at www.bioconductor.org. Contact: felix.klein@embl.de or whuber@embl.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26034064

  3. Detection of somatic BRCA1/2 mutations in ovarian cancer - next-generation sequencing analysis of 100 cases.

    PubMed

    Koczkowska, Magdalena; Zuk, Monika; Gorczynski, Adam; Ratajska, Magdalena; Lewandowska, Marzena; Biernat, Wojciech; Limon, Janusz; Wasag, Bartosz

    2016-07-01

    The overall prevalence of germline BRCA1/2 mutations is estimated between 11% and 15% of all ovarian cancers. Individuals with germline BRCA1/2 alterations treated with the PARP1 inhibitors (iPARP1) tend to respond better than patients with wild-type BRCA1/2. Additionally, also somatic BRCA1/2 alterations induce the sensitivity to iPARP1. Therefore, the detection of both germline and somatic BRCA1/2 mutations is required for effective iPARP1 treatment. The aim of this study was to identify the frequency and spectrum of germline and somatic BRCA1/2 alterations in a group of Polish patients with ovarian serous carcinoma. In total, 100 formalin-fixed paraffin-embedded (FFPE) ovarian serous carcinoma tissues were enrolled to the study. Mutational analysis of BRCA1/2 genes was performed by using next-generation sequencing. The presence of pathogenic variants was confirmed by Sanger sequencing. In addition, to confirm the germline or somatic status of the mutation, the nonneoplastic tissue was analyzed by bidirectional Sanger sequencing. In total, 27 (28% of patient samples) mutations (20 in BRCA1 and 7 in BRCA2) were identified. For 22 of 27 patients, nonneoplastic cells were available and sequencing revealed the somatic character of two BRCA1 (2/16; 12.5%) and two BRCA2 (2/6; 33%) mutations. Notably, we identified six novel frameshift or nonsense BRCA1/2 mutations. The heterogeneity of the detected mutations confirms the necessity of simultaneous analysis of BRCA1/2 genes in all patients diagnosed with serous ovarian carcinoma. Moreover, the use of tumor tissue for mutational analysis allowed the detection of both somatic and germline BRCA1/2 mutations. PMID:27167707

  4. Analysis of the full-length genome sequence of papaya lethal yellowing virus (PLYV), determined by deep sequencing, confirms its classification in the genus Sobemovirus.

    PubMed

    Pereira, Alvaro J; Alfenas-Zerbini, Poliane; Cascardo, Renan S; Andrade, Eduardo C; Murilo Zerbini, F

    2012-10-01

    Papaya lethal yellowing virus (PLYV) causes an economically important disease in papayas in northeastern Brazil. Based on biological and molecular properties, PLYV has been tentatively assigned to the genus Sobemovirus. We report the sequence of the full-length genome of a PLYV isolate from Brazil, determined by deep sequencing. The PLYV genome is 4,145 nt long and contains four ORFs, with an arrangement identical to that of sobemoviruses. The polyprotein and CP display significant sequence identity with the corresponding proteins of other sobemoviruses. Pairwise comparisons and phylogenetic analysis based on complete nucleotide sequences confirm the classification of PLYV in the genus Sobemovirus. PMID:22743825

  5. A Bayesian Semi-parametric Approach for the Differential Analysis of Sequence Counts Data.

    PubMed

    Guindani, Michele; Sepúlveda, Nuno; Paulino, Carlos Daniel; Müller, Peter

    2014-04-01

    Data obtained using modern sequencing technologies are often summarized by recording the frequencies of observed sequences. Examples include the analysis of T cell counts in immunological research and studies of gene expression based on counts of RNA fragments. In both cases the items being counted are sequences, of proteins and base pairs, respectively. The resulting sequence-abundance distribution is usually characterized by overdispersion. We propose a Bayesian semi-parametric approach to implement inference for such data. Besides modeling the overdispersion, the approach takes also into account two related sources of bias that are usually associated with sequence counts data: some sequence types may not be recorded during the experiment and the total count may differ from one experiment to another. We illustrate our methodology with two data sets, one regarding the analysis of CD4+ T cell counts in healthy and diabetic mice and another data set concerning the comparison of mRNA fragments recorded in a Serial Analysis of Gene Expression (SAGE) experiment with gastrointestinal tissue of healthy and cancer patients. PMID:24833809

  6. Impregnating unconsolidated pyroclastic sequences: A tool for detailed facies analysis

    NASA Astrophysics Data System (ADS)

    Klapper, Daniel; Kueppers, Ulrich; Castro, Jon M.; Pacheco, Jose M. R.; Dingwell, Donald B.

    2010-05-01

    The interpretation of volcanic eruptions is usually derived from direct observation and the thorough analysis of the deposits. Processes in vent-proximal areas are usually not directly accessible or likely to be obscured. Hence, our understanding of proximal deposits is often limited as they were produced by the simultaneous events stemming from primary eruptive, transportative, and meteorological conditions. Here we present a method that permits for a direct and detailed quasi in-situ investigation of loose pyroclastic units that are usually analysed in the laboratory for their 1) grain-size distribution, 2) componentry, and 3) grain morphology. As the clast assembly is altered during sampling, the genesis of a stratigraphic unit and the relative importance of the above mentioned deposit characteristics is hard to achieve. In an attempt to overcome the possible loss of information during conventional sampling techniques, we impregnated the cleaned surfaces of proximal, unconsolidated units of the 1957-58 Capelinhos eruption on Faial, Azores. During this basaltic, emergent eruption, fluxes in magma rise rate led to a repeated build-up and collapse of tuff cones and consequently to a shift between phreatomagmatic and magmatic eruptive style. The deposits are a succession of generally parallel bedded, cm- to dm-thick layers with a predominantly ashy matrix. The lapilli content is varying gradually; the content of bombs is enriched in discrete layers without clear bomb sags. The sample areas have been cleaned and impregnated with two-component glue (EPOTEK 301). For approx. 10 * 10 cm, a volume of mixed glue of 20 ml was required. Using a syringe, this low-viscosity, transparent glue could be easily applied on the target area. We found that the glue permeated the deposit as deep as 5 mm. After > 24 h, the glue was sufficiently dry to enable the sample to be laid open. This impregnation method renders it possible to cut and polish the sample and investigate grain

  7. Impregnating unconsolidated pyroclastic sequences: A tool for detailed facies analysis

    NASA Astrophysics Data System (ADS)

    Klapper, D.; Kueppers, U.; Castro, J. M.

    2009-12-01

    The interpretation of volcanic eruptions is usually derived from direct observation and the thorough analysis of the deposits. Processes in vent-proximal areas are usually not directly accessible or likely to be obscured. Hence, our understanding of proximal deposits is often limited as they were produced by the simultaneous events stemming from primary eruptive, transportative, and meteorological conditions. Here we present a method that permits for a direct and detailed quasi in-situ investigation of loose pyroclastic units that are usually analysed in the laboratory for their 1) grain-size distribution, 2) componentry, and 3) grain morphology. As the clast assembly is altered during sampling, the genesis of a stratigraphic unit and the relative importance of the above mentioned deposit characteristics is hard to achieve. In an attempt to overcome the possible loss of information during conventional sampling techniques, we impregnated the cleaned surfaces of proximal, unconsolidated units of the 1957-58 Capelinhos eruption on Faial, Azores. During this basaltic, emergent eruption, fluxes in magma rise rate led to a repeated build-up and collapse of tuff cones and consequently to a shift between phreatomagmatic and magmatic eruptive style. The deposits are a succession of generally parallel bedded, cm- to dm-thick layers with a predominantly ashy matrix. The lapilli content is varying gradually; the content of bombs is enriched in discrete layers without clear bomb sags. The sample areas have been cleaned and impregnated with a two-component glue (EPOTEK 301). For approx. 10 * 10 cm, a volume of mixed glue of 20 ml was required. This low-viscosity, transparent glue allowed for an easy application on the target area by means of a syringe and permeated the deposit as deep as 5 mm. After > 24 h, the glue was sufficiently dry to enable the sample to be laid open. This impregnation method renders it possible to cut and polish the sample and investigate grain

  8. In silico comparative analysis of DNA and amino acid sequences for prion protein gene.

    PubMed

    Kim, Y; Lee, J; Lee, C

    2008-01-01

    Genetic variability might contribute to species specificity of prion diseases in various organisms. In this study, structures of the prion protein gene (PRNP) and its amino acids were compared among species of which sequence data were available. Comparisons of PRNP DNA sequences among 12 species including human, chimpanzee, monkey, bovine, ovine, dog, mouse, rat, wallaby, opossum, chicken and zebrafish allowed us to identify candidate regulatory regions in intron 1 and 3'-untranslated region (UTR) in addition to the coding region. Highly conserved putative binding sites for transcription factors, such as heat shock factor 2 (HSF2) and myocite enhancer factor 2 (MEF2), were discovered in the intron 1. In 3'-UTR, the functional sequence (ATTAAA) for nucleus-specific polyadenylation was found in all the analysed species. The functional sequence (TTTTTAT) for maturation-specific polyadenylation was identically observed only in ovine, and one or two nucleotide mismatches in the other species. A comparison of the amino acid sequences in 53 species revealed a large sequence identity. Especially the octapeptide repeat region was observed in all the species but frog and zebrafish. Functional changes and susceptibility to prion diseases with various isoforms of prion protein could be caused by numeric variability and conformational changes discovered in the repeat sequences. PMID:18397498

  9. Analysis of error-prone survival data under additive hazards models: measurement error effects and adjustments.

    PubMed

    Yan, Ying; Yi, Grace Y

    2016-07-01

    Covariate measurement error occurs commonly in survival analysis. Under the proportional hazards model, measurement error effects have been well studied, and various inference methods have been developed to correct for error effects under such a model. In contrast, error-contaminated survival data under the additive hazards model have received relatively less attention. In this paper, we investigate this problem by exploring measurement error effects on parameter estimation and the change of the hazard function. New insights of measurement error effects are revealed, as opposed to well-documented results for the Cox proportional hazards model. We propose a class of bias correction estimators that embraces certain existing estimators as special cases. In addition, we exploit the regression calibration method to reduce measurement error effects. Theoretical results for the developed methods are established, and numerical assessments are conducted to illustrate the finite sample performance of our methods. PMID:26328545

  10. In-line image analysis on the effects of additives in batch cooling crystallization

    NASA Astrophysics Data System (ADS)

    Qu, Haiyan; Louhi-Kultanen, Marjatta; Kallas, Juha

    2006-03-01

    The effects of two potassium salt additives, ethylene diamine tetra acetic acid dipotassium salt (EDTA) and potassium pyrophosphate (KPY), on the batch cooling crystallization of potassium dihydrogen phosphate (KDP) were investigated. The crystal growth rates of certain crystal faces were determined from in-line images taken with a MTS particle image analysis (PIA) video microscope. An in-line image processing method was developed to characterize the size and shape of the crystals. The nucleation kinetics was studied by measurement of the metastable zone width and induction time. A significant promotion effect on both nucleation and growth of KDP was observed when EDTA was used as an additive. KPY, however, exhibited strong inhibiting impacts. The mechanism underlying the EDTA promotion effect on crystal growth was further studied with the 2-dimension nucleation model. It is shown that the presence of EDTA increased the density of adsorbed molecules of the crystallizing solute on the surface of the crystal.

  11. De novo transcriptome sequencing a