Science.gov

Sample records for addition sequence analysis

  1. Analysis of sequences from field samples reveals the presence of the recently described pepper vein yellows virus (genus Polerovirus) in six additional countries.

    PubMed

    Knierim, Dennis; Tsai, Wen-Shi; Kenyon, Lawrence

    2013-06-01

    Polerovirus infection was detected by reverse transcription polymerase chain reaction (RT-PCR) in 29 pepper plants (Capsicum spp.) and one black nightshade plant (Solanum nigrum) sample collected from fields in India, Indonesia, Mali, Philippines, Thailand and Taiwan. At least two representative samples for each country were selected to generate a general polerovirus RT-PCR product of 1.4 kb length for sequencing. Sequence analysis of the partial genome sequences revealed the presence of pepper vein yellows virus (PeVYV) in all 13 samples. A 1990 Australian herbarium sample of pepper described by serological means as infected with capsicum yellows virus (CYV) was identified by sequence analysis of a partial CP sequence as probably infected with a potato leaf roll virus (PLRV) isolate.

  2. Regulatory sequence analysis tools.

    PubMed

    van Helden, Jacques

    2003-07-01

    The web resource Regulatory Sequence Analysis Tools (RSAT) (http://rsat.ulb.ac.be/rsat) offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative formats are supported for the representation of regulatory motifs (strings or position-specific scoring matrices) and several algorithms are proposed for pattern discovery. RSAT currently holds >100 fully sequenced genomes and these data are regularly updated from GenBank.

  3. RSAT: regulatory sequence analysis tools.

    PubMed

    Thomas-Chollier, Morgane; Sand, Olivier; Turatsinze, Jean-Valéry; Janky, Rekin's; Defrance, Matthieu; Vervisch, Eric; Brohée, Sylvain; van Helden, Jacques

    2008-07-01

    The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published.

  4. ISHAN: sequence homology analysis package.

    PubMed

    Shil, Pratip; Dudani, Niraj; Vidyasagar, Pandit B

    2006-01-01

    Sequence based homology studies play an important role in evolutionary tracing and classification of proteins. Various methods are available to analyze biological sequence information. However, with the advent of proteomics era, there is a growing demand for analysis of huge amount of biological sequence information, and it has become necessary to have programs that would provide speedy analysis. ISHAN has been developed as a homology analysis package, built on various sequence analysis tools viz FASTA, ALIGN, CLUSTALW, PHYLIP and CODONW (for DNA sequences). This JAVA application offers the user choice of analysis tools. For testing, ISHAN was applied to perform phylogenetic analysis for sets of Caspase 3 DNA sequences and NF-kappaB p105 amino acid sequences. By integrating several tools it has made analysis much faster and reduced manual intervention. PMID:17274766

  5. Twin Mitochondrial Sequence Analysis.

    PubMed

    Bouhlal, Yosr; Martinez, Selena; Gong, Henry; Dumas, Kevin; Shieh, Joseph T C

    2013-09-01

    When applying genome-wide sequencing technologies to disease investigation, it is increasingly important to resolve sequence variation in regions of the genome that may have homologous sequences. The human mitochondrial genome challenges interpretation given the potential for heteroplasmy, somatic variation, and homologous nuclear mitochondrial sequences (numts). Identical twins share the same mitochondrial DNA (mtDNA) from early life, but whether the mitochondrial sequence remains similar is unclear. We compared an adult monozygotic twin pair using high throughput-sequencing and evaluated variants with primer extension and mitochondrial pre-enrichment. Thirty-seven variants were shared between the twin individuals, and the variants were verified on the original genomic DNA. These studies support highly identical genetic sequence in this case. Certain low-level variant calls were of high quality and homology to the mitochondrial DNA, and they were further evaluated. When we assessed calls in pre-enriched mitochondrial DNA templates, we found that these may represent numts, which can be differentiated from mtDNA variation. We conclude that twin identity extends to mitochondrial DNA, and it is critical to differentiate between numts and mtDNA in genome sequencing, particularly since significant heteroplasmy could influence genome interpretation. Further studies on mtDNA and numts will aid in understanding how variation occurs and persists. PMID:24040623

  6. Image analysis for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Palaniappan, Kannappan; Huang, Thomas S.

    1991-07-01

    There is a great deal of interest in automating the process of DNA (deoxyribonucleic acid) sequencing to support the analysis of genomic DNA such as the Human and Mouse Genome projects. In one class of gel-based sequencing protocols autoradiograph images are generated in the final step and usually require manual interpretation to reconstruct the DNA sequence represented by the image. The need to handle a large volume of sequence information necessitates automation of the manual autoradiograph reading step through image analysis in order to reduce the length of time required to obtain sequence data and reduce transcription errors. Various adaptive image enhancement, segmentation and alignment methods were applied to autoradiograph images. The methods are adaptive to the local characteristics of the image such as noise, background signal, or presence of edges. Once the two-dimensional data is converted to a set of aligned one-dimensional profiles waveform analysis is used to determine the location of each band which represents one nucleotide in the sequence. Different classification strategies including a rule-based approach are investigated to map the profile signals, augmented with the original two-dimensional image data as necessary, to textual DNA sequence information.

  7. [Multilocus sequence typing (MLST) analysis].

    PubMed

    Matsumura, Yasufumi

    2013-12-01

    Multilocus sequence typing (MLST) analysis has been emerging as a powerful tool for genotyping specific bacterial species. MLST utilizes internal fragments of multiple housekeeping genes and the combination of each allele defines the sequence type for each isolate. MLST databases contain reference data and are freely accessible via internet websites. The standard method for investigating short-term hospital outbreaks is still pulse-field gel-electrophoresis and MLST analysis is not a substitute. However, analysis of sequence types and clonal complexes (closely related sequence types) enables identification and understanding of a specific clone that is widely spreading among drug-resistant organisms, or a key clone that is important for evolution of the organism. In the case of Escherichia coli, CTX-M-15 or CTX-M-14 extended-spectrum beta-lactamase producing ST131 clone has emerged and spread globally in the last 10 years. MLST analysis is an unambiguous procedure and is becoming a common typing method to characterize isolates. PMID:24605545

  8. Optimizing cancer genome sequencing and analysis

    PubMed Central

    Griffith, Malachi; Miller, Christopher A.; Griffith, Obi L.; Krysiak, Kilannin; Skidmore, Zachary L.; Ramu, Avinash; Walker, Jason R.; Dang, Ha X.; Trani, Lee; Larson, David E.; Demeter, Ryan T.; Wendl, Michael C.; McMichael, Joshua F.; Austin, Rachel E.; Magrini, Vincent; McGrath, Sean D.; Ly, Amy; Kulkarni, Shashikant; Cordes, Matthew G.; Fronick, Catrina C.; Fulton, Robert S.; Maher, Christopher A.; Ding, Li; Klco, Jeffery M.; Mardis, Elaine R.; Ley, Timothy J.; Wilson, Richard K.

    2015-01-01

    Summary Tumors are typically sequenced to depths of 75–100× (exome) or 30–50× (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312×) whole genome sequencing (WGS) and exome capture (up to ~433×) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ~200,000 putative SNVs by sequencing them to depths of ~1,000×. Additional targeted sequencing provided over 10,000× coverage and ddPCR assays provided up to ~250,000× sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP accession id phs000159). PMID:26645048

  9. Genome Sequences of Five Additional Brevibacillus laterosporus Bacteriophages

    PubMed Central

    Merrill, Bryan D.; Berg, Jordan A.; Graves, Kiel A.; Ward, Andy T.; Hilton, Jared A.; Wake, Braden N.; Grose, Julianne H.; Breakwell, Donald P.

    2015-01-01

    Brevibacillus laterosporus has been isolated from many different environments, including beehives, and produces compounds that are toxic to many organisms. Five B. laterosporus phages have been isolated previously. Here, we announce five additional phages that infect this bacterium, including the first B. laterosporus siphoviruses to be discovered. PMID:26494658

  10. Conserved DNA sequences adjacent to chromosome fragmentation and telomere addition sites in Euplotes crassus.

    PubMed

    Klobutcher, L A; Gygax, S E; Podoloff, J D; Vermeesch, J R; Price, C M; Tebeau, C M; Jahn, C L

    1998-09-15

    During the formation of a new macronucleus in the ciliate Euplotes crassus, micronuclear chromosomes are reproducibly broken at approximately 10 000 sites. This chromosome fragmentation process is tightly coupled with de novo telomere synthesis by the telomerase ribonucleoprotein complex, generating short linear macronuclear DNA molecules. In this study, the sequences of 58 macronuclear DNA termini and eight regions of the micronuclear genome containing chromosome fragmentation/telomere addition sites were determined. Through a statistically based analysis of these data, along with previously published sequences, we have defined a 10 bp conserved sequence element (E-Cbs, 5'-HATTGAAaHH-3', H = A, C or T) near chromosome fragmentation sites. The E-Cbs typically resides within the DNA destined to form a macronuclear DNA molecule, but can also reside within flanking micronuclear DNA that is eliminated during macronuclear development. The location of the E-Cbs in macronuclear-destined versus flanking micronuclear DNA leads us to propose a model of chromosome fragmentation that involves a 6 bp staggered cut in the chromosome. The identification of adjacent macronuclear-destined sequences that overlap by 6 bp provides support for the model. Finally, our data provide evidence that telomerase is able to differentiate between newly generated ends that contain partial telomeric repeats and those that do not in vivo.

  11. Nonlinear analysis of biological sequences

    SciTech Connect

    Torney, D.C.; Bruno, W.; Detours, V.

    1998-11-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The main objectives of this project involved deriving new capabilities for analyzing biological sequences. The authors focused on tabulating the statistical properties exhibited by Human coding DNA sequences and on techniques of inferring the phylogenetic relationships among protein sequences related by descent.

  12. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  13. Analysis of human collagen sequences.

    PubMed

    Nassa, Manisha; Anand, Pracheta; Jain, Aditi; Chhabra, Aastha; Jaiswal, Astha; Malhotra, Umang; Rani, Vibha

    2012-01-01

    The extracellular matrix is fast emerging as important component mediating cell-cell interactions, along with its established role as a scaffold for cell support. Collagen, being the principal component of extracellular matrix, has been implicated in a number of pathological conditions. However, collagens are complex protein structures belonging to a large family consisting of 28 members in humans; hence, there exists a lack of in depth information about their structural features. Annotating and appreciating the functions of these proteins is possible with the help of the numerous biocomputational tools that are currently available. This study reports a comparative analysis and characterization of the alpha-1 chain of human collagen sequences. Physico-chemical, secondary structural, functional and phylogenetic classification was carried out, based on which, collagens 12, 14 and 20, which belong to the FACIT collagen family, have been identified as potential players in diseased conditions, owing to certain atypical properties such as very high aliphatic index, low percentage of glycine and proline residues and their proximity in evolutionary history. These collagen molecules might be important candidates to be investigated further for their role in skeletal disorders. PMID:22359431

  14. Image sequence analysis workstation for multipoint motion analysis

    NASA Astrophysics Data System (ADS)

    Mostafavi, Hassan

    1990-08-01

    This paper describes an application-specific engineering workstation designed and developed to analyze motion of objects from video sequences. The system combines the software and hardware environment of a modem graphic-oriented workstation with the digital image acquisition, processing and display techniques. In addition to automation and Increase In throughput of data reduction tasks, the objective of the system Is to provide less invasive methods of measurement by offering the ability to track objects that are more complex than reflective markers. Grey level Image processing and spatial/temporal adaptation of the processing parameters is used for location and tracking of more complex features of objects under uncontrolled lighting and background conditions. The applications of such an automated and noninvasive measurement tool include analysis of the trajectory and attitude of rigid bodies such as human limbs, robots, aircraft in flight, etc. The system's key features are: 1) Acquisition and storage of Image sequences by digitizing and storing real-time video; 2) computer-controlled movie loop playback, freeze frame display, and digital Image enhancement; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored Image sequence; 4) model-based estimation and tracking of the six degrees of freedom of a rigid body: 5) field-of-view and spatial calibration: 6) Image sequence and measurement data base management; and 7) offline analysis software for trajectory plotting and statistical analysis.

  15. Complete Plastome Sequences from Glycine syndetika and Six Additional Perennial Wild Relatives of Soybean

    PubMed Central

    Sherman-Broyles, Sue; Bombarely, Aureliano; Grimwood, Jane; Schmutz, Jeremy; Doyle, Jeff

    2014-01-01

    Organelle sequences have a long history of utility in phylogenetic analyses. Chloroplast sequences when combined with nuclear data can help resolve relationships among flowering plant genera, and within genera incongruence can point to reticulate evolution. Plastome sequences are becoming plentiful because they are increasingly easier to obtain. Complete plastome sequences allow us to detect rare rearrangements and test the tempo of sequence evolution. Chloroplast sequences are generally considered a nuisance to be kept to a minimum in bacterial artificial chromosome libraries. Here, we sequenced two bacterial artificial chromosomes per species to generate complete plastome sequences from seven species. The plastome sequences from Glycine syndetika and six other perennial Glycine species are similar in arrangement and gene content to the previously published soybean plastome. Repetitive sequences were detected in high frequencies as in soybean, but further analysis showed that repeat sequence numbers are inflated. Previous chloroplast-based phylogenetic trees for perennial Glycine were incongruent with nuclear gene–based phylogenetic trees. We tested whether the hypothesis of introgression was supported by the complete plastomes. Alignment of complete plastome sequences and Bayesian analysis allowed us to date putative hybridization events supporting the hypothesis of introgression and chloroplast “capture.” PMID:25155272

  16. Phylogenetic Analysis of Poliovirus Sequences.

    PubMed

    Jorba, Jaume

    2016-01-01

    Comparative genomic sequencing is a major surveillance tool in the Polio Laboratory Network. Due to the rapid evolution of polioviruses (~1 % per year), pathways of virus transmission can be reconstructed from the pathways of genomic evolution. Here, we describe three main phylogenetic methods; estimation of genetic distances, reconstruction of a maximum-likelihood (ML) tree, and estimation of substitution rates using Bayesian Markov chain Monte Carlo (MCMC). The data set used consists of complete capsid sequences from a survey of poliovirus sequences available in GenBank. PMID:26983737

  17. Genome Sequencing and Analysis Conference IV

    SciTech Connect

    Not Available

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  18. Phylogenetic analysis of adenovirus sequences.

    PubMed

    Harrach, Balázs; Benko, Mária

    2007-01-01

    Members of the family Adenoviridae have been isolated from a large variety of hosts, including representatives from every major vertebrate class from fish to mammals. The high prevalence, together with the fairly conserved organization of the central part of their genomes, make the adenoviruses one of (if not the) best models for studying viral evolution on a larger time scale. Phylogenetic calculation can infer the evolutionary distance among adenovirus strains on serotype, species, and genus levels, thus helping the establishment of a correct taxonomy on the one hand, and speeding up the process of typing new isolates on the other. Initially, four major lineages corresponding to four genera were recognized. Later, the demarcation criteria of lower taxon levels, such as species or types, could also be defined with phylogenetic calculations. A limited number of possible host switches have been hypothesized and convincingly supported. Application of the web-based BLAST and MultAlin programs and the freely available PHYLIP package, along with the TreeView program, enables everyone to make correct calculations. In addition to step-by-step instruction on how to perform phylogenetic analysis, critical points where typical mistakes or misinterpretation of the results might occur will be identified and hints for their avoidance will be provided. PMID:17656792

  19. Phylogenetic analysis of adenovirus sequences.

    PubMed

    Harrach, Balázs; Benko, Mária

    2007-01-01

    Members of the family Adenoviridae have been isolated from a large variety of hosts, including representatives from every major vertebrate class from fish to mammals. The high prevalence, together with the fairly conserved organization of the central part of their genomes, make the adenoviruses one of (if not the) best models for studying viral evolution on a larger time scale. Phylogenetic calculation can infer the evolutionary distance among adenovirus strains on serotype, species, and genus levels, thus helping the establishment of a correct taxonomy on the one hand, and speeding up the process of typing new isolates on the other. Initially, four major lineages corresponding to four genera were recognized. Later, the demarcation criteria of lower taxon levels, such as species or types, could also be defined with phylogenetic calculations. A limited number of possible host switches have been hypothesized and convincingly supported. Application of the web-based BLAST and MultAlin programs and the freely available PHYLIP package, along with the TreeView program, enables everyone to make correct calculations. In addition to step-by-step instruction on how to perform phylogenetic analysis, critical points where typical mistakes or misinterpretation of the results might occur will be identified and hints for their avoidance will be provided.

  20. Analysis of DNA Sequence Variants Detected by High Throughput Sequencing

    PubMed Central

    Adams, David R; Sincan, Murat; Fajardo, Karin Fuentes; Mullikin, James C; Pierson, Tyler M; Toro, Camilo; Boerkoel, Cornelius F; Tifft, Cynthia J; Gahl, William A; Markello, Tom C

    2014-01-01

    The Undiagnosed Diseases Program at the National Institutes of Health uses High Throughput Sequencing (HTS) to diagnose rare and novel diseases. HTS techniques generate large numbers of DNA sequence variants, which must be analyzed and filtered to find candidates for disease causation. Despite the publication of an increasing number of successful exome-based projects, there has been little formal discussion of the analytic steps applied to HTS variant lists. We present the results of our experience with over 30 families for whom HTS sequencing was used in an attempt to find clinical diagnoses. For each family, exome sequence was augmented with high-density SNP-array data. We present a discussion of the theory and practical application of each analytic step and provide example data to illustrate our approach. The paper is designed to provide an analytic roadmap for variant analysis, thereby enabling a wide range of researchers and clinical genetics practitioners to perform direct analysis of HTS data for their patients and projects. PMID:22290882

  1. Bayesian Correlation Analysis for Sequence Count Data

    PubMed Central

    Lau, Nelson; Perkins, Theodore J.

    2016-01-01

    Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities’ measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low—especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities’ signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset. PMID:27701449

  2. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  3. Fractal analysis of DNA sequence data

    SciTech Connect

    Berthelsen, C.L.

    1993-01-01

    DNA sequence databases are growing at an almost exponential rate. New analysis methods are needed to extract knowledge about the organization of nucleotides from this vast amount of data. Fractal analysis is a new scientific paradigm that has been used successfully in many domains including the biological and physical sciences. Biological growth is a nonlinear dynamic process and some have suggested that to consider fractal geometry as a biological design principle may be most productive. This research is an exploratory study of the application of fractal analysis to DNA sequence data. A simple random fractal, the random walk, is used to represent DNA sequences. The fractal dimension of these walks is then estimated using the [open quote]sandbox method[close quote]. Analysis of 164 human DNA sequences compared to three types of control sequences (random, base-content matched, and dimer-content matched) reveals that long-range correlations are present in DNA that are not explained by base or dimer frequencies. The study also revealed that the fractal dimension of coding sequences was significantly lower than sequences that were primarily noncoding, indicating the presence of longer-range correlations in functional sequences. The multifractal spectrum is used to analyze fractals that are heterogeneous and have a different fractal dimension for subsets with different scalings. The multifractal spectrum of the random walks of twelve mitochondrial genome sequences was estimated. Eight vertebrate mtDNA sequences had uniformly lower spectra values than did four invertebrate mtDNA sequences. Thus, vertebrate mitochondria show significantly longer-range correlations than to invertebrate mitochondria. The higher multifractal spectra values for invertebrate mitochondria suggest a more random organization of the sequences. This research also includes considerable theoretical work on the effects of finite size, embedding dimension, and scaling ranges.

  4. Fractal Analysis of DNA Sequence Data

    NASA Astrophysics Data System (ADS)

    Berthelsen, Cheryl Lynn

    DNA sequence databases are growing at an almost exponential rate. New analysis methods are needed to extract knowledge about the organization of nucleotides from this vast amount of data. Fractal analysis is a new scientific paradigm that has been used successfully in many domains including the biological and physical sciences. Biological growth is a nonlinear dynamic process and some have suggested that to consider fractal geometry as a biological design principle may be most productive. This research is an exploratory study of the application of fractal analysis to DNA sequence data. A simple random fractal, the random walk, is used to represent DNA sequences. The fractal dimension of these walks is then estimated using the "sandbox method." Analysis of 164 human DNA sequences compared to three types of control sequences (random, base -content matched, and dimer-content matched) reveals that long-range correlations are present in DNA that are not explained by base or dimer frequencies. The study also revealed that the fractal dimension of coding sequences was significantly lower than sequences that were primarily noncoding, indicating the presence of longer-range correlations in functional sequences. The multifractal spectrum is used to analyze fractals that are heterogeneous and have a different fractal dimension for subsets with different scalings. The multifractal spectrum of the random walks of twelve mitochondrial genome sequences was estimated. Eight vertebrate mtDNA sequences had uniformly lower spectra values than did four invertebrate mtDNA sequences. Thus, vertebrate mitochondria show significantly longer-range correlations than do invertebrate mitochondria. The higher multifractal spectra values for invertebrate mitochondria suggest a more random organization of the sequences. This research also includes considerable theoretical work on the effects of finite size, embedding dimension, and scaling ranges.

  5. Additional EIPC Study Analysis. Final Report

    SciTech Connect

    Hadley, Stanton W; Gotham, Douglas J.; Luciani, Ralph L.

    2014-12-01

    Between 2010 and 2012 the Eastern Interconnection Planning Collaborative (EIPC) conducted a major long-term resource and transmission study of the Eastern Interconnection (EI). With guidance from a Stakeholder Steering Committee (SSC) that included representatives from the Eastern Interconnection States Planning Council (EISPC) among others, the project was conducted in two phases. Phase 1 involved a long-term capacity expansion analysis that involved creation of eight major futures plus 72 sensitivities. Three scenarios were selected for more extensive transmission- focused evaluation in Phase 2. Five power flow analyses, nine production cost model runs (including six sensitivities), and three capital cost estimations were developed during this second phase. The results from Phase 1 and 2 provided a wealth of data that could be examined further to address energy-related questions. A list of 14 topics was developed for further analysis. This paper brings together the earlier interim reports of the first 13 topics plus one additional topic into a single final report.

  6. Autonomous replication and addition of telomerelike sequences to DNA microinjected into Paramecium tetraurelia macronuclei.

    PubMed Central

    Gilley, D; Preer, J R; Aufderheide, K J; Polisky, B

    1988-01-01

    Paramecium tetraurelia can be transformed by microinjection of cloned serotype A gene sequences into the macronucleus. Transformants are detected by their ability to express serotype A surface antigen from the injected templates. After injection, the DNA is converted from a supercoiled form to a linear form by cleavage at nonrandom sites. The linear form appears to replicate autonomously as a unit-length molecule and is present in transformants at high copy number. The injected DNA is further processed by the addition of paramecium-type telomeric sequences to the termini of the linear DNA. To examine the fate of injected linear DNA molecules, plasmid pSA14SB DNA containing the A gene was cleaved into two linear pieces, a 14-kilobase (kb) piece containing the A gene and flanking sequences and a 2.2-kb piece consisting of the procaryotic vector. In transformants expressing the A gene, we observed that two linear DNA species were present which correspond to the two species injected. Both species had Paramecium telomerelike sequences added to their termini. For the 2.2-kb DNA, we show that the site of addition of the telomerelike sequences is directly at one terminus and within one nucleotide of the other terminus. These results indicate that injected procaryotic DNA is capable of autonomous replication in Paramecium macronuclei and that telomeric addition in the macronucleus does not require specific recognition sequences. Images PMID:3211128

  7. Whole-Genome Sequencing in Outbreak Analysis

    PubMed Central

    Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.

    2015-01-01

    SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  8. Whole-genome sequencing in outbreak analysis.

    PubMed

    Gilchrist, Carol A; Turner, Stephen D; Riley, Margaret F; Petri, William A; Hewlett, Erik L

    2015-07-01

    In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  9. Sequence and Phylogenetic Analysis of FAD Synthetase

    NASA Astrophysics Data System (ADS)

    Schubert, Luisa; Frago, Susana; Martínez-Júlvez, Marta; Medina, Milagros

    2006-08-01

    An evolutionary analysis of the sequences available till now for FAD synthetases has been carried out. Several identical conserved residues have been observed along the sequences of all the FAD synthetases analyzed, which might correlate with role for these residues in the catalytic activity of the enzyme. Phylogenetic analysis shows that FAD synthetase sequences can be organized in two main clusters. One of them mainly contains temperature, pressure or pH resistant organisms, whereas in the other one organisms with pathogenic character can be found.

  10. Whole exome sequence analysis of Peters anomaly

    PubMed Central

    Weh, Eric; Reis, Linda M.; Happ, Hannah C.; Levin, Alex V.; Wheeler, Patricia G.; David, Karen L.; Carney, Erin; Angle, Brad; Hauser, Natalie

    2015-01-01

    Peters anomaly is a rare form of anterior segment ocular dysgenesis, which can also be associated with additional systemic defects. At this time, the majority of cases of Peters anomaly lack a genetic diagnosis. We performed whole exome sequencing of 27 patients with syndromic or isolated Peters anomaly to search for pathogenic mutations in currently known ocular genes. Among the eight previously recognized Peters anomaly genes, we identified a de novo missense mutation in PAX6, c.155G>A, p.(Cys52Tyr), in one patient. Analysis of 691 additional genes currently associated with a different ocular phenotype identified a heterozygous splicing mutation c.1025+2T>A in TFAP2A, a de novo heterozygous nonsense mutation c.715C>T, p.(Gln239*) in HCCS, a hemizygous mutation c.385G>A, p.(Glu129Lys) in NDP, a hemizygous mutation c.3446C>T, p.(Pro1149Leu) in FLNA, and compound heterozygous mutations c.1422T>A, p.(Tyr474*) and c.2544G>A, p.(Met848Ile) in SLC4A11; all mutations, except for the FLNA and SLC4A11 c.2544G>A alleles, are novel. This is the frst study to use whole exome sequencing to discern the genetic etiology of a large cohort of patients with syndromic or isolated Peters anomaly. We report five new genes associated with this condition and suggest screening of TFAP2A and FLNA in patients with Peters anomaly and relevant syndromic features and HCCS, NDP and SLC4A11 in patients with isolated Peters anomaly. PMID:25182519

  11. Laser Desorption Mass Spectrometry for DNA Sequencing and Analysis

    NASA Astrophysics Data System (ADS)

    Chen, C. H. Winston; Taranenko, N. I.; Golovlev, V. V.; Isola, N. R.; Allman, S. L.

    1998-03-01

    Rapid DNA sequencing and/or analysis is critically important for biomedical research. In the past, gel electrophoresis has been the primary tool to achieve DNA analysis and sequencing. However, gel electrophoresis is a time-consuming and labor-extensive process. Recently, we have developed and used laser desorption mass spectrometry (LDMS) to achieve sequencing of ss-DNA longer than 100 nucleotides. With LDMS, we succeeded in sequencing DNA in seconds instead of hours or days required by gel electrophoresis. In addition to sequencing, we also applied LDMS for the detection of DNA probes for hybridization LDMS was also used to detect short tandem repeats for forensic applications. Clinical applications for disease diagnosis such as cystic fibrosis caused by base deletion and point mutation have also been demonstrated. Experimental details will be presented in the meeting. abstract.

  12. A sea urchin genome project: sequence scan, virtual map, and additional resources.

    PubMed

    Cameron, R A; Mahairas, G; Rast, J P; Martinez, P; Biondi, T R; Swartzell, S; Wallace, J C; Poustka, A J; Livingston, B T; Wray, G A; Ettensohn, C A; Lehrach, H; Britten, R J; Davidson, E H; Hood, L

    2000-08-15

    Results of a first-stage Sea Urchin Genome Project are summarized here. The species chosen was Strongylocentrotus purpuratus, a research model of major importance in developmental and molecular biology. A virtual map of the genome was constructed by sequencing the ends of 76,020 bacterial artificial chromosome (BAC) recombinants (average length, 125 kb). The BAC-end sequence tag connectors (STCs) occur an average of 10 kb apart, and, together with restriction digest patterns recorded for the same BAC clones, they provide immediate access to contigs of several hundred kilobases surrounding any gene of interest. The STCs survey >5% of the genome and provide the estimate that this genome contains approximately 27,350 protein-coding genes. The frequency distribution and canonical sequences of all middle and highly repetitive sequence families in the genome were obtained from the STCs as well. The 500-kb Hox gene complex of this species is being sequenced in its entirety. In addition, arrayed cDNA libraries of >10(5) clones each were constructed from every major stage of embryogenesis, several individual cell types, and adult tissues and are available to the community. The accumulated STC data and an expanding expressed sequence tag database (at present including >12, 000 sequences) have been reported to GenBank and are accessible on public web sites.

  13. Quality Control and Analysis of NGS RNA Sequencing Data.

    PubMed

    Quinn, Emma M; McManus, Ross

    2015-01-01

    Transcriptome sequencing, where RNA is isolated, converted to library of cDNA fragments, and sequenced using next-generation sequencing technology, has become the method of choice for the genome-wide characterization of mRNA levels. It offers a more accurate quantification of transcript levels than array-based methods, but also has the added benefit of allowing the discovery of novel gene/transcripts, alternative splice junctions, and novel RNAs. In addition, RNA sequencing may be used to investigate differential gene expression, allelic imbalance, eQTL mapping, RNA editing, RNA-protein interactions, and alternative splicing. A number of statistical methods and tools are available for differential expression analysis using RNA sequencing data and these are continually being developed and improved to handle more complex experimental designs. This chapter describes an example workflow for the quality control and analysis of raw RNA sequencing reads for the purposes of differential gene expression analysis, followed by pathway/enrichment analysis of significantly different genes. The methods and tools described are just one example of how this analysis can be conducted, but they can be applied to most standard RNA sequencing studies of differential gene expression. The methods covered are based on Illumina HiSeq single-end 50 bp reads. However, all programs used are capable of working with paired-end data, subsequent to minor adaptations.

  14. Auditory sequence analysis and phonological skill.

    PubMed

    Grube, Manon; Kumar, Sukhbinder; Cooper, Freya E; Turton, Stuart; Griffiths, Timothy D

    2012-11-01

    This work tests the relationship between auditory and phonological skill in a non-selected cohort of 238 school students (age 11) with the specific hypothesis that sound-sequence analysis would be more relevant to phonological skill than the analysis of basic, single sounds. Auditory processing was assessed across the domains of pitch, time and timbre; a combination of six standard tests of literacy and language ability was used to assess phonological skill. A significant correlation between general auditory and phonological skill was demonstrated, plus a significant, specific correlation between measures of phonological skill and the auditory analysis of short sequences in pitch and time. The data support a limited but significant link between auditory and phonological ability with a specific role for sound-sequence analysis, and provide a possible new focus for auditory training strategies to aid language development in early adolescence. PMID:22951739

  15. Sequencing and Analysis of Neanderthal Genomic DNA

    PubMed Central

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Pääbo, Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2008-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library are of Neanderthal origin, the strongest being the ascertainment of sequence identities between Neanderthal and chimpanzee at sites where the human genomic sequence is different. These results enabled us to calculate the human-Neanderthal divergence time based on multiple randomly distributed autosomal loci. Our analyses suggest that on average the Neanderthal genomic sequence we obtained and the reference human genome sequence share a most recent common ancestor ~706,000 years ago, and that the human and Neanderthal ancestral populations split ~370,000 years ago, before the emergence of anatomically modern humans. Our finding that the Neanderthal and human genomes are at least 99.5% identical led us to develop and successfully implement a targeted method for recovering specific ancient DNA sequences from metagenomic libraries. This initial analysis of the Neanderthal genome advances our understanding of the evolutionary relationship of Homo sapiens and Homo neanderthalensis and signifies the dawn of Neanderthal genomics. PMID:17110569

  16. RSAT 2015: Regulatory Sequence Analysis Tools.

    PubMed

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/.

  17. RSAT 2015: Regulatory Sequence Analysis Tools

    PubMed Central

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-01-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  18. RSAT 2015: Regulatory Sequence Analysis Tools.

    PubMed

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  19. Sequence analysis by iterated maps, a review.

    PubMed

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  20. Sequence analysis by iterated maps, a review.

    PubMed

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results. PMID:24162172

  1. Acid Rain Analysis by Standard Addition Titration.

    ERIC Educational Resources Information Center

    Ophardt, Charles E.

    1985-01-01

    The standard addition titration is a precise and rapid method for the determination of the acidity in rain or snow samples. The method requires use of a standard buret, a pH meter, and Gran's plot to determine the equivalence point. Experimental procedures used and typical results obtained are presented. (JN)

  2. Cladistic analysis of anuran POMC sequences.

    PubMed

    Alrubaian, Jasem; Danielson, Phillip; Walker, David; Dores, Robert M

    2002-03-01

    Procedures for performing cladistic analyses can provide powerful tools for understanding the evolution of neuropeptide and polypeptide hormone coding genes. These analyses can be done on either amino acid data sets or nucleotide data sets and can utilize several different algorithms that are dependent on distinct sets of operating assumptions and constraints. In some cases, the results of these analyses can be used to gauge phylogenetic relationships between taxa. Selecting the proper cladistic analysis strategy is dependent on the taxonomic level of analysis and the rate of evolution within the orthologous genes being evaluated. For example, previous studies have shown that the amino acid sequence of proopiomelanocortin (POMC), the common precursor for the melanocortins and beta-endorphin, can be used to resolve phylogenetic relationships at the class and order level. This study tested the hypothesis that POMC sequences could be used to resolve phylogenetic relationships at the family taxonomic level. Cladistic analyses were performed on amphibian POMC sequences characterized from the marine toad, Bufo marinus (family Bufonidae; this study), the spadefoot toad, Spea multiplicatus (family Pelobatidae), the African clawed frog, Xenopus laevis (family Pipidae) and the laughing frog, Rana ridibunda (family Ranidae). In these analyses the sequence of Australian lungfish POMC was used as the outgroup. The analyses were done at the amino acid level using the maximum parsimony algorithm and at the nucleotide level using the maximum likelihood algorithm. For the anuran POMC genes, analysis at the nucleotide level using the maximum likelihood algorithm generated a cladogram with higher bootstrap values than the maximum parsimony analysis of the POMC amino acid data set. For anuran POMC sequences, analysis of nucleotide sequences using the maximum likelihood algorithm would appear to be the preferred strategy for resolving phylogenetic relationships at the family taxonomic

  3. Electrophoretic analysis of Allium alien addition lines.

    PubMed

    Peffley, E B; Corgan, J N; Horak, K E; Tanksley, S D

    1985-12-01

    Meiotic pairing in an interspecific triploid of Allium cepa and A. fistulosum, 'Delta Giant', exhibits preferential pairing between the two A. cepa genomes, leaving the A. fistulosum genome as univalents. Multivalent pairing involving A. fistulosum chromosomes occurs at a low level, allowing for recombination between the genomes. Ten trisomies were recovered from the backcross of 'Delta Giant' x A. cepa cv., 'Temprana', representing a minimum of four of the eight possible alien addition lines. The alien addition lines possessed different A. fistulosum enzyme markers. Those markers, Adh-1, Idh-1 and Pgm-1 reside on different A. fistulosum chromosomes, whereas Pgi-1 and Idh-1 may be linked. Diploid, trisomic and hyperploid progeny were recovered that exhibited putative pink root resistance. The use of interspecific plants as a means to introgress A. fistulosum genes into A. cepa appears to be successful at both the trisomic and the diploid levels. If introgression can be accomplished using an interspecific triploid such as 'Delta Giant' to generate fertile alien addition lines and subsequent fertile diploids, or if introgression can be accomplished directly at the diploid level, this will have accomplished gene flow that has not been possible at the interspecific diploid level.

  4. Engineering of Schroedinger cat states by a sequence of displacements and photon additions or subtractions

    SciTech Connect

    Podoshvedov, S. A.

    2011-04-15

    A method to generate Schroedinger cat states in free propagating optical fields based on the use of displaced states (or displacement operators) is developed. Some optical schemes with photon-added coherent states are studied. The schemes are modifications of the general method based on a sequence of displacements and photon additions or subtractions adjusted to generate Schroedinger cat states of a larger size. The effects of detection inefficiency are taken into account.

  5. Additives

    NASA Technical Reports Server (NTRS)

    Smalheer, C. V.

    1973-01-01

    The chemistry of lubricant additives is discussed to show what the additives are chemically and what functions they perform in the lubrication of various kinds of equipment. Current theories regarding the mode of action of lubricant additives are presented. The additive groups discussed include the following: (1) detergents and dispersants, (2) corrosion inhibitors, (3) antioxidants, (4) viscosity index improvers, (5) pour point depressants, and (6) antifouling agents.

  6. Sequence analysis of the AAA protein family.

    PubMed Central

    Beyer, A.

    1997-01-01

    The AAA protein family, a recently recognized group of Walker-type ATPases, has been subjected to an extensive sequence analysis. Multiple sequence alignments revealed the existence of a region of sequence similarity, the so-called AAA cassette. The borders of this cassette were localized and within it, three boxes of a high degree of conservation were identified. Two of these boxes could be assigned to substantial parts of the ATP binding site (namely, to Walker motifs A and B); the third may be a portion of the catalytic center. Phylogenetic trees were calculated to obtain insights into the evolutionary history of the family. Subfamilies with varying degrees of intra-relatedness could be discriminated; these relationships are also supported by analysis of sequences outside the canonical AAA boxes: within the cassette are regions that are strongly conserved within each subfamily, whereas little or even no similarity between different subfamilies can be observed. These regions are well suited to define fingerprints for subfamilies. A secondary structure prediction utilizing all available sequence information was performed and the result was fitted to the general 3D structure of a Walker A/GTPase. The agreement was unexpectedly high and strongly supports the conclusion that the AAA family belongs to the Walker superfamily of A/GTPases. PMID:9336829

  7. Information theory applications for biological sequence analysis.

    PubMed

    Vinga, Susana

    2014-05-01

    Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.

  8. Sequence analysis by iterated maps, a review

    PubMed Central

    2014-01-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, ‘Chaos Game Representation’. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results. PMID:24162172

  9. In vivo generation of linear plasmids with addition of telomeric sequences by Histoplasma capsulatum.

    PubMed

    Woods, J P; Goldman, W E

    1992-12-01

    Histoplasma capsulatum is a dimorphic pathogenic fungus that is a major cause of respiratory and systemic mycosis. We previously developed a transformation system for Histoplasma and demonstrated chromosomal integration of transforming plasmid sequences. In this study, we describe another Histoplasma mechanism for maintaining transforming DNA i.e. the generation of modified, multicopy linear plasmids carrying DNA from the transforming Escherichia coli plasmid. Under selective conditions, these linear plasmids were stable and capable of retransforming Histoplasma without further modification. In vivo modification of the transforming DNA included duplication of plasmid sequence and telomeric addition at the termini of linear DNA. Apparently Histoplasma telomerase, like that of other organisms such as humans and Tetrahymena, is able to act on non-telomeric substrates. The terminus of a Histoplasma linear plasmid was cloned and shown to contain multiple repeats of GGGTTA, the telomeric repeat unit also found in vertebrates, trypanosomes, and slime moulds. PMID:1474902

  10. NexGen Production – Sequencing and Analysis

    SciTech Connect

    Muzny, Donna

    2010-06-02

    Donna Muzny of the Baylor College of Medicine Human Genome Sequencing Center discusses next generation sequencing platforms and evaluating pipeline performance on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  11. Protein sequence analysis using Hewlett-Packard biphasic sequencing cartridges in an applied biosystems 473A protein sequencer.

    PubMed

    Tang, S; Mozdzanowski, J; Anumula, K R

    1999-01-01

    Protein sequence analysis using an adsorptive biphasic sequencing cartridge, a set of two coupled columns introduced by Hewlett-Packard for protein sequencing by Edman degradation, in an Applied Biosystems 473A protein sequencer has been demonstrated. Samples containing salts, detergents, excipients, etc. (e.g., formulated protein drugs) can be easily analyzed using the ABI sequencer. Simple modifications to the ABI sequencer to accommodate the cartridge extend its utility in the analysis of difficult samples. The ABI sequencer solvents and reagents were compatible with the HP cartridge for sequencing. Sequence information up to ten residues can be easily generated by this nonoptimized procedure, and it is sufficient for identifying proteins by database search and for preparing a DNA probe for cloning novel proteins.

  12. Comparative sequence analysis for Brassica oleracea with similar sequences in B. rapa and Arabidopsis thaliana.

    PubMed

    Qiu, Dan; Gao, Muqiang; Li, Genyi; Quiros, Carlos

    2009-04-01

    We sequenced five BAC clones of Brassica oleracea doubled haploid 'Early Big' broccoli containing major genes in the aliphatic glucosinolate pathway, and comparatively analyzed them with similar sequences in A. thaliana and B. rapa. Additionally, we included in the analysis published sequences from three other B. oleracea BAC clones and a contig of this species corresponding to segments in A. thaliana chromosomes IV and V. A total of 2,946 kb of B. oleracea, 1,069 kb of B. rapa sequence and 2,607 kb of A. thaliana sequence were compared and analyzed. We found conserved collinearity for gene order and content restricted to specific chromosomal segments, but breaks in collinearity were frequent resulting in gene absence likely not due to gene loss but rearrangements. B. oleracea has the lowest gene density of the three species, followed by B. rapa. The genome expansion of the Brassica species, B. oleracea in particular, is due to larger introns and gene spacers resulting from frequent insertion of DNA transposons and retrotransposons. These findings are discussed in relation to the possible origin and evolution of the Brassica genomes.

  13. Analysis of Pteridium ribosomal RNA sequences by rapid direct sequencing.

    PubMed

    Tan, M K

    1991-08-01

    A total of 864 bases from 5 regions interspersed in the 18S and 26S rRNA molecules from various clones of Pteridium covering the general geographical distribution of the genus was analysed using a rapid rRNA sequencing technique. No base difference has been detected amongst the three major lineages, two of which apparently separated before the breakup of the ancient supercontinent, Pangaea. These regions of the rRNA sequences have thus been conserved for at least 160 million years and are here compared with other eukaryotic, especially plant rRNAs.

  14. Integrating Sequence Evolution into Probabilistic Orthology Analysis.

    PubMed

    Ullah, Ikram; Sjöstrand, Joel; Andersson, Peter; Sennblad, Bengt; Lagergren, Jens

    2015-11-01

    Orthology analysis, that is, finding out whether a pair of homologous genes are orthologs - stemming from a speciation - or paralogs - stemming from a gene duplication - is of central importance in computational biology, genome annotation, and phylogenetic inference. In particular, an orthologous relationship makes functional equivalence of the two genes highly likely. A major approach to orthology analysis is to reconcile a gene tree to the corresponding species tree, (most commonly performed using the most parsimonious reconciliation, MPR). However, most such phylogenetic orthology methods infer the gene tree without considering the constraints implied by the species tree and, perhaps even more importantly, only allow the gene sequences to influence the orthology analysis through the a priori reconstructed gene tree. We propose a sound, comprehensive Bayesian Markov chain Monte Carlo-based method, DLRSOrthology, to compute orthology probabilities. It efficiently sums over the possible gene trees and jointly takes into account the current gene tree, all possible reconciliations to the species tree, and the, typically strong, signal conveyed by the sequences. We compare our method with PrIME-GEM, a probabilistic orthology approach built on a probabilistic duplication-loss model, and MrBayesMPR, a probabilistic orthology approach that is based on conventional Bayesian inference coupled with MPR. We find that DLRSOrthology outperforms these competing approaches on synthetic data as well as on biological data sets and is robust to incomplete taxon sampling artifacts. PMID:26130236

  15. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    PubMed Central

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  16. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing.

    PubMed

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids.

  17. Multilocus sequence analysis (MLSA) in prokaryotic taxonomy.

    PubMed

    Glaeser, Stefanie P; Kämpfer, Peter

    2015-06-01

    To obtain a higher resolution of the phylogenetic relationships of species within a genus or genera within a family, multilocus sequence analysis (MLSA) is currently a widely used method. In MLSA studies, partial sequences of genes coding for proteins with conserved functions ('housekeeping genes') are used to generate phylogenetic trees and subsequently deduce phylogenies. However, MLSA is not only suggested as a phylogenetic tool to support and clarify the resolution of bacterial species with a higher resolution, as in 16S rRNA gene-based studies, but has also been discussed as a replacement for DNA-DNA hybridization (DDH) in species delineation. Nevertheless, despite the fact that MLSA has become an accepted and widely used method in prokaryotic taxonomy, no common generally accepted recommendations have been devised to date for either the whole area of microbial taxonomy or for taxa-specific applications of individual MLSA schemes. The different ways MLSA is performed can vary greatly for the selection of genes, their number, and the calculation method used when comparing the sequences obtained. Here, we provide an overview of the historical development of MLSA and critically review its current application in prokaryotic taxonomy by highlighting the advantages and disadvantages of the method's numerous variations. This provides a perspective for its future use in forthcoming genome-based genotypic taxonomic analyses.

  18. Exploration of phylogenetic data using a global sequence analysis method

    PubMed Central

    Chapus, Charles; Dufraigne, Christine; Edwards, Scott; Giron, Alain; Fertil, Bernard; Deschavanne, Patrick

    2005-01-01

    Background Molecular phylogenetic methods are based on alignments of nucleic or peptidic sequences. The tremendous increase in molecular data permits phylogenetic analyses of very long sequences and of many species, but also requires methods to help manage large datasets. Results Here we explore the phylogenetic signal present in molecular data by genomic signatures, defined as the set of frequencies of short oligonucleotides present in DNA sequences. Although violating many of the standard assumptions of traditional phylogenetic analyses – in particular explicit statements of homology inherent in character matrices – the use of the signature does permit the analysis of very long sequences, even those that are unalignable, and is therefore most useful in cases where alignment is questionable. We compare the results obtained by traditional phylogenetic methods to those inferred by the signature method for two genes: RAG1, which is easily alignable, and 18S RNA, where alignments are often ambiguous for some regions. We also apply this method to a multigene data set of 33 genes for 9 bacteria and one archea species as well as to the whole genome of a set of 16 γ-proteobacteria. In addition to delivering phylogenetic results comparable to traditional methods, the comparison of signatures for the sequences involved in the bacterial example identified putative candidates for horizontal gene transfers. Conclusion The signature method is therefore a fast tool for exploring phylogenetic data, providing not only a pretreatment for discovering new sequence relationships, but also for identifying cases of sequence evolution that could confound traditional phylogenetic analysis. PMID:16280081

  19. FAST: FAST Analysis of Sequences Toolbox

    PubMed Central

    Lawrence, Travis J.; Kauffman, Kyle T.; Amrine, Katherine C. H.; Carper, Dana L.; Lee, Raymond S.; Becich, Peter J.; Canales, Claudia J.; Ardell, David H.

    2015-01-01

    FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought. PMID:26042145

  20. FAST: FAST Analysis of Sequences Toolbox.

    PubMed

    Lawrence, Travis J; Kauffman, Kyle T; Amrine, Katherine C H; Carper, Dana L; Lee, Raymond S; Becich, Peter J; Canales, Claudia J; Ardell, David H

    2015-01-01

    FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought.

  1. Genome sequencing and analysis conference grant

    SciTech Connect

    Venter, J.C.

    1995-10-01

    The 14 plenary session presentations focused on nematode; yeast; fruit fly; plants; mycobacteria; and man. In addition there were presentations on a variety of technical innovations including database developments and refinements, bioelectronic genesensors, computer-assisted multiplex techniques, and hybridization analysis with DNA chip technology. This document includes a list of exhibitors and abstracts of sessions.

  2. Integrative visual analysis of protein sequence mutations

    PubMed Central

    2014-01-01

    Background An important aspect of studying the relationship between protein sequence, structure and function is the molecular characterization of the effect of protein mutations. To understand the functional impact of amino acid changes, the multiple biological properties of protein residues have to be considered together. Results Here, we present a novel visual approach for analyzing residue mutations. It combines different biological visualizations and integrates them with molecular data derived from external resources. To show various aspects of the biological information on different scales, our approach includes one-dimensional sequence views, three-dimensional protein structure views and two-dimensional views of residue interaction networks as well as aggregated views. The views are linked tightly and synchronized to reduce the cognitive load of the user when switching between them. In particular, the protein mutations are mapped onto the views together with further functional and structural information. We also assess the impact of individual amino acid changes by the detailed analysis and visualization of the involved residue interactions. We demonstrate the effectiveness of our approach and the developed software on the data provided for the BioVis 2013 data contest. Conclusions Our visual approach and software greatly facilitate the integrative and interactive analysis of protein mutations based on complementary visualizations. The different data views offered to the user are enriched with information about molecular properties of amino acid residues and further biological knowledge. PMID:25237389

  3. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  4. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi. PMID:26542221

  5. Additive interaction in survival analysis: use of the additive hazards model.

    PubMed

    Rod, Naja Hulvej; Lange, Theis; Andersen, Ingelise; Marott, Jacob Louis; Diderichsen, Finn

    2012-09-01

    It is a widely held belief in public health and clinical decision-making that interventions or preventive strategies should be aimed at patients or population subgroups where most cases could potentially be prevented. To identify such subgroups, deviation from additivity of absolute effects is the relevant measure of interest. Multiplicative survival models, such as the Cox proportional hazards model, are often used to estimate the association between exposure and risk of disease in prospective studies. In Cox models, deviations from additivity have usually been assessed by surrogate measures of additive interaction derived from multiplicative models-an approach that is both counter-intuitive and sometimes invalid. This paper presents a straightforward and intuitive way of assessing deviation from additivity of effects in survival analysis by use of the additive hazards model. The model directly estimates the absolute size of the deviation from additivity and provides confidence intervals. In addition, the model can accommodate both continuous and categorical exposures and models both exposures and potential confounders on the same underlying scale. To illustrate the approach, we present an empirical example of interaction between education and smoking on risk of lung cancer. We argue that deviations from additivity of effects are important for public health interventions and clinical decision-making, and such estimations should be encouraged in prospective studies on health. A detailed implementation guide of the additive hazards model is provided in the appendix.

  6. Multilocus sequence analysis of the family Halomonadaceae.

    PubMed

    de la Haba, Rafael R; Márquez, M Carmen; Papke, R Thane; Ventosa, Antonio

    2012-03-01

    Multilocus sequence analysis (MLSA) protocols have been developed for species circumscription for many taxa. However, at present, no studies based on MLSA have been performed within any moderately halophilic bacterial group. To test the usefulness of MLSA with these kinds of micro-organisms, the family Halomonadaceae, which includes mainly halophilic bacteria, was chosen as a model. This family comprises ten genera with validly published names and 85 species of environmental, biotechnological and clinical interest. In some cases, the phylogenetic relationships between members of this family, based on 16S rRNA gene sequence comparisons, are not clear and a deep phylogenetic analysis using several housekeeping genes seemed appropriate. Here, MLSA was applied using the 16S rRNA, 23S rRNA, atpA, gyrB, rpoD and secA genes for species of the family Halomonadaceae. Phylogenetic trees based on the individual and concatenated gene sequences revealed that the family Halomonadaceae formed a monophyletic group of micro-organisms within the order Oceanospirillales. With the exception of the genera Halomonas and Modicisalibacter, all other genera within this family were phylogenetically coherent. Five of the six studied genes (16S rRNA, 23S rRNA, gyrB, rpoD and secA) showed a consistent evolutionary history. However, the results obtained with the atpA gene were different; thus, this gene may not be considered useful as an individual gene phylogenetic marker within this family. The phylogenetic methods produced variable results, with those generated from the maximum-likelihood and neighbour-joining algorithms being more similar than those obtained by maximum-parsimony methods. Horizontal gene transfer (HGT) plays an important evolutionary role in the family Halomonadaceae; however, the impact of recombination events in the phylogenetic analysis was minimized by concatenating the six loci, which agreed with the current taxonomic scheme for this family. Finally, the findings of

  7. Whole-genome sequence-based analysis of thyroid function.

    PubMed

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function. PMID:25743335

  8. Whole-genome sequence-based analysis of thyroid function.

    PubMed

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

    2015-03-06

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function.

  9. Analysis of E. coli promoter sequences.

    PubMed Central

    Harley, C B; Reynolds, R P

    1987-01-01

    We have compiled and analyzed 263 promoters with known transcriptional start points for E. coli genes. Promoter elements (-35 hexamer, -10 hexamer, and spacing between these regions) were aligned by a program which selects the arrangement consistent with the start point and statistically most homologous to a reference list of promoters. The initial reference list was that of Hawley and McClure (Nucl. Acids Res. 11, 2237-2255, 1983). Alignment of the complete list was used for reference until successive analyses did not alter the structure of the list. In the final compilation, all bases in the -35 (TTGACA) and -10 (TATAAT) hexamers were highly conserved, 92% of promoters had inter-region spacing of 17 +/- 1 bp, and 75% of the uniquely defined start points initiated 7 +/- 1 bases downstream of the -10 region. The consensus sequence of promoters with inter-region spacing of 16, 17 or 18 bp did not differ. This compilation and analysis should be useful for studies of promoter structure and function and for programs which identify potential promoter sequences. PMID:3550697

  10. Computer analysis of HIV epitope sequences

    SciTech Connect

    Gupta, G.; Myers, G.

    1990-01-01

    Phylogenetic tree analysis provide us with important general information regarding the extent and rate of HIV variation. Currently we are attempting to extend computer analysis and modeling to the V3 loop of the type 2 virus and its simian homologues, especially in light of the prominent role the latter will play in animal model studies. Moreover, it might be possible to attack the slightly similar V4 loop by this approach. However, the strategy relies very heavily upon natural'' information and constraints, thus there exist severe limitations upon the general applicability, in addition to uncertainties with regard to long-range residue interactions. 5 refs., 3 figs.

  11. Time fluctuation analysis of forest fire sequences

    NASA Astrophysics Data System (ADS)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  12. An Imaging And Graphics Workstation For Image Sequence Analysis

    NASA Astrophysics Data System (ADS)

    Mostafavi, Hassan

    1990-01-01

    This paper describes an application-specific engineering workstation designed and developed to analyze imagery sequences from a variety of sources. The system combines the software and hardware environment of the modern graphic-oriented workstations with the digital image acquisition, processing and display techniques. The objective is to achieve automation and high throughput for many data reduction tasks involving metric studies of image sequences. The applications of such an automated data reduction tool include analysis of the trajectory and attitude of aircraft, missile, stores and other flying objects in various flight regimes including launch and separation as well as regular flight maneuvers. The workstation can also be used in an on-line or off-line mode to study three-dimensional motion of aircraft models in simulated flight conditions such as wind tunnels. The system's key features are: 1) Acquisition and storage of image sequences by digitizing real-time video or frames from a film strip; 2) computer-controlled movie loop playback, slow motion and freeze frame display combined with digital image sharpening, noise reduction, contrast enhancement and interactive image magnification; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored image sequence; 4) automatic and manual field-of-view and spatial calibration; 5) image sequence data base generation and management, including the measurement data products; 6) off-line analysis software for trajectory plotting and statistical analysis; 7) model-based estimation and tracking of object attitude angles; and 8) interface to a variety of video players and film transport sub-systems.

  13. Magnetic resonance imaging in partial epilepsy: additional abnormalities shown with the fluid attenuated inversion recovery (FLAIR) pulse sequence.

    PubMed Central

    Bergin, P S; Fish, D R; Shorvon, S D; Oatridge, A; deSouza, N M; Bydder, G M

    1995-01-01

    Thirty six patients with a history of partial epilepsy had MRI of the brain performed with conventional T1 and T2 weighted pulse sequences as well as the fluid attenuated inversion recovery (FLAIR) sequence. Abnormalities were found in 20 cases (56%), in whom there were 25 lesions or groups of lesions. Twenty four of these lesions were more conspicuous with the FLAIR sequence than with any of the conventional sequences. In 11 of these 20 cases, lesions thought to be of aetiological importance were only seen with the FLAIR sequence. In eight this was a solitary lesion. In the other three, an additional and apparently significant lesion (or lesions) was only seen with the FLAIR sequence when another lesion had been identified with both conventional and FLAIR sequences. The 11 additional lesions or groups of lesions were seen in the hippocampus, amygdala, cortex, or subcortical and periventricular regions. No lesion was found with any pulse sequence in 16 (44%) of the original group of 36 patients. In the eight cases where a lesion was seen only with the FLAIR sequence, localisation was concordant with the electroclinical features. Two of the eight patients with solitary lesions seen only on the FLAIR sequence underwent surgery, after which there was pathological confirmation of the abnormality identified with imaging. In one patient with a congenital cavernoma, the primary lesion was best seen with a contrast enhanced T1 weighted spin echo sequence. In this selected series, the FLAIR sequence increased the yield of MRI examinations of the brain by 30%. Images PMID:7738550

  14. Sequencing, Assembly and Analysis of Human Microbial Communities

    SciTech Connect

    Petrosino, Joe

    2010-06-04

    Joe Petrosino of Baylor College of Medicine discusses using next generation sequencing technologies to study human microbial communities associated with health and disease on June 4, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  15. Direct Chloroplast Sequencing: Comparison of Sequencing Platforms and Analysis Tools for Whole Chloroplast Barcoding

    PubMed Central

    Brozynska, Marta; Furtado, Agnelo; Henry, Robert James

    2014-01-01

    Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis. PMID:25329378

  16. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding.

    PubMed

    Brozynska, Marta; Furtado, Agnelo; Henry, Robert James

    2014-01-01

    Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.

  17. Schlieren sequence analysis using computer vision

    NASA Astrophysics Data System (ADS)

    Smith, Nathanial Timothy

    Computer vision-based methods are proposed for extraction and measurement of flow structures of interest in schlieren video. As schlieren data has increased with faster frame rates, we are faced with thousands of images to analyze. This presents an opportunity to study global flow structures over time that may not be evident from surface measurements. A degree of automation is desirable to extract flow structures and features to give information on their behavior through the sequence. Using an interdisciplinary approach, the analysis of large schlieren data is recast as a computer vision problem. The double-cone schlieren sequence is used as a testbed for the methodology; it is unique in that it contains 5,000 images, complex phenomena, and is feature rich. Oblique structures such as shock waves and shear layers are common in schlieren images. A vision-based methodology is used to provide an estimate of oblique structure angles through the unsteady sequence. The methodology has been applied to a complex flowfield with multiple shocks. A converged detection success rate between 94% and 97% for these structures is obtained. The modified curvature scale space is used to define features at salient points on shock contours. A challenge in developing methods for feature extraction in schlieren images is the reconciliation of existing techniques with features of interest to an aerodynamicist. Domain-specific knowledge of physics must therefore be incorporated into the definition and detection phases. Known location and physically possible structure representations form a knowledge base that provides a unique feature definition and extraction. Model tip location and the motion of a shock intersection across several thousand frames are identified, localized, and tracked. Images are parsed into physically meaningful labels using segmentation. Using this representation, it is shown that in the double-cone flowfield, the dominant unsteady motion is associated with large scale

  18. Now and Next-Generation Sequencing Techniques: Future of Sequence Analysis Using Cloud Computing

    PubMed Central

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed “cloud computing”) has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows. PMID:23248640

  19. Now and next-generation sequencing techniques: future of sequence analysis using cloud computing.

    PubMed

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed "cloud computing") has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows.

  20. Phylogenetic analysis of burkholderia species by multilocus sequence analysis.

    PubMed

    Estrada-de los Santos, Paulina; Vinuesa, Pablo; Martínez-Aguilar, Lourdes; Hirsch, Ann M; Caballero-Mellado, Jesús

    2013-07-01

    Burkholderia comprises more than 60 species of environmental, clinical, and agro-biotechnological relevance. Previous phylogenetic analyses of 16S rRNA, recA, gyrB, rpoB, and acdS gene sequences as well as genome sequence comparisons of different Burkholderia species have revealed two major species clusters. In this study, we undertook a multilocus sequence analysis of 77 type and reference strains of Burkholderia using atpD, gltB, lepA, and recA genes in combination with the 16S rRNA gene sequence and employed maximum likelihood and neighbor-joining criteria to test this further. The phylogenetic analysis revealed, with high supporting values, distinct lineages within the genus Burkholderia. The two large groups were named A and B, whereas the B. rhizoxinica/B. endofungorum, and B. andropogonis groups consisted of two and one species, respectively. The group A encompasses several plant-associated and saprophytic bacterial species. The group B comprises the B. cepacia complex (opportunistic human pathogens), the B. pseudomallei subgroup, which includes both human and animal pathogens, and an assemblage of plant pathogenic species. The distinct lineages present in Burkholderia suggest that each group might represent a different genus. However, it will be necessary to analyze the full set of Burkholderia species and explore whether enough phenotypic features exist among the different clusters to propose that these groups should be considered separate genera.

  1. DNA sequence analysis by MALDI mass spectrometry.

    PubMed Central

    Kirpekar, F; Nordhoff, E; Larsen, L K; Kristiansen, K; Roepstorff, P; Hillenkamp, F

    1998-01-01

    Conventional DNA sequencing is based on gel electrophoretic separation of the sequencing products. Gel casting and electrophoresis are the time limiting steps, and the gel separation is occasionally imperfect due to aberrant mobility of certain fragments, leading to erroneous sequence determination. Furthermore, illegitimately terminated products frequently cannot be distinguished from correctly terminated ones, a phenomenon that also obscures data interpretation. In the present work the use of MALDI mass spectrometry for sequencing of DNA amplified from clinical samples is implemented. The unambiguous and fast identification of deletions and substitutions in DNA amplified from heterozygous carriers realistically suggest MALDI mass spectrometry as a future alternative to conventional sequencing procedures for high throughput screening for mutations. Unique features of the method are demonstrated by sequencing a DNA fragment that could not be sequenced conventionally because of gel electrophoretic band compression and the presence of multiple non-specific termination products. Taking advantage of the accurate mass information provided by MALDI mass spectrometry, the sequence was deduced, and the nature of the non-specific termination could be determined. The method described here increases the fidelity in DNA sequencing, is fast, compatible with standard DNA sequencing procedures, and amenable to automation. PMID:9592136

  2. Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

    PubMed Central

    2012-01-01

    Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742

  3. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  4. Project Report: Automatic Sequence Processor Software Analysis

    NASA Technical Reports Server (NTRS)

    Benjamin, Brandon

    2011-01-01

    The Mission Planning and Sequencing (MPS) element of Multi-Mission Ground System and Services (MGSS) provides space missions with multi-purpose software to plan spacecraft activities, sequence spacecraft commands, and then integrate these products and execute them on spacecraft. Jet Propulsion Laboratory (JPL) is currently is flying many missions. The processes for building, integrating, and testing the multi-mission uplink software need to be improved to meet the needs of the missions and the operations teams that command the spacecraft. The Multi-Mission Sequencing Team is responsible for collecting and processing the observations, experiments and engineering activities that are to be performed on a selected spacecraft. The collection of these activities is called a sequence and ultimately a sequence becomes a sequence of spacecraft commands. The operations teams check the sequence to make sure that no constraints are violated. The workflow process involves sending a program start command, which activates the Automatic Sequence Processor (ASP). The ASP is currently a file-based system that is comprised of scripts written in perl, c-shell and awk. Once this start process is complete, the system checks for errors and aborts if there are any; otherwise the system converts the commands to binary, and then sends the resultant information to be radiated to the spacecraft.

  5. Nucleotide deletion and P addition in V(D)J recombination: a determinant role of the coding-end sequence.

    PubMed Central

    Nadel, B; Feeney, A J

    1997-01-01

    During V(D)J recombination, the coding ends to be joined are extensively modified. Those modifications, termed coding-end processing, consist of removal and addition of various numbers of nucleotides. We previously showed in vivo that coding-end processing is specific for each coding end, suggesting that specific motifs in a coding-end sequence influence nucleotide deletion and P-region formation. In this study, we created a panel of recombination substrates containing actual immunoglobulin and T-cell receptor coding-end sequences and dissected the role of each motif by comparing its processing pattern with those of variants containing minimal nucleotide changes from the original sequence. Our results demonstrate the determinant role of specific sequence motifs on coding-end processing and also the importance of the context in which they are found. We show that minimal nucleotide changes in key positions of a coding-end sequence can result in dramatic changes in the processing pattern. We propose that each coding-end sequence dictates a unique hairpin structure, the result of a particular energy conformation between nucleotides organizing the loop and the stem, and that the interplay between this structure and specific sequence motifs influences the frequency and location of nicks which open the coding-end hairpin. These findings indicate that the sequences of the coding ends determine their own processing and have a profound impact on the development of the primary B- and T-cell repertoires. PMID:9199310

  6. Computed Tomography Inspection and Analysis for Additive Manufacturing Components

    NASA Technical Reports Server (NTRS)

    Beshears, Ronald D.

    2016-01-01

    Computed tomography (CT) inspection was performed on test articles additively manufactured from metallic materials. Metallic AM and machined wrought alloy test articles with programmed flaws were inspected using a 2MeV linear accelerator based CT system. Performance of CT inspection on identically configured wrought and AM components and programmed flaws was assessed using standard image analysis techniques to determine the impact of additive manufacturing on inspectability of objects with complex geometries.

  7. Automated shielding analysis sequences for spent fuel casks

    SciTech Connect

    Tang, J.S.; Parks, C.V.; Hermann, O.W.

    1987-01-01

    Two important Shielding Analysis Sequences (SAS) have recently been developed within the SCALE computational system. These sequences significantly enhance the existing SCALE system capabilities for evaluating radiation doses exterior to spent fuel casks. These new control module sequences (SAS1 and SAS4) and their capabilities are discussed and demonstrated, together with the existing SAS2 sequence that is used to generate radiation sources for spent fuel. Particular attention is given to the new SAS4 sequence which provides an automated scheme for generating and using biasing parameters in a subsequent Monte Carlo analysis of a cask.

  8. Additivity in the Analysis and Design of HIV Protease Inhibitors

    PubMed Central

    Jorissen, Robert N.; Kiran Kumar Reddy, G. S.; Ali, Akbar; Altman, Michael D.; Chellappan, Sripriya; Anjum, Saima G.; Tidor, Bruce; Schiffer, Celia A.; Rana, Tariq M.; Gilson, Michael K.

    2009-01-01

    We explore the applicability of an additive treatment of substituent effects to the analysis and design of HIV protease inhibitors. Affinity data for a set of inhibitors with a common chemical framework were analyzed to provide estimates of the free energy contribution of each chemical substituent. These estimates were then used to design new inhibitors, whose high affinities were confirmed by synthesis and experimental testing. Derivations of additive models by least-squares and ridge-regression methods were found to yield statistically similar results. The additivity approach was also compared with standard molecular descriptor-based QSAR; the latter was not found to provide superior predictions. Crystallographic studies of HIV protease-inhibitor complexes help explain the perhaps surprisingly high degree of substituent additivity in this system, and allow some of the additivity coefficients to be rationalized on a structural basis. PMID:19193159

  9. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2001-06-05

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  10. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  11. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1999-10-26

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  12. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  13. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2003-08-19

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  14. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, M.S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device. 27 figs.

  15. Targeted DNA methylation analysis by next-generation sequencing.

    PubMed

    Masser, Dustin R; Stanford, David R; Freeman, Willard M

    2015-02-24

    The role of epigenetic processes in the control of gene expression has been known for a number of years. DNA methylation at cytosine residues is of particular interest for epigenetic studies as it has been demonstrated to be both a long lasting and a dynamic regulator of gene expression. Efforts to examine epigenetic changes in health and disease have been hindered by the lack of high-throughput, quantitatively accurate methods. With the advent and popularization of next-generation sequencing (NGS) technologies, these tools are now being applied to epigenomics in addition to existing genomic and transcriptomic methodologies. For epigenetic investigations of cytosine methylation where regions of interest, such as specific gene promoters or CpG islands, have been identified and there is a need to examine significant numbers of samples with high quantitative accuracy, we have developed a method called Bisulfite Amplicon Sequencing (BSAS). This method combines bisulfite conversion with targeted amplification of regions of interest, transposome-mediated library construction and benchtop NGS. BSAS offers a rapid and efficient method for analysis of up to 10 kb of targeted regions in up to 96 samples at a time that can be performed by most research groups with basic molecular biology skills. The results provide absolute quantitation of cytosine methylation with base specificity. BSAS can be applied to any genomic region from any DNA source. This method is useful for hypothesis testing studies of target regions of interest as well as confirmation of regions identified in genome-wide methylation analyses such as whole genome bisulfite sequencing, reduced representation bisulfite sequencing, and methylated DNA immunoprecipitation sequencing.

  16. Comparative Analysis of Genome Sequences Covering the Seven Cronobacter Species

    PubMed Central

    Cummings, Craig A.; Shih, Rita; Degoricija, Lovorka; Rico, Alain; Brzoska, Pius; Hamby, Stephen E.; Masood, Naqash; Hariri, Sumyya; Sonbol, Hana; Chuzhanova, Nadia; McClelland, Michael; Furtado, Manohar R.; Forsythe, Stephen J.

    2012-01-01

    Background Species of Cronobacter are widespread in the environment and are occasional food-borne pathogens associated with serious neonatal diseases, including bacteraemia, meningitis, and necrotising enterocolitis. The genus is composed of seven species: C. sakazakii, C. malonaticus, C. turicensis, C. dublinensis, C. muytjensii, C. universalis, and C. condimenti. Clinical cases are associated with three species, C. malonaticus, C. turicensis and, in particular, with C. sakazakii multilocus sequence type 4. Thus, it is plausible that virulence determinants have evolved in certain lineages. Methodology/Principal Findings We generated high quality sequence drafts for eleven Cronobacter genomes representing the seven Cronobacter species, including an ST4 strain of C. sakazakii. Comparative analysis of these genomes together with the two publicly available genomes revealed Cronobacter has over 6,000 genes in one or more strains and over 2,000 genes shared by all Cronobacter. Considerable variation in the presence of traits such as type six secretion systems, metal resistance (tellurite, copper and silver), and adhesins were found. C. sakazakii is unique in the Cronobacter genus in encoding genes enabling the utilization of exogenous sialic acid which may have clinical significance. The C. sakazakii ST4 strain 701 contained additional genes as compared to other C. sakazakii but none of them were known specific virulence-related genes. Conclusions/Significance Genome comparison revealed that pair-wise DNA sequence identity varies between 89 and 97% in the seven Cronobacter species, and also suggested various degrees of divergence. Sets of universal core genes and accessory genes unique to each strain were identified. These gene sequences can be used for designing genus/species specific detection assays. Genes encoding adhesins, T6SS, and metal resistance genes as well as prophages are found in only subsets of genomes and have contributed considerably to the variation of

  17. Optimal Multicomponent Analysis Using the Generalized Standard Addition Method.

    ERIC Educational Resources Information Center

    Raymond, Margaret; And Others

    1983-01-01

    Describes an experiment on the simultaneous determination of chromium and magnesium by spectophotometry modified to include the Generalized Standard Addition Method computer program, a multivariate calibration method that provides optimal multicomponent analysis in the presence of interference and matrix effects. Provides instructions for…

  18. Pathway analysis with next-generation sequencing data.

    PubMed

    Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric; Xiong, Momiao

    2015-04-01

    Although pathway analysis methods have been developed and successfully applied to association studies of common variants, the statistical methods for pathway-based association analysis of rare variants have not been well developed. Many investigators observed highly inflated false-positive rates and low power in pathway-based tests of association of rare variants. The inflated false-positive rates and low true-positive rates of the current methods are mainly due to their lack of ability to account for gametic phase disequilibrium. To overcome these serious limitations, we develop a novel statistic that is based on the smoothed functional principal component analysis (SFPCA) for pathway association tests with next-generation sequencing data. The developed statistic has the ability to capture position-level variant information and account for gametic phase disequilibrium. By intensive simulations, we demonstrate that the SFPCA-based statistic for testing pathway association with either rare or common or both rare and common variants has the correct type 1 error rates. Also the power of the SFPCA-based statistic and 22 additional existing statistics are evaluated. We found that the SFPCA-based statistic has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the SFPCA-based statistic is applied to pathway analysis of exome sequencing data in the early-onset myocardial infarction (EOMI) project. We identify three pathways significantly associated with EOMI after the Bonferroni correction. In addition, our preliminary results show that the SFPCA-based statistic has much smaller P-values to identify pathway association than other existing methods. PMID:24986826

  19. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    ERIC Educational Resources Information Center

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  20. [Tabular excel editor for analysis of aligned nucleotide sequences].

    PubMed

    Demkin, V V

    2010-01-01

    Excel platform was used for transition of results of multiple aligned nucleotide sequences obtained using the BLAST network service to the form appropriate for visual analysis and editing. Two macros operators for MS Excel 2007 were constructed. The array of aligned sequences transformed into Excel table and processed using macros operators is more appropriate for analysis than initial html data.

  1. Relationships among genera of the Saccharomycotina from multigene sequence analysis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Most known species of the subphylum Saccharomycotina (budding ascomycetous yeasts) have now been placed in phylogenetically defined clades following multigene sequence analysis. Terminal clades, which are usually well supported from bootstrap analysis, are viewed as phylogenetically circumscribed ge...

  2. Establishing a framework for comparative analysis of genome sequences

    SciTech Connect

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  3. Analysis of sequence diversity through internal transcribed spacers and simple sequence repeats to identify Dendrobium species.

    PubMed

    Liu, Y T; Chen, R K; Lin, S J; Chen, Y C; Chin, S W; Chen, F C; Lee, C Y

    2014-04-08

    The Orchidaceae is one of the largest and most diverse families of flowering plants. The Dendrobium genus has high economic potential as ornamental plants and for medicinal purposes. In addition, the species of this genus are able to produce large crops. However, many Dendrobium varieties are very similar in outward appearance, making it difficult to distinguish one species from another. This study demonstrated that the 12 Dendrobium species used in this study may be divided into 2 groups by internal transcribed spacer (ITS) sequence analysis. Red and yellow flowers may also be used to separate these species into 2 main groups. In particular, the deciduous characteristic is associated with the ITS genetic diversity of the A group. Of 53 designed simple sequence repeat (SSR) primer pairs, 7 pairs were polymorphic for polymerase chain reaction products that were amplified from a specific band. The results of this study demonstrate that these 7 SSR primer pairs may potentially be used to identify Dendrobium species and their progeny in future studies.

  4. Sequencing and computational analysis of complete genome sequences of Citrus yellow mosaic badna virus from acid lime and pummelo.

    PubMed

    Borah, Basanta K; Johnson, A M Anthony; Sai Gopal, D V R; Dasgupta, Indranil

    2009-08-01

    Citrus yellow mosaic badna virus (CMBV), a member of the Family Caulimoviridae, Genus Badnavirus, is the causative agent of Citrus mosaic disease in India. Although the virus has been detected in several citrus species, only two full-length genomes, one each from Sweet orange and Rangpur lime, are available in publicly accessible databases. In order to obtain a better understanding of the genetic variability of the virus in other citrus mosaic-affected citrus species, we performed the cloning and sequence analysis of complete genomes of CMBV from two additional citrus species, Acid lime and Pummelo. We show that CMBV genomes from the two hosts share high homology with previously reported CMBV sequences and hence conclude that the new isolates represent variants of the virus present in these species. Based on in silico sequence analysis, we predict the possible function of the protein encoded by one of the five ORFs.

  5. Additional data for a new Theileria sp. from China based on the sequences of ribosomal RNA internal transcribed spacers.

    PubMed

    Liu, Junlong; Guan, Guiquan; Liu, Zhijie; Liu, Aihong; Ma, Miling; Bai, Qi; Yin, Hong; Luo, Jianxun

    2013-02-01

    Theileria sinensis was recently isolated and named as an independent Theileria species that infects cattle in China. To date, this parasite has been described based on its morphology, transmission and molecular studies, indicating that it should be classified as a distinct species. To test the validity of this taxon, the two internal transcribed spacers (ITS1 and ITS2) and the 5.8S rRNA gene were cloned and sequenced from three T. sinensis isolates. The complete ITS sequences were compared with those of other Theileria sp. available in GenBank. Phylogenetic analyses based on sequence data for the complete ITS sequences indicate that T. sinensis lies in a distinct clade that is separate from that of T. buffeli/orientalis and T. annulata. Sequence comparisons indicate that different T. sinensis isolates possess unique sizes of ITS1 and ITS2 as well as species-specific nucleotide sequences. This analysis provides new molecular data to support the classification of T. sinensis as a distinct species from other known Theileria spp. based on ITS sequences.

  6. Wavelet Analysis on Symbolic Sequences and Two-Fold de Bruijn Sequences

    NASA Astrophysics Data System (ADS)

    Osipov, V. Al.

    2016-07-01

    The concept of symbolic sequences play important role in study of complex systems. In the work we are interested in ultrametric structure of the set of cyclic sequences naturally arising in theory of dynamical systems. Aimed at construction of analytic and numerical methods for investigation of clusters we introduce operator language on the space of symbolic sequences and propose an approach based on wavelet analysis for study of the cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a formula for counting of two-fold de Bruijn sequences, the extension of the notion of de Bruijn sequences. Possible advantages of the developed description is also discussed in context of applied problem of construction of efficient DNA sequence assembly algorithms.

  7. A global analysis of soil acidification caused by nitrogen addition

    NASA Astrophysics Data System (ADS)

    Tian, Dashuan; Niu, Shuli

    2015-02-01

    Nitrogen (N) deposition-induced soil acidification has become a global problem. However, the response patterns of soil acidification to N addition and the underlying mechanisms remain far from clear. Here, we conducted a meta-analysis of 106 studies to reveal global patterns of soil acidification in responses to N addition. We found that N addition significantly reduced soil pH by 0.26 on average globally. However, the responses of soil pH varied with ecosystem types, N addition rate, N fertilization forms, and experimental durations. Soil pH decreased most in grassland, whereas boreal forest was not observed a decrease to N addition in soil acidification. Soil pH decreased linearly with N addition rates. Addition of urea and NH4NO3 contributed more to soil acidification than NH4-form fertilizer. When experimental duration was longer than 20 years, N addition effects on soil acidification diminished. Environmental factors such as initial soil pH, soil carbon and nitrogen content, precipitation, and temperature all influenced the responses of soil pH. Base cations of Ca2+, Mg2+ and K+ were critical important in buffering against N-induced soil acidification at the early stage. However, N addition has shifted global soils into the Al3+ buffering phase. Overall, this study indicates that acidification in global soils is very sensitive to N deposition, which is greatly modified by biotic and abiotic factors. Global soils are now at a buffering transition from base cations (Ca2+, Mg2+ and K+) to non-base cations (Mn2+ and Al3+). This calls our attention to care about the limitation of base cations and the toxic impact of non-base cations for terrestrial ecosystems with N deposition.

  8. Sustainable nutrients recovery and recycling by optimizing the chemical addition sequence for struvite precipitation from raw swine slurries.

    PubMed

    Taddeo, Raffaele; Kolppo, Kari; Lepistö, Raghida

    2016-09-15

    Livestock farming contributes heavily to nitrogen (N) and phosphorus (P) flows into the environment, a major cause of eutrophication of coastal and freshwater systems. Furthermore, the growing demand for N-P fertilizers is increasing the emission of anthropogenic reactive N into the atmosphere and the depletion of the current P reserves. Therefore, it is essential to minimize the anthropogenic impact on the environment and recycle the wasted N-P for agricultural reuse. This study focused on enhancing struvite (MgNH4PO4*6H2O) precipitation from raw swine slurries in batch and laboratory-scale reactors. Different chemical addition sequences were evaluated, and the best removal efficiency (E%) was obtained when the chemicals were mixed before the precipitation process. Struvite was detected at a pH as low as 6 (E%N-P∼50%), and high E%N-P was found at pH 7-9.5 (80-95%). Furthermore, air stripping was used in place of NaOH to adjust pH, returning the same efficiency as if only alkali had been used. XRD and FE-SEM analysis of the precipitate showed that the recovered struvite was of high purity with orthorhombic crystalline structure and only trace amounts of impurities from matrix organics, co-precipitation products (CaO and amorphous calcium-phosphates), and residuals of added chemicals (MgO). PMID:27208994

  9. Sustainable nutrients recovery and recycling by optimizing the chemical addition sequence for struvite precipitation from raw swine slurries.

    PubMed

    Taddeo, Raffaele; Kolppo, Kari; Lepistö, Raghida

    2016-09-15

    Livestock farming contributes heavily to nitrogen (N) and phosphorus (P) flows into the environment, a major cause of eutrophication of coastal and freshwater systems. Furthermore, the growing demand for N-P fertilizers is increasing the emission of anthropogenic reactive N into the atmosphere and the depletion of the current P reserves. Therefore, it is essential to minimize the anthropogenic impact on the environment and recycle the wasted N-P for agricultural reuse. This study focused on enhancing struvite (MgNH4PO4*6H2O) precipitation from raw swine slurries in batch and laboratory-scale reactors. Different chemical addition sequences were evaluated, and the best removal efficiency (E%) was obtained when the chemicals were mixed before the precipitation process. Struvite was detected at a pH as low as 6 (E%N-P∼50%), and high E%N-P was found at pH 7-9.5 (80-95%). Furthermore, air stripping was used in place of NaOH to adjust pH, returning the same efficiency as if only alkali had been used. XRD and FE-SEM analysis of the precipitate showed that the recovered struvite was of high purity with orthorhombic crystalline structure and only trace amounts of impurities from matrix organics, co-precipitation products (CaO and amorphous calcium-phosphates), and residuals of added chemicals (MgO).

  10. [Kinetic analysis of additive effect on desulfurization activity].

    PubMed

    Han, Kui-hua; Zhao, Jian-li; Lu, Chun-mei; Wang, Yong-zheng; Zhao, Gai-ju; Cheng, Shi-qing

    2006-02-01

    The additive effects of A12O3, Fe2O3 and MnCO3 on CaO sulfation kinetics were investigated by thermogravimetic analysis method and modified grain model. The activation energy (Ea) and the pre-exponential factor (k0) of surface reaction, the activation energy (Ep) and the pre-exponential factor (D0) of product layer diffusion reaction were calculated according to the model. Additions of MnCO3 can enhance the initial reaction rate, product layer diffusion and the final CaO conversion of sorbents, the effect mechanism of which is similar to that of Fe2O3. The method based isokinetic temperature Ts and activation energy can not estimate the contribution of additive to the sulfation reactivity, the rate constant of the surface reaction (k), and the effective diffusivity of reactant in the product layer (Ds) under certain experimental conditions can reflect the effect of additives on the activation. Unstoichiometric metal oxide may catalyze the surface reaction and promote the diffusivity of reactant in the product layer by the crystal defect and distinct diffusion of cation and anion. According to the mechanism and effect of additive on the sulfation, the effective temperature and the stoichiometric relation of reaction, it is possible to improve the utilization of sorbent by compounding more additives to the calcium-based sorbent.

  11. Modern Computational Techniques for the HMMER Sequence Analysis

    PubMed Central

    2013-01-01

    This paper focuses on the latest research and critical reviews on modern computing architectures, software and hardware accelerated algorithms for bioinformatics data analysis with an emphasis on one of the most important sequence analysis applications—hidden Markov models (HMM). We show the detailed performance comparison of sequence analysis tools on various computing platforms recently developed in the bioinformatics society. The characteristics of the sequence analysis, such as data and compute-intensive natures, make it very attractive to optimize and parallelize by using both traditional software approach and innovated hardware acceleration technologies. PMID:25937944

  12. DNA sequence-based analysis of the Pseudomonas species.

    PubMed

    Mulet, Magdalena; Lalucat, Jorge; García-Valdés, Elena

    2010-06-01

    Partial sequences of four core 'housekeeping' genes (16S rRNA, gyrB, rpoB and rpoD) of the type strains of 107 Pseudomonas species were analysed in order to obtain a comprehensive view regarding the phylogenetic relationships within the Pseudomonas genus. Gene trees allowed the discrimination of two lineages or intrageneric groups (IG), called IG P. aeruginosa and IG P. fluorescens. The first IG P. aeruginosa, was divided into three main groups, represented by the species P. aeruginosa, P. stutzeri and P. oleovorans. The second IG was divided into six groups, represented by the species P. fluorescens, P. syringae, P. lutea, P. putida, P. anguilliseptica and P. straminea. The P. fluorescens group was the most complex and included nine subgroups, represented by the species P. fluorescens, P. gessardi, P. fragi, P. mandelii, P. jesseni, P. koreensis, P. corrugata, P. chlororaphis and P. asplenii. Pseudomonas rhizospherae was affiliated with the P. fluorescens IG in the phylogenetic analysis but was independent of any group. Some species were located on phylogenetic branches that were distant from defined clusters, such as those represented by the P. oryzihabitans group and the type strains P. pachastrellae, P. pertucinogena and P. luteola. Additionally, 17 strains of P. aeruginosa, 'P. entomophila', P. fluorescens, P. putida, P. syringae and P. stutzeri, for which genome sequences have been determined, have been included to compare the results obtained in the analysis of four housekeeping genes with those obtained from whole genome analyses.

  13. DNA sequence-based analysis of the Pseudomonas species.

    PubMed

    Mulet, Magdalena; Lalucat, Jorge; García-Valdés, Elena

    2010-06-01

    Partial sequences of four core 'housekeeping' genes (16S rRNA, gyrB, rpoB and rpoD) of the type strains of 107 Pseudomonas species were analysed in order to obtain a comprehensive view regarding the phylogenetic relationships within the Pseudomonas genus. Gene trees allowed the discrimination of two lineages or intrageneric groups (IG), called IG P. aeruginosa and IG P. fluorescens. The first IG P. aeruginosa, was divided into three main groups, represented by the species P. aeruginosa, P. stutzeri and P. oleovorans. The second IG was divided into six groups, represented by the species P. fluorescens, P. syringae, P. lutea, P. putida, P. anguilliseptica and P. straminea. The P. fluorescens group was the most complex and included nine subgroups, represented by the species P. fluorescens, P. gessardi, P. fragi, P. mandelii, P. jesseni, P. koreensis, P. corrugata, P. chlororaphis and P. asplenii. Pseudomonas rhizospherae was affiliated with the P. fluorescens IG in the phylogenetic analysis but was independent of any group. Some species were located on phylogenetic branches that were distant from defined clusters, such as those represented by the P. oryzihabitans group and the type strains P. pachastrellae, P. pertucinogena and P. luteola. Additionally, 17 strains of P. aeruginosa, 'P. entomophila', P. fluorescens, P. putida, P. syringae and P. stutzeri, for which genome sequences have been determined, have been included to compare the results obtained in the analysis of four housekeeping genes with those obtained from whole genome analyses. PMID:20192968

  14. Stratigraphic sequence analysis of the Antler foreland

    SciTech Connect

    Silberling, N.J.; Nichols, K.M.; Macke, D.L. )

    1993-04-01

    Mid-Upper Devonian to Upper Mississippian strata in western Utah were deposited in the distal Antler foreland. They record lateral and vertical changes in depositional environments that define five successive stratigraphic sequences, each representing a third-order transgressive-regressive cycle. In ascending order, these sequences are informally named the Langenheim (LA) of late Frasnian to mid-Famennian age, the Gutschick (GU) of late Famennian to early Kinderhookian age, the Morris (MO) of late Kinderhookian age; the Sadlick (SA) of Osagean to early Meramecian age, and the Maughan (MA) of mid-Meramecian to Chesterian age. MO is widespread and recognized within carbonate rocks of the Fitchville Formation and Joana Limestone. SA formed in concert with and to the east and south of the Wendover foreland high; the Delle phosphatic event marks maximum marine flooding during SA deposition. The transgressive systems tract of MA includes rhythmic-bedded limestone in the upper part of the Deseret Limestone in west-central Utah and, farther west, the hypoxic limestone and black shale of the Skunk Spring Limestone Bed and part of the overlying Chainman Shale. Traced westward into Nevada, MA first oversteps SA and then MO. Lithostratigraphic correlation of these sequences still farther west into the Eureka thrust belt (ETB) could mean that the youngest strata truncated by the Roberts Mountains thrust belong to the MA and that this thrust is simply part of the post-Mississippian ETB. However, some strata in central Nevada that lithically resemble those of the MA are paleontologically dated as Early Mississippian, the age of sequences overstepped by MA not far to the east. Thus, at least some imbricates of the ETB may contain a sequence stratigraphy which reflects local tectonic control.

  15. Bioinformatics Pipeline for Transcriptome Sequencing Analysis.

    PubMed

    Djebali, Sarah; Wucher, Valentin; Foissac, Sylvain; Hitte, Christophe; Corre, Evan; Derrien, Thomas

    2017-01-01

    The development of High Throughput Sequencing (HTS) for RNA profiling (RNA-seq) has shed light on the diversity of transcriptomes. While RNA-seq is becoming a de facto standard for monitoring the population of expressed transcripts in a given condition at a specific time, processing the huge amount of data it generates requires dedicated bioinformatics programs. Here, we describe a standard bioinformatics protocol using state-of-the-art tools, the STAR mapper to align reads onto a reference genome, Cufflinks to reconstruct the transcriptome, and RSEM to quantify expression levels of genes and transcripts. We present the workflow using human transcriptome sequencing data from two biological replicates of the K562 cell line produced as part of the ENCODE3 project. PMID:27662878

  16. Detailed Analysis of a Multiplet Earthquake Sequence

    NASA Astrophysics Data System (ADS)

    Iglesias, A.; Singh, S. K.; Garduño, V. H.

    2014-12-01

    The Mexican National Seismological Service reported a sequence of four small earthquakes (2.5 < M < 3.0) occurring in Morelia, a city of 1,000,000, which is the capital city of Michoacán State. A careful revision of the records from a three-component broad band station, located ~10 km far from the earthquakes, showed a sequence of 7 earthquakes in a period of about 36 hours. Waveforms are remarkably similar between them and they may be considered as a "multiplet". In this work, we use the records from the broad-band station and a coda wave interferometry based methodology to obtain the relative distance between pair of events. The 21 inter-event distances obtained are considered as over-determined system for the relative positions between events. A non-linear damped scheme is used to solve the over-determined system and to obtain the spatial distribution of the 7 earthquakes. Results show (1) distances between events are < 200 m, and (2) the sequence has an approximate linear distribution.

  17. ANALYSIS OF MPC ACCESS REQUIREMENTS FOR ADDITION OF FILLER MATERIALS

    SciTech Connect

    W. Wallin

    1996-09-03

    This analysis is prepared by the Mined Geologic Disposal System (MGDS) Waste Package Development Department (WPDD) in response to a request received via a QAP-3-12 Design Input Data Request (Ref. 5.1) from WAST Design (formerly MRSMPC Design). The request is to provide: Specific MPC access requirements for the addition of filler materials at the MGDS (i.e., location and size of access required). The objective of this analysis is to provide a response to the foregoing request. The purpose of this analysis is to provide a documented record of the basis for the response. The response is stated in Section 8 herein. The response is based upon requirements from an MGDS perspective.

  18. Accident sequence analysis for sites producing and storing explosives.

    PubMed

    Papazoglou, Ioannis A; Aneziris, Olga; Konstandinidou, Myrto; Giakoumatos, Ieronymos

    2009-11-01

    This paper presents a QRA-based approach for assessing and evaluating the safety of installations handling explosive substances. Comprehensive generic lists of immediate causes and initiating events of detonation and deflagration of explosive substances as well as safety measures preventing these explosions are developed. Initiating events and corresponding measures are grouped under the more general categories of explosion due to shock wave, explosion due to mechanical energy, thermal energy, electrical energy, chemical energy, and electromagnetic radiation. Generic accident sequences are developed using Event Trees. This analysis is adapted to plant-specific conditions and potentially additional protective measures are rank-ordered in terms of the induced reduction in the frequency of explosion, by including also uncertainty. This approach has been applied to 14 plants in Greece with very satisfactory results. PMID:19819362

  19. Accident sequence analysis for sites producing and storing explosives.

    PubMed

    Papazoglou, Ioannis A; Aneziris, Olga; Konstandinidou, Myrto; Giakoumatos, Ieronymos

    2009-11-01

    This paper presents a QRA-based approach for assessing and evaluating the safety of installations handling explosive substances. Comprehensive generic lists of immediate causes and initiating events of detonation and deflagration of explosive substances as well as safety measures preventing these explosions are developed. Initiating events and corresponding measures are grouped under the more general categories of explosion due to shock wave, explosion due to mechanical energy, thermal energy, electrical energy, chemical energy, and electromagnetic radiation. Generic accident sequences are developed using Event Trees. This analysis is adapted to plant-specific conditions and potentially additional protective measures are rank-ordered in terms of the induced reduction in the frequency of explosion, by including also uncertainty. This approach has been applied to 14 plants in Greece with very satisfactory results.

  20. Synthesis of a Fluorescent Acridone Using a Grignard Addition, Oxidation, and Nucleophilic Aromatic Substitution Reaction Sequence

    ERIC Educational Resources Information Center

    Goodrich, Samuel; Patel, Miloni; Woydziak, Zachary R.

    2015-01-01

    A three-pot synthesis oriented for an undergraduate organic chemistry laboratory was developed to construct a fluorescent acridone molecule. This laboratory experiment utilizes Grignard addition to an aldehyde, alcohol oxidation, and iterative nucleophilic aromatic substitution steps to produce the final product. Each of the intermediates and the…

  1. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis

    PubMed Central

    Santana-Quintero, Luis; Dingerdissen, Hayley; Thierry-Mieg, Jean; Mazumder, Raja; Simonyan, Vahan

    2014-01-01

    Due to the size of Next-Generation Sequencing data, the computational challenge of sequence alignment has been vast. Inexact alignments can take up to 90% of total CPU time in bioinformatics pipelines. High-performance Integrated Virtual Environment (HIVE), a cloud-based environment optimized for storage and analysis of extra-large data, presents an algorithmic solution: the HIVE-hexagon DNA sequence aligner. HIVE-hexagon implements novel approaches to exploit both characteristics of sequence space and CPU, RAM and Input/Output (I/O) architecture to quickly compute accurate alignments. Key components of HIVE-hexagon include non-redundification and sorting of sequences; floating diagonals of linearized dynamic programming matrices; and consideration of cross-similarity to minimize computations. Availability https://hive.biochemistry.gwu.edu/hive/ PMID:24918764

  2. Error analysis of deep sequencing of phage libraries: peptides censored in sequencing.

    PubMed

    Matochko, Wadim L; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (Sa). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq). Sequencing without any bias and errors is Seq = Sa IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071

  3. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

  4. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence. PMID:11237011

  5. MESSA: MEta-Server for protein Sequence Analysis

    PubMed Central

    2012-01-01

    Background Computational sequence analysis, that is, prediction of local sequence properties, homologs, spatial structure and function from the sequence of a protein, offers an efficient way to obtain needed information about proteins under study. Since reliable prediction is usually based on the consensus of many computer programs, meta-severs have been developed to fit such needs. Most meta-servers focus on one aspect of sequence analysis, while others incorporate more information, such as PredictProtein for local sequence feature predictions, SMART for domain architecture and sequence motif annotation, and GeneSilico for secondary and spatial structure prediction. However, as predictions of local sequence properties, three-dimensional structure and function are usually intertwined, it is beneficial to address them together. Results We developed a MEta-Server for protein Sequence Analysis (MESSA) to facilitate comprehensive protein sequence analysis and gather structural and functional predictions for a protein of interest. For an input sequence, the server exploits a number of select tools to predict local sequence properties, such as secondary structure, structurally disordered regions, coiled coils, signal peptides and transmembrane helices; detect homologous proteins and assign the query to a protein family; identify three-dimensional structure templates and generate structure models; and provide predictive statements about the protein's function, including functional annotations, Gene Ontology terms, enzyme classification and possible functionally associated proteins. We tested MESSA on the proteome of Candidatus Liberibacter asiaticus. Manual curation shows that three-dimensional structure models generated by MESSA covered around 75% of all the residues in this proteome and the function of 80% of all proteins could be predicted. Availability MESSA is free for non-commercial use at http://prodata.swmed.edu/MESSA/ PMID:23031578

  6. Comparative DNA Sequence Analysis of Wheat and Rice Genomes

    PubMed Central

    Sorrells, Mark E.; La Rota, Mauricio; Bermudez-Kandianis, Catherine E.; Greene, Robert A.; Kantety, Ramesh; Munkvold, Jesse D.; Miftahudin; Mahmoud, Ahmed; Ma, Xuefeng; Gustafson, Perry J.; Qi, Lili L.; Echalier, Benjamin; Gill, Bikram S.; Matthews, David E.; Lazo, Gerard R.; Chao, Shiaoman; Anderson, Olin D.; Edwards, Hugh; Linkiewicz, Anna M.; Dubcovsky, Jorge; Akhunov, Eduard D.; Dvorak, Jan; Zhang, Deshui; Nguyen, Henry T.; Peng, Junhua; Lapitan, Nora L.V.; Gonzalez-Hernandez, Jose L.; Anderson, James A.; Hossain, Khwaja; Kalavacharla, Venu; Kianian, Shahryar F.; Choi, Dong-Woog; Close, Timothy J.; Dilbirligi, Muharrem; Gill, Kulvinder S.; Steber, Camille; Walker-Simmons, Mary K.; McGuire, Patrick E.; Qualset, Calvin O.

    2003-01-01

    The use of DNA sequence-based comparative genomics for evolutionary studies and for transferring information from model species to crop species has revolutionized molecular genetics and crop improvement strategies. This study compared 4485 expressed sequence tags (ESTs) that were physically mapped in wheat chromosome bins, to the public rice genome sequence data from 2251 ordered BAC/PAC clones using BLAST. A rice genome view of homologous wheat genome locations based on comparative sequence analysis revealed numerous chromosomal rearrangements that will significantly complicate the use of rice as a model for cross-species transfer of information in nonconserved regions. PMID:12902377

  7. Designing novel kinases using evolutionary sequence analysis

    NASA Astrophysics Data System (ADS)

    Mody, Areez; Weiner, Joan; Iyer, Lakshman; Ramanathan, Sharad

    2006-03-01

    Cellular pathways with new functions are thought to arise from the duplication and divergence of proteins in existing pathways. The MAP kinase pathways in eukaryotes provide one example of this. These pathways consist of the MAP kinase proteins which are responsible for evoking the correct response to external stimuli. In the yeast Saccharomyces cerevisiae these pathways detect pheromones, osmolar stresses and nutrient levels, leading the cell into dramatic changes of morphology. Despite being homologous to each other, the MAP kinase proteins show specificity of function. We investigate the nature of the amino acid sequences conferring this specificity. To this end, we i) search the sequences of similar proteins in other Eukaryote species, ii) make a study of simple theoretical models exploring the constraints felt by these protein segments and iii) experimentally construct, a large suite of hybrid proteins made of segments taken from the homologous proteins. These are then expressed in Yeast cells to see what function they are able to perform. Particularly we also ask whether it is possible to design a new kinase protein possessing new function and specificity.

  8. Sequencing and annotated analysis of an Estonian human genome.

    PubMed

    Lilleoja, Rutt; Sarapik, Aili; Reimann, Ene; Reemann, Paula; Jaakma, Ülle; Vasar, Eero; Kõks, Sulev

    2012-02-01

    In present study we describe the sequencing and annotated analysis of the individual genome of Estonian. Using SOLID technology we generated 2,449,441,916 of 50-bp reads. The Bioscope version 1.3 was used for mapping and pairing of reads to the NCBI human genome reference (build 36, hg18). Bioscope enables also the annotation of the results of variant (tertiary) analysis. The average mapping of reads was 75.5% with total coverage of 107.72 Gb. resulting in mean fold coverage of 34.6. We found 3,482,975 SNPs out of which 352,492 were novel. 21,222 SNPs were in coding region: 10,649 were synonymous SNPs, 10,360 were nonsynonymous missense SNPs, 155 were nonsynonymous nonsense SNPs and 58 were nonsynonymous frameshifts. We identified 219 CNVs with total base pair coverage of 37,326,300 bp and 87,451 large insertion/deletion polymorphisms covering 10,152,256 bp of the genome. In addition, we found 285,864 small size insertion/deletion polymorphisms out of which 133,969 were novel. Finally, we identified 53 inversions, 19 overlapped genes and 2 overlapped exons. Interestingly, we found the region in chromosome 6 to be enriched with the coding SNPs and CNVs. This study confirms previous findings, that our genomes are more complex and variable as thought before. Therefore, sequencing of the personal genomes followed by annotation would improve the analysis of heritability of phenotypes and our understandings on the functions of genome.

  9. Effect of solvent addition sequence on lycopene extraction efficiency from membrane neutralized caustic peeled tomato waste.

    PubMed

    Phinney, David M; Frelka, John C; Cooperstone, Jessica L; Schwartz, Steven J; Heldman, Dennis R

    2017-01-15

    Lycopene is a high value nutraceutical and its isolation from waste streams is often desirable to maximize profits. This research investigated solvent addition order and composition on lycopene extraction efficiency from a commercial tomato waste stream (pH 12.5, solids ∼5%) that was neutralized using membrane filtration. Constant volume dilution (CVD) was used to desalinate the caustic salt to neutralize the waste. Acetone, ethanol and hexane were used as direct or blended additions. Extraction efficiency was defined as the amount of lycopene extracted divided by the total lycopene in the sample. The CVD operation reduced the active alkali of the waste from 0.66 to <0.01M and the moisture content of the pulp increased from 93% to 97% (wet basis), showing the removal of caustic salts from the waste. Extraction efficiency varied from 32.5% to 94.5%. This study demonstrates a lab scale feasibility to extract lycopene efficiently from tomato processing byproducts. PMID:27542486

  10. Tandem sequence of phenol oxidation and intramolecular addition as a method in building heterocycles.

    PubMed

    Ratnikov, Maxim O; Farkas, Linda E; Doyle, Michael P

    2012-11-16

    A tandem phenol oxidation-Michael addition furnishing oxo- and -aza-heterocycles has been developed. Dirhodium caprolactamate [Rh(2)(cap)(4)] catalyzed oxidation by T-HYDRO of phenols with alcohols, ketones, amides, carboxylic acids, and N-Boc protected amines tethered to their 4-position afforded 4-(tert-butylperoxy)cyclohexa-2,5-dienones that undergo Brønsted acid catalyzed intramolecular Michael addition in one-pot to produce oxo- and -aza-heterocycles in moderate to good yields. The scope of the developed methodology includes dipeptides Boc-Tyr-Gly-OEt and Boc-Tyr-Phe-Me and provides a pathway for understanding the possible transformations arising from oxidative stress of tyrosine residues. A novel method of selective cleavage of O-O bond in hindered internal peroxide using TiCl(4) has been discovered in efforts directed to the construction of cleroindicin F, whose synthesis was completed in 50% yield over just 3 steps from tyrosol using the developed methodology.

  11. [Automatic analysis pipeline of next-generation sequencing data].

    PubMed

    Wenke, Li; Fengyu, Li; Siyao, Zhang; Bin, Cai; Na, Zheng; Yu, Nie; Dao, Zhou; Qian, Zhao

    2014-06-01

    The development of next-generation sequencing has generated high demand for data processing and analysis. Although there are a lot of software for analyzing next-generation sequencing data, most of them are designed for one specific function (e.g., alignment, variant calling or annotation). Therefore, it is necessary to combine them together for data analysis and to generate interpretable results for biologists. This study designed a pipeline to process Illumina sequencing data based on Perl programming language and SGE system. The pipeline takes original sequence data (fastq format) as input, calls the standard data processing software (e.g., BWA, Samtools, GATK, and Annovar), and finally outputs a list of annotated variants that researchers can further analyze. The pipeline simplifies the manual operation and improves the efficiency by automatization and parallel computation. Users can easily run the pipeline by editing the configuration file or clicking the graphical interface. Our work will facilitate the research projects using the sequencing technology.

  12. SeqCalc: A portable bioinformatics software for sequence analysis

    PubMed Central

    Vignesh, Dhandapani; Parameswari, Paul; Jin, Kim Hae; Pyo, Lim Yong

    2010-01-01

    Rapid genome sequencing enriched biological databases with enormous sequence data. Yet it remains a daunting task to unravel this information. However experimental and computational researchers lead their own way in analyzing sequence information. Here we introduce a standalone portable tool named “SeqCalc” that would assist the research personnel in computational sequence analysis and automated experimental calculations. Although several tools are available online for sequence analysis they serve only for one or two purposes. SeqCalc is a package of offline program, developed using Perl and TCL/Tk scripts that serve ten different applications. This tool would be an initiative to both experimental and computational researchers in their routine research. SeqCalc is executable in all windows operating systems. Availability SeqCalc can be freely downloaded at http://code.google.com/p/seqcalc. PMID:21364786

  13. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  14. GENSTYLE: exploration and analysis of DNA sequences with genomic signature.

    PubMed

    Fertil, Bernard; Massin, Matthieu; Lespinats, Sylvain; Devic, Caroline; Dumee, Philippe; Giron, Alain

    2005-07-01

    GENSTYLE (http://Genstyle.imed.jussieu.fr) is a workspace designed for the characterization and classification of nucleotide sequences. Based on the genomic signature paradigm, GENSTYLE focuses on oligonucleotide frequencies in DNA sequences. Users can select sequences of interest in the GENSTYLE companion database, where the whole set of GenBank sequences is grouped per species, or upload their own sequences to work with. Tools for the exploration and analysis of signatures allow (i) identification of the origin of DNA segments (detection of rare species or species for which technical problems prevent fast characterization, such as micro-organisms with slow growth), (ii) analysis of the homogeneity of a genome and isolation of areas with novel functionality (horizontal transfers for example)--and (iii) molecular phylogeny and taxonomy.

  15. Lessons from next-generation sequencing analysis in hematological malignancies

    PubMed Central

    Braggio, E; Egan, J B; Fonseca, R; Stewart, A K

    2013-01-01

    Next-generation sequencing has led to a revolution in the study of hematological malignancies with a substantial number of publications and discoveries in the last few years. Significant discoveries associated with disease diagnosis, risk stratification, clonal evolution and therapeutic intervention have been generated by this powerful technology. As part of the post-genomic era, sequencing analysis will likely become part of routine clinical testing and the challenge will ultimately be successfully transitioning from gene discovery to preventive and therapeutic intervention as part of individualized medicine strategies. In this report, we review recent advances in the understanding of hematological malignancies derived through genome-wide sequence analysis. PMID:23872706

  16. Accident Sequence Evaluation Program: Human reliability analysis procedure

    SciTech Connect

    Swain, A.D.

    1987-02-01

    This document presents a shortened version of the procedure, models, and data for human reliability analysis (HRA) which are presented in the Handbook of Human Reliability Analysis With emphasis on Nuclear Power Plant Applications (NUREG/CR-1278, August 1983). This shortened version was prepared and tried out as part of the Accident Sequence Evaluation Program (ASEP) funded by the US Nuclear Regulatory Commission and managed by Sandia National Laboratories. The intent of this new HRA procedure, called the ''ASEP HRA Procedure,'' is to enable systems analysts, with minimal support from experts in human reliability analysis, to make estimates of human error probabilities and other human performance characteristics which are sufficiently accurate for many probabilistic risk assessments. The ASEP HRA Procedure consists of a Pre-Accident Screening HRA, a Pre-Accident Nominal HRA, a Post-Accident Screening HRA, and a Post-Accident Nominal HRA. The procedure in this document includes changes made after tryout and evaluation of the procedure in four nuclear power plants by four different systems analysts and related personnel, including human reliability specialists. The changes consist of some additional explanatory material (including examples), and more detailed definitions of some of the terms. 42 refs.

  17. Sequencing and Analysis of Neanderthal Genomic DNA

    SciTech Connect

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  18. Genomic-scale comparison of sequence- and structure-based methods of function prediction: Does structure provide additional insight?

    PubMed Central

    Fetrow, Jacquelyn S.; Siew, Naomi; Di Gennaro, Jeannine A.; Martinez-Yamout, Maria; Dyson, H. Jane; Skolnick, Jeffrey

    2001-01-01

    A function annotation method using the sequence-to-structure-to-function paradigm is applied to the identification of all disulfide oxidoreductases in the Saccharomyces cerevisiae genome. The method identifies 27 sequences as potential disulfide oxidoreductases. All previously known thioredoxins, glutaredoxins, and disulfide isomerases are correctly identified. Three of the 27 predictions are probable false-positives. Three novel predictions, which subsequently have been experimentally validated, are presented. Two additional novel predictions suggest a disulfide oxidoreductase regulatory mechanism for two subunits (OST3 and OST6) of the yeast oligosaccharyltransferase complex. Based on homology, this prediction can be extended to a potential tumor suppressor gene, N33, in humans, whose biochemical function was not previously known. Attempts to obtain a folded, active N33 construct to test the prediction were unsuccessful. The results show that structure prediction coupled with biochemically relevant structural motifs is a powerful method for the function annotation of genome sequences and can provide more detailed, robust predictions than function prediction methods that rely on sequence comparison alone. PMID:11316881

  19. Sequence and comparative genomic analysis of actin-related proteins.

    PubMed

    Muller, Jean; Oma, Yukako; Vallar, Laurent; Friederich, Evelyne; Poch, Olivier; Winsor, Barbara

    2005-12-01

    Actin-related proteins (ARPs) are key players in cytoskeleton activities and nuclear functions. Two complexes, ARP2/3 and ARP1/11, also known as dynactin, are implicated in actin dynamics and in microtubule-based trafficking, respectively. ARP4 to ARP9 are components of many chromatin-modulating complexes. Conventional actins and ARPs codefine a large family of homologous proteins, the actin superfamily, with a tertiary structure known as the actin fold. Because ARPs and actin share high sequence conservation, clear family definition requires distinct features to easily and systematically identify each subfamily. In this study we performed an in depth sequence and comparative genomic analysis of ARP subfamilies. A high-quality multiple alignment of approximately 700 complete protein sequences homologous to actin, including 148 ARP sequences, allowed us to extend the ARP classification to new organisms. Sequence alignments revealed conserved residues, motifs, and inserted sequence signatures to define each ARP subfamily. These discriminative characteristics allowed us to develop ARPAnno (http://bips.u-strasbg.fr/ARPAnno), a new web server dedicated to the annotation of ARP sequences. Analyses of sequence conservation among actins and ARPs highlight part of the actin fold and suggest interactions between ARPs and actin-binding proteins. Finally, analysis of ARP distribution across eukaryotic phyla emphasizes the central importance of nuclear ARPs, particularly the multifunctional ARP4.

  20. Molecular Evolution of Multi-subunit RNA Polymerases: Sequence Analysis

    PubMed Central

    Lane, William J.; Darst, Seth A.

    2009-01-01

    Transcription in all cellular organisms is performed by multi-subunit, DNA-dependent RNA polymerases that synthesize RNA from DNA templates. Previous sequence and structural studies have elucidated the importance of shared regions common to all multi-subunit RNA polymerases. In addition RNA polymerases contain multiple lineage-specific domain insertions involved in protein-protein and protein-nucleic acid interactions. We have created comprehensive multiple sequence alignments using all available sequence data for the multi-subunit RNA polymerase large subunits, including the bacterial β and β′ subunits and their homologues from archaebacterial RNA polymerases, the eukaryotic RNA polymerases I, II, and III, the nuclear-cytoplasmic large double-stranded DNA Virus RNA polymerases, and plant plastid RNA polymerases. In order to overcome technical difficulties inherent to the large subunit sequences, including large sequence length, small and large lineage-specific insertions, split subunits, and fused proteins, we created an automated and customizable sequence retrieval and processing system. In addition, we used our alignments to create a more expansive set of shared sequence regions and bacterial lineage-specific domain insertions. We also analyzed the intergenic gap between the bacterial β and β′ genes. PMID:19895820

  1. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  2. Spectroscopic analysis and DFT calculations of a food additive Carmoisine

    NASA Astrophysics Data System (ADS)

    Snehalatha, M.; Ravikumar, C.; Hubert Joe, I.; Sekar, N.; Jayakumar, V. S.

    2009-04-01

    FT-IR and Raman techniques were employed for the vibrational characterization of the food additive Carmoisine (E122). The equilibrium geometry, various bonding features, and harmonic vibrational wavenumbers have been investigated with the help of density functional theory (DFT) calculations. A good correlation was found between the computed and experimental wavenumbers. Azo stretching wavenumbers have been lowered due to conjugation and π-electron delocalization. Predicted electronic absorption spectra from TD-DFT calculation have been analysed comparing with the UV-vis spectrum. The first hyperpolarizability of the molecule is calculated. Intramolecular charge transfer (ICT) responsible for the optical nonlinearity of the dye molecule has been discussed theoretically and experimentally. Stability of the molecule arising from hyperconjugative interactions, charge delocalization and C-H⋯O, improper, blue shifted hydrogen bonds have been analysed using natural bond orbital (NBO) analysis.

  3. [Analysis of constituents in urushi wax, a natural food additive].

    PubMed

    Jin, Zhe-Long; Tada, Atsuko; Sugimoto, Naoki; Sato, Kyoko; Masuda, Aino; Yamagata, Kazuo; Yamazaki, Takeshi; Tanamoto, Kenichi

    2006-08-01

    Urushi wax is a natural gum base used as a food additive. In order to evaluate the quality of urushi wax as a food additive and to obtain information useful for setting official standards, we investigated the constituents and their concentrations in urushi wax, using the same sample as scheduled for toxicity testing. After methanolysis of urushi wax, the composition of fatty acids was analyzed by GC/MS. The results indicated that the main fatty acids were palmitic acid, oleic acid and stearic acid. LC/MS analysis of urushi wax provided molecular-related ions of the main constituents. The main constituents were identified as triglycerides, namely glyceryl tripalmitate (30.7%), glyceryl dipalmitate monooleate (21.2%), glyceryl dioleate monopalmitate (2.1%), glyceryl monooleate monopalmitate monostearate (2.6%), glyceryl dipalmitate monostearate (5.6%), glyceryl distearate monopalmitate (1.4%). Glyceryl dipalmitate monooleate isomers differing in the binding sites of each constituent fatty acid could be separately determined by LC/MS/MS. PMID:16984037

  4. Decreasing Cloudiness Over China: An Updated Analysis Examining Additional Variables

    SciTech Connect

    Kaiser, D.P.

    2000-01-14

    As preparation of the IPCC's Third Assessment Report takes place, one of the many observed climate variables of key interest is cloud amount. For several nations of the world, there exist records of surface-observed cloud amount dating back to the middle of the 20th Century or earlier, offering valuable information on variations and trends. Studies using such databases include Sun and Groisman (1999) and Kaiser and Razuvaev (1995) for the former Soviet Union, Angel1 et al. (1984) for the United States, Henderson-Sellers (1986) for Europe, Jones and Henderson-Sellers (1992) for Australia, and Kaiser (1998) for China. The findings of Kaiser (1998) differ from the other studies in that much of China appears to have experienced decreased cloudiness over recent decades (1954-1994), whereas the other land regions for the most part show evidence of increasing cloud cover. This paper expands on Kaiser (1998) by analyzing trends in additional meteorological variables for Chi na [station pressure (p), water vapor pressure (e), and relative humidity (rh)] and extending the total cloud amount (N) analysis an additional two years (through 1996).

  5. Applying machine learning techniques to DNA sequence analysis

    SciTech Connect

    Shavlik, J.W.

    1992-01-01

    We are developing a machine learning system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being learned. Using this information (which we call a domain theory''), our learning algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, the KBANN algorithm maps inference rules, such as consensus sequences, into a neural (connectionist) network. Neural network training techniques then use the training examples of refine these inference rules. We have been applying this approach to several problems in DNA sequence analysis and have also been extending the capabilities of our learning system along several dimensions.

  6. Data on meq gene sequence analysis of Ludhiana MDV isolates.

    PubMed

    Gupta, Mridula; Deka, Dipak; Ramneek

    2016-12-01

    The data described are related to the article entitled "Sequence Analysis of Meq oncogene among Indian isolates of Marek׳s Disease Herpesvirus" M. Gupta, D. Deka, Ramneek, 2016. Seven meq genes of Ludhiana Marek׳s disease virus (MDV) field isolates were PCR amplified by using proof reading Platinum Pfx DNA polymerase enzyme, sequenced and then analyzed for the distinct polymorphisms and point mutations. The sequences were named as LDH 1758, LDH 2003, LDH 2483, LDH 2614, LDH 2700, LDH 2929 and LDH 3262. At this point, their deduced Meq amino acid sequences were compared with GenBank available already sequenced meq genes worldwide in their deduced amino acid form to study their identity/similarity with each other. PMID:27656677

  7. Identification of Medically Important Yeast Species by Sequence Analysis of the Internal Transcribed Spacer Regions

    PubMed Central

    Leaw, Shiang Ning; Chang, Hsien Chang; Sun, Hsiao Fang; Barton, Richard; Bouchara, Jean-Philippe; Chang, Tsung Chain

    2006-01-01

    Infections caused by yeasts have increased in previous decades due primarily to the increasing population of immunocompromised patients. In addition, infections caused by less common species such as Pichia, Rhodotorula, Trichosporon, and Saccharomyces spp. have been widely reported. This study extensively evaluated the feasibility of sequence analysis of the rRNA gene internal transcribed spacer (ITS) regions for the identification of yeasts of clinical relevance. Both the ITS1 and ITS2 regions of 373 strains (86 species), including 299 reference strains and 74 clinical isolates, were amplified by PCR and sequenced. The sequences were compared to reference data available at the GenBank database by using BLAST (basic local alignment search tool) to determine if species identification was possible by ITS sequencing. Since the GenBank database currently lacks ITS sequence entries for some yeasts, the ITS sequences of type (or reference) strains of 15 species were submitted to GenBank to facilitate identification of these species. Strains producing discrepant identifications between the conventional methods and ITS sequence analysis were further analyzed by sequencing of the D1-D2 domain of the large-subunit rRNA gene for species clarification. The rates of correct identification by ITS1 and ITS2 sequence analysis were 96.8% (361/373) and 99.7% (372/373), respectively. Of the 373 strains tested, only 1 strain (Rhodotorula glutinis BCRC 20576) could not be identified by ITS2 sequence analysis. In conclusion, identification of medically important yeasts by ITS sequencing, especially using the ITS2 region, is reliable and can be used as an accurate alternative to conventional identification methods. PMID:16517841

  8. Transcriptome Sequencing and Positive Selected Genes Analysis of Bombyx mandarina

    PubMed Central

    Wu, Yuqian; Long, Renwen; Liu, Chun; Xia, Qingyou

    2015-01-01

    The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG) and posterior silk gland (PSG). Three sericin genes (sericin 1, sericin 2, and sericin 3) were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25) were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs) and 361 insertion-deletions (INDELs) were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research. PMID:25806526

  9. Sensitivity analysis of geometric errors in additive manufacturing medical models.

    PubMed

    Pinto, Jose Miguel; Arrieta, Cristobal; Andia, Marcelo E; Uribe, Sergio; Ramos-Grez, Jorge; Vargas, Alex; Irarrazaval, Pablo; Tejos, Cristian

    2015-03-01

    Additive manufacturing (AM) models are used in medical applications for surgical planning, prosthesis design and teaching. For these applications, the accuracy of the AM models is essential. Unfortunately, this accuracy is compromised due to errors introduced by each of the building steps: image acquisition, segmentation, triangulation, printing and infiltration. However, the contribution of each step to the final error remains unclear. We performed a sensitivity analysis comparing errors obtained from a reference with those obtained modifying parameters of each building step. Our analysis considered global indexes to evaluate the overall error, and local indexes to show how this error is distributed along the surface of the AM models. Our results show that the standard building process tends to overestimate the AM models, i.e. models are larger than the original structures. They also show that the triangulation resolution and the segmentation threshold are critical factors, and that the errors are concentrated at regions with high curvatures. Errors could be reduced choosing better triangulation and printing resolutions, but there is an important need for modifying some of the standard building processes, particularly the segmentation algorithms.

  10. Comprehensive Primer Design for Analysis of Population Genetics in Non-Sequenced Organisms

    PubMed Central

    Tezuka, Ayumi; Matsushima, Noe; Nemoto, Yoriko; Akashi, Hiroshi D.; Kawata, Masakado; Makino, Takashi

    2012-01-01

    Nuclear sequence markers are useful tool for the study of the history of populations and adaptation. However, it is not easy to obtain multiple nuclear primers for organisms with poor or no genomic sequence information. Here we used the genomes of organisms that have been fully sequenced to design comprehensive sets of primers to amplify polymorphic genomic fragments of multiple nuclear genes in non-sequenced organisms. First, we identified a large number of candidate polymorphic regions that were flanked on each side by conserved regions in the reference genomes. We then designed primers based on these conserved sequences and examined whether the primers could be used to amplify sequences in target species, montane brown frog (Rana ornativentris), anole lizard (Anolis sagrei), guppy (Poecilia reticulata), and fruit fly (Drosophila melanogaster), for population genetic analysis. We successfully obtained polymorphic markers for all target species studied. In addition, we found that sequence identities of the regions between the primer sites in the reference genomes affected the experimental success of DNA amplification and identification of polymorphic loci in the target genomes, and that exonic primers had a higher success rate than intronic primers in amplifying readable sequences. We conclude that this comparative genomic approach is a time- and cost-effective way to obtain polymorphic markers for non-sequenced organisms, and that it will contribute to the further development of evolutionary ecology and population genetics for non-sequenced organisms, aiding in the understanding of the genetic basis of adaptation. PMID:22393396

  11. Deep Sequencing Analysis of Nucleolar Small RNAs: Bioinformatics.

    PubMed

    Bai, Baoyan; Laiho, Marikki

    2016-01-01

    Small RNAs (size 20-30 nt) of various types have been actively investigated in recent years, and their subcellular compartmentalization and relative concentrations are likely to be of importance to their cellular and physiological functions. Comprehensive data on this subset of the transcriptome can only be obtained by application of high-throughput sequencing, which yields data that are inherently complex and multidimensional, as sequence composition, length, and abundance will all inform to the small RNA function. Subsequent data analysis, hypothesis testing, and presentation/visualization of the results are correspondingly challenging. We have constructed small RNA libraries derived from different cellular compartments, including the nucleolus, and asked whether small RNAs exist in the nucleolus and whether they are distinct from cytoplasmic and nuclear small RNAs, the miRNAs. Here, we present a workflow for analysis of small RNA sequencing data generated by the Ion Torrent PGM sequencer from samples derived from different cellular compartments. PMID:27576724

  12. Comprehensive analysis of sequences of a protein switch.

    PubMed

    Chen, Szu-Hua; Meller, Jaroslaw; Elber, Ron

    2016-01-01

    Switches form a special class of proteins that dramatically change their three-dimensional structures upon a small perturbation. One possible perturbation that we explore is that of a single point mutation. Building on the pioneering experimental work of Alexander et al. (Alexander et al. PNAS, 2007; 104,11963-11968) that determines switch sequences between α and α+β folds we conduct a comprehensive sequence sampling by a Markov Chain with multiple fitness criteria to identify new switches given the experimental folds. We screen for switch sequences using a combination of contact potential, secondary structure prediction, and finally molecular dynamics simulations. Statistical properties of switch sequences are discussed and illustrated to be most sensitive to mutation at the N- and C- termini of the switch protein. Based on this analysis, a particularly stable putative switch pair is identified and proposed for further experimental analysis. PMID:26073558

  13. Food Fish Identification from DNA Extraction through Sequence Analysis

    ERIC Educational Resources Information Center

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  14. Basic Sequence Analysis Techniques for Use with Audit Trail Data

    ERIC Educational Resources Information Center

    Judd, Terry; Kennedy, Gregor

    2008-01-01

    Audit trail analysis can provide valuable insights to researchers and evaluators interested in comparing and contrasting designers' expectations of use and students' actual patterns of use of educational technology environments (ETEs). Sequence analysis techniques are particularly effective but have been neglected to some extent because of real…

  15. A new natural hGH variant--17.5 kd--produced by alternative splicing. An additional consensus sequence which might play a role in branchpoint selection.

    PubMed Central

    Lecomte, C M; Renard, A; Martial, J A

    1987-01-01

    From a human pituitary cDNA library, we have cloned 3 distinct human growth hormone (hGH) cDNAs, coding respectively for the 22 K hGH, the 20 K variant, and a yet unknown 17.5 K variant. S1 mapping analysis using human pituitary RNA confirms the existence of at least four distinct hGH mRNAs originating from alternative acceptor sites at the second intron of the primary transcript. We have analysed the hGH gene sequence to explain the high frequency of alternative splicings which occur only at this location. In this study we propose CTTGNNPyPyPy as an additional consensus sequence guiding the selection of the branched nucleotide. Images PMID:3627992

  16. Streamlined analysis of duplex sequencing data with Du Novo.

    PubMed

    Stoler, Nicholas; Arbeithuber, Barbara; Guiblet, Wilfried; Makova, Kateryna D; Nekrutenko, Anton

    2016-01-01

    Duplex sequencing was originally developed to detect rare nucleotide polymorphisms normally obscured by the noise of high-throughput sequencing. Here we describe a new, streamlined, reference-free approach for the analysis of duplex sequencing data. We show the approach performs well on simulated data and precisely reproduces previously published results and apply it to a newly produced dataset, enabling us to type low-frequency variants in human mitochondrial DNA. Finally, we provide all necessary tools as stand-alone components as well as integrate them into the Galaxy platform. All analyses performed in this manuscript can be repeated exactly as described at http://usegalaxy.org/duplex . PMID:27566673

  17. Statistical design and analysis of RNA sequencing data.

    PubMed

    Auer, Paul L; Doerge, R W

    2010-06-01

    Next-generation sequencing technologies are quickly becoming the preferred approach for characterizing and quantifying entire genomes. Even though data produced from these technologies are proving to be the most informative of any thus far, very little attention has been paid to fundamental design aspects of data collection and analysis, namely sampling, randomization, replication, and blocking. We discuss these concepts in an RNA sequencing framework. Using simulations we demonstrate the benefits of collecting replicated RNA sequencing data according to well known statistical designs that partition the sources of biological and technical variation. Examples of these designs and their corresponding models are presented with the goal of testing differential expression.

  18. A comparative analysis of multiple sequence alignments for biological data.

    PubMed

    Manzoor, Umar; Shahid, Sarosh; Zafar, Bassam

    2015-01-01

    Multiple sequence alignment plays a key role in the computational analysis of biological data. Different programs are developed to analyze the sequence similarity. This paper highlights the algorithmic techniques of the most popular multiple sequence alignment programs. These programs are then evaluated on the basis of execution time and scalability. The overall performance of these programs is assessed to highlight their strengths and weaknesses with reference to their algorithmic techniques. In terms of overall alignment quality, T-Coffee and Mafft attain the highest average scores, whereas K-align has the minimum computation time. PMID:26405947

  19. Streamlined analysis of duplex sequencing data with Du Novo.

    PubMed

    Stoler, Nicholas; Arbeithuber, Barbara; Guiblet, Wilfried; Makova, Kateryna D; Nekrutenko, Anton

    2016-08-26

    Duplex sequencing was originally developed to detect rare nucleotide polymorphisms normally obscured by the noise of high-throughput sequencing. Here we describe a new, streamlined, reference-free approach for the analysis of duplex sequencing data. We show the approach performs well on simulated data and precisely reproduces previously published results and apply it to a newly produced dataset, enabling us to type low-frequency variants in human mitochondrial DNA. Finally, we provide all necessary tools as stand-alone components as well as integrate them into the Galaxy platform. All analyses performed in this manuscript can be repeated exactly as described at http://usegalaxy.org/duplex .

  20. Detection and removal of PCR duplicates in population genomic ddRAD studies by addition of a degenerate base region (DBR) in sequencing adapters.

    PubMed

    Schweyen, Hannah; Rozenberg, Andrey; Leese, Florian

    2014-10-01

    Restriction-site associated DNA sequencing (RAD) has emerged as a powerful marker system for studying genome-wide DNA polymorphisms using next-generation sequencing. A recent technical facilitation of RAD is double-digest RAD (ddRAD), which utilizes two restriction enzymes for library preparation. The more flexible and balanced ddRAD allows analysis of genomic loci in hundreds of individuals. However, in contrast to paired-end sequencing of traditional RAD libraries, PCR duplicates cannot be detected with ddRAD. This is a concern because duplicates can contribute substantially to read coverage data and erroneously inflate the proportion of homozygous loci (allele dropout). Allele dropout can bias population genetic parameter inference and complicate the detection of outlier loci under selection. Here we outline a simple and straightforward approach to detecting PCR duplicates from ddRAD libraries. Our approach introduces a degenerate base region (DBR, 12,288 unique combinations) in the sequencing adapter. We demonstrate the high efficiency and low rate of false positives in simulations. In addition, a pilot study was performed to test this approach on six aquatic invertebrates, sequenced on a HiSeq 2500 sequencer. The reads of the ddRAD libraries consisted of 33.48% PCR duplicates distributed on 19.40% of the loci. A disproportionate number of PCR duplicates were detected in only 4.66% of the loci. While this should not be a concern for general parameter inference, outlier loci detection in particular would be improved by the DBR technique. Given the easy and straightforward application of the technique in other RAD protocols as well, we suggest that DBR regions should generally be included in PCR-based RAD studies.

  1. Bioinformatics analysis of circulating cell-free DNA sequencing data.

    PubMed

    Chan, Landon L; Jiang, Peiyong

    2015-10-01

    The discovery of cell-free DNA molecules in plasma has opened up numerous opportunities in noninvasive diagnosis. Cell-free DNA molecules have become increasingly recognized as promising biomarkers for detection and management of many diseases. The advent of next generation sequencing has provided unprecedented opportunities to scrutinize the characteristics of cell-free DNA molecules in plasma in a genome-wide fashion and at single-base resolution. Consequently, clinical applications of circulating cell-free DNA analysis have not only revolutionized noninvasive prenatal diagnosis but also facilitated cancer detection and monitoring toward an era of blood-based personalized medicine. With the remarkably increasing throughput and lowering cost of next generation sequencing, bioinformatics analysis becomes increasingly demanding to understand the large amount of data generated by these sequencing platforms. In this Review, we highlight the major bioinformatics algorithms involved in the analysis of cell-free DNA sequencing data. Firstly, we briefly describe the biological properties of these molecules and provide an overview of the general bioinformatics approach for the analysis of cell-free DNA. Then, we discuss the specific upstream bioinformatics considerations concerning the analysis of sequencing data of circulating cell-free DNA, followed by further detailed elaboration on each key clinical situation in noninvasive prenatal diagnosis and cancer management where downstream bioinformatics analysis is heavily involved. We also discuss bioinformatics analysis as well as clinical applications of the newly developed massively parallel bisulfite sequencing of cell-free DNA. Finally, we offer our perspectives on the future development of bioinformatics in noninvasive diagnosis.

  2. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Athavale, Ajay [Monsanto

    2016-07-12

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  3. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Athavale, Ajay

    2012-06-01

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  4. Deep sequencing analysis of phage libraries using Illumina platform.

    PubMed

    Matochko, Wadim L; Chu, Kiki; Jin, Bingjie; Lee, Sam W; Whitesides, George M; Derda, Ratmir

    2012-09-01

    This paper presents an analysis of phage-displayed libraries of peptides using Illumina. We describe steps for the preparation of short DNA fragments for deep sequencing and MatLab software for the analysis of the results. Screening of peptide libraries displayed on the surface of bacteriophage (phage display) can be used to discover peptides that bind to any target. The key step in this discovery is the analysis of peptide sequences present in the library. This analysis is usually performed by Sanger sequencing, which is labor intensive and limited to examination of a few hundred phage clones. On the other hand, Illumina deep-sequencing technology can characterize over 10(7) reads in a single run. We applied Illumina sequencing to analyze phage libraries. Using PCR, we isolated the variable regions from M13KE phage vectors from a phage display library. The PCR primers contained (i) sequences flanking the variable region, (ii) barcodes, and (iii) variable 5'-terminal region. We used this approach to examine how diversity of peptides in phage display libraries changes as a result of amplification of libraries in bacteria. Using HiSeq single-end Illumina sequencing of these fragments, we acquired over 2×10(7) reads, 57 base pairs (bp) in length. Each read contained information about the barcode (6bp), one complimentary region (12bp) and a variable region (36bp). We applied this sequencing to a model library of 10(6) unique clones and observed that amplification enriches ∼150 clones, which dominate ∼20% of the library. Deep sequencing, for the first time, characterized the collapse of diversity in phage libraries. The results suggest that screens based on repeated amplification and small-scale sequencing identify a few binding clones and miss thousands of useful clones. The deep sequencing approach described here could identify under-represented clones in phage screens. It could also be instrumental in developing new screening strategies, which can preserve

  5. Analysis of Mitochondrial Control Region Using Sanger Sequencing.

    PubMed

    Ballard, David

    2016-01-01

    The analysis of mitochondrial DNA (mtDNA) is an established forensic tool and has been used extensively to aid with both the identification of human remains and evidence recovered from scenes of crime. The biology of mtDNA confers both advantages and disadvantages when using it as a tool for identification. It benefits from a high copy number, which facilitates analysis from samples with highly degraded DNA or trace amounts of DNA, but the maternal mode of inheritance restricts its power of discrimination. With Next Generation Sequencing being used in research and some forensic casework laboratories the scope of mtDNA analysis in forensic casework may expand in the near future. Currently, however, most casework laboratories rely on Sanger sequencing and an established method for analyzing the hypervariable sequence regions is described. PMID:27259738

  6. Complete genomic sequence analysis of norovirus isolated from South Korea.

    PubMed

    Lee, Gyu-Cheol; Jung, Gyoo Seung; Lee, Chan Hee

    2012-10-01

    The complete nucleotide and deduced amino acid sequences of the RNA genome of a recently isolated norovirus (NoV) from Korea, designated Hu/GII-4/CBNU2/2007/KR (CBNU2), were determined and characterized by phylogenetic comparison with several genetically diverse NoV sequences. The RNA genome of CBNU2 is 7,560 nucleotides in length, excluding the 3' poly (A) tract. It includes three open reading frames (ORFs): ORF1, which encodes the nonstructural polyprotein (5-5,104); ORF2, which encodes VP1 (5,085-6,707); and ORF3, which encodes VP2 (6,707-7,513). ORF2-based phylogenetic analysis revealed that CBNU2 belonged to the GII.4 genotype, the most prevalent genotype, and formed a cluster with NoVs isolated from Asian regions, between 2006 and 2008. Comparative analysis with the consensus sequence of 207 completely sequenced NoV genomes showed 47 mismatched nucleotides: 26 in ORF1, 14 in ORF2, and 7 in ORF3, resulting in 8 amino acid changes: 3 in ORF1, 2 in ORF2, and 3 in ORF3. Phylogenetic analysis with full genome ORF1, ORF2, and ORF3 nucleotide sequences obtained from CBNU2 and each of the other representative NoV genomes suggested that CBNU2 had not undergone recombination with any of the other NoVs. A SimPlot analysis further supported this finding.

  7. Mitochondrial genome sequence and gene order of Sipunculus nudus give additional support for an inclusion of Sipuncula into Annelida

    PubMed Central

    Mwinyi, Adina; Meyer, Achim; Bleidorn, Christoph; Lieb, Bernhard; Bartolomaeus, Thomas; Podsiadlowski, Lars

    2009-01-01

    Background Mitochondrial genomes are a valuable source of data for analysing phylogenetic relationships. Besides sequence information, mitochondrial gene order may add phylogenetically useful information, too. Sipuncula are unsegmented marine worms, traditionally placed in their own phylum. Recent molecular and morphological findings suggest a close affinity to the segmented Annelida. Results The first complete mitochondrial genome of a member of Sipuncula, Sipunculus nudus, is presented. All 37 genes characteristic for metazoan mtDNA were detected and are encoded on the same strand. The mitochondrial gene order (protein-coding and ribosomal RNA genes) resembles that of annelids, but shows several derivations so far found only in Sipuncula. Sequence based phylogenetic analysis of mitochondrial protein-coding genes results in significant bootstrap support for Annelida sensu lato, combining Annelida together with Sipuncula, Echiura, Pogonophora and Myzostomida. Conclusion The mitochondrial sequence data support a close relationship of Annelida and Sipuncula. Also the most parsimonious explanation of changes in gene order favours a derivation from the annelid gene order. These results complement findings from recent phylogenetic analyses of nuclear encoded genes as well as a report of a segmental neural patterning in Sipuncula. PMID:19149868

  8. Nonparametric survival analysis using Bayesian Additive Regression Trees (BART).

    PubMed

    Sparapani, Rodney A; Logan, Brent R; McCulloch, Robert E; Laud, Purushottam W

    2016-07-20

    Bayesian additive regression trees (BART) provide a framework for flexible nonparametric modeling of relationships of covariates to outcomes. Recently, BART models have been shown to provide excellent predictive performance, for both continuous and binary outcomes, and exceeding that of its competitors. Software is also readily available for such outcomes. In this article, we introduce modeling that extends the usefulness of BART in medical applications by addressing needs arising in survival analysis. Simulation studies of one-sample and two-sample scenarios, in comparison with long-standing traditional methods, establish face validity of the new approach. We then demonstrate the model's ability to accommodate data from complex regression models with a simulation study of a nonproportional hazards scenario with crossing survival functions and survival function estimation in a scenario where hazards are multiplicatively modified by a highly nonlinear function of the covariates. Using data from a recently published study of patients undergoing hematopoietic stem cell transplantation, we illustrate the use and some advantages of the proposed method in medical investigations. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26854022

  9. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers.

    PubMed

    Pabinger, Stephan; Ernst, Karina; Pulverer, Walter; Kallmeyer, Rainer; Valdes, Ana M; Metrustry, Sarah; Katic, Denis; Nuzzo, Angelo; Kriegner, Albert; Vierlinger, Klemens; Weinhaeusel, Andreas

    2016-01-01

    Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely

  10. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers

    PubMed Central

    Pabinger, Stephan; Ernst, Karina; Pulverer, Walter; Kallmeyer, Rainer; Valdes, Ana M.; Metrustry, Sarah; Katic, Denis; Nuzzo, Angelo; Kriegner, Albert; Vierlinger, Klemens; Weinhaeusel, Andreas

    2016-01-01

    Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely

  11. Solexa sequencing based transcriptome analysis of Helicoverpa armigera larvae.

    PubMed

    Li, Jigang; Li, Xiumin; Chen, Yongli; Yang, Zhongxiang; Guo, Sandui

    2012-12-01

    Helicoverpa armigera (Hübner) is a polyphagous Lepidoptera pest which causes great economic losses in crop production worldwide. In contrast to its agricultural importance, advances in the molecular aspects of this insect are quite limited. In the present study, Illumina's SOLEXA sequencing was adopted to determine the transcriptome of young H. armigera larvae. About 7 gigabases of raw sequence data was generated and assembled into 116,601 contigs with an average length of 389 base pairs after data preprocess. 37,352 of these contigs were annotated by searching against Uniref 100 of UniProt database. The annotated sequences were functionally classified into three groups including biological process (15,632 sequences), cellular component (9,562 sequences) and molecular function (19,258 sequences). KEGG (Kyoto Encyclopedia of Genes and Genomes) analysis showed that 1,409 contigs predicted to encode enzymes with enzyme commission numbers were mapped into 220 KEGG pathways in total. Finally, contigs with simple sequence repeats were derived from this dataset. PMID:23065207

  12. Sequence analysis of chromatin immunoprecipitation data for transcription factors

    PubMed Central

    Fraenkel, Ernest

    2013-01-01

    Chromatin immunoprecipitation (ChIP) experiments allow the location of transcription factors to be determined across the genome. Subsequent analysis of the sequences of the identified regions allows binding to be localized at a higher resolution than can be achieved by current high-throughput experiments without sequence analysis, and may provide important insight into the regulatory programs enacted by the protein of interest. In this chapter we review the tools, workflow, and common pitfalls of such analyses, and recommend strategies for effective motif discovery from these data. PMID:20827592

  13. The PACRAT system: an extensible WWW-based system for correlated sequence retrieval, storage and analysis.

    PubMed

    Ray, W C; Daniels, C J

    2001-01-01

    With PACRAT (Patterns, Analyses, Correlations. Remote Archive Testbed) we present an online database solution to the problem of accessing high-confidence sequences with specific relationships to classes of genes, such as upstream intergenic regions attached to tRNA genes. In addition the software contains a data warehousing and analysis-facilitating suite to streamline the process of analyzing the collected data. An unexpected additional benefit of the system is that it also provides easy access to sequences of lower confidence, and may be of assistance in such things as resolving ORF-call conflicts in genomic annotation projects.

  14. A convenient and adaptable microcomputer environment for DNA and protein sequence manipulation and analysis.

    PubMed Central

    Pustell, J; Kafatos, F C

    1986-01-01

    We describe the further development of a widely used package of DNA and protein sequence analysis programs for microcomputers (1,2,3). The package now provides a screen oriented user interface, and an enhanced working environment with powerful formatting, disk access, and memory management tools. The new GenBank floppy disk database is supported transparently to the user and a similar version of the NBRF protein database is provided. The programs can use sequence file annotation to automatically annotate printouts and translate or extract specified regions from sequences by name. The sequence comparison programs can now perform a 5000 X 5000 bp analysis in 12 minutes on an IBM PC. A program to locate potential protein coding regions in nucleic acids, a digitizer interface, and other additions are also described. PMID:3753784

  15. The sequence and analysis of a Chinese pig genome

    PubMed Central

    2012-01-01

    Background The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP), as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome. Results Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes. Conclusion Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models. PMID:23587058

  16. Analysis of expressed sequence tags from the Ulva prolifera (Chlorophyta)

    NASA Astrophysics Data System (ADS)

    Niu, Jianfeng; Hu, Haiyan; Hu, Songnian; Wang, Guangce; Peng, Guang; Sun, Song

    2010-01-01

    In 2008, a green tide broke out before the sailing competition of the 29th Olympic Games in Qingdao. The causative species was determined to be Enteromorpha prolifera ( Ulva prolifera O. F. Müller), a familiar green macroalga along the coastline of China. Rapid accumulation of a large biomass of floating U. prolifera prompted research on different aspects of this species. In this study, we constructed a nonnormalized cDNA library from the thalli of U. prolifera and acquired 10 072 high-quality expressed sequence tags (ESTs). These ESTs were assembled into 3 519 nonredundant gene groups, including 1 446 clusters and 2 073 singletons. After annotation with the nr database, a large number of genes were found to be related with chloroplast and ribosomal protein, GO functional classification showed 1 418 ESTs participated in photosynthesis and 1 359 ESTs were responsible for the generation of precursor metabolites and energy. In addition, rather comprehensive carbon fixation pathways were found in U. prolifera using KEGG. Some stress-related and signal transduction-related genes were also found in this study. All the evidences displayed that U. prolifera had substance and energy foundation for the intense photosynthesis and the rapid proliferation. Phylogenetic analysis of cytochrome c oxidase subunit I revealed that this green-tide causative species is most closely affiliated to Pseudendoclonium akinetum (Ulvophyceae).

  17. DNA sequence analysis of newly formed telomeres in yeast.

    PubMed

    Wang, S S; Pluta, A F; Zakian, V A

    1989-01-01

    A plasmid can be maintained in linear form in baker's yeast if it bears telomeric sequences at each end. Linear plasmids bearing cloned telomeric C4A4 repeats at one end (test end) and a natural DNA terminus with approximately 300 bps of C4A2 repeats at the other or control end were introduced by transformation into yeast. Test-end termini of 28 to 112 bps supported telomere formation. During telomere formation, C4A2 repeats were often transferred to test-end termini. To determine in greater detail the fate of test-end sequences on these plasmids after propagation in yeast, test-end telomeres were subcloned into E. coli and sequenced. DNA sequencing established a number of points about the molecular events involved in telomere formation in yeast. The results suggest that there are at least two mechanisms for telomere formation in yeast. One is mediated by a recombination event that requires neither a long stretch of homology nor the RAD52 gene product. The other mechanism is by addition of C1-3A repeats to the termini of linear DNA molecules. The telomeric sequence required to support C1-3A addition need not be at the very end of a molecule for telomere formation.

  18. WBSA: web service for bisulfite sequencing data analysis.

    PubMed

    Liang, Fang; Tang, Bixia; Wang, Yanqing; Wang, Jianfeng; Yu, Caixia; Chen, Xu; Zhu, Junwei; Yan, Jiangwei; Zhao, Wenming; Li, Rujiao

    2014-01-01

    Whole-Genome Bisulfite Sequencing (WGBS) and genome-wide Reduced Representation Bisulfite Sequencing (RRBS) are widely used to study DNA methylation. However, data analysis is complicated, lengthy, and hampered by a lack of seamless analytical pipelines. To address these issues, we developed a convenient, stable, and efficient web service called Web Service for Bisulfite Sequencing Data Analysis (WBSA) to analyze bisulfate sequencing data. WBSA focuses on not only CpG methylation, which is the most common biochemical modification in eukaryotic DNA, but also non-CG methylation, which have been observed in plants, iPS cells, oocytes, neurons and stem cells of human. WBSA comprises three main modules as follows: WGBS data analysis, RRBS data analysis, and differentially methylated region (DMR) identification. The WGBS and RRBS modules execute read mapping, methylation site identification, annotation, and advanced analysis, whereas the DMR module identifies actual DMRs and annotates their correlations to genes. WBSA can be accessed and used without charge either online or local version. WBSA also includes the executables of the Portable Batch System (PBS) and standalone versions that can be downloaded from the website together with the installation instructions. WBSA is available at no charge for academic users at http://wbsa.big.ac.cn.

  19. WBSA: Web Service for Bisulfite Sequencing Data Analysis

    PubMed Central

    Wang, Yanqing; Wang, Jianfeng; Yu, Caixia; Chen, Xu; Zhu, Junwei; Yan, Jiangwei; Zhao, Wenming; Li, Rujiao

    2014-01-01

    Whole-Genome Bisulfite Sequencing (WGBS) and genome-wide Reduced Representation Bisulfite Sequencing (RRBS) are widely used to study DNA methylation. However, data analysis is complicated, lengthy, and hampered by a lack of seamless analytical pipelines. To address these issues, we developed a convenient, stable, and efficient web service called Web Service for Bisulfite Sequencing Data Analysis (WBSA) to analyze bisulfate sequencing data. WBSA focuses on not only CpG methylation, which is the most common biochemical modification in eukaryotic DNA, but also non-CG methylation, which have been observed in plants, iPS cells, oocytes, neurons and stem cells of human. WBSA comprises three main modules as follows: WGBS data analysis, RRBS data analysis, and differentially methylated region (DMR) identification. The WGBS and RRBS modules execute read mapping, methylation site identification, annotation, and advanced analysis, whereas the DMR module identifies actual DMRs and annotates their correlations to genes. WBSA can be accessed and used without charge either online or local version. WBSA also includes the executables of the Portable Batch System (PBS) and standalone versions that can be downloaded from the website together with the installation instructions. WBSA is available at no charge for academic users at http://wbsa.big.ac.cn. PMID:24497972

  20. Complete VAX/VMS DNA/protein sequence analysis system

    SciTech Connect

    Smith, D.W.

    1987-05-01

    A complete yet flexible system of programs and database libraries for analysis of DNA, RNA and protein sequences is implemented for VAX/VMS computers. Types of analysis include 1) construction and analysis of chimeric sequences (cloning in the VAX), 2) multiple analysis of one or more single sequences, 3) search and comparison studies using sequence libraries, and 4) direct input and analysis of experimental data. Published groups of programs, including the Staden, Los Alamos, Zuker, Pearson, and PHYLIP programs, are used. GenBank and EMBL DNA libraries and PIR and Doolittle NEWAT protein libraries are available, with associated programs. The system is tutorial, with online documentation for relevent VAX software, the programs, and the databases. The complete documentation is flexibly maintained on reserve via computer printout placed in 3-ring binders. Command files are used extensively; porting of the entire system to another VAX/VMS system requires modification of a single command. Users of the system are members of a VAX group, with automatic implementation of the system upon login. The present system occupies about 140,000 blocks, and is easily expanded, or contracted, as desired. The UCSD system is used extensively for both teaching and research purposes. Use of microcomputers emulating Tektronix 4014 graphics terminals permits saving of graphics output to disk for subsequent modification to generate high quality publishable figures.

  1. Precessing rotating flows with additional shear: Stability analysis

    NASA Astrophysics Data System (ADS)

    Salhi, A.; Cambon, C.

    2009-03-01

    We consider unbounded precessing rotating flows in which vertical or horizontal shear is induced by the interaction between the solid-body rotation (with angular velocity Ω0 ) and the additional “precessing” Coriolis force (with angular velocity -ɛΩ0 ), normal to it. A “weak” shear flow, with rate 2ɛ of the same order of the Poincaré “small” ratio ɛ , is needed for balancing the gyroscopic torque, so that the whole flow satisfies Euler’s equations in the precessing frame (the so-called admissibility conditions). The base flow case with vertical shear (its cross-gradient direction is aligned with the main angular velocity) corresponds to Mahalov’s [Phys. Fluids A 5, 891 (1993)] precessing infinite cylinder base flow (ignoring boundary conditions), while the base flow case with horizontal shear (its cross-gradient direction is normal to both main and precessing angular velocities) corresponds to the unbounded precessing rotating shear flow considered by Kerswell [Geophys. Astrophys. Fluid Dyn. 72, 107 (1993)]. We show that both these base flows satisfy the admissibility conditions and can support disturbances in terms of advected Fourier modes. Because the admissibility conditions cannot select one case with respect to the other, a more physical derivation is sought: Both flows are deduced from Poincaré’s [Bull. Astron. 27, 321 (1910)] basic state of a precessing spheroidal container, in the limit of small ɛ . A Rapid distortion theory (RDT) type of stability analysis is then performed for the previously mentioned disturbances, for both base flows. The stability analysis of the Kerswell base flow, using Floquet’s theory, is recovered, and its counterpart for the Mahalov base flow is presented. Typical growth rates are found to be the same for both flows at very small ɛ , but significant differences are obtained regarding growth rates and widths of instability bands, if larger ɛ values, up to 0.2, are considered. Finally, both flow cases

  2. Initial sequence and comparative analysis of the cat genome

    PubMed Central

    Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.

    2007-01-01

    The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172

  3. Nematode.net update 2011: addition of data sets and tools featuring next-generation sequencing data.

    PubMed

    Martin, John; Abubucker, Sahar; Heizer, Esley; Taylor, Christina M; Mitreva, Makedonka

    2012-01-01

    Nematode.net (http://nematode.net) has been a publicly available resource for studying nematodes for over a decade. In the past 3 years, we reorganized Nematode.net to provide more user-friendly navigation through the site, a necessity due to the explosion of data from next-generation sequencing platforms. Organism-centric portals containing dynamically generated data are available for over 56 different nematode species. Next-generation data has been added to the various data-mining portals hosted, including NemaBLAST and NemaBrowse. The NemaPath metabolic pathway viewer builds associations using KOs, rather than ECs to provide more accurate and fine-grained descriptions of proteins. Two new features for data analysis and comparative genomics have been added to the site. NemaSNP enables the user to perform population genetics studies in various nematode populations using next-generation sequencing data. HelmCoP (Helminth Control and Prevention) as an independent component of Nematode.net provides an integrated resource for storage, annotation and comparative genomics of helminth genomes to aid in learning more about nematode genomes, as well as drug, pesticide, vaccine and drug target discovery. With this update, Nematode.net will continue to realize its original goal to disseminate diverse bioinformatic data sets and provide analysis tools to the broad scientific community in a useful and user-friendly manner.

  4. De novo sequencing and transcriptome analysis of Ustilaginoidea virens by using Illumina paired-end sequencing and development of simple sequence repeat markers.

    PubMed

    Yu, Mina; Yu, Junjie; Gu, Chenhao; Nie, Yafeng; Chen, Zhiyi; Yin, Xiaole; Liu, Yongfeng

    2014-09-01

    Ustilaginoidea virens is the causal agent of rice false smut, which is a rice disease of increasing importance worldwide that has caused with the quantitative and qualitative rice losses. However, research on the pathogenic mechanism of U. virens is limited. In this study, we reported a de novo assembling, annotation, and characterization of the transcriptome and developed simple sequence repeat (SSR) markers of U. virens. U. virens transcripts of the mycelia and conidia mixture were sequenced using Illumina RNA-seq technology. A total of 52,554,142 clean reads were assembled into 36,496 transcripts representing 18,534 unigenes. Assembled unigenes were annotated through sequence comparison with known protein databases, and 48.48% of the unigenes were without hits in any of these databases. Clusters of orthologous groups for eukaryotic complete genome analysis identified the largest set of genes associated with posttranslational modification, protein turnover and chaperones. Kyoto Encyclopedia of Genes and Genomes pathway analyses identified the number of genes associated with mitogen-activated protein kinase and calcium-calcineurin pathways. The study also identified several putative pathogenicity determinants and candidate effectors in U. virens by using the pathogen-host interaction database. In addition, bioinformatics analysis revealed the presence of 12,298 SSR markers. This study provides a better understanding of the biology of U. virens and is an excellent resource for candidate genes required for pathogenesis discovery.

  5. Network Analysis of Sequence-Function Relationships and Exploration of Sequence Space of TEM β-Lactamases.

    PubMed

    Zeil, Catharina; Widmann, Michael; Fademrecht, Silvia; Vogel, Constantin; Pleiss, Jürgen

    2016-05-01

    The Lactamase Engineering Database (www.LacED.uni-stuttgart.de) was developed to facilitate the classification and analysis of TEM β-lactamases. The current version contains 474 TEM variants. Two hundred fifty-nine variants form a large scale-free network of highly connected point mutants. The network was divided into three subnetworks which were enriched by single phenotypes: one network with predominantly 2be and two networks with 2br phenotypes. Fifteen positions were found to be highly variable, contributing to the majority of the observed variants. Since it is expected that a considerable fraction of the theoretical sequence space is functional, the currently sequenced 474 variants represent only the tip of the iceberg of functional TEM β-lactamase variants which form a huge natural reservoir of highly interconnected variants. Almost 50% of the variants are part of a quartet. Thus, two single mutations that result in functional enzymes can be combined into a functional protein. Most of these quartets consist of the same phenotype, or the mutations are additive with respect to the phenotype. By predicting quartets from triplets, 3,916 unknown variants were constructed. Eighty-seven variants complement multiple quartets and therefore have a high probability of being functional. The construction of a TEM β-lactamase network and subsequent analyses by clustering and quartet prediction are valuable tools to gain new insights into the viable sequence space of TEM β-lactamases and to predict their phenotype. The highly connected sequence space of TEM β-lactamases is ideally suited to network analysis and demonstrates the strengths of network analysis over tree reconstruction methods.

  6. Medical target prediction from genome sequence: combining different sequence analysis algorithms with expert knowledge and input from artificial intelligence approaches.

    PubMed

    Dandekar, T; Du, F; Schirmer, R H; Schmidt, S

    2001-12-01

    By exploiting the rapid increase in available sequence data, the definition of medically relevant protein targets has been improved by a combination of: (i) differential genome analysis (target list): and (ii) analysis of individual proteins (target analysis). Fast sequence comparisons, data mining, and genetic algorithms further promote these procedures. Mycobacterium tuberculosis proteins were chosen as applied examples.

  7. Third-Generation Sequencing and Analysis of Four Complete Pig Liver Esterase Gene Sequences in Clones Identified by Screening BAC Library

    PubMed Central

    Zhou, Qiongqiong; Sun, Wenjuan; Liu, Xiyan; Wang, Xiliang; Xiao, Yuncai; Bi, Dingren; Yin, Jingdong; Shi, Deshi

    2016-01-01

    Aim Pig liver carboxylesterase (PLE) gene sequences in GenBank are incomplete, which has led to difficulties in studying the genetic structure and regulation mechanisms of gene expression of PLE family genes. The aim of this study was to obtain and analysis of complete gene sequences of PLE family by screening from a Rongchang pig BAC library and third-generation PacBio gene sequencing. Methods After a number of existing incomplete PLE isoform gene sequences were analysed, primers were designed based on conserved regions in PLE exons, and the whole pig genome used as a template for Polymerase chain reaction (PCR) amplification. Specific primers were then selected based on the PCR amplification results. A three-step PCR screening method was used to identify PLE-positive clones by screening a Rongchang pig BAC library and PacBio third-generation sequencing was performed. BLAST comparisons and other bioinformatics methods were applied for sequence analysis. Results Five PLE-positive BAC clones, designated BAC-10, BAC-70, BAC-75, BAC-119 and BAC-206, were identified. Sequence analysis yielded the complete sequences of four PLE genes, PLE1, PLE-B9, PLE-C4, and PLE-G2. Complete PLE gene sequences were defined as those containing regulatory sequences, exons, and introns. It was found that, not only did the PLE exon sequences of the four genes show a high degree of homology, but also that the intron sequences were highly similar. Additionally, the regulatory region of the genes contained two 720bps reverse complement sequences that may have an important function in the regulation of PLE gene expression. Significance This is the first report to confirm the complete sequences of four PLE genes. In addition, the study demonstrates that each PLE isoform is encoded by a single gene and that the various genes exhibit a high degree of sequence homology, suggesting that the PLE family evolved from a single ancestral gene. Obtaining the complete sequences of these PLE genes

  8. Educational Software for the Analysis of DNA and Protein Sequences.

    ERIC Educational Resources Information Center

    Maloy, Stanley; Olson, Sue

    1989-01-01

    Describes the development of the microcomputer-based educational software, DNAzoom, which was designed to introduce undergraduates in molecular biology to computer analysis of DNA protein sequences. Highlights include graphical presentation of data, the functional use of color, a menu-oriented interface, and students' evaluations of the software.…

  9. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    PubMed

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp.

  10. Molecular characterization of Giardia psittaci by multilocus sequence analysis.

    PubMed

    Abe, Niichiro; Makino, Ikuko; Kojima, Atsushi

    2012-12-01

    Multilocus sequence analyses targeting small subunit ribosomal DNA (SSU rDNA), elongation factor 1 alpha (ef1α), glutamate dehydrogenase (gdh), and beta giardin (β-giardin) were performed on Giardia psittaci isolates from three Budgerigars (Melopsittacus undulates) and four Barred parakeets (Bolborhynchus lineola) kept in individual households or imported from overseas. Nucleotide differences and phylogenetic analyses at four loci indicate the distinction of G. psittaci from the other known Giardia species: Giardia muris, Giardia microti, Giardia ardeae, and Giardia duodenalis assemblages. Furthermore, G. psittaci was related more closely to G. duodenalis than to the other known Giardia species, except for G. microti. Conflicting signals regarded as "double peaks" were found at the same nucleotide positions of the ef1α in all isolates. However, the sequences of the other three loci, including gdh and β-giardin, which are known to be highly variable, from all isolates were also mutually identical at every locus. They showed no double peaks. These results suggest that double peaks found in the ef1α sequences are caused not by mixed infection with genetically different G. psittaci isolates but by allelic sequence heterogeneity (ASH), which is observed in diplomonad lineages including G. duodenalis. No sequence difference was found in any G. psittaci isolates at the gdh and β-giardin, suggesting that G. psittaci is indeed not more diverse genetically than other Giardia species. This report is the first to provide evidence related to the genetic characteristics of G. psittaci obtained using multilocus sequence analysis. PMID:22921500

  11. Motion sequence analysis in the presence of figural cues

    PubMed Central

    Sinha, Pawan; Vaina, Lucia M.

    2015-01-01

    The perception of 3D structure in dynamic sequences is believed to be subserved primarily through the use of motion cues. However, real-world sequences contain many figural shape cues besides the dynamic ones. We hypothesize that if figural cues are perceptually significant during sequence analysis, then inconsistencies in these cues over time would lead to percepts of non-rigidity in sequences showing physically rigid objects in motion. We develop an experimental paradigm to test this hypothesis and present results with two patients with impairments in motion perception due to focal neurological damage, as well as two control subjects. Consistent with our hypothesis, the data suggest that figural cues strongly influence the perception of structure in motion sequences, even to the extent of inducing non-rigid percepts in sequences where motion information alone would yield rigid structures. Beyond helping to probe the issue of shape perception, our experimental paradigm might also serve as a possible perceptual assessment tool in a clinical setting. PMID:26028822

  12. Hybrid Additive Manufacturing Technologies - An Analysis Regarding Potentials and Applications

    NASA Astrophysics Data System (ADS)

    Merklein, Marion; Junker, Daniel; Schaub, Adam; Neubauer, Franziska

    Imposing the trend of mass customization of lightweight construction in industry, conventional manufacturing processes like forming technology and chipping production are pushed to their limits for economical manufacturing. More flexible processes are needed which were developed by the additive manufacturing technology. This toolless production principle offers a high geometrical freedom and an optimized utilization of the used material. Thus load adjusted lightweight components can be produced in small lot sizes in an economical way. To compensate disadvantages like inadequate accuracy and surface roughness hybrid machines combining additive and subtractive manufacturing are developed. Within this paper the principles of mainly used additive manufacturing processes of metals and their possibility to be integrated into a hybrid production machine are summarized. It is pointed out that in particular the integration of deposition processes into a CNC milling center supposes high potential for manufacturing larger parts with high accuracy. Furthermore the combination of additive and subtractive manufacturing allows the production of ready to use products within one single machine. Additionally actual research for the integration of additive manufacturing processes into the production chain will be analyzed. For the long manufacturing time of additive production processes the combination with conventional manufacturing processes like sheet or bulk metal forming seems an effective solution. Especially large volumes can be produced by conventional processes. In an additional production step active elements can be applied by additive manufacturing. This principle is also investigated for tool production to reduce chipping of the high strength material used for forming tools. The aim is the addition of active elements onto a geometrical simple basis by using Laser Metal Deposition. That process allows the utilization of several powder materials during one process what

  13. Improved algorithm for analysis of DNA sequences using multiresolution transformation.

    PubMed

    Inbamalar, T M; Sivakumar, R

    2015-01-01

    Bioinformatics and genomic signal processing use computational techniques to solve various biological problems. They aim to study the information allied with genetic materials such as the deoxyribonucleic acid (DNA), the ribonucleic acid (RNA), and the proteins. Fast and precise identification of the protein coding regions in DNA sequence is one of the most important tasks in analysis. Existing digital signal processing (DSP) methods provide less accurate and computationally complex solution with greater background noise. Hence, improvements in accuracy, computational complexity, and reduction in background noise are essential in identification of the protein coding regions in the DNA sequences. In this paper, a new DSP based method is introduced to detect the protein coding regions in DNA sequences. Here, the DNA sequences are converted into numeric sequences using electron ion interaction potential (EIIP) representation. Then discrete wavelet transformation is taken. Absolute value of the energy is found followed by proper threshold. The test is conducted using the data bases available in the National Centre for Biotechnology Information (NCBI) site. The comparative analysis is done and it ensures the efficiency of the proposed system.

  14. Improved algorithm for analysis of DNA sequences using multiresolution transformation.

    PubMed

    Inbamalar, T M; Sivakumar, R

    2015-01-01

    Bioinformatics and genomic signal processing use computational techniques to solve various biological problems. They aim to study the information allied with genetic materials such as the deoxyribonucleic acid (DNA), the ribonucleic acid (RNA), and the proteins. Fast and precise identification of the protein coding regions in DNA sequence is one of the most important tasks in analysis. Existing digital signal processing (DSP) methods provide less accurate and computationally complex solution with greater background noise. Hence, improvements in accuracy, computational complexity, and reduction in background noise are essential in identification of the protein coding regions in the DNA sequences. In this paper, a new DSP based method is introduced to detect the protein coding regions in DNA sequences. Here, the DNA sequences are converted into numeric sequences using electron ion interaction potential (EIIP) representation. Then discrete wavelet transformation is taken. Absolute value of the energy is found followed by proper threshold. The test is conducted using the data bases available in the National Centre for Biotechnology Information (NCBI) site. The comparative analysis is done and it ensures the efficiency of the proposed system. PMID:26000337

  15. Analysis of passing sequences, shots and goals in soccer.

    PubMed

    Hughes, Mike; Franks, Ian

    2005-05-01

    Early research into how goals were scored in association football (Reep and Benjamin, 1968) may have shaped the tactics of British football. Most coaches have been affected, to a greater or lesser extent, by the tactics referred to as the "long-ball game" or "direct play", which was a tactic employed as a consequence of this research. Data from these studies, published in the late 1960s, have been reconfirmed by analyses of different FIFA World Cup tournaments by several different research groups. In the present study, the number of passes that led to goals scored in two FIFA World Cup finals were analysed. The results conform to that of previous research, but when these data were normalized with respect to the frequency of the respective lengths of passing sequences, there were more goals scored from longer passing sequences than from shorter passing sequences. Teams produced significantly more shots per possession for these longer passing sequences, but the strike ratio of goals from shots is better for "direct play" than for "possession play". Finally, an analysis of the shooting data for successful and unsuccessful teams for different lengths of passing sequences in the 1990 FIFA World Cup finals indicated that, for successful teams, longer passing sequences produced more goals per possession than shorter passing sequences. For unsuccessful teams, neither tactic had a clear advantage. It was further concluded that the original work of Reep and Benjamin (1968), although a key landmark in football analysis, led only to a partial understanding of the phenomenon that was investigated.

  16. GISH analysis of disomic Brassica napus-Crambe abyssinica chromosome addition lines produced by microspore culture from monosomic addition lines.

    PubMed

    Wang, Youping; Sonntag, Karin; Rudloff, Eicke; Wehling, Peter; Snowdon, Rod J

    2006-02-01

    Two Brassica napus-Crambe abyssinica monosomic addition lines (2n=39, AACC plus a single chromosome from C. abyssinca) were obtained from the F(2) progeny of the asymmetric somatic hybrid. The alien chromosome from C. abyssinca in the addition line was clearly distinguished by genomic in situ hybridization (GISH). Twenty-seven microspore-derived plants from the addition lines were obtained. Fourteen seedlings were determined to be diploid plants (2n=38) arising from spontaneous chromosome doubling, while 13 seedlings were confirmed as haploid plants. Doubled haploid plants produced after treatment with colchicine and two disomic chromosome addition lines (2n=40, AACC plus a single pair of homologous chromosomes from C. abyssinca) could again be identified by GISH analysis. The lines are potentially useful for molecular genetic analysis of novel C. abyssinica genes or alleles contributing to traits relevant for oilseed rape (B. napus) breeding.

  17. Whole-Genome sequencing and genetic variant analysis of a quarter Horse mare

    PubMed Central

    2012-01-01

    Background The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. Results Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. Conclusions This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids. PMID:22340285

  18. Expressed sequence tag analysis in tef (Eragrostis tef (Zucc) Trotter).

    PubMed

    Yu, Ju-Kyung; Sun, Qi; Rota, Mauricio La; Edwards, Hugh; Tefera, Hailu; Sorrells, Mark E

    2006-04-01

    Tef (Eragrostis tef (Zucc.) Trotter) is the most important cereal crop in Ethiopia; however, there is very little DNA sequence information available for this species. Expressed sequence tags (ESTs) were generated from 4 cDNA libraries: seedling leaf, seedling root, and inflorescence of E. tef and seedling leaf of Eragrostis pilosa, a wild relative of E. tef. Clustering of 3603 sequences produced 530 clusters and 1890 singletons, resulting in 2420 tef unigenes. Approximately 3/4 of tef unigenes matched protein or nucleotide sequences in public databases. Annotation of unigenes associated 68% of the putative tef genes with gene ontology categories. Identification of the translated unigenes for conserved protein domains revealed 389 protein family domains (Pfam), the most frequent of which was protein kinase. A total of 170 ESTs containing simple sequence repeats (EST-SSRs) were identified and 80 EST-SSR markers were developed. In addition, 19 single-nucleotide polymorphism (SNP) and (or) insertion-deletion (indel) and 34 intron fragment length polymorphism (IFLP) markers were developed. The EST database and molecular markers generated in this study will be valuable resources for further tef genetic research.

  19. Construction of an integrated database to support genomic sequence analysis

    SciTech Connect

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  20. Artificial intelligence approach in analysis of DNA sequences.

    PubMed

    Brézillon, P J; Zaraté, P; Saci, F

    1993-01-01

    We present an approach for designing a knowledge-based system, called Sequence Acquisition In Context (SAIC), that will be able to cooperate with a biologist in the analysis of DNA sequences. The main task of the system is the acquisition of the expert knowledge that the biologist uses for solving ambiguities from gel autoradiograms, with the aim of re-using it later for solving similar ambiguities. The various types of expert knowledge constitute what we call the contextual knowledge of the sequence analysis. Contextual knowledge deals with the unavoidable problems that are common in the study of the living material (eg noise on data, difficulties of observations). Indeed, the analysis of DNA sequences from autoradiograms belongs to an emerging and promising area of investigation, namely reasoning with images. The SAIC project is developed in a theoretical framework that is shared with other applications. Not all tasks have the same importance in each application. We use this observation for designing an intelligent assistant system with three applications. In the SAIC project, we focus on knowledge acquisition, human-computer interaction and explanation. The project will benefit research in the two other applications. We also discuss our SAIC project in the context of large international projects that aim to re-use and share knowledge in a repository.

  1. VIROME: a standard operating procedure for analysis of viral metagenome sequences.

    PubMed

    Wommack, K Eric; Bhavsar, Jaysheel; Polson, Shawn W; Chen, Jing; Dumas, Michael; Srinivasiah, Sharath; Furman, Megan; Jamindar, Sanchita; Nasko, Daniel J

    2012-07-30

    One consistent finding among studies using shotgun metagenomics to analyze whole viral communities is that most viral sequences show no significant homology to known sequences. Thus, bioinformatic analyses based on sequence collections such as GenBank nr, which are largely comprised of sequences from known organisms, tend to ignore a majority of sequences within most shotgun viral metagenome libraries. Here we describe a bioinformatic pipeline, the Viral Informatics Resource for Metagenome Exploration (VIROME), that emphasizes the classification of viral metagenome sequences (predicted open-reading frames) based on homology search results against both known and environmental sequences. Functional and taxonomic information is derived from five annotated sequence databases which are linked to the UniRef 100 database. Environmental classifications are obtained from hits against a custom database, MetaGenomes On-Line, which contains 49 million predicted environmental peptides. Each predicted viral metagenomic ORF run through the VIROME pipeline is placed into one of seven ORF classes, thus, every sequence receives a meaningful annotation. Additionally, the pipeline includes quality control measures to remove contaminating and poor quality sequence and assesses the potential amount of cellular DNA contamination in a viral metagenome library by screening for rRNA genes. Access to the VIROME pipeline and analysis results are provided through a web-application interface that is dynamically linked to a relational back-end database. The VIROME web-application interface is designed to allow users flexibility in retrieving sequences (reads, ORFs, predicted peptides) and search results for focused secondary analyses. PMID:23407591

  2. [Statistical analysis of DNA sequences nearby splicing sites].

    PubMed

    Korzinov, O M; Astakhova, T V; Vlasov, P K; Roĭtberg, M A

    2008-01-01

    Recognition of coding regions within eukaryotic genomes is one of oldest but yet not solved problems of bioinformatics. New high-accuracy methods of splicing sites recognition are needed to solve this problem. A question of current interest is to identify specific features of nucleotide sequences nearby splicing sites and recognize sites in sequence context. We performed a statistical analysis of human genes fragment database and revealed some characteristics of nucleotide sequences in splicing sites neighborhood. Frequencies of all nucleotides and dinucleotides in splicing sites environment were computed and nucleotides and dinucleotides with extremely high\\low occurrences were identified. Statistical information obtained in this work can be used in further development of the methods of splicing sites annotation and exon-intron structure recognition.

  3. A biostratigraphic sequence analysis in Cretaceous sediments from Eastern Venezuela

    SciTech Connect

    Paredes, I.; Carillo, M.; Fasola, A.; Luna, F. )

    1993-02-01

    This paper presents the results of a high resolution biostratigraphic study integrated with petrophysic analyses, of the Late Cretaceous sequence in several wells from the Maturin Sub-Basin, Eastern Venezuela. The main objective of this study is to integrate the different faunal and floral assemblages to the sedimentological evolution of the basin using sequential analysis techniques. This technique was applied using mainly terrestrial and marine palynomorphs which were relatively abundant and diverse as compared to the scarcity of foraminifera and nonnofossils. Based on the percentages of abundance and the diversity of the different groups of microfoss it was possible to establish the maximum flooding surfaces and condensation levels which allowed the definition of the possible candidates for the sequence boundaries. On the other hand, the identified bioevents made possible the definition of the chronostratigraphic datums of the sequence under study. The results obtained will contribute to optimize the exploration and development programs of the oil fields in Eastern Venezuela.

  4. Genome sequencing and analysis of the model grass Brachypodium distachyon

    SciTech Connect

    Yang, Xiaohan; Kalluri, Udaya C; Tuskan, Gerald A

    2010-01-01

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

  5. Strategy for microbiome analysis using 16S rRNA gene sequence analysis on the Illumina sequencing platform.

    PubMed

    Ram, Jeffrey L; Karim, Aos S; Sendler, Edward D; Kato, Ikuko

    2011-06-01

    Understanding the identity and changes of organisms in the urogenital and other microbiomes of the human body may be key to discovering causes and new treatments of many ailments, such as vaginosis. High-throughput sequencing technologies have recently enabled discovery of the great diversity of the human microbiome. The cost per base of many of these sequencing platforms remains high (thousands of dollars per sample); however, the Illumina Genome Analyzer (IGA) is estimated to have a cost per base less than one-fifth of its nearest competitor. The main disadvantage of the IGA for sequencing PCR-amplified 16S rRNA genes is that the maximum read-length of the IGA is only 100 bases; whereas, at least 300 bases are needed to obtain phylogenetically informative data down to the genus and species level. In this paper we describe and conduct a pilot test of a multiplex sequencing strategy suitable for achieving total reads of > 300 bases per extracted DNA molecule on the IGA. Results show that all proposed primers produce products of the expected size and that correct sequences can be obtained, with all proposed forward primers. Various bioinformatic optimization of the Illumina Bustard analysis pipeline proved necessary to extract the correct sequence from IGA image data, and these modifications of the data files indicate that further optimization of the analysis pipeline may improve the quality rankings of the data and enable more sequence to be correctly analyzed. The successful application of this method could result in an unprecedentedly deep description (800,000 taxonomic identifications per sample) of the urogenital and other microbiomes in a large number of samples at a reasonable cost per sample. PMID:21361774

  6. Evolution Analysis of Simple Sequence Repeats in Plant Genome.

    PubMed

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1-3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution.

  7. Evolution Analysis of Simple Sequence Repeats in Plant Genome

    PubMed Central

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1–3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution. PMID:26630570

  8. CyMATE: a new tool for methylation analysis of plant genomic DNA after bisulphite sequencing.

    PubMed

    Hetzl, Jennifer; Foerster, Andrea M; Raidl, Günther; Mittelsten Scheid, Ortrun

    2007-08-01

    Cytosine methylation is a hallmark of epigenetic information in the DNA of many fungi, vertebrates and plants. The technique of bisulphite genomic sequencing reveals the methylation state of every individual cytosine in a sequence, and thereby provides high-resolution data on epigenetic diversity; however, the manual evaluation and documentation of large amounts of data is laborious and error-prone. While some software is available for facilitating the analysis of mammalian DNA methylation, which is found nearly exclusively at CG sites, there is no software optimally suited for data from DNA with significant non-CG methylation. We describe CyMATE (Cytosine Methylation Analysis Tool for Everyone) for in silico analysis of DNA sequences after bisulphite conversion of plant DNA, in which methylation is more divergent with respect to sequence context and biological relevance. From aligned sequences, CyMATE includes and distinguishes methylation at CG, CHG and CHH (where H = A, C or T), and can extract both quantitative and qualitative data regarding general and pattern-specific methylation per sequence and per position, i.e. data for individual sites in a sequence and the epigenetic divergence within a sample. In addition, it can provide graphical output from alignments in either an overview or a 'zoom-in' view as pdf files. Detailed information, including a quality control of the sequencing data, is provided in text format. We applied CyMATE to the analysis of DNA methylation at transcriptionally silenced promoters in diploid and polyploid Arabidopsis and found significant hypermethylation, high stability of the methylated state independent of chromosome number, and non-redundant patterns of mC distribution. CyMATE is freely available for non-commercial use at http://www.gmi.oeaw.ac.at/CyMATE. PMID:17559516

  9. Congruence analysis of point clouds from unstable stereo image sequences

    NASA Astrophysics Data System (ADS)

    Jepping, C.; Bethmann, F.; Luhmann, T.

    2014-06-01

    This paper deals with the correction of exterior orientation parameters of stereo image sequences over deformed free-form surfaces without control points. Such imaging situation can occur, for example, during photogrammetric car crash test recordings where onboard high-speed stereo cameras are used to measure 3D surfaces. As a result of such measurements 3D point clouds of deformed surfaces are generated for a complete stereo sequence. The first objective of this research focusses on the development and investigation of methods for the detection of corresponding spatial and temporal tie points within the stereo image sequences (by stereo image matching and 3D point tracking) that are robust enough for a reliable handling of occlusions and other disturbances that may occur. The second objective of this research is the analysis of object deformations in order to detect stable areas (congruence analysis). For this purpose a RANSAC-based method for congruence analysis has been developed. This process is based on the sequential transformation of randomly selected point groups from one epoch to another by using a 3D similarity transformation. The paper gives a detailed description of the congruence analysis. The approach has been tested successfully on synthetic and real image data.

  10. Castor bean organelle genome sequencing and worldwide genetic diversity analysis.

    PubMed

    Rivarola, Maximo; Foster, Jeffrey T; Chan, Agnes P; Williams, Amber L; Rice, Danny W; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M J; Khouri, Hoda M; Beckstrom-Sternberg, Stephen M; Allan, Gerard J; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade.

  11. Castor Bean Organelle Genome Sequencing and Worldwide Genetic Diversity Analysis

    PubMed Central

    Chan, Agnes P.; Williams, Amber L.; Rice, Danny W.; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M. J.; Khouri, Hoda M.; Beckstrom-Sternberg, Stephen M.; Allan, Gerard J.; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D.

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade. PMID:21750729

  12. Infrared thermal facial image sequence registration analysis and verification

    NASA Astrophysics Data System (ADS)

    Chen, Chieh-Li; Jian, Bo-Lin

    2015-03-01

    To study the emotional responses of subjects to the International Affective Picture System (IAPS), infrared thermal facial image sequence is preprocessed for registration before further analysis such that the variance caused by minor and irregular subject movements is reduced. Without affecting the comfort level and inducing minimal harm, this study proposes an infrared thermal facial image sequence registration process that will reduce the deviations caused by the unconscious head shaking of the subjects. A fixed image for registration is produced through the localization of the centroid of the eye region as well as image translation and rotation processes. Thermal image sequencing will then be automatically registered using the two-stage genetic algorithm proposed. The deviation before and after image registration will be demonstrated by image quality indices. The results show that the infrared thermal image sequence registration process proposed in this study is effective in localizing facial images accurately, which will be beneficial to the correlation analysis of psychological information related to the facial area.

  13. On combining protein sequences and nucleic acid sequences in phylogenetic analysis: the homeobox protein case.

    PubMed

    Agosti, D; Jacobs, D; DeSalle, R

    1996-01-01

    Amino acid encoding genes contain character state information that may be useful for phylogenetic analysis on at least two levels. The nucleotide sequence and the translated amino acid sequences have both been employed separately as character states for cladistic studies of various taxa, including studies of the genealogy of genes in multigene families. In essence, amino acid sequences and nucleic acid sequences are two different ways of character coding the information in a gene. Silent positions in the nucleotide sequence (first or third positions in codons that can accrue change without changing the identity of the amino acid that the triplet codes for) may accrue change relatively rapidly and become saturated, losing the pattern of historical divergence. On the other hand, non-silent nucleotide alterations and their accompanying amino acid changes may evolve too slowly to reveal relationships among closely related taxa. In general, the dynamics of sequence change in silent and non-silent positions in protein coding genes result in homoplasy and lack of resolution, respectively. We suggest that the combination of nucleic acid and the translated amino acid coded character states into the same data matrix for phylogenetic analysis addresses some of the problems caused by the rapid change of silent nucleotide positions and overall slow rate of change of non-silent nucleotide positions and slowly changing amino acid positions. One major theoretical problem with this approach is the apparent non-independence of the two sources of characters. However, there are at least three possible outcomes when comparing protein coding nucleic acid sequences with their translated amino acids in a phylogenetic context on a codon by codon basis. First, the two character sets for a codon may be entirely congruent with respect to the information they convey about the relationships of a certain set of taxa. Second, one character set may display no information concerning a phylogenetic

  14. Applications of new sequencing technologies for transcriptome analysis.

    PubMed

    Morozova, Olena; Hirst, Martin; Marra, Marco A

    2009-01-01

    Transcriptome analysis has been a key area of biological inquiry for decades. Over the years, research in the field has progressed from candidate gene-based detection of RNAs using Northern blotting to high-throughput expression profiling driven by the advent of microarrays. Next-generation sequencing technologies have revolutionized transcriptomics by providing opportunities for multidimensional examinations of cellular transcriptomes in which high-throughput expression data are obtained at a single-base resolution. PMID:19715439

  15. ANALYSIS OF EXPRESSED SEQUENCE TAGS FROM THE GREEN ALGA DUNALIELLA SALINA (CHLOROPHYTA)(1).

    PubMed

    Zhao, Rui; Cao, Yu; Xu, Hui; Lv, Linfeng; Qiao, Dairong; Cao, Yi

    2011-12-01

    The unicellular green alga Dunaliella salina (Dunal) Teodor. is a novel model photosynthetic eukaryote for studying photosystems, high salinity acclimation, and carotenoid accumulation. In spite of such significance, there have been limited studies on the Dunaliella genome transcriptome and proteome. To further investigate D. salina, a cDNA library was constructed and sequenced. Here, we present the analysis of the 2,282 expressed sequence tags (ESTs) generated together with 3,990 ESTs from dbEST. A total of 4,148 unique sequences (UniSeqs) were identified, of which 56.1% had sequence similarity with Uniprot entries, suggesting that a large number of unique genes may be harbored by Dunaliella. Additionally, protein family domains were identified to further characterize these sequences. Then, we also compared EST sequences with different complete eukaryotic genomes from several animals, plants, and fungi. We observed notable differences between D. salina and other organisms. This EST collection and its annotation provided a significant resource for basic and applied research on D. salina and laid the foundation for a systematic analysis of the transcriptome basis of green algae development and diversification.

  16. Construction of a BAC library of Korean ginseng and initial analysis of BAC-end sequences.

    PubMed

    Hong, C P; Lee, S J; Park, J Y; Plaha, P; Park, Y S; Lee, Y K; Choi, J E; Kim, K Y; Lee, J H; Lee, J; Jin, H; Choi, S R; Lim, Y P

    2004-07-01

    We estimated the genome size of Korean ginseng (Panax ginseng C.A. Meyer), a medicinal herb, constructed a HindIII BAC library, and analyzed BAC-end sequences to provide an initial characterization of the library. The 1C nuclear DNA content of Korean ginseng was estimated to be 3.33 pg (3.12 x 10(3) Mb). The BAC library consists of 106,368 clones with an average size of 98.61 kb, amounting to 3.34 genome equivalents. Sequencing of 2167 BAC clones generated 2492 BAC-end sequences with an average length of 400 bp. Analysis using BLAST and motif searches revealed that 10.2%, 20.9% and 3.8% of the BAC-end sequences contained protein-coding regions, transposable elements and microsatellites, respectively. A comparison of the functional categories represented by the protein-coding regions found in BAC-end sequences with those of Arabidopsis revealed that proteins pertaining to energy metabolism, subcellular localization, cofactor requirement and transport facilitation were more highly represented in the P. ginseng sample. In addition, a sequence encoding a glucosyltransferase-like protein implicated in the ginsenoside biosynthesis pathway was also found. The majority of the transposable element sequences found belonged to the gypsy type (67.6%), followed by copia (11.7%) and LINE (8.0%) retrotransposons, whereas DNA transposons accounted for only 2.1% of the total in our sequence sample. Higher levels of transposable elements than protein-coding regions suggest that mobile elements have played an important role in the evolution of the genome of Korean ginseng, and contributed significantly to its complexity. We also identified 103 microsatellites with 3-38 repeats in their motifs. The BAC library and BAC-end sequences will serve as a useful resource for physical mapping, positional cloning and genome sequencing of P. ginseng.

  17. Effect of Si additions on thermal stability and the phase transition sequence of sputtered amorphous alumina thin films

    SciTech Connect

    Bolvardi, H.; Baben, M. to; Nahif, F.; Music, D. Schnabel, V.; Shaha, K. P.; Mráz, S.; Schneider, J. M.; Bednarcik, J.; Michalikova, J.

    2015-01-14

    Si-alloyed amorphous alumina coatings having a silicon concentration of 0 to 2.7 at. % were deposited by combinatorial reactive pulsed DC magnetron sputtering of Al and Al-Si (90-10 at. %) split segments in Ar/O{sub 2} atmosphere. The effect of Si alloying on thermal stability of the as-deposited amorphous alumina thin films and the phase formation sequence was evaluated by using differential scanning calorimetry and X-ray diffraction. The thermal stability window of the amorphous phase containing 2.7 at. % of Si was increased by more than 100 °C compared to that of the unalloyed phase. A similar retarding effect of Si alloying was also observed for the α-Al{sub 2}O{sub 3} formation temperature, which increased by more than 120 °C. While for the latter retardation, the evidence for the presence of SiO{sub 2} at the grain boundaries was presented previously, this obviously cannot explain the stability enhancement reported here for the amorphous phase. Based on density functional theory molecular dynamics simulations and synchrotron X-ray diffraction experiments for amorphous Al{sub 2}O{sub 3} with and without Si incorporation, we suggest that the experimentally identified enhanced thermal stability of amorphous alumina with addition of Si is due to the formation of shorter and stronger Si–O bonds as compared to Al–O bonds.

  18. Treatment of a simulated textile wastewater in a sequencing batch reactor (SBR) with addition of a low-cost adsorbent.

    PubMed

    Santos, Sílvia C R; Boaventura, Rui A R

    2015-06-30

    Color removal from textile wastewaters, at a low-cost and consistent technology, is even today a challenge. Simultaneous biological treatment and adsorption is a known alternative to the treatment of wastewaters containing biodegradable and non-biodegradable contaminants. The present work aims at evaluating the treatability of a simulated textile wastewater by simultaneously combining biological treatment and adsorption in a SBR (sequencing batch reactor), but using a low-cost adsorbent, instead of a commercial one. The selected adsorbent was a metal hydroxide sludge (WS) from an electroplating industry. Direct Blue 85 dye (DB) was used in the preparation of the synthetic wastewater. Firstly, adsorption kinetics and equilibrium were studied, in respect to many factors (temperature, pH, WS dosage and presence of salts and dyeing auxiliary chemicals in the aqueous media). At 25 °C and pH 4, 7 and 10, maximum DB adsorption capacities in aqueous solution were 600, 339 and 98.7 mg/g, respectively. These values are quite considerable, compared to other reported in literature, but proved to be significantly reduced by the presence of dyeing auxiliary chemicals in the wastewater. The simulated textile wastewater treatment in SBR led to BOD5 removals of 53-79%, but color removal was rather limited (10-18%). The performance was significantly enhanced by the addition of WS, with BOD5 removals above 91% and average color removals of 60-69%.

  19. Targeted Analysis of Whole Genome Sequence Data to Diagnose Genetic Cardiomyopathy

    SciTech Connect

    Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa; Fahrenbach, John P.; Nelakuditi, Viswateja; Pesce, Lorenzo L.; Pytel, Peter; McNally, Elizabeth M.

    2014-09-01

    Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused on 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.

  20. Targeted Analysis of Whole Genome Sequence Data to Diagnose Genetic Cardiomyopathy

    PubMed Central

    Dellefave-Castillo, Lisa; Fahrenbach, John P; Nelakuditi, Viswateja; Pesce, Lorenzo L; Pytel, Peter; McNally, Elizabeth M

    2014-01-01

    Background Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused on 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. Conclusions These pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes. PMID:25179549

  1. Targeted Analysis of Whole Genome Sequence Data to Diagnose Genetic Cardiomyopathy

    DOE PAGES

    Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa; Fahrenbach, John P.; Nelakuditi, Viswateja; Pesce, Lorenzo L.; Pytel, Peter; McNally, Elizabeth M.

    2014-09-01

    Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused onmore » 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.« less

  2. Porosity Measurements and Analysis for Metal Additive Manufacturing Process Control.

    PubMed

    Slotwinski, John A; Garboczi, Edward J; Hebenstreit, Keith M

    2014-01-01

    Additive manufacturing techniques can produce complex, high-value metal parts, with potential applications as critical metal components such as those found in aerospace engines and as customized biomedical implants. Material porosity in these parts is undesirable for aerospace parts - since porosity could lead to premature failure - and desirable for some biomedical implants - since surface-breaking pores allows for better integration with biological tissue. Changes in a part's porosity during an additive manufacturing build may also be an indication of an undesired change in the build process. Here, we present efforts to develop an ultrasonic sensor for monitoring changes in the porosity in metal parts during fabrication on a metal powder bed fusion system. The development of well-characterized reference samples, measurements of the porosity of these samples with multiple techniques, and correlation of ultrasonic measurements with the degree of porosity are presented. A proposed sensor design, measurement strategy, and future experimental plans on a metal powder bed fusion system are also presented.

  3. Porosity Measurements and Analysis for Metal Additive Manufacturing Process Control.

    PubMed

    Slotwinski, John A; Garboczi, Edward J; Hebenstreit, Keith M

    2014-01-01

    Additive manufacturing techniques can produce complex, high-value metal parts, with potential applications as critical metal components such as those found in aerospace engines and as customized biomedical implants. Material porosity in these parts is undesirable for aerospace parts - since porosity could lead to premature failure - and desirable for some biomedical implants - since surface-breaking pores allows for better integration with biological tissue. Changes in a part's porosity during an additive manufacturing build may also be an indication of an undesired change in the build process. Here, we present efforts to develop an ultrasonic sensor for monitoring changes in the porosity in metal parts during fabrication on a metal powder bed fusion system. The development of well-characterized reference samples, measurements of the porosity of these samples with multiple techniques, and correlation of ultrasonic measurements with the degree of porosity are presented. A proposed sensor design, measurement strategy, and future experimental plans on a metal powder bed fusion system are also presented. PMID:26601041

  4. Porosity Measurements and Analysis for Metal Additive Manufacturing Process Control

    PubMed Central

    Slotwinski, John A; Garboczi, Edward J; Hebenstreit, Keith M

    2014-01-01

    Additive manufacturing techniques can produce complex, high-value metal parts, with potential applications as critical metal components such as those found in aerospace engines and as customized biomedical implants. Material porosity in these parts is undesirable for aerospace parts - since porosity could lead to premature failure - and desirable for some biomedical implants - since surface-breaking pores allows for better integration with biological tissue. Changes in a part’s porosity during an additive manufacturing build may also be an indication of an undesired change in the build process. Here, we present efforts to develop an ultrasonic sensor for monitoring changes in the porosity in metal parts during fabrication on a metal powder bed fusion system. The development of well-characterized reference samples, measurements of the porosity of these samples with multiple techniques, and correlation of ultrasonic measurements with the degree of porosity are presented. A proposed sensor design, measurement strategy, and future experimental plans on a metal powder bed fusion system are also presented. PMID:26601041

  5. Additional EIPC Study Analysis: Interim Report on High Priority Topics

    SciTech Connect

    Hadley, Stanton W

    2013-11-01

    Between 2010 and 2012 the Eastern Interconnection Planning Collaborative (EIPC) conducted a major long-term resource and transmission study of the Eastern Interconnection (EI). With guidance from a Stakeholder Steering Committee (SSC) that included representatives from the Eastern Interconnection States Planning Council (EISPC) among others, the project was conducted in two phases. Phase 1 involved a long-term capacity expansion analysis that involved creation of eight major futures plus 72 sensitivities. Three scenarios were selected for more extensive transmission- focused evaluation in Phase 2. Five power flow analyses, nine production cost model runs (including six sensitivities), and three capital cost estimations were developed during this second phase. The results from Phase 1 and 2 provided a wealth of data that could be examined further to address energy-related questions. A list of 13 topics was developed for further analysis; this paper discusses the first five.

  6. Sequence analysis of two novel HLA-DMA alleles

    SciTech Connect

    Carrington, M.; Harding, A.

    1994-12-31

    Several novel genes have been mapped recently in the HLA class II region between DQ and DP. Two of these genes, DMA and DMB, are predicted to encode a protein which has a structure similar to that of the DR, DQ, and DP molecules. The function of the DM molecule, however, is unlikely to mimic precisely that of the other class II molecules, since they share a low level of similarity and both DMA and DMB have limited polymorphism. Based on sequences from the third exon, four alleles of DMB and two alleles of DMA were previously characterized. Single-strand conformation polymorphism (SSCP) patterns of amplified DMA exon 3 products indicated the existence of two additional DMA alleles, which were subsequently sequenced and are now reported here. 4 refs., 2 figs.

  7. Environmental impact analysis for the main accidental sequences of ignitor

    SciTech Connect

    Carpignano, A.; Francabandiera, S.; Vella, R.; Zucchetti, M.

    1996-12-31

    A safety analysis study has been applied to the Ignitor machine using Probabilistic Safety Assessment. The main initiating events have been identified, and accident sequences have been studied by means of traditional methods such as Failure Mode and Effect Analysis (FMEA), Fault Trees (FT) and Event Trees (ET). The consequences of the radioactive environmental releases have been assessed in terms of Effective Dose Equivalent (EDEs) to the Most Exposed Individuals (MEI) of the chosen site, by means of a population dose code. Results point out the low enviromental impact of the machine. 13 refs., 1 fig., 3 tabs.

  8. The design and analysis of transposon insertion sequencing experiments.

    PubMed

    Chao, Michael C; Abel, Sören; Davis, Brigid M; Waldor, Matthew K

    2016-02-01

    Transposon insertion sequencing (TIS) is a powerful approach that can be extensively applied to the genome-wide definition of loci that are required for bacterial growth under diverse conditions. However, experimental design choices and stochastic biological processes can heavily influence the results of TIS experiments and affect downstream statistical analysis. In this Opinion article, we discuss TIS experimental parameters and how these factors relate to the benefits and limitations of the various statistical frameworks that can be applied to the computational analysis of TIS data.

  9. Disclosure of hydraulic fracturing fluid chemical additives: analysis of regulations.

    PubMed

    Maule, Alexis L; Makey, Colleen M; Benson, Eugene B; Burrows, Isaac J; Scammell, Madeleine K

    2013-01-01

    Hydraulic fracturing is used to extract natural gas from shale formations. The process involves injecting into the ground fracturing fluids that contain thousands of gallons of chemical additives. Companies are not mandated by federal regulations to disclose the identities or quantities of chemicals used during hydraulic fracturing operations on private or public lands. States have begun to regulate hydraulic fracturing fluids by mandating chemical disclosure. These laws have shortcomings including nondisclosure of proprietary or "trade secret" mixtures, insufficient penalties for reporting inaccurate or incomplete information, and timelines that allow for after-the-fact reporting. These limitations leave lawmakers, regulators, public safety officers, and the public uninformed and ill-prepared to anticipate and respond to possible environmental and human health hazards associated with hydraulic fracturing fluids. We explore hydraulic fracturing exemptions from federal regulations, as well as current and future efforts to mandate chemical disclosure at the federal and state level.

  10. Disclosure of hydraulic fracturing fluid chemical additives: analysis of regulations.

    PubMed

    Maule, Alexis L; Makey, Colleen M; Benson, Eugene B; Burrows, Isaac J; Scammell, Madeleine K

    2013-01-01

    Hydraulic fracturing is used to extract natural gas from shale formations. The process involves injecting into the ground fracturing fluids that contain thousands of gallons of chemical additives. Companies are not mandated by federal regulations to disclose the identities or quantities of chemicals used during hydraulic fracturing operations on private or public lands. States have begun to regulate hydraulic fracturing fluids by mandating chemical disclosure. These laws have shortcomings including nondisclosure of proprietary or "trade secret" mixtures, insufficient penalties for reporting inaccurate or incomplete information, and timelines that allow for after-the-fact reporting. These limitations leave lawmakers, regulators, public safety officers, and the public uninformed and ill-prepared to anticipate and respond to possible environmental and human health hazards associated with hydraulic fracturing fluids. We explore hydraulic fracturing exemptions from federal regulations, as well as current and future efforts to mandate chemical disclosure at the federal and state level. PMID:23552653

  11. Risk analysis of sulfites used as food additives in China.

    PubMed

    Zhang, Jian Bo; Zhang, Hong; Wang, Hua Li; Zhang, Ji Yue; Luo, Peng Jie; Zhu, Lei; Wang, Zhu Tian

    2014-02-01

    This study was to analyze the risk of sulfites in food consumed by the Chinese people and assess the health protection capability of maximum-permitted level (MPL) of sulfites in GB 2760-2011. Sulfites as food additives are overused or abused in many food categories. When the MPL in GB 2760-2011 was used as sulfites content in food, the intake of sulfites in most surveyed populations was lower than the acceptable daily intake (ADI). Excess intake of sulfites was found in all the surveyed groups when a high percentile of sulfites in food was in taken. Moreover, children aged 1-6 years are at a high risk to intake excess sulfites. The primary cause for the excess intake of sulfites in Chinese people is the overuse and abuse of sulfites by the food industry. The current MPL of sulfites in GB 2760-2011 protects the health of most populations.

  12. An editing environment for DNA sequence analysis and annotation

    SciTech Connect

    Uberbacher, E.C.; Xu, Y.; Shah, M.B.; Olman, V.; Parang, M.; Mural, R.

    1998-12-31

    This paper presents a computer system for analyzing and annotating large-scale genomic sequences. The core of the system is a multiple-gene structure identification program, which predicts the most probable gene structures based on the given evidence, including pattern recognition, EST and protein homology information. A graphics-based user interface provides an environment which allows the user to interactively control the evidence to be used in the gene identification process. To overcome the computational bottleneck in the database similarity search used in the gene identification process, the authors have developed an effective way to partition a database into a set of sub-databases of related sequences, and reduced the search problem on a large database to a signature identification problem and a search problem on a much smaller sub-database. This reduces the number of sequences to be searched from N to O({radical}N) on average, and hence greatly reduces the search time, where N is the number of sequences in the original database. The system provides the user with the ability to facilitate and modify the analysis and modeling in real time.

  13. A special-purpose processor for gene sequence analysis.

    PubMed

    Fagin, B; Watt, J G; Gross, R

    1993-04-01

    Advances in computational biology have occurred primarily in the areas of software and algorithm development; new designs of hardware to support biological computing are extremely scarce. This is due, we believe, to the presence of a non-trivial knowledge gap between molecular biologists and computer designers. The existence of this gap is unfortunate, as it has long been known that for certain problems, special-purpose computers can achieve significant cost/performance gains over general-purpose machines. We describe one such computer here: a custom accelerator for gene sequence analysis. The accelerator implements a version of the Needleman-Wunsch algorithm for nucleotide sequence alignment. Sequence lengths are constrained only by available memory; the product of sequence lengths in the current implementation can be up to 2(22). The machine is implemented as two NuBus boards connected to a Mac IIf/x, using a mixture of TTL and FPGA technology clocked at 10 MHz. The boards are completely functional, and yield a 15-fold performance improvement over an unassisted host.

  14. Application of Subspace Clustering in DNA Sequence Analysis.

    PubMed

    Wallace, Tim; Sekmen, Ali; Wang, Xiaofei

    2015-10-01

    Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. A series of experiments was performed to cluster randomly selected sequences. The experimental design allows for both false positives and false negatives, and estimates for the statistical significance are provided. The clustering results are consistent with the main hypothesis. A simple random mutation binary tree model is used to simulate speciation events that show the interdependence of the subspace rank versus time and mutation rates. The simple mutation model is found to be largely consistent with the observed subspace clustering singular value results. Our study indicates that the subspace clustering method may be applied in orthology analysis. PMID:26162018

  15. Ichnofabric and siliciclastic depositional systems: Integration for sequence stratigraphic analysis

    SciTech Connect

    Bottjer, D.J. ); Droser, M.L. )

    1991-03-01

    Much previous research on biogenic sedimentary structures has established how ichnofacies (assemblages of discrete trace fossils) vary within marine depositional systems. However, studies aimed at understanding the distribution of ichnofabric (sedimentary rock fabric resulting from biogenic reworking) have only recently been attempted. Because ichnofabric can be recorded using a semi-quantitative series of ichnofabric indices (ii), its distribution in marine sedimentary rocks can be easily recorded through vertical sequence analysis. Thicknesses of strata recording different ichnofabric indices can be logged from stratigraphic sections or cores. These data are best displayed in histograms as percent of ii recorded from the total thickness measured. These ichnofabric histograms (ichnograms) show variable but distinctive distributions for genetic units such as facies within systems tracts of siliciclastic depositional sequences. An average ichnofabric index for any genetic sedimentary unit can also be computed from the data used to construct ichnograms. Because skeletal fossils are typically much less commonly preserved in siliciclastic than carbonate depositional systems, such ichnofabric analyses have the potential of providing an important new line of evidence for depositional systems and sequence stratigraphic analysis of siliciclastic strata. In petroleum exploration results from completing analyses of ichnofabric distribution could provide important information including: (1) systems tracts with fine-grained facies that have relatively low ichnofabric values are potential source beds; and (2) petroleum reservoirs that occur in coarse episodically deposited beds are more likely to from in systems tracts with facies that have low rather than high ichnofabric values.

  16. Expression analysis of a tyrosinase promoter sequence in zebrafish.

    PubMed

    Camp, Esther; Badhwar, Prerna; Mann, Graham J; Lardelli, Michael

    2003-04-01

    Sequence comparisons and functional analysis of the 5' upstream regions of tyrosinase genes have revealed the importance of cis-regulatory elements acting to control the spatiotemporal expression of tyrosinase in the melanocytes and retinal pigmented epithelium of developing embryos. To date there are no reports addressing the control of tyrosinase gene transcription in zebrafish, a vertebrate model organism of increasing importance. To exploit the tyrosinase gene as a marker in zebrafish we set out to clone its promoter and analyse its regulation during embryogenesis. Amplification of a zebrafish tyrosinase complementary DNA fragment by reverse transcriptase polymerase chain reaction allowed us to isolate and sequence a 1041 nt genomic DNA fragment that includes a transcription initiation site and 73 nt of the open reading frame. Bioinformatic analysis of this genomic sequence revealed five E-box motifs, including one CATGTG type E-box present in a putative initiation region. These are conserved positive regulatory elements in vertebrate tyrosinase promoters. We show that a region of 814 nt upstream from the translation start site of the zebrafish tyrosinase gene can drive expression in retinal pigmented epithelium in transiently transgenic zebrafish embryos but that its activity is not restricted to melanin-producing cells. This region is unable to drive transcription in human melanoma cell lines. Ectopic expression from this zebrafish tyrosinase promoter fragment is probably due to the absence of positive and negative cis-regulatory elements, such as a tyrosinase distal element, which is known to function as a pigment cell-specific enhancer.

  17. Nonlinear analysis of correlations in Alu repeat sequences in DNA

    NASA Astrophysics Data System (ADS)

    Xiao, Yi; Huang, Yanzhao; Li, Mingfeng; Xu, Ruizhen; Xiao, Saifeng

    2003-12-01

    We report on a nonlinear analysis of deterministic structures in Alu repeats, one of the richest repetitive DNA sequences in the human genome. Alu repeats contain the recognition sites for the restriction endonuclease AluI, which is what gives them their name. Using the nonlinear prediction method developed in chaos theory, we find that all Alu repeats have novel deterministic structures and show strong nonlinear correlations that are absent from exon and intron sequences. Furthermore, the deterministic structures of Alus of younger subfamilies show panlike shapes. As young Alus can be seen as mutation free copies from the “master genes,” it may be suggested that the deterministic structures of the older subfamilies are results of an evolution from a “panlike” structure to a more diffuse correlation pattern due to mutation.

  18. Introduction to the analysis of environmental sequences: metagenomics with MEGAN.

    PubMed

    Huson, Daniel H; Mitra, Suparna

    2012-01-01

    Metagenomics is the study of microbial organisms using sequencing applied directly to environmental samples. Similarly, in metatranscriptomics and metaproteomics, the RNA and protein sequences of such samples are studied. The analysis of these kinds of data often starts by asking the questions of "who is out there?", "what are they doing?", and "how do they compare?". In this chapter, we describe how these computational questions can be addressed using MEGAN, the MEtaGenome ANalyzer program. We first show how to analyze the taxonomic and functional content of a single dataset and then show how such analyses can be performed in a comparative fashion. We demonstrate how to compare different datasets using ecological indices and other distance measures. The discussion is conducted using a number of published marine datasets comprising metagenomic, metatranscriptomic, metaproteomic, and 16S rRNA data.

  19. Cladistic analysis of iridoviruses based on protein and DNA sequences.

    PubMed

    Wang, J W; Deng, R Q; Wang, X Z; Huang, Y S; Xing, K; Feng, J H; He, J G; Long, Q X

    2003-11-01

    Cladograms of iridoviruses were inferred from bootstrap analysis of molecular data sets comprising all published protein and DNA sequences of the major capsid protein, ATPase and DNA polymerase genes of members of the Iridoviridae family Iridovirus. All data sets yielded cladograms supporting the separation of the Iridovirus, Ranavirus and Lymphocystivirus genera, and the cladogram based on data derived from major capsid proteins further divided both the Iridovirus and Ranavirus genera into two groups. Tests of alternative hypotheses of topological constraints were also performed to further investigate relationships between infectious spleen and kidney necrosis virus (ISKNV), an unclassified fish iridovirus for which the complete genome sequence data is available, and other iridoviruses. Cladograms inferred and results of Shimodaira-Hasegawa tests indicated that ISKNV is more closely related to the Ranavirus genus than it is to the other genera of the family.

  20. ESTPiper – a web-based analysis pipeline for expressed sequence tags

    PubMed Central

    Tang, Zuojian; Choi, Jeong-Hyeon; Hemmerich, Chris; Sarangi, Ankita; Colbourne, John K; Dong, Qunfeng

    2009-01-01

    Background EST sequencing projects are increasing in scale and scope as the genome sequencing technologies migrate from core sequencing centers to individual research laboratories. Effectively, generating EST data is no longer a bottleneck for investigators. However, processing large amounts of EST data remains a non-trivial challenge for many. Web-based EST analysis tools are proving to be the most convenient option for biologists when performing their analysis, so these tools must continuously improve on their utility to keep in step with the growing needs of research communities. We have developed a web-based EST analysis pipeline called ESTPiper, which streamlines typical large-scale EST analysis components. Results The intuitive web interface guides users through each step of base calling, data cleaning, assembly, genome alignment, annotation, analysis of gene ontology (GO), and microarray oligonucleotide probe design. Each step is modularized. Therefore, a user can execute them separately or together in batch mode. In addition, the user has control over the parameters used by the underlying programs. Extensive documentation of ESTPiper's functionality is embedded throughout the web site to facilitate understanding of the required input and interpretation of the computational results. The user can also download intermediate results and port files to separate programs for further analysis. In addition, our server provides a time-stamped description of the run history for reproducibility. The pipeline can also be installed locally, allowing researchers to modify ESTPiper to suit their own needs. Conclusion ESTPiper streamlines the typical process of EST analysis. The pipeline was initially designed in part to support the Daphnia pulex cDNA sequencing project. A web server hosting ESTPiper is provided at to now support projects of all size. The software is also freely available from the authors for local installations. PMID:19383159

  1. Ensemble analysis of adaptive compressed genome sequencing strategies

    PubMed Central

    2014-01-01

    Background Acquiring genomes at single-cell resolution has many applications such as in the study of microbiota. However, deep sequencing and assembly of all of millions of cells in a sample is prohibitively costly. A property that can come to rescue is that deep sequencing of every cell should not be necessary to capture all distinct genomes, as the majority of cells are biological replicates. Biologically important samples are often sparse in that sense. In this paper, we propose an adaptive compressed method, also known as distilled sensing, to capture all distinct genomes in a sparse microbial community with reduced sequencing effort. As opposed to group testing in which the number of distinct events is often constant and sparsity is equivalent to rarity of an event, sparsity in our case means scarcity of distinct events in comparison to the data size. Previously, we introduced the problem and proposed a distilled sensing solution based on the breadth first search strategy. We simulated the whole process which constrained our ability to study the behavior of the algorithm for the entire ensemble due to its computational intensity. Results In this paper, we modify our previous breadth first search strategy and introduce the depth first search strategy. Instead of simulating the entire process, which is intractable for a large number of experiments, we provide a dynamic programming algorithm to analyze the behavior of the method for the entire ensemble. The ensemble analysis algorithm recursively calculates the probability of capturing every distinct genome and also the expected total sequenced nucleotides for a given population profile. Our results suggest that the expected total sequenced nucleotides grows proportional to log of the number of cells and proportional linearly with the number of distinct genomes. The probability of missing a genome depends on its abundance and the ratio of its size over the maximum genome size in the sample. The modified resource

  2. Sequence analysis of mutations and translocations across breast cancer subtypes

    PubMed Central

    Banerji, Shantanu; Cibulskis, Kristian; Rangel-Escareno, Claudia; Brown, Kristin K.; Carter, Scott L.; Frederick, Abbie M.; Lawrence, Michael S.; Sivachenko, Andrey Y.; Sougnez, Carrie; Zou, Lihua; Cortes, Maria L.; Fernandez-Lopez, Juan C.; Peng, Shouyong; Ardlie, Kristin G.; Auclair, Daniel; Bautista-Piña, Veronica; Duke, Fujiko; Francis, Joshua; Jung, Joonil; Maffuz-Aziz, Antonio; Onofrio, Robert C.; Parkin, Melissa; Pho, Nam H.; Quintanar-Jurado, Valeria; Ramos, Alex H.; Rebollar-Vega, Rosa; Rodriguez-Cuevas, Sergio; Romero-Cordoba, Sandra L.; Schumacher, Steven E.; Stransky, Nicolas; Thompson, Kristin M.; Uribe-Figueroa, Laura; Baselga, Jose; Beroukhim, Rameen; Polyak, Kornelia; Sgroi, Dennis C.; Richardson, Andrea L.; Jimenez-Sanchez, Gerardo; Lander, Eric S.; Gabriel, Stacey B.; Garraway, Levi A.; Golub, Todd R.; Melendez-Zajgla, Jorge; Toker, Alex; Getz, Gad; Hidalgo-Miranda, Alfredo; Meyerson, Matthew

    2014-01-01

    Breast carcinoma is the leading cause of cancer-related mortality in women worldwide with an estimated 1.38 million new cases and 458,000 deaths in 2008 alone1. This malignancy represents a heterogeneous group of tumours with characteristic molecular features, prognosis, and responses to available therapy2–4. Recurrent somatic alterations in breast cancer have been described including mutations and copy number alterations, notably ERBB2 amplifications, the first successful therapy target defined by a genomic aberration5. Prior DNA sequencing studies of breast cancer genomes have revealed additional candidate mutations and gene rearrangements 6–10. Here we report the whole-exome sequences of DNA from 103 human breast cancers of diverse subtypes from patients in Mexico and Vietnam compared to matched-normal DNA, together with whole-genome sequences of 22 breast cancer/normal pairs. Beyond confirming recurrent somatic mutations in PIK3CA11, TP536, AKT112, GATA313, and MAP3K110, we discovered recurrent mutations in the CBFB transcription factor gene and deletions of its partner RUNX1. Furthermore, we have identified a recurrent MAGI3-AKT3 fusion enriched in triple-negative breast cancer lacking estrogen and progesterone receptors and ERBB2 expression. The Magi3-Akt3 fusion leads to constitutive activation of Akt kinase, which is abolished by treatment with an ATP-competitive Akt small-molecule inhibitor. PMID:22722202

  3. Additional challenges for uncertainty analysis in river engineering

    NASA Astrophysics Data System (ADS)

    Berends, Koen; Warmink, Jord; Hulscher, Suzanne

    2016-04-01

    the proposed intervention. The implicit assumption underlying such analysis is that both models are commensurable. We hypothesize that they are commensurable only to a certain extent. In an idealised study we have demonstrated that prediction performance loss should be expected with increasingly large engineering works. When accounting for parametric uncertainty of floodplain roughness in model identification, we see uncertainty bounds for predicted effects of interventions increase with increasing intervention scale. Calibration of these types of models therefore seems to have a shelf-life, beyond which calibration does not longer improves prediction. Therefore a qualification scheme for model use is required that can be linked to model validity. In this study, we characterize model use along three dimensions: extrapolation (using the model with different external drivers), extension (using the model for different output or indicators) and modification (using modified models). Such use of models is expected to have implications for the applicability of surrogating modelling for efficient uncertainty analysis as well, which is recommended for future research. Warmink, J. J.; Straatsma, M. W.; Huthoff, F.; Booij, M. J. & Hulscher, S. J. M. H. 2013. Uncertainty of design water levels due to combined bed form and vegetation roughness in the Dutch river Waal. Journal of Flood Risk Management 6, 302-318 . DOI: 10.1111/jfr3.12014

  4. Reproducible Analysis of Sequencing-Based RNA Structure Probing Data with User-Friendly Tools.

    PubMed

    Kielpinski, Lukasz Jan; Sidiropoulos, Nikolaos; Vinther, Jeppe

    2015-01-01

    RNA structure-probing data can improve the prediction of RNA secondary and tertiary structure and allow structural changes to be identified and investigated. In recent years, massive parallel sequencing has dramatically improved the throughput of RNA structure probing experiments, but at the same time also made analysis of the data challenging for scientists without formal training in computational biology. Here, we discuss different strategies for data analysis of massive parallel sequencing-based structure-probing data. To facilitate reproducible and standardized analysis of this type of data, we have made a collection of tools, which allow raw sequencing reads to be converted to normalized probing values using different published strategies. In addition, we also provide tools for visualization of the probing data in the UCSC Genome Browser and for converting RNA coordinates to genomic coordinates and vice versa. The collection is implemented as functions in the R statistical environment and as tools in the Galaxy platform, making them easily accessible for the scientific community. We demonstrate the usefulness of the collection by applying it to the analysis of sequencing-based hydroxyl radical probing data and comparing different normalization strategies.

  5. MEGA: A biologist-centric software for evolutionary analysis of DNA and protein sequences

    PubMed Central

    Kumar, Sudhir; Nei, Masatoshi; Dudley, Joel; Tamura, Koichiro

    2008-01-01

    The Molecular Evolutionary Genetics Analysis (MEGA) software is a desktop application designed for comparative analysis of homologous gene sequences either from multigene families or from different species with a special emphasis on inferring evolutionary relationships and patterns of DNA and protein evolution. In addition to the tools for statistical analysis of data, MEGA provides many convenient facilities for the assembly of sequence data sets from files or web-based repositories, and it includes tools for visual presentation of the results obtained in the form of interactive phylogenetic trees and evolutionary distance matrices. Here we discuss the motivation, design principles, and priorities that have shaped the development of MEGA. We also discuss how MEGA might evolve in the future to assist researchers in their growing need to analyze large dataset using new computational methods. PMID:18417537

  6. HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis

    PubMed Central

    David, Fabrice P. A.; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J.; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch. PMID:24475057

  7. MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences.

    PubMed

    Kumar, Sudhir; Nei, Masatoshi; Dudley, Joel; Tamura, Koichiro

    2008-07-01

    The Molecular Evolutionary Genetics Analysis (MEGA) software is a desktop application designed for comparative analysis of homologous gene sequences either from multigene families or from different species with a special emphasis on inferring evolutionary relationships and patterns of DNA and protein evolution. In addition to the tools for statistical analysis of data, MEGA provides many convenient facilities for the assembly of sequence data sets from files or web-based repositories, and it includes tools for visual presentation of the results obtained in the form of interactive phylogenetic trees and evolutionary distance matrices. Here we discuss the motivation, design principles and priorities that have shaped the development of MEGA. We also discuss how MEGA might evolve in the future to assist researchers in their growing need to analyze large data set using new computational methods.

  8. Kinetic analysis of microbial respiratory response to substrate addition

    NASA Astrophysics Data System (ADS)

    Blagodatskaya, Evgenia; Blagodatsky, Sergey; Yuyukina, Tatayna; Kuzyakov, Yakov

    2010-05-01

    Heterotrophic component of CO2 emitted from soil is mainly due to the respiratory activity of soil microorganisms. Field measurements of microbial respiration can be used for estimation of C-budget in soil, while laboratory estimation of respiration kinetics allows the elucidation of mechanisms of soil C sequestration. Physiological approaches based on 1) time-dependent or 2) substrate-dependent respiratory response of soil microorganisms decomposing the organic substrates allow to relate the functional properties of soil microbial community with decomposition rates of soil organic matter. We used a novel methodology combining (i) microbial growth kinetics and (ii) enzymes affinity to the substrate to show the shift in functional properties of the soil microbial community after amendments with substrates of contrasting availability. We combined the application of 14C labeled glucose as easily available C source to soil with natural isotope labeling of old and young soil SOM. The possible contribution of two processes: isotopic fractionation and preferential substrate utilization to the shifts in δ13C during SOM decomposition in soil after C3-C4 vegetation change was evaluated. Specific growth rate (µ) of soil microorganisms was estimated by fitting the parameters of the equation v(t) = A + B * exp(µ*t), to the measured CO2 evolution rate (v(t)) after glucose addition, and where A is the initial rate of non-growth respiration, B - initial rate of the growing fraction of total respiration. Maximal mineralization rate (Vmax), substrate affinity of microbial enzymes (Ks) and substrate availability (Sn) were determined by Michaelis-Menten kinetics. To study the effect of plant originated C on δ13C signature of SOM we compared the changes in isotopic composition of different C pools in C3 soil under grassland with C3-C4 soil where C4 plant Miscanthus giganteus was grown for 12 years on the plot after grassland. The shift in 13δ C caused by planting of M. giganteus

  9. In Silico Genome Comparison and Distribution Analysis of Simple Sequences Repeats in Cassava

    PubMed Central

    Vásquez, Andrea; López, Camilo

    2014-01-01

    We conducted a SSRs density analysis in different cassava genomic regions. The information obtained was useful to establish comparisons between cassava's SSRs genomic distribution and those of poplar, flax, and Jatropha. In general, cassava has a low SSR density (~50 SSRs/Mbp) and has a high proportion of pentanucleotides, (24,2 SSRs/Mbp). It was found that coding sequences have 15,5 SSRs/Mbp, introns have 82,3 SSRs/Mbp, 5′ UTRs have 196,1 SSRs/Mbp, and 3′ UTRs have 50,5 SSRs/Mbp. Through motif analysis of cassava's genome SSRs, the most abundant motif was AT/AT while in intron sequences and UTRs regions it was AG/CT. In addition, in coding sequences the motif AAG/CTT was also found to occur most frequently; in fact, it is the third most used codon in cassava. Sequences containing SSRs were classified according to their functional annotation of Gene Ontology categories. The identified SSRs here may be a valuable addition for genetic mapping and future studies in phylogenetic analyses and genomic evolution. PMID:25374887

  10. Infectious hypodermal and hematopoietic necrosis virus from Brazil: Sequencing, comparative analysis and PCR detection.

    PubMed

    Silva, Douglas C D; Nunes, Allan R D; Teixeira, Dárlio I A; Lima, João Paulo M S; Lanza, Daniel C F

    2014-08-30

    A 3739 nucleotide fragment of Infectious hypodermal and hematopoietic necrosis virus (IHHNV) from Brazil was amplified and sequenced. This fragment contains the entire coding sequences of viral proteins, the full 3' untranslated region (3'UTR) and a partial sequence of 5' untranslated region (5'UTR). The genome organization of IHHNV revealed the three typical major coding domains: a left ORF1 of 2001 bp that codes NS1, a left ORF2 (NS2) of 1091 bp that codes NS2 and a right ORF3 of 990 bp that codes VP. Nucleotide and amino acid sequences of the three viral proteins were compared with putative amino acid sequences of viruses reported from different regions. Comparisons among genomes from different geographic locations reveal 31 nucleotide regions that are 100% similar, distributed throughout the genome. An analysis of secondary structure of UTR regions, revealed regions with high probability to form hairpins, that may be involved in mechanisms of viral replication. Additionally, a maximum likelihood analysis indicates that Brazilian IHHNV belongs to lineage III, in the infectious IHHNV group, and is clustered with IHHNV isolates from Hawaii, China, Taiwan, Vietnam and South Korea. A new nested PCR targeting conserved nucleotide regions is proposed to detect IHHNV.

  11. Core genome conservation of Staphylococcus haemolyticus limits sequence based population structure analysis.

    PubMed

    Cavanagh, Jorunn Pauline; Klingenberg, Claus; Hanssen, Anne-Merethe; Fredheim, Elizabeth Aarag; Francois, Patrice; Schrenzel, Jacques; Flægstad, Trond; Sollid, Johanna Ericson

    2012-06-01

    The notoriously multi-resistant Staphylococcus haemolyticus is an emerging pathogen causing serious infections in immunocompromised patients. Defining the population structure is important to detect outbreaks and spread of antimicrobial resistant clones. Currently, the standard typing technique is pulsed-field gel electrophoresis (PFGE). In this study we describe novel molecular typing schemes for S. haemolyticus using multi locus sequence typing (MLST) and multi locus variable number of tandem repeats (VNTR) analysis. Seven housekeeping genes (MLST) and five VNTR loci (MLVF) were selected for the novel typing schemes. A panel of 45 human and veterinary S. haemolyticus isolates was investigated. The collection had diverse PFGE patterns (38 PFGE types) and was sampled over a 20 year-period from eight countries. MLST resolved 17 sequence types (Simpsons index of diversity [SID]=0.877) and MLVF resolved 14 repeat types (SID=0.831). We found a low sequence diversity. Phylogenetic analysis clustered the isolates in three (MLST) and one (MLVF) clonal complexes, respectively. Taken together, neither the MLST nor the MLVF scheme was suitable to resolve the population structure of this S. haemolyticus collection. Future MLVF and MLST schemes will benefit from addition of more variable core genome sequences identified by comparing different fully sequenced S. haemolyticus genomes. PMID:22484086

  12. A Primary Sequence Analysis of the ARGONAUTE Protein Family in Plants

    PubMed Central

    Rodríguez-Leal, Daniel; Castillo-Cobián, Amanda; Rodríguez-Arévalo, Isaac; Vielle-Calzada, Jean-Philippe

    2016-01-01

    Small RNA (sRNA)-mediated gene silencing represents a conserved regulatory mechanism controlling a wide diversity of developmental processes through interactions of sRNAs with proteins of the ARGONAUTE (AGO) family. On the basis of a large phylogenetic analysis that includes 206 AGO genes belonging to 23 plant species, AGO genes group into four clades corresponding to the phylogenetic distribution proposed for the ten family members of Arabidopsis thaliana. A primary analysis of the corresponding protein sequences resulted in 50 sequences of amino acids (blocks) conserved across their linear length. Protein members of the AGO4/6/8/9 and AGO1/10 clades are more conserved than members of the AGO5 and AGO2/3/7 clades. In addition to blocks containing components of the PIWI, PAZ, and DUF1785 domains, members of the AGO2/3/7 and AGO4/6/8/9 clades possess other consensus block sequences that are exclusive of members within these clades, suggesting unforeseen functional specialization revealed by their primary sequence. We also show that AGO proteins of animal and plant kingdoms share linear sequences of blocks that include motifs involved in posttranslational modifications such as those regulating AGO2 in humans and the PIWI protein AUBERGINE in Drosophila. Our results open possibilities for exploring new structural and functional aspects related to the evolution of AGO proteins within the plant kingdom, and their convergence with analogous proteins in mammals and invertebrates.

  13. Cloning and sequence analysis of candidate human natural killer-enhancing factor genes

    SciTech Connect

    Shau, H.; Butterfield, L.H.; Chiu, R.; Kim, A.

    1994-12-31

    A cytosol factor from human red blood cells enhances natural killer (NK) activity. This factor, termed NK-enhancing factor (NKEF), is a protein of 44000 M{sub r} consisting of two subunits of equal size linked by disulfide bonds. NKEF is expressed in the NK-sensitive erythroleukemic cell line K562. Using an antibody specific for NKEF as a probe for immunoblot screening, we isolated several clones from a {lambda}gt11 cDNA library of K562. Additional subcloning and sequencing revealed that the candidate NKEF cDNAs fell into one of two categories of closely related but non-identical genes, referred to as NKEF A and B. They are 88% identical in amino acid sequence and 71% identical in nucleotide sequence. Southern blot analysis suggests that there are two to three NKEF family members in the genome. Analysis of predicted amino acid sequences indicates that both NKEF A and B are cytosol proteins with several phosphorylation sites each, but that they have no glycosylation sites. They are significantly homologous to several other proteins from a wide variety of organisms ranging from prokaryotes to mammals, especially with regard to several well-conserved motifs within the amino acid sequences. The biological functions of these proteins in other species are mostly unknown, but some of them were reported to be induced by oxidative stress. Therefore, as well as for immunoregulation of NK activity, NKEF may be important for cells in coping with oxidative insults. 32 refs., 3 figs.

  14. A Primary Sequence Analysis of the ARGONAUTE Protein Family in Plants.

    PubMed

    Rodríguez-Leal, Daniel; Castillo-Cobián, Amanda; Rodríguez-Arévalo, Isaac; Vielle-Calzada, Jean-Philippe

    2016-01-01

    Small RNA (sRNA)-mediated gene silencing represents a conserved regulatory mechanism controlling a wide diversity of developmental processes through interactions of sRNAs with proteins of the ARGONAUTE (AGO) family. On the basis of a large phylogenetic analysis that includes 206 AGO genes belonging to 23 plant species, AGO genes group into four clades corresponding to the phylogenetic distribution proposed for the ten family members of Arabidopsis thaliana. A primary analysis of the corresponding protein sequences resulted in 50 sequences of amino acids (blocks) conserved across their linear length. Protein members of the AGO4/6/8/9 and AGO1/10 clades are more conserved than members of the AGO5 and AGO2/3/7 clades. In addition to blocks containing components of the PIWI, PAZ, and DUF1785 domains, members of the AGO2/3/7 and AGO4/6/8/9 clades possess other consensus block sequences that are exclusive of members within these clades, suggesting unforeseen functional specialization revealed by their primary sequence. We also show that AGO proteins of animal and plant kingdoms share linear sequences of blocks that include motifs involved in posttranslational modifications such as those regulating AGO2 in humans and the PIWI protein AUBERGINE in Drosophila. Our results open possibilities for exploring new structural and functional aspects related to the evolution of AGO proteins within the plant kingdom, and their convergence with analogous proteins in mammals and invertebrates. PMID:27635128

  15. A Primary Sequence Analysis of the ARGONAUTE Protein Family in Plants.

    PubMed

    Rodríguez-Leal, Daniel; Castillo-Cobián, Amanda; Rodríguez-Arévalo, Isaac; Vielle-Calzada, Jean-Philippe

    2016-01-01

    Small RNA (sRNA)-mediated gene silencing represents a conserved regulatory mechanism controlling a wide diversity of developmental processes through interactions of sRNAs with proteins of the ARGONAUTE (AGO) family. On the basis of a large phylogenetic analysis that includes 206 AGO genes belonging to 23 plant species, AGO genes group into four clades corresponding to the phylogenetic distribution proposed for the ten family members of Arabidopsis thaliana. A primary analysis of the corresponding protein sequences resulted in 50 sequences of amino acids (blocks) conserved across their linear length. Protein members of the AGO4/6/8/9 and AGO1/10 clades are more conserved than members of the AGO5 and AGO2/3/7 clades. In addition to blocks containing components of the PIWI, PAZ, and DUF1785 domains, members of the AGO2/3/7 and AGO4/6/8/9 clades possess other consensus block sequences that are exclusive of members within these clades, suggesting unforeseen functional specialization revealed by their primary sequence. We also show that AGO proteins of animal and plant kingdoms share linear sequences of blocks that include motifs involved in posttranslational modifications such as those regulating AGO2 in humans and the PIWI protein AUBERGINE in Drosophila. Our results open possibilities for exploring new structural and functional aspects related to the evolution of AGO proteins within the plant kingdom, and their convergence with analogous proteins in mammals and invertebrates.

  16. A Primary Sequence Analysis of the ARGONAUTE Protein Family in Plants

    PubMed Central

    Rodríguez-Leal, Daniel; Castillo-Cobián, Amanda; Rodríguez-Arévalo, Isaac; Vielle-Calzada, Jean-Philippe

    2016-01-01

    Small RNA (sRNA)-mediated gene silencing represents a conserved regulatory mechanism controlling a wide diversity of developmental processes through interactions of sRNAs with proteins of the ARGONAUTE (AGO) family. On the basis of a large phylogenetic analysis that includes 206 AGO genes belonging to 23 plant species, AGO genes group into four clades corresponding to the phylogenetic distribution proposed for the ten family members of Arabidopsis thaliana. A primary analysis of the corresponding protein sequences resulted in 50 sequences of amino acids (blocks) conserved across their linear length. Protein members of the AGO4/6/8/9 and AGO1/10 clades are more conserved than members of the AGO5 and AGO2/3/7 clades. In addition to blocks containing components of the PIWI, PAZ, and DUF1785 domains, members of the AGO2/3/7 and AGO4/6/8/9 clades possess other consensus block sequences that are exclusive of members within these clades, suggesting unforeseen functional specialization revealed by their primary sequence. We also show that AGO proteins of animal and plant kingdoms share linear sequences of blocks that include motifs involved in posttranslational modifications such as those regulating AGO2 in humans and the PIWI protein AUBERGINE in Drosophila. Our results open possibilities for exploring new structural and functional aspects related to the evolution of AGO proteins within the plant kingdom, and their convergence with analogous proteins in mammals and invertebrates. PMID:27635128

  17. DANPOS: dynamic analysis of nucleosome position and occupancy by sequencing.

    PubMed

    Chen, Kaifu; Xi, Yuanxin; Pan, Xuewen; Li, Zhaoyu; Kaestner, Klaus; Tyler, Jessica; Dent, Sharon; He, Xiangwei; Li, Wei

    2013-02-01

    Recent developments in next-generation sequencing have enabled whole-genome profiling of nucleosome organizations. Although several algorithms for inferring nucleosome position from a single experimental condition have been available, it remains a challenge to accurately define dynamic nucleosomes associated with environmental changes. Here, we report a comprehensive bioinformatics pipeline, DANPOS, explicitly designed for dynamic nucleosome analysis at single-nucleotide resolution. Using both simulated and real nucleosome data, we demonstrated that bias correction in preliminary data processing and optimal statistical testing significantly enhances the functional interpretation of dynamic nucleosomes. The single-nucleotide resolution analysis of DANPOS allows us to detect all three categories of nucleosome dynamics, such as position shift, fuzziness change, and occupancy change, using a uniform statistical framework. Pathway analysis indicates that each category is involved in distinct biological functions. We also analyzed the influence of sequencing depth and suggest that even 200-fold coverage is probably not enough to identify all the dynamic nucleosomes. Finally, based on nucleosome data from the human hematopoietic stem cells (HSCs) and mouse embryonic stem cells (ESCs), we demonstrated that DANPOS is also robust in defining functional dynamic nucleosomes, not only in promoters, but also in distal regulatory regions in the mammalian genome. PMID:23193179

  18. Integrated visual analysis of protein structures, sequences, and feature data

    PubMed Central

    2015-01-01

    Background To understand the molecular mechanisms that give rise to a protein's function, biologists often need to (i) find and access all related atomic-resolution 3D structures, and (ii) map sequence-based features (e.g., domains, single-nucleotide polymorphisms, post-translational modifications) onto these structures. Results To streamline these processes we recently developed Aquaria, a resource offering unprecedented access to protein structure information based on an all-against-all comparison of SwissProt and PDB sequences. In this work, we provide a requirements analysis for several frequently occuring tasks in molecular biology and describe how design choices in Aquaria meet these requirements. Finally, we show how the interface can be used to explore features of a protein and gain biologically meaningful insights in two case studies conducted by domain experts. Conclusions The user interface design of Aquaria enables biologists to gain unprecedented access to molecular structures and simplifies the generation of insight. The tasks involved in mapping sequence features onto structures can be conducted easier and faster using Aquaria. PMID:26329268

  19. Determining physical constraints in transcriptional initiationcomplexes using DNA sequence analysis

    SciTech Connect

    Shultzaberger, Ryan K.; Chiang, Derek Y.; Moses, Alan M.; Eisen,Michael B.

    2007-07-01

    Eukaryotic gene expression is often under the control ofcooperatively acting transcription factors whose binding is limited bystructural constraints. By determining these structural constraints, wecan understand the "rules" that define functional cooperativity.Conversely, by understanding the rules of binding, we can inferstructural characteristics. We have developed an information theory basedmethod for approximating the physical limitations of cooperativeinteractions by comparing sequence analysis to microarray expressiondata. When applied to the coordinated binding of the sulfur amino acidregulatory protein Met4 by Cbf1 and Met31, we were able to create acombinatorial model that can correctly identify Met4 regulatedgenes.

  20. Experience using web services for biological sequence analysis

    PubMed Central

    Attwood, Teresa; Chohan, Shahid Nadeem; Côté, Richard; Cudré-Mauroux, Philippe; Falquet, Laurent; Fernandes, Pedro; Finn, Robert D.; Hupponen, Taavi; Korpelainen, Eija; Labarga, Alberto; Laugraud, Aurelie; Lima, Tania; Pafilis, Evangelos; Pagni, Marco; Pettifer, Steve; Phan, Isabelle; Rahman, Nazim

    2008-01-01

    Programmatic access to data and tools through the web using so-called web services has an important role to play in bioinformatics. In this article, we discuss the most popular approaches based on SOAP/WS-I and REST and describe our, a cross section of the community, experiences with providing and using web services in the context of biological sequence analysis. We briefly review main technological approaches as well as best practice hints that are useful for both users and developers. Finally, syntactic and semantic data integration issues with multiple web services are discussed. PMID:18621748

  1. Experience using web services for biological sequence analysis.

    PubMed

    Stockinger, Heinz; Attwood, Teresa; Chohan, Shahid Nadeem; Côté, Richard; Cudré-Mauroux, Philippe; Falquet, Laurent; Fernandes, Pedro; Finn, Robert D; Hupponen, Taavi; Korpelainen, Eija; Labarga, Alberto; Laugraud, Aurelie; Lima, Tania; Pafilis, Evangelos; Pagni, Marco; Pettifer, Steve; Phan, Isabelle; Rahman, Nazim

    2008-11-01

    Programmatic access to data and tools through the web using so-called web services has an important role to play in bioinformatics. In this article, we discuss the most popular approaches based on SOAP/WS-I and REST and describe our, a cross section of the community, experiences with providing and using web services in the context of biological sequence analysis. We briefly review main technological approaches as well as best practice hints that are useful for both users and developers. Finally, syntactic and semantic data integration issues with multiple web services are discussed.

  2. Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis

    PubMed Central

    Ré, Miguel A.; Azad, Rajeev K.

    2014-01-01

    Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms. PMID:24728338

  3. Generalization of entropy based divergence measures for symbolic sequence analysis.

    PubMed

    Ré, Miguel A; Azad, Rajeev K

    2014-01-01

    Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms. PMID:24728338

  4. Isolation and sequence analysis of napin seed specific promoter from Iranian Rapeseed (Brassica napus L.).

    PubMed

    Sohrabi, Maryam; Zebarjadi, Alireza; Najaphy, Abdollah; Kahrizi, Danial

    2015-06-01

    Rapeseed (Brassica napus L.) has become an important crop during the last 30years. In addition to a high lipid level, the seeds also have a significant protein content, which constitutes 20-25% of the dry seed weight. The synthesis of storage proteins is primarily controlled at transcriptional level and seed-specific expression has been shown to be conferred upon the promoter regions of many storage protein genes. Napin is one of the main storage proteins in rapeseed(')s embryo that is produced in seed developing stage. Its promoter region located at 5' upstream of the napin gene has already been isolated (GenBank number, EU416279.1). In current research, seed-specific promoter (napin) of Iranian B. napus L. was isolated from the genomic DNA and cloned into pBI121 plant binary vector to use in future researches. For this purpose, the napin promoter was amplified by PCR method using specific primers, cloned in pSK(+) vector and sequenced. Sequencing analysis showed that the cloned promoter contained all of conserved motifs such as TATA box (TATAAA), RY repeats (CATGCA), dist-B (TCAAACACC) and prox-B elements (GCCACTTGTC), G-box (CACGTG) and CAAT Motifs, which constituted the seed-specific promoter activity and according to this analysis, the seed-specific promoter activity of cloned sequence was predicted. Based on sequence distances of nucleotide sequences, our sequence had the highest similarity (99.8%) whit B. napus sequence (with EU416279.1 accession number). Finally the promoter obtained might be interesting not only as a useful tool for biotechnological application but also for fundamental research.

  5. Isolation and sequence analysis of napin seed specific promoter from Iranian Rapeseed (Brassica napus L.).

    PubMed

    Sohrabi, Maryam; Zebarjadi, Alireza; Najaphy, Abdollah; Kahrizi, Danial

    2015-06-01

    Rapeseed (Brassica napus L.) has become an important crop during the last 30years. In addition to a high lipid level, the seeds also have a significant protein content, which constitutes 20-25% of the dry seed weight. The synthesis of storage proteins is primarily controlled at transcriptional level and seed-specific expression has been shown to be conferred upon the promoter regions of many storage protein genes. Napin is one of the main storage proteins in rapeseed(')s embryo that is produced in seed developing stage. Its promoter region located at 5' upstream of the napin gene has already been isolated (GenBank number, EU416279.1). In current research, seed-specific promoter (napin) of Iranian B. napus L. was isolated from the genomic DNA and cloned into pBI121 plant binary vector to use in future researches. For this purpose, the napin promoter was amplified by PCR method using specific primers, cloned in pSK(+) vector and sequenced. Sequencing analysis showed that the cloned promoter contained all of conserved motifs such as TATA box (TATAAA), RY repeats (CATGCA), dist-B (TCAAACACC) and prox-B elements (GCCACTTGTC), G-box (CACGTG) and CAAT Motifs, which constituted the seed-specific promoter activity and according to this analysis, the seed-specific promoter activity of cloned sequence was predicted. Based on sequence distances of nucleotide sequences, our sequence had the highest similarity (99.8%) whit B. napus sequence (with EU416279.1 accession number). Finally the promoter obtained might be interesting not only as a useful tool for biotechnological application but also for fundamental research. PMID:25797503

  6. Facile Analysis and Sequencing of Linear and Branched Peptide Boronic Acids by MALDI Mass Spectrometry

    PubMed Central

    Crumpton, Jason; Zhang, Wenyu; Santos, Webster

    2011-01-01

    Interest in peptides incorporating boronic acid moieties is increasing due to their potential as therapeutics/diagnostics for a variety of diseases such as cancer. The utility of peptide boronic acids may be expanded with access to vast libraries that can be deconvoluted rapidly and economically. Unfortunately, current detection protocols using mass spectrometry are laborious and confounded by boronic acid trimerization, which requires time consuming analysis of dehydration products. These issues are exacerbated when the peptide sequence is unknown, as with de novo sequencing, and especially when multiple boronic acid moieties are present. Thus, a rapid, reliable and simple method for peptide identification is of utmost importance. Herein, we report the identification and sequencing of linear and branched peptide boronic acids containing up to five boronic acid groups by matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS). Protocols for preparation of pinacol boronic esters were adapted for efficient MALDI analysis of peptides. Additionally, a novel peptide boronic acid detection strategy was developed in which 2,5-dihydroxybenzoic acid (DHB) served as both matrix and derivatizing agent in a convenient, in situ, on-plate esterification. Finally, we demonstrate that DHB-modified peptide boronic acids from a single bead can be analyzed by MALDI-MSMS analysis, validating our approach for the identification and sequencing of branched peptide boronic acid libraries. PMID:21449540

  7. Review of Current Methods, Applications, and Data Management for the Bioinformatics Analysis of Whole Exome Sequencing

    PubMed Central

    Bao, Riyue; Huang, Lei; Andrade, Jorge; Tan, Wei; Kibbe, Warren A; Jiang, Hongmei; Feng, Gang

    2014-01-01

    The advent of next-generation sequencing technologies has greatly promoted advances in the study of human diseases at the genomic, transcriptomic, and epigenetic levels. Exome sequencing, where the coding region of the genome is captured and sequenced at a deep level, has proven to be a cost-effective method to detect disease-causing variants and discover gene targets. In this review, we outline the general framework of whole exome sequence data analysis. We focus on established bioinformatics tools and applications that support five analytical steps: raw data quality assessment, pre-processing, alignment, post-processing, and variant analysis (detection, annotation, and prioritization). We evaluate the performance of open-source alignment programs and variant calling tools using simulated and benchmark datasets, and highlight the challenges posed by the lack of concordance among variant detection tools. Based on these results, we recommend adopting multiple tools and resources to reduce false positives and increase the sensitivity of variant calling. In addition, we briefly discuss the current status and solutions for big data management, analysis, and summarization in the field of bioinformatics. PMID:25288881

  8. Genome Sequencing and Analysis of Catopsilia pomona nucleopolyhedrovirus: A Distinct Species in Group I Alphabaculovirus

    PubMed Central

    Wang, Jun; Zhu, Zheng; Zhang, Lei; Hou, Dianhai; Wang, Manli; Arif, Basil; Kou, Zheng; Wang, Hualin; Deng, Fei; Hu, Zhihong

    2016-01-01

    The genome sequence of Catopsilia pomona nucleopolyhedrovirus (CapoNPV) was determined by the Roche 454 sequencing system. The genome consisted of 128,058 bp and had an overall G+C content of 40%. There were 130 hypothetical open reading frames (ORFs) potentially encoding proteins of more than 50 amino acids and covering 92% of the genome. Among all the hypothetical ORFs, 37 baculovirus core genes, 23 lepidopteran baculovirus conserved genes and 10 genes conserved in Group I alphabaculoviruses were identified. In addition, the genome included regions of 8 typical baculoviral homologous repeat sequences (hrs). Phylogenic analysis showed that CapoNPV was in a distinct branch of clade “a” in Group I alphabaculoviruses. Gene parity plot analysis and overall similarity of ORFs indicated that CapoNPV is more closely related to the Group I alphabaculoviruses than to other baculoviruses. Interesting, CapoNPV lacks the genes encoding the fibroblast growth factor (fgf) and ac30, which are conserved in most lepidopteran and Group I baculoviruses, respectively. Sequence analysis of the F-like protein of CapoNPV showed that some amino acids were inserted into the fusion peptide region and the pre-transmembrane region of the protein. All these unique features imply that CapoNPV represents a member of a new baculovirus species. PMID:27166956

  9. Comparing Apples and Oranges?: Next Generation Sequencing and Its Impact on Microbiome Analysis

    PubMed Central

    Sleator, Roy D.; O’ Driscoll, Aisling; Stanton, Catherine; Cotter, Paul D.; Claesson, Marcus J.

    2016-01-01

    Rapid advancements in sequencing technologies along with falling costs present widespread opportunities for microbiome studies across a vast and diverse array of environments. These impressive technological developments have been accompanied by a considerable growth in the number of methodological variables, including sampling, storage, DNA extraction, primer pairs, sequencing technology, chemistry version, read length, insert size, and analysis pipelines, amongst others. This increase in variability threatens to compromise both the reproducibility and the comparability of studies conducted. Here we perform the first reported study comparing both amplicon and shotgun sequencing for the three leading next-generation sequencing technologies. These were applied to six human stool samples using Illumina HiSeq, MiSeq and Ion PGM shotgun sequencing, as well as amplicon sequencing across two variable 16S rRNA gene regions. Notably, we found that the factor responsible for the greatest variance in microbiota composition was the chosen methodology rather than the natural inter-individual variance, which is commonly one of the most significant drivers in microbiome studies. Amplicon sequencing suffered from this to a large extent, and this issue was particularly apparent when the 16S rRNA V1-V2 region amplicons were sequenced with MiSeq. Somewhat surprisingly, the choice of taxonomic binning software for shotgun sequences proved to be of crucial importance with even greater discriminatory power than sequencing technology and choice of amplicon. Optimal N50 assembly values for the HiSeq was obtained for 10 million reads per sample, whereas the applied MiSeq and PGM sequencing depths proved less sufficient for shotgun sequencing of stool samples. The latter technologies, on the other hand, provide a better basis for functional gene categorisation, possibly due to their longer read lengths. Hence, in addition to highlighting methodological biases, this study demonstrates the

  10. Comparing Apples and Oranges?: Next Generation Sequencing and Its Impact on Microbiome Analysis.

    PubMed

    Clooney, Adam G; Fouhy, Fiona; Sleator, Roy D; O' Driscoll, Aisling; Stanton, Catherine; Cotter, Paul D; Claesson, Marcus J

    2016-01-01

    Rapid advancements in sequencing technologies along with falling costs present widespread opportunities for microbiome studies across a vast and diverse array of environments. These impressive technological developments have been accompanied by a considerable growth in the number of methodological variables, including sampling, storage, DNA extraction, primer pairs, sequencing technology, chemistry version, read length, insert size, and analysis pipelines, amongst others. This increase in variability threatens to compromise both the reproducibility and the comparability of studies conducted. Here we perform the first reported study comparing both amplicon and shotgun sequencing for the three leading next-generation sequencing technologies. These were applied to six human stool samples using Illumina HiSeq, MiSeq and Ion PGM shotgun sequencing, as well as amplicon sequencing across two variable 16S rRNA gene regions. Notably, we found that the factor responsible for the greatest variance in microbiota composition was the chosen methodology rather than the natural inter-individual variance, which is commonly one of the most significant drivers in microbiome studies. Amplicon sequencing suffered from this to a large extent, and this issue was particularly apparent when the 16S rRNA V1-V2 region amplicons were sequenced with MiSeq. Somewhat surprisingly, the choice of taxonomic binning software for shotgun sequences proved to be of crucial importance with even greater discriminatory power than sequencing technology and choice of amplicon. Optimal N50 assembly values for the HiSeq was obtained for 10 million reads per sample, whereas the applied MiSeq and PGM sequencing depths proved less sufficient for shotgun sequencing of stool samples. The latter technologies, on the other hand, provide a better basis for functional gene categorisation, possibly due to their longer read lengths. Hence, in addition to highlighting methodological biases, this study demonstrates the

  11. Comparing Apples and Oranges?: Next Generation Sequencing and Its Impact on Microbiome Analysis.

    PubMed

    Clooney, Adam G; Fouhy, Fiona; Sleator, Roy D; O' Driscoll, Aisling; Stanton, Catherine; Cotter, Paul D; Claesson, Marcus J

    2016-01-01

    Rapid advancements in sequencing technologies along with falling costs present widespread opportunities for microbiome studies across a vast and diverse array of environments. These impressive technological developments have been accompanied by a considerable growth in the number of methodological variables, including sampling, storage, DNA extraction, primer pairs, sequencing technology, chemistry version, read length, insert size, and analysis pipelines, amongst others. This increase in variability threatens to compromise both the reproducibility and the comparability of studies conducted. Here we perform the first reported study comparing both amplicon and shotgun sequencing for the three leading next-generation sequencing technologies. These were applied to six human stool samples using Illumina HiSeq, MiSeq and Ion PGM shotgun sequencing, as well as amplicon sequencing across two variable 16S rRNA gene regions. Notably, we found that the factor responsible for the greatest variance in microbiota composition was the chosen methodology rather than the natural inter-individual variance, which is commonly one of the most significant drivers in microbiome studies. Amplicon sequencing suffered from this to a large extent, and this issue was particularly apparent when the 16S rRNA V1-V2 region amplicons were sequenced with MiSeq. Somewhat surprisingly, the choice of taxonomic binning software for shotgun sequences proved to be of crucial importance with even greater discriminatory power than sequencing technology and choice of amplicon. Optimal N50 assembly values for the HiSeq was obtained for 10 million reads per sample, whereas the applied MiSeq and PGM sequencing depths proved less sufficient for shotgun sequencing of stool samples. The latter technologies, on the other hand, provide a better basis for functional gene categorisation, possibly due to their longer read lengths. Hence, in addition to highlighting methodological biases, this study demonstrates the

  12. On-line procedures for alkylation of cysteine residues with 3-bromopropylamine prior to protein sequence analysis.

    PubMed

    Jue, R A; Hale, J E

    1994-09-01

    We have previously shown that 3-bromopropylamine offers several advantages over other alkylating reagents in the modification and subsequent identification of cysteine residues by protein sequencing. We describe here simple on-sequencer procedures for alkylating cysteines in proteins which employ the reduction of cystines in proteins with tri-n-butylphosphine and concomitant alkylation of the resulting cysteines with 3-bromopropylamine. Addition of an aqueous acetone wash to a modified reaction cycle on the Applied Biosystems 477A sequencer removes excess 3-bromopropylamine. As a result, very little background in the first step of the sequence analysis is seen. Under these conditions, cysteines are readily modified and identified during sequencing. Moreover, very little preview of the next amino acid is observed, which indicates that the N-terminal amino acid is not appreciably alkylated by 3-bromopropylamine. On-sequencer methods have been developed for proteins spotted onto glass fiber filters and proteins electroblotted onto polyvinylidene difluoride membranes.

  13. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    PubMed Central

    Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461

  14. Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud.

    PubMed

    Griffith, Malachi; Walker, Jason R; Spies, Nicholas C; Ainscough, Benjamin J; Griffith, Obi L

    2015-08-01

    Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki.

  15. Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud

    PubMed Central

    Griffith, Malachi; Walker, Jason R.; Spies, Nicholas C.; Ainscough, Benjamin J.; Griffith, Obi L.

    2015-01-01

    Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki. PMID:26248053

  16. Deep Sequencing Analysis of the Ixodes ricinus Haemocytome

    PubMed Central

    Franta, Zdeněk; Pedra, Joao H. F.; Ribeiro, José M. C.

    2015-01-01

    Background Ixodes ricinus is the main tick vector of the microbes that cause Lyme disease and tick-borne encephalitis in Europe. Pathogens transmitted by ticks have to overcome innate immunity barriers present in tick tissues, including midgut, salivary glands epithelia and the hemocoel. Molecularly, invertebrate immunity is initiated when pathogen recognition molecules trigger serum or cellular signalling cascades leading to the production of antimicrobials, pathogen opsonization and phagocytosis. We presently aimed at identifying hemocyte transcripts from semi-engorged female I. ricinus ticks by mass sequencing a hemocyte cDNA library and annotating immune-related transcripts based on their hemocyte abundance as well as their ubiquitous distribution. Methodology/principal findings De novo assembly of 926,596 pyrosequence reads plus 49,328,982 Illumina reads (148 nt length) from a hemocyte library, together with over 189 million Illumina reads from salivary gland and midgut libraries, generated 15,716 extracted coding sequences (CDS); these are displayed in an annotated hyperlinked spreadsheet format. Read mapping allowed the identification and annotation of tissue-enriched transcripts. A total of 327 transcripts were found significantly over expressed in the hemocyte libraries, including those coding for scavenger receptors, antimicrobial peptides, pathogen recognition proteins, proteases and protease inhibitors. Vitellogenin and lipid metabolism transcription enrichment suggests fat body components. We additionally annotated ubiquitously distributed transcripts associated with immune function, including immune-associated signal transduction proteins and transcription factors, including the STAT transcription factor. Conclusions/significance This is the first systems biology approach to describe the genes expressed in the haemocytes of this neglected disease vector. A total of 2,860 coding sequences were deposited to GenBank, increasing to 27,547 the number so

  17. Reverse transcriptase domain sequences from tree peony (Paeonia suffruticosa) long terminal repeat retrotransposons: sequence characterization and phylogenetic analysis

    PubMed Central

    Guo, Da-Long; Hou, Xiao-Gai; Jia, Tian

    2014-01-01

    Tree peony is an important horticultural plant worldwide of great ornamental and medicinal value. Long terminal repeat retrotransposons (LTR-retrotransposons) are the major components of most plant genomes and can substantially impact the genome in many ways. It is therefore crucial to understand their sequence characteristics, genetic distribution and transcriptional activity; however, no information about them is available in tree peony. Ty1-copia-like reverse transcriptase sequences were amplified from tree peony genomic DNA by polymerase chain reaction (PCR) with degenerate oligonucleotide primers corresponding to highly conserved domains of the Ty1-copia-like retrotransposons in this study. PCR fragments of roughly 270 bp were isolated and cloned, and 33 sequences were obtained. According to alignment and phylogenetic analysis, all sequences were divided into six families. The observed difference in the degree of nucleotide sequence similarity is an indication for high level of sequence heterogeneity among these clones. Most of these sequences have a frame shift, a stop codon, or both. Dot-blot analysis revealed distribution of these sequences in all the studied tree peony species. However, different hybridization signals were detected among them, which is in agreement with previous systematics studies. Reverse transcriptase PCR (RT-PCR) indicated that Ty1-copia retrotransposons in tree peony were transcriptionally inactive. The results provide basic genetic and evolutionary information of tree peony genome, and will provide valuable information for the further utilization of retrotransposons in tree peony. PMID:26019529

  18. Analysis of expressed sequence tags from Plasmodium falciparum.

    PubMed

    Chakrabarti, D; Reddy, G R; Dame, J B; Almira, E C; Laipis, P J; Ferl, R J; Yang, T P; Rowe, T C; Schuster, S M

    1994-07-01

    An initiative was undertaken to sequence all genes of the human malaria parasite Plasmodium falciparum in an effort to gain a better understanding at the molecular level of the parasite that inflicts much suffering in the developing world. 550 random complimentary DNA clones were partially sequenced from the intraerythrocytic form of the parasite as one of the approaches to analyze the transcribed sequences of its genome. The sequences, after editing, generated 389 expressed sequence tag sites and over 105 kb of DNA sequences. About 32% of these clones showed significant homology with other genes in the database. These clones represent 340 new Plasmodium falciparum expressed sequence tags.

  19. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data

    PubMed Central

    Thiel, William H

    2016-01-01

    Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment). High-throughput sequencing (HTS) revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs.

  20. Sequence characterization and comparative analysis of the gastrotropin gene in buffalo (Bubalus bubalis).

    PubMed

    Stafuzza, N B; Borges, M M; Amaral-Trusty, M E J

    2014-01-01

    In this study, we compared the complete sequence of the FABP6 gene from an animal representing the Murrah breed of the river buffalo (Bubalus bubalis) with the gene sequence from different mammals. The buffalo FABP6 gene is 6105 bp in length and is organized into four exons (67, 176, 90, and 54 bp), three introns (1167, 1737, and 2649 bp), a 5ꞌUTR (93 bp), and a 3ꞌUTR (72 bp). A total of 22 repetitive elements were identified at the intronic level, and four of these (L1MC, L1M5, MIRb, and Charlie4z) were identified as being exclusive to buffalo. Comparative analysis between the FABP6 gene coding sequence and the amino acid sequence with its homologues from other mammalian species showed a percentage of identity varying from 79 to 98% at the DNA coding level and 70 to 96% at the amino acid level. In addition, the alignment of the gene sequence between the Murrah and the Mediterranean breeds revealed 20 potential single nucleotide polymorphisms, which could be candidates for validation in commercial buffalo populations. PMID:25526214

  1. Radar image sequence analysis of inhomogeneous water surfaces

    NASA Astrophysics Data System (ADS)

    Seemann, Joerg; Senet, Christian M.; Dankert, Heiko; Hatten, Helge; Ziemer, Friedwart

    1999-10-01

    The radar backscatter from the ocean surface, called sea clutter, is modulated by the surface wave field. A method was developed to estimate the near-surface current, the water depth and calibrated surface wave spectra from nautical radar image sequences. The algorithm is based on the three- dimensional Fast Fourier Transformation (FFT) of the spatio- temporal sea clutter pattern in the wavenumber-frequency domain. The dispersion relation is used to define a filter to separate the spectral signal of the imaged waves from the background noise component caused by speckle noise. The signal-to-noise ratio (SNR) contains information about the significant wave height. The method has been proved to be reliable for the analysis of homogeneous water surfaces in offshore installations. Radar images are inhomogeneous because of the dependency of the image transfer function (ITF) on the azimuth angle between the wave propagation and the antenna viewing direction. The inhomogeneity of radar imaging is analyzed using image sequences of a homogeneous deep-water surface sampled by a ship-borne radar. Changing water depths in shallow-water regions induce horizontal gradients of the tidal current. Wave refraction occurs due to the spatial variability of the current and water depth. These areas cannot be investigated with the standard method. A new method, based on local wavenumber estimation with the multiple-signal classification (MUSIC) algorithm, is outlined. The MUSIC algorithm provides superior wavenumber resolution on local spatial scales. First results, retrieved from a radar image sequence taken from an installation at a coastal site, are presented.

  2. Development of a merged conjugate addition/oxidative coupling sequence. Application to the enantioselective total synthesis of metacycloprodigiosin and prodigiosin R1.

    PubMed

    Clift, Michael D; Thomson, Regan J

    2009-10-14

    A merged conjugate addition/oxidative coupling sequence that represents an efficient strategy for preparing structurally diverse pyrroles has been developed. Success of the method hinged upon the controlled oxidative coupling of unsymmetrical silyl bis-enol ether intermediates, formed by the 1,4-addition of a Grignard reagent with subsequent enolate trapping by a (chloro)silylenol ether. The process was applied to the first enantioselective syntheses of the biologically active pyrrolophane natural products, metacycloprodigiosin and prodigiosin R1.

  3. Sequence-independent amplification coupled with DNA microarray analysis for detection and genotyping of noroviruses.

    PubMed

    Hu, Yuan; Yan, Huijun; Mammel, Mark; Chen, Haifeng

    2015-12-01

    Noroviruses (NoVs) have high levels of genetic sequence diversities, which lead to difficulties in designing robust universal primers to efficiently amplify specific viral genomes for molecular analysis. We here described the practicality of sequence-independent amplification combined with DNA microarray analysis for simultaneous detection and genotyping of human NoVs in fecal specimens. We showed that single primer isothermal linear amplification (Ribo-SPIA) of genogroup I (GI) and genogroup II (GII) NoVs could be run through the same amplification protocol without the need to design and use any virus-specific primers. Related virus could be subtyped by the unique pattern of hybridization with the amplified product to the microarray. By testing 22 clinical fecal specimens obtained from acute gastroenteritis cases as blinded samples, 2 were GI positive and 18 were GII positive as well as 2 negative for NoVs. A NoV GII positive specimen was also identified as having co-occurrence of hepatitis A virus. The study showed that there was 100 % concordance for positive NoV detection at genogroup level between the results of Ribo-SPIA/microarray and the phylogenetic analysis of viral sequences of the capsid gene. In addition, 85 % genotype agreement was observed for the new assay compared to the results of phylogenetic analysis. PMID:26556029

  4. Sequence-independent amplification coupled with DNA microarray analysis for detection and genotyping of noroviruses.

    PubMed

    Hu, Yuan; Yan, Huijun; Mammel, Mark; Chen, Haifeng

    2015-12-01

    Noroviruses (NoVs) have high levels of genetic sequence diversities, which lead to difficulties in designing robust universal primers to efficiently amplify specific viral genomes for molecular analysis. We here described the practicality of sequence-independent amplification combined with DNA microarray analysis for simultaneous detection and genotyping of human NoVs in fecal specimens. We showed that single primer isothermal linear amplification (Ribo-SPIA) of genogroup I (GI) and genogroup II (GII) NoVs could be run through the same amplification protocol without the need to design and use any virus-specific primers. Related virus could be subtyped by the unique pattern of hybridization with the amplified product to the microarray. By testing 22 clinical fecal specimens obtained from acute gastroenteritis cases as blinded samples, 2 were GI positive and 18 were GII positive as well as 2 negative for NoVs. A NoV GII positive specimen was also identified as having co-occurrence of hepatitis A virus. The study showed that there was 100 % concordance for positive NoV detection at genogroup level between the results of Ribo-SPIA/microarray and the phylogenetic analysis of viral sequences of the capsid gene. In addition, 85 % genotype agreement was observed for the new assay compared to the results of phylogenetic analysis.

  5. Bacterial Genomic Data Analysis in the Next-Generation Sequencing Era.

    PubMed

    Orsini, Massimiliano; Cuccuru, Gianmauro; Uva, Paolo; Fotia, Giorgio

    2016-01-01

    Bacterial genome sequencing is now an affordable choice for many laboratories for applications in research, diagnostic, and clinical microbiology. Nowadays, an overabundance of tools is available for genomic data analysis. However, tools differ for algorithms, languages, hardware requirements, and user interface, and combining them as it is necessary for sequence data interpretation often requires (bio)informatics skills which can be difficult to find in many laboratories. In addition, multiple data sources, as well as exceedingly large dataset sizes, and increasingly computational complexity further challenge the accessibility, reproducibility, and transparency of the entire process. In this chapter we will cover the main bioinformatics steps required for a complete bacterial genome analysis using next-generation sequencing data, from the raw sequence data to assembled and annotated genomes. All the tools described are available in the Orione framework ( http://orione.crs4.it ), which uniquely combines in a transparent way the most used open source bioinformatics tools for microbiology, allowing microbiologist without any specific hardware or informatics skill to conduct data-intensive computational analyses from quality control to microbial gene annotation. PMID:27115645

  6. Implementation of Cloud based Next Generation Sequencing data analysis in a clinical laboratory

    PubMed Central

    2014-01-01

    Background The introduction of next generation sequencing (NGS) has revolutionized molecular diagnostics, though several challenges remain limiting the widespread adoption of NGS testing into clinical practice. One such difficulty includes the development of a robust bioinformatics pipeline that can handle the volume of data generated by high-throughput sequencing in a cost-effective manner. Analysis of sequencing data typically requires a substantial level of computing power that is often cost-prohibitive to most clinical diagnostics laboratories. Findings To address this challenge, our institution has developed a Galaxy-based data analysis pipeline which relies on a web-based, cloud-computing infrastructure to process NGS data and identify genetic variants. It provides additional flexibility, needed to control storage costs, resulting in a pipeline that is cost-effective on a per-sample basis. It does not require the usage of EBS disk to run a sample. Conclusions We demonstrate the validation and feasibility of implementing this bioinformatics pipeline in a molecular diagnostics laboratory. Four samples were analyzed in duplicate pairs and showed 100% concordance in mutations identified. This pipeline is currently being used in the clinic and all identified pathogenic variants confirmed using Sanger sequencing further validating the software. PMID:24885806

  7. Whale song analyses using bioinformatics sequence analysis approaches

    NASA Astrophysics Data System (ADS)

    Chen, Yian A.; Almeida, Jonas S.; Chou, Lien-Siang

    2005-04-01

    Animal songs are frequently analyzed using discrete hierarchical units, such as units, themes and songs. Because animal songs and bio-sequences may be understood as analogous, bioinformatics analysis tools DNA/protein sequence alignment and alignment-free methods are proposed to quantify the theme similarities of the songs of false killer whales recorded off northeast Taiwan. The eighteen themes with discrete units that were identified in an earlier study [Y. A. Chen, masters thesis, University of Charleston, 2001] were compared quantitatively using several distance metrics. These metrics included the scores calculated using the Smith-Waterman algorithm with the repeated procedure; the standardized Euclidian distance and the angle metrics based on word frequencies. The theme classifications based on different metrics were summarized and compared in dendrograms using cluster analyses. The results agree with earlier classifications derived by human observation qualitatively. These methods further quantify the similarities among themes. These methods could be applied to the analyses of other animal songs on a larger scale. For instance, these techniques could be used to investigate song evolution and cultural transmission quantifying the dissimilarities of humpback whale songs across different seasons, years, populations, and geographic regions. [Work supported by SC Sea Grant, and Ilan County Government, Taiwan.

  8. Complete nucleotide sequence and transcriptional analysis of snakehead fish retrovirus.

    PubMed Central

    Hart, D; Frerichs, G N; Rambaut, A; Onions, D E

    1996-01-01

    The complete genome of the snakehead fish retrovirus has been cloned and sequenced, and its transcriptional profile in cell culture has been determined. The 11.2-kb provirus displays a complex expression pattern capable of encoding accessory proteins and is unique in the predicted location of the env initiation codon and signal peptide upstream of gag and the common splice donor site. The virus is distinguishable from all known retrovirus groups by the presence of an arginine tRNA primer binding site. The coding regions are highly divergent and show a number of unusual characteristics, including a large Gag coiled-coil region, a Pol domain of unknown function, and a long, lentiviral-like, Env cytoplasmic domain. Phylogenetic analysis of the Pol sequence emphasizes the divergent nature of the virus from the avian and mammalian retroviruses. The snakehead virus is also distinct from a previously characterized complex fish retrovirus, suggesting that discrete groups of these viruses have yet to be identified in the lower vertebrates. PMID:8648695

  9. Cloning and Sequence Analysis of Two Pseudomonas Flavoprotein Xenobiotic Reductases

    PubMed Central

    Blehert, David S.; Fox, Brian G.; Chambliss, Glenn H.

    1999-01-01

    The genes encoding flavin mononucleotide-containing oxidoreductases, designated xenobiotic reductases, from Pseudomonas putida II-B and P. fluorescens I-C that removed nitrite from nitroglycerin (NG) by cleavage of the nitroester bond were cloned, sequenced, and characterized. The P. putida gene, xenA, encodes a 39,702-Da monomeric, NAD(P)H-dependent flavoprotein that removes either the terminal or central nitro groups from NG and that reduces 2-cyclohexen-1-one but did not readily reduce 2,4,6-trinitrotoluene (TNT). The P. fluorescens gene, xenB, encodes a 37,441-Da monomeric, NAD(P)H-dependent flavoprotein that exhibits fivefold regioselectivity for removal of the central nitro group from NG and that transforms TNT but did not readily react with 2-cyclohexen-1-one. Heterologous expression of xenA and xenB was demonstrated in Escherichia coli DH5α. The transcription initiation sites of both xenA and xenB were identified by primer extension analysis. BLAST analyses conducted with the P. putida xenA and the P. fluorescens xenB sequences demonstrated that these genes are similar to several other bacterial genes that encode broad-specificity flavoprotein reductases. The prokaryotic flavoprotein reductases described herein likely shared a common ancestor with old yellow enzyme of yeast, a broad-specificity enzyme which may serve a detoxification role in antioxidant defense systems. PMID:10515912

  10. Complete nucleotide sequence and transcriptional analysis of snakehead fish retrovirus.

    PubMed

    Hart, D; Frerichs, G N; Rambaut, A; Onions, D E

    1996-06-01

    The complete genome of the snakehead fish retrovirus has been cloned and sequenced, and its transcriptional profile in cell culture has been determined. The 11.2-kb provirus displays a complex expression pattern capable of encoding accessory proteins and is unique in the predicted location of the env initiation codon and signal peptide upstream of gag and the common splice donor site. The virus is distinguishable from all known retrovirus groups by the presence of an arginine tRNA primer binding site. The coding regions are highly divergent and show a number of unusual characteristics, including a large Gag coiled-coil region, a Pol domain of unknown function, and a long, lentiviral-like, Env cytoplasmic domain. Phylogenetic analysis of the Pol sequence emphasizes the divergent nature of the virus from the avian and mammalian retroviruses. The snakehead virus is also distinct from a previously characterized complex fish retrovirus, suggesting that discrete groups of these viruses have yet to be identified in the lower vertebrates.

  11. Harmonic Analysis of Sedimentary Cyclic Sequences in Kansas, Midcontinent, USA

    USGS Publications Warehouse

    Merriam, D.F.; Robinson, J.E.

    1997-01-01

    Several stratigraphic sequences in the Upper Carboniferous (Pennsylvanian) in Kansas (Midcontinent, USA) were analyzed quantitatively for periodic repetitions. The sequences were coded by lithologic type into strings of datasets. The strings then were analyzed by an adaptation of a one-dimensional Fourier transform analysis and examined for evidence of periodicity. The method was tested using different states in coding to determine the robustness of the method and data. The most persistent response is in multiples of 8-10 ft (2.5-3.0 m) and probably is dependent on the depositional thickness of the original lithologic units. Other cyclicities occurred in multiples of the basic frequency of 8-10 with persistent ones at 22 and 30 feet (6.5-9.0 m) and large ones at 80 and 160 feet (25-50 m). These levels of thickness relate well to the basic cyclothem and megacyclothem as measured on outcrop. We propose that this approach is a suitable one for analyzing cyclic events in the stratigraphic record.

  12. Sequence and comparative analysis of Leuconostoc dairy bacteriophages.

    PubMed

    Kot, Witold; Hansen, Lars H; Neve, Horst; Hammer, Karin; Jacobsen, Susanne; Pedersen, Per D; Sørensen, Søren J; Heller, Knut J; Vogensen, Finn K

    2014-04-17

    Bacteriophages attacking Leuconostoc species may significantly influence the quality of the final product. There is however limited knowledge of this group of phages in the literature. We have determined the complete genome sequences of nine Leuconostoc bacteriophages virulent to either Leuconostoc mesenteroides or Leuconostoc pseudomesenteroides strains. The phages have dsDNA genomes with sizes ranging from 25.7 to 28.4 kb. Comparative genomics analysis helped classify the 9 phages into two classes, which correlates with the host species. High percentage of similarity within the classes on both nucleotide and protein levels was observed. Genome comparison also revealed very high conservation of the overall genomic organization between the classes. The genes were organized in functional modules responsible for replication, packaging, head and tail morphogenesis, cell lysis and regulation and modification, respectively. No lysogeny modules were detected. To our knowledge this report provides the first comparative genomic work done on Leuconostoc dairy phages.

  13. Sequence analysis of the Lactobacillus temperate phage Sha1.

    PubMed

    Yoon, Bo Hyun; Jang, Se Hwan; Chang, Hyo-Ihl

    2011-09-01

    Bacteriophage Sha1, a newly isolated temperate phage from a mitomycin-C-induced lysate of Lactobacillus plantarum isolated from Kimchi, has an isometric head (58 nm × 60 nm) and a long tail (259 nm × 11 nm). The double-strand DNA genome of the phage Sha1 was 41,726 base pairs (bp) long, with a G+C content of 40.61%. Bioinformatic analysis of Sha1 shows that this phage contains 58 putative open reading frames (ORFs). Sha1 can be classified as a member of the large family Siphoviridae by genomic structure and morphology. To our knowledge, this is the first report of genomic sequencing and characterization of temperate phage Sha1 from wild-type L. plantarum isolated from kimchi in Korea. PMID:21701917

  14. Comparative Topological Analysis of Neuronal Arbors via Sequence Representation and Alignment

    NASA Astrophysics Data System (ADS)

    Gillette, Todd Aaron

    neocortical pyramidal cell axons and rodent neocortical dendritic targeting interneurons to be substantially more asymmetric than perisomatic-targeting interneurons. With optimization techniques adapted from the field of genomic alignment, these methods compose a framework with the potential to be made orders of magnitude more efficient. Moreover, the framework is capable of handling expanded sequence representations that include additional branch features, enabling analysis of correspondence and joint conservation of various morphological characteristics.

  15. Human factors review for Severe Accident Sequence Analysis (SASA)

    SciTech Connect

    Krois, P.A.; Haas, P.M.; Manning, J.J.; Bovell, C.R.

    1984-01-01

    The paper will discuss work being conducted during this human factors review including: (1) support of the Severe Accident Sequence Analysis (SASA) Program based on an assessment of operator actions, and (2) development of a descriptive model of operator severe accident management. Research by SASA analysts on the Browns Ferry Unit One (BF1) anticipated transient without scram (ATWS) was supported through a concurrent assessment of operator performance to demonstrate contributions to SASA analyses from human factors data and methods. A descriptive model was developed called the Function Oriented Accident Management (FOAM) model, which serves as a structure for bridging human factors, operations, and engineering expertise and which is useful for identifying needs/deficiencies in the area of accident management. The assessment of human factors issues related to ATWS required extensive coordination with SASA analysts. The analysis was consolidated primarily to six operator actions identified in the Emergency Procedure Guidelines (EPGs) as being the most critical to the accident sequence. These actions were assessed through simulator exercises, qualitative reviews, and quantitative human reliability analyses. The FOAM descriptive model assumes as a starting point that multiple operator/system failures exceed the scope of procedures and necessitates a knowledge-based emergency response by the operators. The FOAM model provides a functionally-oriented structure for assembling human factors, operations, and engineering data and expertise into operator guidance for unconventional emergency responses to mitigate severe accident progression and avoid/minimize core degradation. Operators must also respond to potential radiological release beyond plant protective barriers. Research needs in accident management and potential uses of the FOAM model are described. 11 references, 1 figure.

  16. Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence

    PubMed Central

    Deakin, Claire T.; Deakin, Jeffrey J.; Ginn, Samantha L.; Young, Paul; Humphreys, David; Suter, Catherine M.; Alexander, Ian E.; Hallwirth, Claus V.

    2014-01-01

    Barcoded vectors are promising tools for investigating clonal diversity and dynamics in hematopoietic gene therapy. Analysis of clones marked with barcoded vectors requires accurate identification of potentially large numbers of individually rare barcodes, when the exact number, sequence identity and abundance are unknown. This is an inherently challenging application, and the feasibility of using contemporary next-generation sequencing technologies is unresolved. To explore this potential application empirically, without prior assumptions, we sequenced barcode libraries of known complexity. Libraries containing 1, 10 and 100 Sanger-sequenced barcodes were sequenced using an Illumina platform, with a 100-barcode library also sequenced using a SOLiD platform. Libraries containing 1 and 10 barcodes were distinguished from false barcodes generated by sequencing error by a several log-fold difference in abundance. In 100-barcode libraries, however, expected and false barcodes overlapped and could not be resolved by bioinformatic filtering and clustering strategies. In independent sequencing runs multiple false-positive barcodes appeared to be represented at higher abundance than known barcodes, despite their confirmed absence from the original library. Such errors, which potentially impact barcoding studies in an application-dependent manner, are consistent with the existence of both stochastic and systematic error, the mechanism of which is yet to be fully resolved. PMID:25013183

  17. Transcriptome analysis of Emiliania huxleyi cells grown under different conditions using high-throughput sequencing data

    NASA Astrophysics Data System (ADS)

    Andreson, R.; Anlauf, H.; Mackinder, L.; Iglesias-Rodriguez, D.; LaRoche, J.; Lenhard, B.

    2012-04-01

    Coccolithophores are ideal for studying genes responsible for biomineralization processes due to relatively small genome sizes, ability to grow in culture, and as a natural model system for measuring expression of calcification-related genes in two life stages. As the Emiliania huxleyi has several annotated calcification-related proteins, we have concentrated on analyzing its genes and promoter areas. Many recent studies have focused primarily on transcriptome analysis of E. huxleyi using nutrient-limited conditions to get more information about up-regulated genes involved in biomineralization and calcification processes. Although there are more than 100,000 EST sequences for E. huxleyi available from these projects in public databases, that data is often insufficient to identify the exact position of transcription start site (TSS) to perform precise analysis (nucleotide content, motif search) of core promoters and regulatory mechanisms in immediate flanking areas. ESTs are not ideal for these kinds of analyses because the standard technologies of producing 5' EST libraries do not guarantee that the exact 5' end of the transcript will be captured. To determine the extent and accurate positions of 5' ends of transcripts and therefore the positions of core promoters, Cap analysis of gene expression (CAGE) sequencing method was used for sequencing RNA of E. huxleyi in both stages, calcifying and non-calcifying. As an additional info, gene expression levels of RNA for 21 samples were retrieved with whole transcriptome shotgun sequencing (RNA-Seq). The collections of reads these methods produced were used to map and annotate genes on several samples and measure the RNA expression levels in different conditions. Although there are not much data available for close organisms, it is possible to compare these results with other species to find conserved regulatory mechanisms between genes related to calcification. Visualization tools allowing browsing of annotated genes

  18. Multiple Comparison Analysis of Two New Genomic Sequences of ILTV Strains from China with Other Strains from Different Geographic Regions.

    PubMed

    Zhao, Yan; Kong, Congcong; Wang, Yunfeng

    2015-01-01

    To date, twenty complete genome sequences of ILTV strains have been published in GenBank, including one strain from China, and nineteen strains from Australian and the United States. To investigate the genomic information on ILTVs from different geographic regions, two additional individual complete genome sequences of WG and K317 strains from China were determined. The genomes of WG and K317 strains were 153,505 and 153,639 bp in length, respectively. Alignments performed on the amino acid sequences of the twelve glycoproteins showed that 13 out of 116 mutational sites were present only among the Chinese strain WG and the Australian strains SA2 and A20. The phylogenetic tree analysis suggested that the WG strain established close relationships with the Australian strain SA2. The recombination events were detected and confirmed in different subregions of the WG strain with the sequences of SA2 and K317 strains as parental. In this study, two new complete genome sequences of Chinese ILTV strains were used in comparative analysis with other complete genome sequences of ILTV strains from China, the United States, and Australia. The analysis of genome comparison, phylogenetic trees, and recombination events showed close relationships among the Chinese strain WG and the Australian strains SA2. The information of the two new complete genome sequences from China will help to facilitate the analysis of phylogenetic relationships and the molecular differences among ILTV strains from different geographic regions.

  19. Multiple Comparison Analysis of Two New Genomic Sequences of ILTV Strains from China with Other Strains from Different Geographic Regions

    PubMed Central

    Zhao, Yan; Kong, Congcong; Wang, Yunfeng

    2015-01-01

    To date, twenty complete genome sequences of ILTV strains have been published in GenBank, including one strain from China, and nineteen strains from Australian and the United States. To investigate the genomic information on ILTVs from different geographic regions, two additional individual complete genome sequences of WG and K317 strains from China were determined. The genomes of WG and K317 strains were 153,505 and 153,639 bp in length, respectively. Alignments performed on the amino acid sequences of the twelve glycoproteins showed that 13 out of 116 mutational sites were present only among the Chinese strain WG and the Australian strains SA2 and A20. The phylogenetic tree analysis suggested that the WG strain established close relationships with the Australian strain SA2. The recombination events were detected and confirmed in different subregions of the WG strain with the sequences of SA2 and K317 strains as parental. In this study, two new complete genome sequences of Chinese ILTV strains were used in comparative analysis with other complete genome sequences of ILTV strains from China, the United States, and Australia. The analysis of genome comparison, phylogenetic trees, and recombination events showed close relationships among the Chinese strain WG and the Australian strains SA2. The information of the two new complete genome sequences from China will help to facilitate the analysis of phylogenetic relationships and the molecular differences among ILTV strains from different geographic regions. PMID:26186451

  20. [Sequencing and analysis of the complete genome sequence of WU polyomavirus in Fuzhou, China].

    PubMed

    Xiu, Wen-qiong; Shen, Xiao-na; Liu, Guang-hua; Xie, Jian-feng; Kang, Yu-lan; Wang, Mei-ai; Zhang, Wen-qing; Weng, Qi-zhu; Yan, Yan-sheng

    2011-03-01

    WU polyomavirus (WUPyV), a new member of the genus Polyomavirus in the family Polyomaviridae, is recently found in patients with respiratory tract infections. In our study, the complete genome of the two WUPyV isolates (FZ18, FZTF) were sequenced and deposited in GenBank (accession nos. FJ890981, FJ890982). The two sequences of the WUPyV isolates in this study varied little from each other. Compared with other complete genome sequences of WUPyV in GenBank (strain B0, S1-S4, CLFF, accession nos. EF444549, EF444550, EF444551, EF444552, EF444553, EU296475 respectively), the sequence length in nucleotides is 5228bp, 1bp shorter than the known sequences. The deleted base pair was at nucleotide position 4536 in the non-coding region of large T antigen (LTAg). The genome of the WUPyV encoded for five proteins. They were three capsid proteins: VP2, VP1, VP3 and LTAg, small T antigen (STAg), respectively. To investigate whether these nucleotide sequences had any unique features, we compared the genome sequence of the 2 WUPyV isolates in Fuzhou, China to those documented in the GenBank database by using PHYLIP software version 3.65 and the neighbor-joining method. The 2 WUPyV strains in our study were clustered together. Strain FZTF was more closed to the reference strain B0 of Australian than strain FZ18. PMID:21528542

  1. Increasing the Scale of Deep Sequencing Data Analysis with BioHDF

    SciTech Connect

    Smith, Todd

    2010-06-03

    Todd Smith of Geospiza discusses how BioHDF systems can be used with next generation DNA sequencing technologies on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  2. Dried blood spot analysis of creatinine with LC-MS/MS in addition to immunosuppressants analysis.

    PubMed

    Koster, Remco A; Greijdanus, Ben; Alffenaar, Jan-Willem C; Touw, Daan J

    2015-02-01

    In order to monitor creatinine levels or to adjust the dosage of renally excreted or nephrotoxic drugs, the analysis of creatinine in dried blood spots (DBS) could be a useful addition to DBS analysis. We developed a LC-MS/MS method for the analysis of creatinine in the same DBS extract that was used for the analysis of tacrolimus, sirolimus, everolimus, and cyclosporine A in transplant patients with the use of Whatman FTA DMPK-C cards. The method was validated using three different strategies: a seven-point calibration curve using the intercept of the calibration to correct for the natural presence of creatinine in reference samples, a one-point calibration curve at an extremely high concentration in order to diminish the contribution of the natural presence of creatinine, and the use of creatinine-[(2)H3] with an eight-point calibration curve. The validated range for creatinine was 120 to 480 μmol/L (seven-point calibration curve), 116 to 7000 μmol/L (1-point calibration curve), and 1.00 to 400.0 μmol/L for creatinine-[(2)H3] (eight-point calibration curve). The precision and accuracy results for all three validations showed a maximum CV of 14.0% and a maximum bias of -5.9%. Creatinine in DBS was found stable at ambient temperature and 32 °C for 1 week and at -20 °C for 29 weeks. Good correlations were observed between patient DBS samples and routine enzymatic plasma analysis and showed the capability of the DBS method to be used as an alternative for creatinine plasma measurement.

  3. Analysis of separate isolates of Bordetella pertussis repeated DNA sequences.

    PubMed

    McPheat, W L; Hanson, J H; Livey, I; Robertson, J S

    1989-06-01

    Two independent isolates of a Bordetella pertussis repeated DNA unit were sequenced and shown to be an insertion sequence element with five nucleotide differences between the two copies. The sequences were 1053 bp in length with near-perfect terminal inverted repeats of 28 bp, had three open reading frames, and were each flanked by short direct repeats. The two insertion sequences showed considerable homology to two other B. pertussis repeated DNA sequences reported recently: IS481 and a 530 bp repeated DNA unit. The B. pertussis insertion sequence would appear to comprise a group of closely related sequences differing mainly in flanking direct repeats and the terminal inverted repeats. The two isolates reported here, which were from the adenylate cyclase and agglutinogen 2 regions of the genome, were numbered IS48lvl and IS48lv2 respectively. PMID:2559151

  4. Data Analysis for Sequencing by Hybridization (SBH) Experiments

    1995-11-28

    SCORES is user friendly software designed to analyze data from SBH (Sequencing By Hybridization) experiments. In these ANL experiments DNA samples are spotted on a nylon membrane and hybridized with radioactivity labeled oligonucleotide probes. An image analysis program (DOTS) calculates a raw value for each DNA dot from images generated by the Molecular Dynamics Phosphorimager. SCORES reads in the DOTS output for each hybridization done for a particular filter. The data for each probe ismore » normalized against a mass probe and scaled properly. These values from 100 or more probes are then used to compute the distance (i.e., degree of similarity) between any two clones on the filter. These calculated distances define clusters of similar clones (cDNA)or contigs (genomic DNA). Histograms of the data at each stage of analysis to establish thresholds for further steps. SCORES generates various statistical tables to evaluate the quality of spotting, hybridization of filters, and of individual dots.« less

  5. Sequencing and analysis of a South Asian-Indian personal genome

    PubMed Central

    2012-01-01

    Background With over 1.3 billion people, India is estimated to contain three times more genetic diversity than does Europe. Next-generation sequencing technologies have facilitated the understanding of diversity by enabling whole genome sequencing at greater speed and lower cost. While genomes from people of European and Asian descent have been sequenced, only recently has a single male genome from the Indian subcontinent been published at sufficient depth and coverage. In this study we have sequenced and analyzed the genome of a South Asian Indian female (SAIF) from the Indian state of Kerala. Results We identified over 3.4 million SNPs in this genome including over 89,873 private variations. Comparison of the SAIF genome with several published personal genomes revealed that this individual shared ~50% of the SNPs with each of these genomes. Analysis of the SAIF mitochondrial genome showed that it was closely related to the U1 haplogroup which has been previously observed in Kerala. We assessed the SAIF genome for SNPs with health and disease consequences and found that the individual was at a higher risk for multiple sclerosis and a few other diseases. In analyzing SNPs that modulate drug response, we found a variation that predicts a favorable response to metformin, a drug used to treat diabetes. SNPs predictive of adverse reaction to warfarin indicated that the SAIF individual is not at risk for bleeding if treated with typical doses of warfarin. In addition, we report the presence of several additional SNPs of medical relevance. Conclusions This is the first study to report the complete whole genome sequence of a female from the state of Kerala in India. The availability of this complete genome and variants will further aid studies aimed at understanding genetic diversity, identifying clinically relevant changes and assessing disease burden in the Indian population. PMID:22938532

  6. Analysis of DNA structure and sequence requirements for Pseudomonas aeruginosa MutL endonuclease activity.

    PubMed

    Correa, Elisa M E; De Tullio, Luisina; Vélez, Pablo S; Martina, Mariana A; Argaraña, Carlos E; Barra, José L

    2013-12-01

    The hallmark of the mismatch repair system in bacterial and eukaryotic organisms devoid of MutH is the presence of a MutL homologue with endonuclease activity. The aim of this study was to analyse whether different DNA structures affect Pseudomonas aeruginosa MutL (PaMutL) endonuclease activity and to determine if a specific nucleotide sequence is required for this activity. Our results showed that PaMutL was able to nick covalently closed circular plasmids but not linear DNA at high ionic strengths, while the activity on linear DNA was only found below 60 mM salt. In addition, single strand DNA, ss/ds DNA boundaries and negatively supercoiling degree were not required for PaMutL nicking activity. Finally, the analysis of the incision sites revealed that PaMutL, as well as Bacillus thuringiensis MutL homologue, did not show DNA sequence specificity.

  7. What’s in the genome of a filamentous fungus? Analysis of the Neurospora genome sequence

    PubMed Central

    Mannhaupt, Gertrud; Montrone, Corinna; Haase, Dirk; Mewes, H. Werner; Aign, Verena; Hoheisel, Jörg D.; Fartmann, Berthold; Nyakatura, Gerald; Kempken, Frank; Maier, Josef; Schulte, Ulrich

    2003-01-01

    The German Neurospora Genome Project has assembled sequences from ordered cosmid and BAC clones of linkage groups II and V of the genome of Neurospora crassa in 13 and 12 contigs, respectively. Including additional sequences located on other linkage groups a total of 12 Mb were subjected to a manual gene extraction and annotation process. The genome comprises a small number of repetitive elements, a low degree of segmental duplications and very few paralogous genes. The analysis of the 3218 identified open reading frames provides a first overview of the protein equipment of a filamentous fungus. Significantly, N.crassa possesses a large variety of metabolic enzymes including a substantial number of enzymes involved in the degradation of complex substrates as well as secondary metabolism. While several of these enzymes are specific for filamentous fungi many are shared exclusively with prokaryotes. PMID:12655011

  8. Analysis of new microsatellite markers developed from reported sequences of Japanese flounder Paralichthys olivaceus

    NASA Astrophysics Data System (ADS)

    Yu, Haiyang; Jiang, Liming; Chen, Wei; Wang, Xubo; Wang, Zhigang; Zhang, Quanqi

    2010-12-01

    The expressed sequence tags (ESTs) of Japanese flounder, Paralichthys olivaceus, were selected from GenBank to identify simple sequence repeats (SSRs) or microsatellites. A bioinformatic analysis of 11111 ESTs identified 751 SSR-containing ESTs, including 440 dinucleotide, 254 trinucleotide, 53 tetranucleotide, 95 pentanucleotide and 40 hexanucleotide microsatellites respectively. The CA/TG and GA/TC repeats were the most abundant microsatellites. AT-rich types were predominant among trinucleotide and tetranucleotide microsatellites. PCR primers were designed to amplify 10 identified microsatellites loci. The PCR results from eight pairs of primers showed polymorphisms in wild populations. In 30 wild individuals, the mean observed and expected heterozygosities of these 8 polymorphic SSRs were 0.71 and 0.83 respectively and the average PIC value was 0.8. These microsatellite markers should prove to be a useful addition to the microsatellite markers that are now available for this species.

  9. DNA sequencing with capillary electrophoresis and single cell analysis with mass spectrometry

    SciTech Connect

    Fung, N.

    1998-03-27

    Since the first demonstration of the laser in the 1960`s, lasers have found numerous applications in analytical chemistry. In this work, two different applications are described, namely, DNA sequencing with capillary gel electrophoresis and single cell analysis with mass spectrometry. Two projects are described in which high-speed DNA separations with capillary gel electrophoresis were demonstrated. In the third project, flow cytometry and mass spectrometry were coupled via a laser vaporization/ionization interface and individual mammalian cells were analyzed. First, DNA Sanger fragments were separated by capillary gel electrophoresis. A separation speed of 20 basepairs per minute was demonstrated with a mixed poly(ethylene oxide) (PEO) sieving solution. In addition, a new capillary wall treatment protocol was developed in which bare (or uncoated) capillaries can be used in DNA sequencing. Second, a temperature programming scheme was used to separate DNA Sanger fragments. Third, flow cytometry and mass spectrometry were coupled with a laser vaporization/ionization interface.

  10. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    PubMed

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities.

  11. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    PubMed

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities. PMID:27006240

  12. Genome Sequencing and Analysis of the Biomass-Degrading Fungus Trichoderma reesei (syn. Hypocrea jecorina)

    SciTech Connect

    Martinez, Antonio D.; Berka, Randy; Henrissat, Bernard; Saloheimo, Markku; Arvas, Mikko; Baker, Scott E.; Chapman, Jaro d; Chertkov, Olga; Coutinho, Pedro M.; Cullen, Dan; Danchin, Etienne G.; Grigoriev, Igor V.; Harris, Paul; Jackson, Melissa ?.; kubicek, Christian P.; Han, Cliff F.; Ho, Isaac; Larrando, Luis F.; Lopez de Leon, Alfredo; Magnuson, Jon K.; Merino, Sandy; Misra, Monica; Nelson, Beth; Putnam, Nicholas; Robbertse, Barbara; Salamov, Asaf; Schmoll, Monika; Terry, Astrid ?.; Thayer, Nina; Westerholm-Parvinen, Ann; Schoch, Conrad L.; Yao, Jian ?.; Barbote, Ravi; Nelson, Mary Anne; Detter, Chris J.; Bruce, David; Kuske, Cheryl; Xie, Gary; Richardson, P. M.; Rokhsar, Daniel S.; Lucas, Susan; Rubin, Eddie M.; Dunn-Coleman, Nigel; Ward, Michael ?.; Brettin, T.

    2008-05-01

    A major thrust of the white biotechnology movement involves the development of enzyme systems which depolymerize biomass to simple sugars which are subsequently converted to sustainable biofuels (e.g., ethanol) and chemical intermediates. The fungus Trichoderma reesei (syn. Hypocrea jecorina) represents a paradigm for the industrial production of highly efficient cellulases and hemicellulases needed for hydrolysis of biomass polysaccharides. Herein we describe intriguing attributes of the T. reeseigenome in relation to the future of fuel biotechnology. The T. reesei genome sequence was derived using a whole genome shotgun approach combined with finishing work to generate an assembly comprising 89 scaffolds totaling 34 Mbp with few gaps. In total, 9,130 gene models were predicted using a combination of ab initio and sequence similarity-based methods and EST data. Considering the industrial utility and effectiveness of its enzymes, the T. reesei genome surprisingly encodes the fewest cellulases and hemicellulases of any fungus having the ability to hydrolyze plant cell wall polysaccharides and whose genome has been sequenced. Many genes encoding carbohydrate active enzymes are distributed non-randomly in groups or clusters that interestingly lie between regions of synteny with other Sordariomycetes. Additionally, the T. reesei genome contains a multitude of genes encoding biosynthetic pathways for secondary metabolites (possible antibacterial and antifungal compounds) which may promote successful competition and survival in the crowded and competitive soil habitat occupied by T. reesei. Our analysis coupled with the availability of genome sequence data provides a roadmap for construction of enhanced T. reesei strains for industrial applications.

  13. Analysis of the microbiome: Advantages of whole genome shotgun versus 16S amplicon sequencing.

    PubMed

    Ranjan, Ravi; Rani, Asha; Metwally, Ahmed; McGee, Halvor S; Perkins, David L

    2016-01-22

    The human microbiome has emerged as a major player in regulating human health and disease. Translational studies of the microbiome have the potential to indicate clinical applications such as fecal transplants and probiotics. However, one major issue is accurate identification of microbes constituting the microbiota. Studies of the microbiome have frequently utilized sequencing of the conserved 16S ribosomal RNA (rRNA) gene. We present a comparative study of an alternative approach using whole genome shotgun sequencing (WGS). In the present study, we analyzed the human fecal microbiome compiling a total of 194.1 × 10(6) reads from a single sample using multiple sequencing methods and platforms. Specifically, after establishing the reproducibility of our methods with extensive multiplexing, we compared: 1) The 16S rRNA amplicon versus the WGS method, 2) the Illumina HiSeq versus MiSeq platforms, 3) the analysis of reads versus de novo assembled contigs, and 4) the effect of shorter versus longer reads. Our study demonstrates that whole genome shotgun sequencing has multiple advantages compared with the 16S amplicon method including enhanced detection of bacterial species, increased detection of diversity and increased prediction of genes. In addition, increased length, either due to longer reads or the assembly of contigs, improved the accuracy of species detection.

  14. Analysis of the microbiome: Advantages of whole genome shotgun versus 16S amplicon sequencing.

    PubMed

    Ranjan, Ravi; Rani, Asha; Metwally, Ahmed; McGee, Halvor S; Perkins, David L

    2016-01-22

    The human microbiome has emerged as a major player in regulating human health and disease. Translational studies of the microbiome have the potential to indicate clinical applications such as fecal transplants and probiotics. However, one major issue is accurate identification of microbes constituting the microbiota. Studies of the microbiome have frequently utilized sequencing of the conserved 16S ribosomal RNA (rRNA) gene. We present a comparative study of an alternative approach using whole genome shotgun sequencing (WGS). In the present study, we analyzed the human fecal microbiome compiling a total of 194.1 × 10(6) reads from a single sample using multiple sequencing methods and platforms. Specifically, after establishing the reproducibility of our methods with extensive multiplexing, we compared: 1) The 16S rRNA amplicon versus the WGS method, 2) the Illumina HiSeq versus MiSeq platforms, 3) the analysis of reads versus de novo assembled contigs, and 4) the effect of shorter versus longer reads. Our study demonstrates that whole genome shotgun sequencing has multiple advantages compared with the 16S amplicon method including enhanced detection of bacterial species, increased detection of diversity and increased prediction of genes. In addition, increased length, either due to longer reads or the assembly of contigs, improved the accuracy of species detection. PMID:26718401

  15. Sequence-Level Analysis of the Major European Huntington Disease Haplotype.

    PubMed

    Lee, Jong-Min; Kim, Kyung-Hee; Shin, Aram; Chao, Michael J; Abu Elneel, Kawther; Gillis, Tammy; Mysore, Jayalakshmi Srinidhi; Kaye, Julia A; Zahed, Hengameh; Kratter, Ian H; Daub, Aaron C; Finkbeiner, Steven; Li, Hong; Roach, Jared C; Goodman, Nathan; Hood, Leroy; Myers, Richard H; MacDonald, Marcy E; Gusella, James F

    2015-09-01

    Huntington disease (HD) reflects the dominant consequences of a CAG-repeat expansion in HTT. Analysis of common SNP-based haplotypes has revealed that most European HD subjects have distinguishable HTT haplotypes on their normal and disease chromosomes and that ∼50% of the latter share the same major HD haplotype. We reasoned that sequence-level investigation of this founder haplotype could provide significant insights into the history of HD and valuable information for gene-targeting approaches. Consequently, we performed whole-genome sequencing of HD and control subjects from four independent families in whom the major European HD haplotype segregates with the disease. Analysis of the full-sequence-based HTT haplotype indicated that these four families share a common ancestor sufficiently distant to have permitted the accumulation of family-specific variants. Confirmation of new CAG-expansion mutations on this haplotype suggests that unlike most founders of human disease, the common ancestor of HD-affected families with the major haplotype most likely did not have HD. Further, availability of the full sequence data validated the use of SNP imputation to predict the optimal variants for capturing heterozygosity in personalized allele-specific gene-silencing approaches. As few as ten SNPs are capable of revealing heterozygosity in more than 97% of European HD subjects. Extension of allele-specific silencing strategies to the few remaining homozygous individuals is likely to be achievable through additional known SNPs and discovery of private variants by complete sequencing of HTT. These data suggest that the current development of gene-based targeting for HD could be extended to personalized allele-specific approaches in essentially all HD individuals of European ancestry. PMID:26320893

  16. Transcriptome Sequencing and Analysis of Leaf Tissue of Avicennia marina Using the Illumina Platform

    PubMed Central

    Zhang, Wanke; Huang, Rongfeng; Chen, Shouyi; Zheng, Yizhi

    2014-01-01

    Avicennia marina is a widely distributed mangrove species that thrives in high-salinity habitats. It plays a significant role in supporting coastal ecosystem and holds unique potential for studying molecular mechanisms underlying ecological adaptation. Despite and sometimes because of its numerous merits, this species is facing increasing pressure of exploitation and deforestation. Both study on adaptation mechanisms and conservation efforts necessitate more genomic resources for A. marina. In this study, we used Illumina sequencing of an A. marina foliar cDNA library to generate a transcriptome dataset for gene and marker discovery. We obtained 40 million high-quality reads and assembled them into 91,125 unigenes with a mean length of 463 bp. These unigenes covered most of the publicly available A. marina Sanger ESTs and greatly extended the repertoire of transcripts for this species. A total of 54,497 and 32,637 unigenes were annotated based on homology to sequences in the NCBI non-redundant and the Swiss-prot protein databases, respectively. Both Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis revealed some transcriptomic signatures of stress adaptation for this halophytic species. We also detected an extraordinary amount of transcripts derived from fungal endophytes and demonstrated the utility of transcriptome sequencing in surveying endophyte diversity without isolating them out of plant tissues. Additionally, we identified 3,423 candidate simple sequence repeats (SSRs) from 3,141 unigenes with a density of one SSR locus every 8.25 kb sequence. Our transcriptomic data will provide valuable resources for ecological, genetic and evolutionary studies in A. marina. PMID:25265387

  17. Targeted Sequencing and Meta-Analysis of Preterm Birth

    PubMed Central

    Schuster, Jessica; McGonnigal, Bethany; Dewan, Andrew; Padbury, James

    2016-01-01

    Understanding the genetic contribution(s) to the risk of preterm birth may lead to the development of interventions for treatment, prediction and prevention. Twin studies suggest heritability of preterm birth is 36–40%. Large epidemiological analyses support a primary maternal origin for recurrence of preterm birth, with little effect of paternal or fetal genetic factors. We exploited an “extreme phenotype” of preterm birth to leverage the likelihood of genetic discovery. We compared variants identified by targeted sequencing of women with 2–3 generations of preterm birth with term controls without history of preterm birth. We used a meta-genomic, bi-clustering algorithm to identify gene sets coordinately associated with preterm birth. We identified 33 genes including 217 variants from 5 modules that were significantly different between cases and controls. The most frequently identified and connected genes in the exome library were IGF1, ATM and IQGAP2. Likewise, SOS1, RAF1 and AKT3 were most frequent in the haplotype library. Additionally, SERPINB8, AZU1 and WASF3 showed significant differences in abundance of variants in the univariate comparison of cases and controls. The biological processes impacted by these gene sets included: cell motility, migration and locomotion; response to glucocorticoid stimulus; signal transduction; metabolic regulation and control of apoptosis. PMID:27163930

  18. Mapping and analysis of Caenorhabditis elegans transcription factor sequence specificities

    PubMed Central

    Narasimhan, Kamesh; Lambert, Samuel A; Yang, Ally WH; Riddell, Jeremy; Mnaimneh, Sanie; Zheng, Hong; Albu, Mihai; Najafabadi, Hamed S; Reece-Hoyes, John S; Fuxman Bass, Juan I; Walhout, Albertha JM; Weirauch, Matthew T; Hughes, Timothy R

    2015-01-01

    Caenorhabditis elegans is a powerful model for studying gene regulation, as it has a compact genome and a wealth of genomic tools. However, identification of regulatory elements has been limited, as DNA-binding motifs are known for only 71 of the estimated 763 sequence-specific transcription factors (TFs). To address this problem, we performed protein binding microarray experiments on representatives of canonical TF families in C. elegans, obtaining motifs for 129 TFs. Additionally, we predict motifs for many TFs that have DNA-binding domains similar to those already characterized, increasing coverage of binding specificities to 292 C. elegans TFs (∼40%). These data highlight the diversification of binding motifs for the nuclear hormone receptor and C2H2 zinc finger families and reveal unexpected diversity of motifs for T-box and DM families. Motif enrichment in promoters of functionally related genes is consistent with known biology and also identifies putative regulatory roles for unstudied TFs. DOI: http://dx.doi.org/10.7554/eLife.06967.001 PMID:25905672

  19. Improved Efficiency and Reliability of NGS Amplicon Sequencing Data Analysis for Genetic Diagnostic Procedures Using AGSA Software

    PubMed Central

    Poulet, Axel; Privat, Maud; Viala, Sandrine; Decousus, Stephanie; Perin, Axel; Lafarge, Laurence; Ollier, Marie; El Saghir, Nagi S.

    2016-01-01

    Screening for BRCA mutations in women with familial risk of breast or ovarian cancer is an ideal situation for high-throughput sequencing, providing large amounts of low cost data. However, 454, Roche, and Ion Torrent, Thermo Fisher, technologies produce homopolymer-associated indel errors, complicating their use in routine diagnostics. We developed software, named AGSA, which helps to detect false positive mutations in homopolymeric sequences. Seventy-two familial breast cancer cases were analysed in parallel by amplicon 454 pyrosequencing and Sanger dideoxy sequencing for genetic variations of the BRCA genes. All 565 variants detected by dideoxy sequencing were also detected by pyrosequencing. Furthermore, pyrosequencing detected 42 variants that were missed with Sanger technique. Six amplicons contained homopolymer tracts in the coding sequence that were systematically misread by the software supplied by Roche. Read data plotted as histograms by AGSA software aided the analysis considerably and allowed validation of the majority of homopolymers. As an optimisation, additional 250 patients were analysed using microfluidic amplification of regions of interest (Access Array Fluidigm) of the BRCA genes, followed by 454 sequencing and AGSA analysis. AGSA complements a complete line of high-throughput diagnostic sequence analysis, reducing time and costs while increasing reliability, notably for homopolymer tracts. PMID:27656653

  20. Improved Efficiency and Reliability of NGS Amplicon Sequencing Data Analysis for Genetic Diagnostic Procedures Using AGSA Software

    PubMed Central

    Poulet, Axel; Privat, Maud; Viala, Sandrine; Decousus, Stephanie; Perin, Axel; Lafarge, Laurence; Ollier, Marie; El Saghir, Nagi S.

    2016-01-01

    Screening for BRCA mutations in women with familial risk of breast or ovarian cancer is an ideal situation for high-throughput sequencing, providing large amounts of low cost data. However, 454, Roche, and Ion Torrent, Thermo Fisher, technologies produce homopolymer-associated indel errors, complicating their use in routine diagnostics. We developed software, named AGSA, which helps to detect false positive mutations in homopolymeric sequences. Seventy-two familial breast cancer cases were analysed in parallel by amplicon 454 pyrosequencing and Sanger dideoxy sequencing for genetic variations of the BRCA genes. All 565 variants detected by dideoxy sequencing were also detected by pyrosequencing. Furthermore, pyrosequencing detected 42 variants that were missed with Sanger technique. Six amplicons contained homopolymer tracts in the coding sequence that were systematically misread by the software supplied by Roche. Read data plotted as histograms by AGSA software aided the analysis considerably and allowed validation of the majority of homopolymers. As an optimisation, additional 250 patients were analysed using microfluidic amplification of regions of interest (Access Array Fluidigm) of the BRCA genes, followed by 454 sequencing and AGSA analysis. AGSA complements a complete line of high-throughput diagnostic sequence analysis, reducing time and costs while increasing reliability, notably for homopolymer tracts.

  1. Moment tensor analysis of the Central Italy Earthquake Sequence of September-October 1997

    NASA Astrophysics Data System (ADS)

    Ekström, Göran; Morelli, Andrea; Boschi, Enzo; Dziewonski, Adam M.

    The larger earthquakes in the Umbria-Marche (central Italy) seismic sequence of September-October 1997 are analyzed using long-period seismograms from the Mediterranean seismographic network (MedNet) and additional data from the global seismographic network (GSN). We modify the Harvard centroid-moment tensor (CMT) algorithm to allow moment tensor inversion of long-period waveforms, primarily Rayleigh and Love waves, for small earthquakes at local to regional distances (Δ<15°). For the three largest earthquakes (MW>5.5) in the sequence, moment tensors have previously been determined using teleseismic waveforms and standard methods of analysis; our results agree well with those of earlier studies. We determine additional moment tensors for the largest foreshock and 10 aftershocks with MW>4.2. The earthquakes are characterized by normal faulting mechanisms, with a NE-SW tension axis, and the presumed fault plane dips towards the SW. Only one of the fourteen events studied has a different faulting geometry, indicating instead right-lateral strike-slip faulting on a plane oriented approximately E-W, or left-lateral faulting on a plane oriented N-S. The September 26 mainshock (09:40 UT) accounts for only approximately ˜50% of the total moment release in the sequence.

  2. Indium-catalyzed, novel route to β,β-disubstituted indanones via tandem Nakamura addition-hydroarylation-decarboxylation sequence.

    PubMed

    Rajesh, Nimmakuri; Prajapati, Dipak

    2015-02-25

    A novel method for the construction of β,β-disubstituted indanones has been developed via tandem Nakamura addition-hydroarylation-decarboxylation process. Indium(III) triflate was demonstrated as a versatile multitasking catalyst, which catalyzes three different chemical transformations under one-pot conditions.

  3. Maintaining Respect for the Past and Flexibility for the Future: Additions and Renovations as an Integrated Sequence.

    ERIC Educational Resources Information Center

    Swedberg, Dan

    As an alternative to new construction or consolidation, many rural communities are considering the option of retaining their existing schools, upgrading them through renovations, and providing community-sensitive and effective additions as needed. The feeling of being connected to one's community can be enhanced by the continuity of community…

  4. Analysis of sequences conferring autonomous replication in baker's yeast.

    PubMed

    Kearsey, S

    1983-01-01

    A method is presented for rapid sequencing and mapping of elements which support autonomous replication in yeast. The strategy relies on a novel phage M13 vector which allows detection of ARS (autonomously replicating sequence) function in cloned fragments. Deletion mapping of an ARS element linked to the HO gene of Saccharomyces cerevisiae has identified a 57-bp region 3' to the gene, which is essential for autonomous replication. This region shows sequence homology to other ARS elements.

  5. CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

    PubMed Central

    2011-01-01

    Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105

  6. The MPI Bioinformatics Toolkit for protein sequence analysis

    PubMed Central

    Biegert, Andreas; Mayer, Christian; Remmert, Michael; Söding, Johannes; Lupas, Andrei N.

    2006-01-01

    The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized versions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit framework and the tools developed in-house will be packaged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at . PMID:16845021

  7. Automated synthesis and sequence analysis of biological macromolecules

    SciTech Connect

    Smith, L.M.

    1988-03-15

    The traditional distinctions between the fields of physics, chemistry, and biology have blurred with time. As the important questions in biological research have become increasingly detailed and molecular in nature, the techniques needed to answer these questions have drawn increasingly on principles and methods usually ascribed to the fields of physics and chemistry. This fusion has resulted in the instruments and chemistries that constitute the technological foundations of modern biology and that are critical components in the new methods responsible for the explosive growth of modern biology during the last decade. Many of these instruments, such as microscopes, and spectrophotometers, have existed for decades; however, technological advances such as the user of imaging methods in NMR have greatly expanded their power and versatility. In the past several years, a new generation of instruments, whose everyday use has had revolutionary consequences, has come into existence. Central among these are the instruments concerned with the synthesis and sequence analysis of the two major biopolymers, protein and DNA. This article contains descriptions of these instruments, the chemistries on which they are based, and some of their manifold applications.

  8. Predictive sequence analysis of the Candidatus Liberibacter asiaticus proteome.

    PubMed

    Cong, Qian; Kinch, Lisa N; Kim, Bong-Hyun; Grishin, Nick V

    2012-01-01

    Candidatus Liberibacter asiaticus (Ca. L. asiaticus) is a parasitic gram-negative bacterium that is closely associated with Huanglongbing (HLB), a worldwide citrus disease. Given the difficulty in culturing the bacterium and thus in its experimental characterization, computational analyses of the whole Ca. L. asiaticus proteome can provide much needed insights into the mechanisms of the disease and guide the development of treatment strategies. In this study, we applied state-of-the-art sequence analysis tools to every Ca. L. asiaticus protein. Our results are available as a public website at http://prodata.swmed.edu/liberibacter_asiaticus/. In particular, we manually curated the results to predict the subcellular localization, spatial structure and function of all Ca. L. asiaticus proteins (http://prodata.swmed.edu/liberibacter_asiaticus/curated/). This extensive information should facilitate the study of Ca. L. asiaticus proteome function and its relationship to disease. Pilot studies based on the information from our website have revealed several potential virulence factors, discussed herein. PMID:22815919

  9. Predictive Sequence Analysis of the Candidatus Liberibacter asiaticus Proteome

    PubMed Central

    Cong, Qian; Kinch, Lisa N.; Kim, Bong-Hyun; Grishin, Nick V.

    2012-01-01

    Candidatus Liberibacter asiaticus (Ca. L. asiaticus) is a parasitic Gram-negative bacterium that is closely associated with Huanglongbing (HLB), a worldwide citrus disease. Given the difficulty in culturing the bacterium and thus in its experimental characterization, computational analyses of the whole Ca. L. asiaticus proteome can provide much needed insights into the mechanisms of the disease and guide the development of treatment strategies. In this study, we applied state-of-the-art sequence analysis tools to every Ca. L. asiaticus protein. Our results are available as a public website at http://prodata.swmed.edu/liberibacter_asiaticus/. In particular, we manually curated the results to predict the subcellular localization, spatial structure and function of all Ca. L. asiaticus proteins (http://prodata.swmed.edu/liberibacter_asiaticus/curated/). This extensive information should facilitate the study of Ca. L. asiaticus proteome function and its relationship to disease. Pilot studies based on the information from our website have revealed several potential virulence factors, discussed herein. PMID:22815919

  10. The DNA sequence and analysis of human chromosome 14.

    PubMed

    Heilig, Roland; Eckenberg, Ralph; Petit, Jean-Louis; Fonknechten, Núria; Da Silva, Corinne; Cattolico, Laurence; Levy, Michaël; Barbe, Valérie; de Berardinis, Véronique; Ureta-Vidal, Abel; Pelletier, Eric; Vico, Virginie; Anthouard, Véronique; Rowen, Lee; Madan, Anup; Qin, Shizhen; Sun, Hui; Du, Hui; Pepin, Kymberlie; Artiguenave, François; Robert, Catherine; Cruaud, Corinne; Brüls, Thomas; Jaillon, Olivier; Friedlander, Lucie; Samson, Gaelle; Brottier, Philippe; Cure, Susan; Ségurens, Béatrice; Anière, Franck; Samain, Sylvie; Crespeau, Hervé; Abbasi, Nissa; Aiach, Nathalie; Boscus, Didier; Dickhoff, Rachel; Dors, Monica; Dubois, Ivan; Friedman, Cynthia; Gouyvenoux, Michel; James, Rose; Madan, Anuradha; Mairey-Estrada, Barbara; Mangenot, Sophie; Martins, Nathalie; Ménard, Manuela; Oztas, Sophie; Ratcliffe, Amber; Shaffer, Tristan; Trask, Barbara; Vacherie, Benoit; Bellemere, Chadia; Belser, Caroline; Besnard-Gonnet, Marielle; Bartol-Mavel, Delphine; Boutard, Magali; Briez-Silla, Stéphanie; Combette, Stephane; Dufossé-Laurent, Virginie; Ferron, Carolyne; Lechaplais, Christophe; Louesse, Claudine; Muselet, Delphine; Magdelenat, Ghislaine; Pateau, Emilie; Petit, Emmanuelle; Sirvain-Trukniewicz, Peggy; Trybou, Arnaud; Vega-Czarny, Nathalie; Bataille, Elodie; Bluet, Elodie; Bordelais, Isabelle; Dubois, Maria; Dumont, Corinne; Guérin, Thomas; Haffray, Sébastien; Hammadi, Rachid; Muanga, Jacqueline; Pellouin, Virginie; Robert, Dominique; Wunderle, Edith; Gauguet, Gilbert; Roy, Alice; Sainte-Marthe, Laurent; Verdier, Jean; Verdier-Discala, Claude; Hillier, LaDeana; Fulton, Lucinda; McPherson, John; Matsuda, Fumihiko; Wilson, Richard; Scarpelli, Claude; Gyapay, Gábor; Wincker, Patrick; Saurin, William; Quétier, Francis; Waterston, Robert; Hood, Leroy; Weissenbach, Jean

    2003-02-01

    Chromosome 14 is one of five acrocentric chromosomes in the human genome. These chromosomes are characterized by a heterochromatic short arm that contains essentially ribosomal RNA genes, and a euchromatic long arm in which most, if not all, of the protein-coding genes are located. The finished sequence of human chromosome 14 comprises 87,410,661 base pairs, representing 100% of its euchromatic portion, in a single continuous segment covering the entire long arm with no gaps. Two loci of crucial importance for the immune system, as well as more than 60 disease genes, have been localized so far on chromosome 14. We identified 1,050 genes and gene fragments, and 393 pseudogenes. On the basis of comparisons with other vertebrate genomes, we estimate that more than 96% of the chromosome 14 genes have been annotated. From an analysis of the CpG island occurrences, we estimate that 70% of these annotated genes are complete at their 5' end. PMID:12508121

  11. Sequence analysis of novel CYP4 transcripts from Mytilus galloprovincialis.

    PubMed

    Ravlić, Sanda; Žučko, Jurica; Tanković, Mirta Smodlaka; Fafanđel, Maja; Bihari, Nevenka

    2015-07-01

    Cytochrome P450 enzymes (CYPs) are essential components of cellular detoxification system. We identified and characterized seven new cytochrome P450 gene transcript clusters in the populations of bivalve mollusc Mytilus galloprovincialis from three different locations. The phylogenetic analysis identified all transcripts as clusters within the CYP4 branch. Identified clusters, each comprising a number of transcript variants, were designated CYP4Y1, Y2, Y3, Y4, Y5, Y6 and Y7. Transcript clusters CYP4Y2 and Y7, and CYP4Y5 and Y6 showed site specificity, while the transcript clusters CYP4Y1, Y3 and Y4 were present at all investigated locations. The comparison of transcripts deduced amino acid sequences with CYP4s from vertebrate and invertebrate species showed high conservation of the residues and domains essential to the putative function of the enzyme, as terminal ω-hydroxylation and prostaglandin hydroxylation. Our results suggest the great expansion of the CYP4Y cDNAs indicative of CYP4 proteins in the mussel M. galloprovincialis presumably as a response to different environmental conditions.

  12. The MPI Bioinformatics Toolkit for protein sequence analysis.

    PubMed

    Biegert, Andreas; Mayer, Christian; Remmert, Michael; Söding, Johannes; Lupas, Andrei N

    2006-07-01

    The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized versions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit framework and the tools developed in-house will be packaged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at http://toolkit.tuebingen.mpg.de.

  13. Complete genome sequence analysis of novel human bocavirus reveals genetic recombination between human bocavirus 2 and human bocavirus 4.

    PubMed

    Khamrin, Pattara; Okitsu, Shoko; Ushijima, Hiroshi; Maneekarn, Niwat

    2013-07-01

    Epidemiological surveillance of human bocavirus (HBoV) was conducted on fecal specimens collected from hospitalized children with diarrhea in Chiang Mai, Thailand in 2011. By partial sequence analysis of VP1 gene, an unusual strain of HBoV (CMH-S011-11), was initially identified as HBoV4. The complete genome sequence of CMH-S011-11 was performed and analyzed further to clarify whether it was a recombinant strain or a new HBoV variant. Analysis of complete genome sequence revealed that the coding sequence starting from NS1, NP1 to VP1/VP2 was 4795 nucleotides long. Interestingly, the nucleotide sequence of NS1 gene of CMH-S011-11 was most closely related to the HBoV2 reference strains detected in Pakistan, which contradicted to the initial genotyping result of the partial VP1 region in the previous study. In addition, comparison of NP1 nucleotide sequence of CMH-S011-11 with those of other HBoV1-4 reference strains also revealed a high level of sequence identity with HBoV2. On the other hand, nucleotide sequence of VP1/VP2 gene of CMH-S011-11 was most closely related to those of HBoV4 reference strains detected in Nigeria. The overall full-length sequence analysis revealed that this CMH-S011-11 was grouped within HBoV4 species, but located in a separate branch from other HBoV4 prototype strains. Recombination analysis revealed that CMH-S011-11 was the result of recombination between HBoV2 and HBoV4 strains with the break point located near the start codon of VP2.

  14. Advanced accident sequence precursor analysis level 1 models

    SciTech Connect

    Sattison, M.B.; Thatcher, T.A.; Knudsen, J.K.; Schroeder, J.A.; Siu, N.O.

    1996-03-01

    INEL has been involved in the development of plant-specific Accident Sequence Precursor (ASP) models for the past two years. These models were developed for use with the SAPHIRE suite of PRA computer codes. They contained event tree/linked fault tree Level 1 risk models for the following initiating events: general transient, loss-of-offsite-power, steam generator tube rupture, small loss-of-coolant-accident, and anticipated transient without scram. Early in 1995 the ASP models were revised based on review comments from the NRC and an independent peer review. These models were released as Revision 1. The Office of Nuclear Regulatory Research has sponsored several projects at the INEL this fiscal year to further enhance the capabilities of the ASP models. Revision 2 models incorporates more detailed plant information into the models concerning plant response to station blackout conditions, information on battery life, and other unique features gleaned from an Office of Nuclear Reactor Regulation quick review of the Individual Plant Examination submittals. These models are currently being delivered to the NRC as they are completed. A related project is a feasibility study and model development of low power/shutdown (LP/SD) and external event extensions to the ASP models. This project will establish criteria for selection of LP/SD and external initiator operational events for analysis within the ASP program. Prototype models for each pertinent initiating event (loss of shutdown cooling, loss of inventory control, fire, flood, seismic, etc.) will be developed. A third project concerns development of enhancements to SAPHIRE. In relation to the ASP program, a new SAPHIRE module, GEM, was developed as a specific user interface for performing ASP evaluations. This module greatly simplifies the analysis process for determining the conditional core damage probability for a given combination of initiating events and equipment failures or degradations.

  15. Gene annotation and functional analysis of a newly sequenced Synechococcus strain.

    PubMed

    Li, Y; Rao, N N; Yang, Y; Zhang, Y; Gu, Y N

    2015-10-16

    Synechococcus sp PCC 7336 represents a newly sequenced strain, and its genome is obviously different from that of other Synechococcus strains. In this analysis, local alignment and annotation databases were constructed and combined with various bioinformatic tools to carry out gene annotation and functional analysis of this strain. From this analysis, we identified 5096 protein-coding genes and 47 RNA genes. Of these, 116 genes that were classified into 9 categories were associated with photosynthesis, and type V polymerase proteins that were identified are unique for this strain. An additional 107 genes were closely related to signal transduction pathways, which primarily comprised parts of two-component regulatory systems. Gene ontogeny analysis showed that 2377 genes were annotated with a total number of 9791 functional categories, and specifically that 41 genes distributed in 4 protein complexes were involved in oxidative phosphorylation. Clusters of orthologous groups classification showed that there were 1463 homologous proteins associated with 17 specific metabolic pathways, and that most of the proteins participated in primary metabolic processes such as binding and catalysis. The phylogenetic tree based on 16S rRNA sequences indicated that Synechococcus PCC 7336 is highly likely to represent a new branch.

  16. Recombinant albumins containing additional peptide sequences smaller than barbourin retain the ability of barbourin-albumin to inhibit platelet aggregation.

    PubMed

    Sheffield, William P; Wilson, Brianna; Eltringham-Smith, Louise J; Gataiance, Sharon; Bhakta, Varsha

    2005-05-01

    The previously described fusion protein BLAH(6) (Marques JA et al.,Thromb Haemost 2001; 86: 902-8) is a recombinant protein that combines the small disintegrin barbourin with hexahistidine-tagged rabbit serumalbumin (RSA) produced in Pichia pastoris yeast. We sought to determine: (1) if BLAH(6) was immunogenic; and (2) if its barbourin domain could be productively replaced with smaller peptides. Purified BLAH(6) was injected into rabbits, and anti-barbourin antibodies were universally detected in plasma 28 days later; BLAH(6) was, however, equally effective in reducing platelet aggregation in both naive and pre-treated rabbits. Thrombocytopenia was not observed, and complexing BLAH(6) to alpha(IIb)beta(3) had no effect on antibody detection. The barbourin moiety of BLAH(6) was replaced with each of four sequences: Pep I (VCKGDWPC); PepII (VCRGDWPC); PepIII (bar-bourin 41-54); and PepIV (LPSPGDWR). The corresponding fusion proteins were tested for their ability to inhibit ADP-induced platelet aggregation. PepIII-LAH(6) inhibited neither rabbit nor human platelets. PepI-LAH(6) and PepIV-LAH(6) inhibited rabbit platelet aggregation as effectively as BLAH(6), but PepIV-LAH(6) did not inhibit human platelet aggregation. PepI-LAH(6) and PepIILAH(6) inhibited human platelet aggregation with IC(50)s 10- and 20-fold higher than BLAH(6). Cross-immunoprecipitation assays with human platelet lysates confirmed that all proteins and peptides interacted with the platelet integrin alpha(IIb)beta(3), but with greatly varying affinities. Our results suggest that the antiplatelet activity of BLAH(6) can be retained in albumin fusion proteins in which smaller peptides replace the barbourin domain; these proteins may be less immunogenic than BLAH(6).

  17. Integration of Seismic Sequence Analysis and High Resolution Sequence Stratigraphy for Delineating the Sedimentation Characteristics and Modeling of Baltim Area, Off-Shore Nile Delta, Egypt

    NASA Astrophysics Data System (ADS)

    Nasr El-Deen Badawy, A. M. E. S.; Abu El-Ata, A. S. A.; El-Gendy, N. H.

    2014-12-01

    The current study is aiming to discuss the Messinian Prospectivity of the concerned area, which is located in the offshore Nile Delta, about 25 Km from the Mediterranean Sea shoreline. An integrated exploration approach applied, using a variety of the 2D/3D seismic data, subsurface borehole geologic and log data of the selected wells distributed in the study area, as well as the geophysical and biostratigraphic data. The well data comprise well markers, and electric logs, where the geological data represented by litho-stratigraphic information, as well as ditch samples analysis of the studied interval. The geophysical data include check shots, VSP, velocity cubes and 3D seismic lines. Biostratigraphic data include biozones, benthonic to planktonic ratios, nannofossils and foraminiferal data. Seismic interpretation and seismic stratigraphic analysis, in the form of seismic sequence analysis, seismic facies analysis, seismic unit analysis and geologic confirmation have been done by the aid of Petrel and Kingdom computer softwares. The seismic lines were interpreted for defining the different parasequences and picking the various smaller sequences for mapping, after picking each sequence from the seismic correlation, it is facilitated the mapping of every sequence laterally. In addition, the interpretation of structures and isopach of every sequence has been carried out, and the seismic attributes for every sequence were possible, to extract the sands present in each sequence, and to study the extensions of these sands that act as a reservoir. The integration of all results was taken as a base to produce the various models for the study area. The first one was the depositional environmental model, which showed that, the area varies from intertidal-littoral southward at Nidoco wells to inner-middle neritic at Baltim East wells then to outer neritic, and changes to bathyal and then to abyssal at the extreme north. The geologic model for the area was constructed

  18. Differentiation of sheep pox and goat poxviruses by sequence analysis and PCR-RFLP of P32 gene.

    PubMed

    Hosamani, Madhusudan; Mondal, Bimalendu; Tembhurne, Prabhakar A; Bandyopadhyay, Santanu Kumar; Singh, Raj Kumar; Rasool, Thaha Jamal

    2004-08-01

    Sheep pox and Goat pox are highly contagious viral diseases of small ruminants. These diseases were earlier thought to be caused by a single species of virus, as they are serologically indistinguishable. P32, one of the major immunogenic genes of Capripoxvirus, was isolated and Sequenced from two Indian isolates of goat poxvirus (GPV) and a vaccine strain of sheep poxvirus (SPV). The sequences were compared with other P32 sequences of capripoxviruses available in the database. Sequence analysis revealed that sheep pox and goat poxviruses share 97.5 and 94.7% homology at nucleotide and amino acid level, respectively. A major difference between them is the presence of an additional aspartic acid at 55th position of P32 of sheep poxvirus that is absent in both goat poxvirus and lumpy skin disease virus. Further, six unique neutral nucleotide substitutions were observed at positions 77, 275, 403, 552, 867 and 964 in the sequence of goat poxvirus, which can be taken as GPV signature residues. Similar unique nucleotide signatures could be identified in SPV and LSDV sequences also. Phylogenetic analysis showed that members of the Capripoxvirus could be delineated into three distinct clusters of GPV, SPV and LSDV based on the P32 genomic sequence. Using this information, a PCR-RFLP method has been developed for unequivocal genomic differentiation of SPV and GPV.

  19. D20S16 is a complex interspersed repeated sequence: Genetic and physical analysis of the locus

    SciTech Connect

    Bowden, D.W.; Krawchuk, M.D.; Howard, T.D.

    1995-01-20

    The genomic structure of the D20S16 locus has been evaluated using genetic and physical methods. D20S16, originally detected with the probe CRI-L1214, is a highly informative, complex restriction fragment length polymorphism consisting of two separate allelic systems. The allelic systems have the characteristics of conventional VNTR polymorphisms and are separated by recombination ({theta} = 0.02, Z{sub max} = 74.82), as demonstrated in family studies. Most of these recombination events are meiotic crossovers and are maternal in origin, but two, including deletion of the locus in a cell line from a CEPH family member, occur without evidence for exchange of flanking markers. DNA sequence analysis suggests that the basis of the polymorphism is variable numbers of a 98-bp sequence tandemly repeated with 87 to 90% sequence similarity between repeats. The 98-bp repeat is a dimer of 49 bp sequence with 45 to 98% identity between the elements. In addition, nonpolymorphic genomic sequences adjacent to the polymorphic 98-bp repeat tracts are also repeated but are not polymorphic, i.e., show no individual to individual variation. Restriction enzyme mapping of cosmids containing the CRI-L1214 sequence suggests that there are multiple interspersed repeats of the CRI-L1214 sequence on chromosome 20. The results of dual-color fluorescence in situ hybridization experiments with interphase nuclei are also consistent with multiple repeats of an interspersed sequence on chromosome 20. 23 refs., 6 figs.

  20. A new method of representing DNA sequences which combines ease of visual analysis with machine readability.

    PubMed Central

    Cowin, J E; Jellis, C H; Rickwood, D

    1986-01-01

    A new method of representing DNA sequences has been devised which is termed stave projection. Compared with other formats for showing the base sequences of DNA, this method greatly enhances the ease of visual analysis of the sequences of bases and it is also in a machine readable form. Using this method it is possible to identify and annotate all of the functional features found in DNA sequences. PMID:3003680

  1. Additional Routes to Staphylococcus aureus Daptomycin Resistance as Revealed by Comparative Genome Sequencing, Transcriptional Profiling, and Phenotypic Studies

    PubMed Central

    Song, Yang; Rubio, Aileen; Jayaswal, Radheshyam K.; Silverman, Jared A.; Wilkinson, Brian J.

    2013-01-01

    Daptomycin is an extensively used anti-staphylococcal agent due to the rise in methicillin-resistant Staphylococcus aureus, but the mechanism(s) of resistance is poorly understood. Comparative genome sequencing, transcriptomics, ultrastructure, and cell envelope studies were carried out on two relatively higher level (4 and 8 µg/ml−1) laboratory-derived daptomycin-resistant strains (strains CB1541 and CB1540 respectively) compared to their parent strain (CB1118; MW2). Several mutations were found in the strains. Both strains had the same mutations in the two-component system genes walK and agrA. In strain CB1540 mutations were also detected in the ribose phosphate pyrophosphokinase (prs) and polyribonucleotide nucleotidyltransferase genes (pnpA), a hypothetical protein gene, and in an intergenic region. In strain CB1541 there were mutations in clpP, an ATP-dependent protease, and two different hypothetical protein genes. The strain CB1540 transcriptome was characterized by upregulation of cap (capsule) operon genes, genes involved in the accumulation of the compatible solute glycine betaine, ure genes of the urease operon, and mscL encoding a mechanosensitive chanel. Downregulated genes included smpB, femAB and femH involved in the formation of the pentaglycine interpeptide bridge, genes involved in protein synthesis and fermentation, and spa encoding protein A. Genes altered in their expression common to both transcriptomes included some involved in glycine betaine accumulation, mscL, ure genes, femH, spa and smpB. However, the CB1541 transcriptome was further characterized by upregulation of various heat shock chaperone and protease genes, consistent with a mutation in clpP, and lytM and sceD. Both strains showed slow growth, and strongly decreased autolytic activity that appeared to be mainly due to decreased autolysin production. In contrast to previous common findings, we did not find any mutations in phospholipid biosynthesis genes, and it appears there

  2. De novo sequencing and a comprehensive analysis of purple sweet potato (Impomoea batatas L.) transcriptome.

    PubMed

    Xie, Fuliang; Burklew, Caitlin E; Yang, Yanfang; Liu, Min; Xiao, Peng; Zhang, Baohong; Qiu, Deyou

    2012-07-01

    High-throughput RNA sequencing was performed for comprehensively analyzing the transcriptome of the purple sweet potato. A total of 58,800 unigenes were obtained and ranged from 200 nt to 10,380 nt with an average length of 476 nt. The average expression of one unigene was 34 reads per kb per million reads (RPKM) with a maximum expression of 1,935 RPKM. At least 40,280 (68.5%) unigenes were identified to be protein-coding genes, in which 11,978 and 5,184 genes were homologous to Arabidopsis and rice proteins, respectively. Gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) analysis showed that 19,707 (33.5%) unigenes were classified to 1,807 terms of GO including molecular functions, biological processes, and cellular components and 9,970 (17.0%) unigenes were enriched to 11,119 KEGG pathways. We found that at least 3,553 genes may be involved in the biosynthesis pathways of starch, alkaloids, anthocyanin pigments, and vitamins. Additionally, 851 potential simple sequence repeats (SSRs) were identified in all unigenes. Transcriptome sequencing on tuberous roots of the sweet potato yielded substantial transcriptional sequences and potentially useful SSR markers which provide an important data source for sweet potato research. Comparison of two RNA-sequence datasets from the purple and the yellow sweet potato showed that UDP-glucose-flavonoid 3-O-glucosyltransferase was one of the key enzymes in the pathway of anthocyanin biosynthesis and that anthocyanin-3-glucoside might be one of the major components for anthocyanin pigments in the purple sweet potato. This study contributes to the molecular mechanisms of sweet potato development and metabolism and therefore that increases the potential utilization of the sweet potato in food nutrition and pharmacy.

  3. Massively parallel sequencing of short tandem repeats-Population data and mixture analysis results for the PowerSeq™ system.

    PubMed

    van der Gaag, Kristiaan J; de Leeuw, Rick H; Hoogenboom, Jerry; Patel, Jaynish; Storts, Douglas R; Laros, Jeroen F J; de Knijff, Peter

    2016-09-01

    Current forensic DNA analysis predominantly involves identification of human donors by analysis of short tandem repeats (STRs) using Capillary Electrophoresis (CE). Recent developments in Massively Parallel Sequencing (MPS) technologies offer new possibilities in analysis of STRs since they might overcome some of the limitations of CE analysis. In this study 17 STRs and Amelogenin were sequenced in high coverage using a prototype version of the Promega PowerSeq™ system for 297 population samples from the Netherlands, Nepal, Bhutan and Central African Pygmies. In addition, 45 two-person mixtures with different minor contributions down to 1% were analysed to investigate the performance of this system for mixed samples. Regarding fragment length, complete concordance between the MPS and CE-based data was found, marking the reliability of MPS PowerSeq™ system. As expected, MPS presented a broader allele range and higher power of discrimination and exclusion rate. The high coverage sequencing data were used to determine stutter characteristics for all loci and stutter ratios were compared to CE data. The separation of alleles with the same length but exhibiting different stutter ratios lowers the overall variation in stutter ratio and helps in differentiation of stutters from genuine alleles in mixed samples. All alleles of the minor contributors were detected in the sequence reads even for the 1% contributions, but analysis of mixtures below 5% without prior information of the mixture ratio is complicated by PCR and sequencing artefacts.

  4. Massively parallel sequencing of short tandem repeats-Population data and mixture analysis results for the PowerSeq™ system.

    PubMed

    van der Gaag, Kristiaan J; de Leeuw, Rick H; Hoogenboom, Jerry; Patel, Jaynish; Storts, Douglas R; Laros, Jeroen F J; de Knijff, Peter

    2016-09-01

    Current forensic DNA analysis predominantly involves identification of human donors by analysis of short tandem repeats (STRs) using Capillary Electrophoresis (CE). Recent developments in Massively Parallel Sequencing (MPS) technologies offer new possibilities in analysis of STRs since they might overcome some of the limitations of CE analysis. In this study 17 STRs and Amelogenin were sequenced in high coverage using a prototype version of the Promega PowerSeq™ system for 297 population samples from the Netherlands, Nepal, Bhutan and Central African Pygmies. In addition, 45 two-person mixtures with different minor contributions down to 1% were analysed to investigate the performance of this system for mixed samples. Regarding fragment length, complete concordance between the MPS and CE-based data was found, marking the reliability of MPS PowerSeq™ system. As expected, MPS presented a broader allele range and higher power of discrimination and exclusion rate. The high coverage sequencing data were used to determine stutter characteristics for all loci and stutter ratios were compared to CE data. The separation of alleles with the same length but exhibiting different stutter ratios lowers the overall variation in stutter ratio and helps in differentiation of stutters from genuine alleles in mixed samples. All alleles of the minor contributors were detected in the sequence reads even for the 1% contributions, but analysis of mixtures below 5% without prior information of the mixture ratio is complicated by PCR and sequencing artefacts. PMID:27347657

  5. Analysis of BAC end sequences in oak, a keystone forest tree species, providing insight into the composition of its genome

    PubMed Central

    2011-01-01

    Background One of the key goals of oak genomics research is to identify genes of adaptive significance. This information may help to improve the conservation of adaptive genetic variation and the management of forests to increase their health and productivity. Deep-coverage large-insert genomic libraries are a crucial tool for attaining this objective. We report herein the construction of a BAC library for Quercus robur, its characterization and an analysis of BAC end sequences. Results The EcoRI library generated consisted of 92,160 clones, 7% of which had no insert. Levels of chloroplast and mitochondrial contamination were below 3% and 1%, respectively. Mean clone insert size was estimated at 135 kb. The library represents 12 haploid genome equivalents and, the likelihood of finding a particular oak sequence of interest is greater than 99%. Genome coverage was confirmed by PCR screening of the library with 60 unique genetic loci sampled from the genetic linkage map. In total, about 20,000 high-quality BAC end sequences (BESs) were generated by sequencing 15,000 clones. Roughly 5.88% of the combined BAC end sequence length corresponded to known retroelements while ab initio repeat detection methods identified 41 additional repeats. Collectively, characterized and novel repeats account for roughly 8.94% of the genome. Further analysis of the BESs revealed 1,823 putative genes suggesting at least 29,340 genes in the oak genome. BESs were aligned with the genome sequences of Arabidopsis thaliana, Vitis vinifera and Populus trichocarpa. One putative collinear microsyntenic region encoding an alcohol acyl transferase protein was observed between oak and chromosome 2 of V. vinifera. Conclusions This BAC library provides a new resource for genomic studies, including SSR marker development, physical mapping, comparative genomics and genome sequencing. BES analysis provided insight into the structure of the oak genome. These sequences will be used in the assembly of a

  6. Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.

    PubMed

    Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami

    2012-08-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or <0.5% or >15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.

  7. Analysis methods for the determination of anthropogenic additions of P to agricultural soils

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Phosphorus additions and measurement in soil is of concern on lands where biosolids have been applied. Colorimetric analysis for plant-available P may be inadequate for the accurate assessment of soil P. Phosphate additions in a regulatory environment need to be accurately assessed as the reported...

  8. Comparative sequence analysis of cytokine genes from human and nonhuman primates

    SciTech Connect

    Villinger, F.; Brar, S.S.; Mayne, A.

    1995-10-15

    Two major issues severely limit the studies of human recombinant cytokines/growth factors in nonhuman primates. First, assays and reagents specific for the detection and quantitation of human cytokines do not all function when utilized to detect/quantitate the nonhuman primate cytokines. Second, although most of the human cytokines appear to induce similar, if not identical, biologic function when used with cells from nonhuman primates in vitro or in vivo, they invariably induce Ab responses in vivo, precluding their repeated and/or continued use in vivo. Our laboratory has thus initiated studies to clone, sequence, and prepare recombinant cytokines from nonhuman primates and to define assays and reagents for their detection and quantitation at the nucleic acid and protein level. The data that were derived from such studies show that the nonhuman primate cytokines IL-1{alpha}, IL-1{beta}, IL-2, IL-4, IL-5, IL-6, IL-8, IL-10, IL-12{alpha}, IL-12{beta}, IL-15, IFN-{alpha}, IFN-{gamma}, and TNF-{alpha} share 93 to 99% homology at the nucleic acid and protein level with the human equivalents. The most prominent differences between human and nonhuman primate cytokine sequences were noted for IL-1{alpha}/{beta}, IL-2, IL-8, IFN-{alpha}, IFN-{gamma}, and IL-12{beta}. The aligned sequences of cytokines for human and several nonhuman primate species are provided herein, and a phylogenetic analysis of the published sequences of select cytokines from other species, along with those of the nonhuman primates, are described. In addition, comparative analysis of the relative bioactivity of our immunoaffinity-purified recombinant rhesus macaque IL-4, IL-15, and IFN-{gamma} with commercially available human recombinant cytokines is described herein. 40 refs., 5 figs., 2 tabs.

  9. Genomic Analysis Reveals Novel Diversity among the 1976 Philadelphia Legionnaires’ Disease Outbreak Isolates and Additional ST36 Strains

    PubMed Central

    Mercante, Jeffrey W.; Morrison, Shatavia S.; Desai, Heta P.; Raphael, Brian H.; Winchell, Jonas M.

    2016-01-01

    Legionella pneumophila was first recognized as a cause of severe and potentially fatal pneumonia during a large-scale outbreak of Legionnaires’ disease (LD) at a Pennsylvania veterans’ convention in Philadelphia, 1976. The ensuing investigation and recovery of four clinical isolates launched the fields of Legionella epidemiology and scientific research. Only one of the original isolates, “Philadelphia-1”, has been widely distributed or extensively studied. Here we describe the whole-genome sequencing (WGS), complete assembly, and comparative analysis of all Philadelphia LD strains recovered from that investigation, along with L. pneumophila isolates sharing the Philadelphia sequence type (ST36). Analyses revealed that the 1976 outbreak was due to multiple serogroup 1 strains within the same genetic lineage, differentiated by an actively mobilized, self-replicating episome that is shared with L. pneumophila str. Paris, and two large, horizontally-transferred genomic loci, among other polymorphisms. We also found a completely unassociated ST36 strain that displayed remarkable genetic similarity to the historical Philadelphia isolates. This similar strain implies the presence of a potential clonal population, and suggests important implications may exist for considering epidemiological context when interpreting phylogenetic relationships among outbreak-associated isolates. Additional extensive archival research identified the Philadelphia isolate associated with a non-Legionnaire case of “Broad Street pneumonia”, and provided new historical and genetic insights into the 1976 epidemic. This retrospective analysis has underscored the utility of fully-assembled WGS data for Legionella outbreak investigations, highlighting the increased resolution that comes from long-read sequencing and a sequence type-matched genomic data set. PMID:27684472

  10. SFAPS: an R package for structure/function analysis of protein sequences based on informational spectrum method.

    PubMed

    Deng, Su-Ping; Huang, De-Shuang

    2014-10-01

    The R package SFAPS has been developed for structure/function analysis of protein sequences based on information spectrum method. The informational spectrum method employs the electron-ion interaction potential parameter as the numerical representation for the protein sequence, and obtains the characteristic frequency of a particular protein interaction after computing the Discrete Fourier Transform for protein sequences. The informational spectrum method is often used to analyze protein sequences, so we developed this software tool, which is implemented as an add-on package to the freely available and widely used statistical language R. Our package is distributed as open source code for Linux, Unix and Microsoft Windows. It is released under the GNU General Public License. The R package along with its source code and additional material are freely available at http://mlsbl.tongji.edu.cn/DBdownload.asp.

  11. Deep Sequencing Analysis of Apple Infecting Viruses in Korea

    PubMed Central

    Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun

    2016-01-01

    Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694

  12. EST sequencing of Onychophora and phylogenomic analysis of Metazoa.

    PubMed

    Roeding, Falko; Hagner-Holler, Silke; Ruhberg, Hilke; Ebersberger, Ingo; von Haeseler, Arndt; Kube, Michael; Reinhardt, Richard; Burmester, Thorsten

    2007-12-01

    Onychophora (velvet worms) represent a small animal taxon considered to be related to Euarthropoda. We have obtained 1873 5' cDNA sequences (expressed sequence tags, ESTs) from the velvet worm Epiperipatus sp., which were assembled into 833 contigs. BLAST similarity searches revealed that 51.9% of the contigs had matches in the protein databases with expectation values lower than 10(-4). Most ESTs had the best hit with proteins from either Chordata or Arthropoda (approximately 40% respectively). The ESTs included sequences of 27 ribosomal proteins. The orthologous sequences from 28 other species of a broad range of phyla were obtained from the databases, including other EST projects. A concatenated amino acid alignment comprising 5021 positions was constructed, which covers 4259 positions when problematic regions were removed. Bayesian and maximum likelihood methods place Epiperipatus within the monophyletic Ecdysozoa (Onychophora, Arthropoda, Tardigrada and Nematoda), but its exact relation to the Euarthropoda remained unresolved. The "Articulata" concept was not supported. Tardigrada and Nematoda formed a well-supported monophylum, suggesting that Tardigrada are actually Cycloneuralia. In agreement with previous studies, we have demonstrated that random sequencing of cDNAs results in sequence information suitable for phylogenomic approaches to resolve metazoan relationships. PMID:17933557

  13. EST sequencing of Onychophora and phylogenomic analysis of Metazoa.

    PubMed

    Roeding, Falko; Hagner-Holler, Silke; Ruhberg, Hilke; Ebersberger, Ingo; von Haeseler, Arndt; Kube, Michael; Reinhardt, Richard; Burmester, Thorsten

    2007-12-01

    Onychophora (velvet worms) represent a small animal taxon considered to be related to Euarthropoda. We have obtained 1873 5' cDNA sequences (expressed sequence tags, ESTs) from the velvet worm Epiperipatus sp., which were assembled into 833 contigs. BLAST similarity searches revealed that 51.9% of the contigs had matches in the protein databases with expectation values lower than 10(-4). Most ESTs had the best hit with proteins from either Chordata or Arthropoda (approximately 40% respectively). The ESTs included sequences of 27 ribosomal proteins. The orthologous sequences from 28 other species of a broad range of phyla were obtained from the databases, including other EST projects. A concatenated amino acid alignment comprising 5021 positions was constructed, which covers 4259 positions when problematic regions were removed. Bayesian and maximum likelihood methods place Epiperipatus within the monophyletic Ecdysozoa (Onychophora, Arthropoda, Tardigrada and Nematoda), but its exact relation to the Euarthropoda remained unresolved. The "Articulata" concept was not supported. Tardigrada and Nematoda formed a well-supported monophylum, suggesting that Tardigrada are actually Cycloneuralia. In agreement with previous studies, we have demonstrated that random sequencing of cDNAs results in sequence information suitable for phylogenomic approaches to resolve metazoan relationships.

  14. Analysis of expressed sequence tags (ESTs) from a normalized cDNA library and isolation of EST simple sequence repeats from the invasive cotton mealybug Phenacoccus solenopsis.

    PubMed

    Li, Hui; Lang, Kun-Ling; Fu, Hai-Bin; Shen, Chang-Peng; Wan, Fang-Hao; Chu, Dong

    2015-12-01

    The cotton mealybug, Phenacoccus solenopsis Tinsley, is a serious and invasive pest. At present, genetic resources for studying P. solenopsis are limited, and this negatively affects genetic research on the organism and, consequently, translational work to improve management of this pest. In the present study, expressed sequence tags (ESTs) were analyzed from a normalized complementary DNA library of P. solenopsis. In addition, EST-derived microsatellite loci (also known as simple sequence repeats or SSRs) were isolated and characterized. A total of 1107 high-quality ESTs were acquired from the library. Clustering and assembly analysis resulted in 785 unigenes, which were classified functionally into 23 categories according to the Gene Ontology database. Seven EST-based SSR markers were developed in this study and are expected to be useful in characterizing how this invasive species was introduced, as well as providing insights into its genetic microevolution.

  15. Quantitative Analysis of Polymer Additives with MALDI-TOF MS Using an Internal Standard Approach

    NASA Astrophysics Data System (ADS)

    Schwarzinger, Clemens; Gabriel, Stefan; Beißmann, Susanne; Buchberger, Wolfgang

    2012-06-01

    MALDI-TOF MS is used for the qualitative analysis of seven different polymer additives directly from the polymer without tedious sample pretreatment. Additionally, by using a solid sample preparation technique, which avoids the concentration gradient problems known to occur with dried droplets and by adding tetraphenylporphyrine as an internal standard to the matrix, it is possible to perform quantitative analysis of additives directly from the polymer sample. Calibration curves for Tinuvin 770, Tinuvin 622, Irganox 1024, Irganox 1010, Irgafos 168, and Chimassorb 944 are presented, showing coefficients of determination between 0.911 and 0.990.

  16. Quantitative analysis of polymer additives with MALDI-TOF MS using an internal standard approach.

    PubMed

    Schwarzinger, Clemens; Gabriel, Stefan; Beißmann, Susanne; Buchberger, Wolfgang

    2012-06-01

    MALDI-TOF MS is used for the qualitative analysis of seven different polymer additives directly from the polymer without tedious sample pretreatment. Additionally, by using a solid sample preparation technique, which avoids the concentration gradient problems known to occur with dried droplets and by adding tetraphenylporphyrine as an internal standard to the matrix, it is possible to perform quantitative analysis of additives directly from the polymer sample. Calibration curves for Tinuvin 770, Tinuvin 622, Irganox 1024, Irganox 1010, Irgafos 168, and Chimassorb 944 are presented, showing coefficients of determination between 0.911 and 0.990.

  17. Tissue-specific transcriptome sequencing analysis expands the non-human primate reference transcriptome resource (NHPRTR)

    PubMed Central

    Peng, Xinxia; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Nishida, Andrew; Pipes, Lenore; Bozinoski, Marjan; Thomas, Matthew J.; Kelly, Sara; Weiss, Jeffrey M.; Raveendran, Muthuswamy; Muzny, Donna; Gibbs, Richard A.; Rogers, Jeffrey; Schroth, Gary P.; Katze, Michael G.; Mason, Christopher E.

    2015-01-01

    The non-human primate reference transcriptome resource (NHPRTR, available online at http://nhprtr.org/) aims to generate comprehensive RNA-seq data from a wide variety of non-human primates (NHPs), from lemurs to hominids. In the 2012 Phase I of the NHPRTR project, 19 billion fragments or 3.8 terabases of transcriptome sequences were collected from pools of ∼20 tissues in 15 species and subspecies. Here we describe a major expansion of NHPRTR by adding 10.1 billion fragments of tissue-specific RNA-seq data. For this effort, we selected 11 of the original 15 NHP species and subspecies and constructed total RNA libraries for the same ∼15 tissues in each. The sequence quality is such that 88% of the reads align to human reference sequences, allowing us to compute the full list of expression abundance across all tissues for each species, using the reads mapped to human genes. This update also includes improved transcript annotations derived from RNA-seq data for rhesus and cynomolgus macaques, two of the most commonly used NHP models and additional RNA-seq data compiled from related projects. Together, these comprehensive reference transcriptomes from multiple primates serve as a valuable community resource for genome annotation, gene dynamics and comparative functional analysis. PMID:25392405

  18. Analysis of transposable elements in the genome of Asparagus officinalis from high coverage sequence data.

    PubMed

    Li, Shu-Fen; Gao, Wu-Jun; Zhao, Xin-Peng; Dong, Tian-Yu; Deng, Chuan-Liang; Lu, Long-Dou

    2014-01-01

    Asparagus officinalis is an economically and nutritionally important vegetable crop that is widely cultivated and is used as a model dioecious species to study plant sex determination and sex chromosome evolution. To improve our understanding of its genome composition, especially with respect to transposable elements (TEs), which make up the majority of the genome, we performed Illumina HiSeq2000 sequencing of both male and female asparagus genomes followed by bioinformatics analysis. We generated 17 Gb of sequence (12×coverage) and assembled them into 163,406 scaffolds with a total cumulated length of 400 Mbp, which represent about 30% of asparagus genome. Overall, TEs masked about 53% of the A. officinalis assembly. Majority of the identified TEs belonged to LTR retrotransposons, which constitute about 28% of genomic DNA, with Ty1/copia elements being more diverse and accumulated to higher copy numbers than Ty3/gypsy. Compared with LTR retrotransposons, non-LTR retrotransposons and DNA transposons were relatively rare. In addition, comparison of the abundance of the TE groups between male and female genomes showed that the overall TE composition was highly similar, with only slight differences in the abundance of several TE groups, which is consistent with the relatively recent origin of asparagus sex chromosomes. This study greatly improves our knowledge of the repetitive sequence construction of asparagus, which facilitates the identification of TEs responsible for the early evolution of plant sex chromosomes and is helpful for further studies on this dioecious plant.

  19. Targeted Next‐Generation Sequencing Analysis of 1,000 Individuals with Intellectual Disability

    PubMed Central

    Grozeva, Detelina; Carss, Keren; Spasic‐Boskovic, Olivera; Tejada, Maria‐Isabel; Gecz, Jozef; Shaw, Marie; Corbett, Mark; Haan, Eric; Thompson, Elizabeth; Friend, Kathryn; Hussain, Zaamin; Hackett, Anna; Field, Michael; Renieri, Alessandra; Stevenson, Roger; Schwartz, Charles; Floyd, James A.B.; Bentham, Jamie; Cosgrove, Catherine; Keavney, Bernard; Bhattacharya, Shoumo; Hurles, Matthew

    2015-01-01

    ABSTRACT To identify genetic causes of intellectual disability (ID), we screened a cohort of 986 individuals with moderate to severe ID for variants in 565 known or candidate ID‐associated genes using targeted next‐generation sequencing. Likely pathogenic rare variants were found in ∼11% of the cases (113 variants in 107/986 individuals: ∼8% of the individuals had a likely pathogenic loss‐of‐function [LoF] variant, whereas ∼3% had a known pathogenic missense variant). Variants in SETD5, ATRX, CUL4B, MECP2, and ARID1B were the most common causes of ID. This study assessed the value of sequencing a cohort of probands to provide a molecular diagnosis of ID, without the availability of DNA from both parents for de novo sequence analysis. This modeling is clinically relevant as 28% of all UK families with dependent children are single parent households. In conclusion, to diagnose patients with ID in the absence of parental DNA, we recommend investigation of all LoF variants in known genes that cause ID and assessment of a limited list of proven pathogenic missense variants in these genes. This will provide 11% additional diagnostic yield beyond the 10%–15% yield from array CGH alone. PMID:26350204

  20. Survey and analysis of simple sequence repeats (SSRs) in three genomes of Candida species.

    PubMed

    Jia, Dongmei

    2016-06-15

    Simple sequence repeats (SSRs) or microsatellites, which composed of tandem repeated short units of 1-6 bp, have been paying attention continuously. Here, the distribution, composition and polymorphism of microsatellites and compound microsatellites were analyzed in three available genomes of Candida species (Candida dubliniensis, Candida glabrata and Candida orthopsilosis). The results show that there were 118,047, 66,259 and 61,119 microsatellites in genomes of C. dubliniensis, C. glabrata and C. orthopsilosis, respectively. The SSRs covered more than 1/3 length of genomes in the three species. The microsatellites, which just consist of bases A and (or) T, such as (A)n, (T)n, (AT)n, (TA)n, (AAT)n, (TAA)n, (TTA)n, (ATA)n, (ATT)n and (TAT)n, were predominant in the three genomes. The length of microsatellites was focused on 6 bp and 9 bp either in the three genomes or in its coding sequences. What's more, the relative abundance (19.89/kbp) and relative density (167.87 bp/kbp) of SSRs in sequence of mitochondrion of C. glabrata were significantly great than that in any one of genomes or chromosomes of the three species. In addition, the distance between any two adjacent microsatellites was an important factor to influence the formation of compound microsatellites. The analysis may be helpful for further studying the roles of microsatellites in genomes' origination, organization and evolution of Candida species.

  1. Targeted Next-Generation Sequencing Analysis of 1,000 Individuals with Intellectual Disability.

    PubMed

    Grozeva, Detelina; Carss, Keren; Spasic-Boskovic, Olivera; Tejada, Maria-Isabel; Gecz, Jozef; Shaw, Marie; Corbett, Mark; Haan, Eric; Thompson, Elizabeth; Friend, Kathryn; Hussain, Zaamin; Hackett, Anna; Field, Michael; Renieri, Alessandra; Stevenson, Roger; Schwartz, Charles; Floyd, James A B; Bentham, Jamie; Cosgrove, Catherine; Keavney, Bernard; Bhattacharya, Shoumo; Hurles, Matthew; Raymond, F Lucy

    2015-12-01

    To identify genetic causes of intellectual disability (ID), we screened a cohort of 986 individuals with moderate to severe ID for variants in 565 known or candidate ID-associated genes using targeted next-generation sequencing. Likely pathogenic rare variants were found in ∼11% of the cases (113 variants in 107/986 individuals: ∼8% of the individuals had a likely pathogenic loss-of-function [LoF] variant, whereas ∼3% had a known pathogenic missense variant). Variants in SETD5, ATRX, CUL4B, MECP2, and ARID1B were the most common causes of ID. This study assessed the value of sequencing a cohort of probands to provide a molecular diagnosis of ID, without the availability of DNA from both parents for de novo sequence analysis. This modeling is clinically relevant as 28% of all UK families with dependent children are single parent households. In conclusion, to diagnose patients with ID in the absence of parental DNA, we recommend investigation of all LoF variants in known genes that cause ID and assessment of a limited list of proven pathogenic missense variants in these genes. This will provide 11% additional diagnostic yield beyond the 10%-15% yield from array CGH alone.

  2. Analysis of transposable elements in the genome of Asparagus officinalis from high coverage sequence data.

    PubMed

    Li, Shu-Fen; Gao, Wu-Jun; Zhao, Xin-Peng; Dong, Tian-Yu; Deng, Chuan-Liang; Lu, Long-Dou

    2014-01-01

    Asparagus officinalis is an economically and nutritionally important vegetable crop that is widely cultivated and is used as a model dioecious species to study plant sex determination and sex chromosome evolution. To improve our understanding of its genome composition, especially with respect to transposable elements (TEs), which make up the majority of the genome, we performed Illumina HiSeq2000 sequencing of both male and female asparagus genomes followed by bioinformatics analysis. We generated 17 Gb of sequence (12×coverage) and assembled them into 163,406 scaffolds with a total cumulated length of 400 Mbp, which represent about 30% of asparagus genome. Overall, TEs masked about 53% of the A. officinalis assembly. Majority of the identified TEs belonged to LTR retrotransposons, which constitute about 28% of genomic DNA, with Ty1/copia elements being more diverse and accumulated to higher copy numbers than Ty3/gypsy. Compared with LTR retrotransposons, non-LTR retrotransposons and DNA transposons were relatively rare. In addition, comparison of the abundance of the TE groups between male and female genomes showed that the overall TE composition was highly similar, with only slight differences in the abundance of several TE groups, which is consistent with the relatively recent origin of asparagus sex chromosomes. This study greatly improves our knowledge of the repetitive sequence construction of asparagus, which facilitates the identification of TEs responsible for the early evolution of plant sex chromosomes and is helpful for further studies on this dioecious plant. PMID:24810432

  3. Targeted Next-Generation Sequencing Analysis of 1,000 Individuals with Intellectual Disability.

    PubMed

    Grozeva, Detelina; Carss, Keren; Spasic-Boskovic, Olivera; Tejada, Maria-Isabel; Gecz, Jozef; Shaw, Marie; Corbett, Mark; Haan, Eric; Thompson, Elizabeth; Friend, Kathryn; Hussain, Zaamin; Hackett, Anna; Field, Michael; Renieri, Alessandra; Stevenson, Roger; Schwartz, Charles; Floyd, James A B; Bentham, Jamie; Cosgrove, Catherine; Keavney, Bernard; Bhattacharya, Shoumo; Hurles, Matthew; Raymond, F Lucy

    2015-12-01

    To identify genetic causes of intellectual disability (ID), we screened a cohort of 986 individuals with moderate to severe ID for variants in 565 known or candidate ID-associated genes using targeted next-generation sequencing. Likely pathogenic rare variants were found in ∼11% of the cases (113 variants in 107/986 individuals: ∼8% of the individuals had a likely pathogenic loss-of-function [LoF] variant, whereas ∼3% had a known pathogenic missense variant). Variants in SETD5, ATRX, CUL4B, MECP2, and ARID1B were the most common causes of ID. This study assessed the value of sequencing a cohort of probands to provide a molecular diagnosis of ID, without the availability of DNA from both parents for de novo sequence analysis. This modeling is clinically relevant as 28% of all UK families with dependent children are single parent households. In conclusion, to diagnose patients with ID in the absence of parental DNA, we recommend investigation of all LoF variants in known genes that cause ID and assessment of a limited list of proven pathogenic missense variants in these genes. This will provide 11% additional diagnostic yield beyond the 10%-15% yield from array CGH alone. PMID:26350204

  4. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Foley, B.; Korber, B.; Mellors, J.W.; Jeang, K.T.; Wain-Hobson, S.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  5. Isolation, complete genome sequencing, and phylogenetic analysis of the first Chuzan virus in China.

    PubMed

    Wang, Fang; Lin, Jun; Chang, Jitao; Cao, Yingying; Qin, Shaomin; Wu, Jianmin; Yu, Li

    2016-02-01

    A Chuzan virus (CHUV), defined as GX871 here, was isolated from blood from a sentinel cattle firstly in China, and its full-length genome was sequenced in this study. The GX871 genome included 10 segments and 18914 bp, one base fewer than the CHUV prototype strain K-47 due to a one-base deletion in the 5' non-coding region of segment 8. A frameshift mutation was detected in a short coding region (1010-1026 nt) corresponding to the VP1 protein; this frameshift resulted in a five-amino acid mutation from 336CVLSY340 to 336YGAKL340. In addition, there were a one-base deletion at 1713 nt and a one-base insertion at 1682 nt in the 3' non-coding region of segment 5. Based on phylogenetic analysis of the deduced VP2 amino acid sequences, Palyam serogroup viruses were classified into three groups. The Chinese CHUV isolate GX871 was categorized into the same group as CHUV prototype strain K-47. The phylogenetic tree was divided into three clusters according to the geographical distribution of the partial nucleotide sequences of VP7, and this arrangement might define the geographical gene pool of CHUV.

  6. Analysis of Transposable Elements in the Genome of Asparagus officinalis from High Coverage Sequence Data

    PubMed Central

    Li, Shu-Fen; Gao, Wu-Jun; Zhao, Xin-Peng; Dong, Tian-Yu; Deng, Chuan-Liang; Lu, Long-Dou

    2014-01-01

    Asparagus officinalis is an economically and nutritionally important vegetable crop that is widely cultivated and is used as a model dioecious species to study plant sex determination and sex chromosome evolution. To improve our understanding of its genome composition, especially with respect to transposable elements (TEs), which make up the majority of the genome, we performed Illumina HiSeq2000 sequencing of both male and female asparagus genomes followed by bioinformatics analysis. We generated 17 Gb of sequence (12×coverage) and assembled them into 163,406 scaffolds with a total cumulated length of 400 Mbp, which represent about 30% of asparagus genome. Overall, TEs masked about 53% of the A. officinalis assembly. Majority of the identified TEs belonged to LTR retrotransposons, which constitute about 28% of genomic DNA, with Ty1/copia elements being more diverse and accumulated to higher copy numbers than Ty3/gypsy. Compared with LTR retrotransposons, non-LTR retrotransposons and DNA transposons were relatively rare. In addition, comparison of the abundance of the TE groups between male and female genomes showed that the overall TE composition was highly similar, with only slight differences in the abundance of several TE groups, which is consistent with the relatively recent origin of asparagus sex chromosomes. This study greatly improves our knowledge of the repetitive sequence construction of asparagus, which facilitates the identification of TEs responsible for the early evolution of plant sex chromosomes and is helpful for further studies on this dioecious plant. PMID:24810432

  7. Massively parallel sequencing and analysis of expressed sequence tags in a successful invasive plant

    PubMed Central

    Prentis, Peter J.; Woolfit, Megan; Thomas-Hall, Skye R.; Ortiz-Barrientos, Daniel; Pavasovic, Ana; Lowe, Andrew J.; Schenk, Peer M.

    2010-01-01

    Background Invasive species pose a significant threat to global economies, agriculture and biodiversity. Despite progress towards understanding the ecological factors associated with plant invasions, limited genomic resources have made it difficult to elucidate the evolutionary and genetic factors responsible for invasiveness. This study presents the first expressed sequence tag (EST) collection for Senecio madagascariensis, a globally invasive plant species. Methods We used pyrosequencing of one normalized and two subtractive libraries, derived from one native and one invasive population, to generate an EST collection. ESTs were assembled into contigs, annotated by BLAST comparison with the NCBI non-redundant protein database and assigned gene ontology (GO) terms from the Plant GO Slim ontologies. Key Results Assembly of the 221 746 sequence reads resulted in 12 442 contigs. Over 50 % (6183) of 12 442 contigs showed significant homology to proteins in the NCBI database, representing approx. 4800 independent transcripts. The molecular transducer GO term was significantly over-represented in the native (South African) subtractive library compared with the invasive (Australian) library. Based on NCBI BLAST hits and literature searches, 40 % of the molecular transducer genes identified in the South African subtractive library are likely to be involved in response to biotic stimuli, such as fungal, bacterial and viral pathogens. Conclusions This EST collection is the first representation of the S. madagascariensis transcriptome and provides an important resource for the discovery of candidate genes associated with plant invasiveness. The over-representation of molecular transducer genes associated with defence responses in the native subtractive library provides preliminary support for aspects of the enemy release and evolution of increased competitive ability hypotheses in this successful invasive. This study highlights the contribution of next-generation sequencing

  8. [Development of laboratory sequence analysis software based on WWW and UNIX].

    PubMed

    Huang, Y; Gu, J R

    2001-01-01

    Sequence analysis tools based on WWW and UNIX were developed in our laboratory to meet the needs of molecular genetics research in our laboratory. General principles of computer analysis of DNA and protein sequences were also briefly discussed in this paper.

  9. Fate of Aegilops speltoides-derived, repetitive DNA sequences in diploid Aegilops species, wheat-Aegilops amphiploids and derived chromosome addition lines.

    PubMed

    Kumar, S; Friebe, B; Gill, B S

    2010-07-01

    The present study reports the cloning and characterization of an Aegilops speltoides-derived subtelomeric repeat, designated as pSp1B16. Clone pSp1B16 has 98% sequence homology with the previously isolated Ae. speltoides repeat Spelt1. The distribution of pSp1B16 and another Ae. speltoides repeat, pGc1R1, was analyzed in diploid Aegilops species, tetra- and hexaploid wheats, wheat-Aegilops amphiploids and derived chromosome addition lines by fluorescence in situ hybridization (FISH). Clones pSp1B16 and pGc1R1 revealed FISH sites in Ae. speltoides, Ae. sharonensis and Triticum timopheevii, whereas additional pGc1R1 FISH sites were observed in Ae. longissima and Ae. caudata. The pSp1B16 and pGc1R1 FISH patterns of the Aegilops chromosomes in the wheat-Aegilops amphiploids and chromosome addition lines are similar to those present in the Aegilops parent accession. We did not observe any evidence of pSp1B16 and pGc1R1 sequence elimination, which is in contrast to previous studies using similar hybrids and repeats. The presented data suggest that the genomic changes in synthetic amphiploids observed in previous studies might be caused by homoeologous recombination, which was suppressed in the amphiploid analyzed in this study.

  10. Developmental sequences for hopping as assessment instruments: a generalizability analysis.

    PubMed

    Painter, M A

    1994-03-01

    The purpose of this study was to investigate the generalizability with which undergraduate kinesiology and elementary education students can rate children's hopping performances according to prelongitudinally validated developmental sequences for the arms, legs, and total body. Twenty observers were assigned to one of four training groups (n = 5): (a) kinesiology students/total-body sequence, (b) kinesiology students/component sequences, (c) elementary education students/total-body sequence, and (d) elementary education students/component sequences. The observers rated five trials of videotaped hopping performances by 10 boys and 10 girls between the ages of 3.5 and 9.0 years. The results suggested that when kinesiology students receive 2 hours of training, one observer can reliably assess leg action in one trial (.80) and arm action in five trials (.80). In contrast, one elementary education student can reliably assess leg action within five trials (.80), but the average score of two observers assessing three trials each is needed to assess arm action (.81). Reliable assessment of total-body action requires two observers for both the kinesiology students (four trials each = .80) and the elementary education students (two trials each = .84).

  11. Computer analysis of phytochrome sequences and reevaluation of the phytochrome secondary structure by Fourier transform infrared spectroscopy.

    PubMed

    Sühnel, J; Hermann, G; Dornberger, U; Fritzsche, H

    1997-07-18

    A repertoire of various methods of computer sequence analysis was applied to phytochromes in order to gain new insights into their structure and function. A statistical analysis of 23 complete phytochrome sequences revealed regions of non-random amino acid composition, which are supposed to be of particular structural or functional importance. All phytochromes other than phyD and phyE from Arabidopsis have at least one such region at the N-terminus between residues 2 and 35. A sequence similarity search of current databases indicated striking homologies between all phytochromes and a hypothetical 84.2-kDa protein from the cyanobacterium Synechocystis. Furthermore, scanning the phytochrome sequences for the occurrence of patterns defined in the PROSITE database detected the signature of the WD repeats of the beta-transducin family within the functionally important 623-779 region (sequence numbering of phyA from Avena) in a number of phytochromes. A multiple sequence alignment performed with 23 complete phytochrome sequences is made available via the IMB Jena World-Wide Web server (http://www.imb-jena.de/PHYTO.html). It can be used as a working tool for future theoretical and experimental studies. Based on the multiple alignment striking sequence differences between phytochromes A and B were detected directly at the N-terminal end, where all phytochromes B have an additional stretch of 15-42 amino acids. There is also a variety of positions with totally conserved but different amino acids in phytochromes A and B. Most of these changes are found in the sequence segment 150-200. It is, therefore, suggested that this region might be of importance in determining the photosensory specificity of the two phytochromes. The secondary structure prediction based on the multiple alignment resulted in a small but significant beta-sheet content. This finding is confirmed by a reevaluation of the secondary structure using FTIR spectroscopy.

  12. SxtA gene sequence analysis of dinoflagellate Alexandrium minutum

    NASA Astrophysics Data System (ADS)

    Norshaha, Safida Anira; Latib, Norhidayu Abdul; Usup, Gires; Yusof, Nurul Yuziana Mohd

    2015-09-01

    The dinoflagellate Alexandrium minutum is typically known for the production of potent neurotoxins such as saxitoxin, affecting the health of human seafood consumers via paralytic shellfish poisoning (PSP). These phenomena is related to the harmful algal blooms (HABs) that is believed to be influenced by environmental and nutritional factors. Previous study has revealed that SxtA gene is a starting gene that involved in the saxitoxin production pathway. The aim of this study was to analyse the sequence of the sxtA gene in A. minutum. The dinoflagellates culture was cultured at temperature 26°C with 16:8-hour light:dark photocycle. After the samples were harvested, RNA was extracted, complementary DNA (cDNA) was synthesised and amplified by polymerase chain reaction (PCR). The PCR products were then purified and cloned before sequenced. The SxtA sequence obtained was then analyzed in order to identify the presence of SxtA gene in Alexandrium minutum.

  13. Analysis of sequence-dependent curvature in matrix attachment regions.

    PubMed

    Yamamura, J; Nomura, K

    2001-02-01

    Sequence-dependent DNA conformations of matrix attachment regions (MARs) available in a database were calculated using the wedge model, and compared with randomly chosen genes, promoters, enhancers and transposons. The MARs had a longer bent part and higher angle/helical turn than the other regions. It is known that some MAR sequences have A-tracts that cause DNA bending, and we also found many A-tracts in examined MARs. Furthermore, non-random and clustered distribution of A-tracts shown here gave further evidence of the importance of A-tracts for MAR conformations. These results suggest that DNAs of MARs have a characteristic conformation instead of conserved sequence.

  14. Genome Sequence and Analysis of the Soil Cellulolytic ActinomyceteThermobifida fusca

    SciTech Connect

    Lykidis, Athanasios; Mavromatis, Konstantinos; Ivanova, Natalia; Anderson, Iain; Land, Miriam; DiBartolo, Genevieve; Martinez, Michele; Lapidus, Alla; Lucas, Susan; Copeland, Alex; Richardson, Paul; Wilson,David B.; Kyrpides, Nikos

    2007-02-01

    Thermobifida fusca is a moderately thermophilic soilbacterium that belongs to Actinobacteria. 3 It is a major degrader ofplant cell walls and has been used as a model organism for the study of 4secreted, thermostable cellulases. The complete genome sequence showedthat T. fusca has a 5 single circular chromosome of 3642249 bp predictedto encode 3117 proteins and 65 RNA6 species with a coding densityof 85percent. Genome analysis revealed the existence of 29 putative 7glycoside hydrolases in addition to the previously identified cellulasesand xylanases. The 8 glycosyl hydrolases include enzymes predicted toexhibit mainly dextran/starch and xylan 9 degrading functions. T. fuscapossesses two protein secretion systems: the sec general secretion 10system and the twin-arginine translocation system. Several of thesecreted cellulases have 11 sequence signatures indicating theirsecretion may be mediated by the twin-arginine12 translocation system. T.fusca has extensive transport systems for import of carbohydrates 13coupled to transcriptional regulators controlling the expression of thetransporters and14 glycosylhydrolases. In addition to providing anoverview of the physiology of a soil 15 actinomycete, this study presentsinsights on the transcriptional regulation and secretion of16 cellulaseswhich may facilitate the industrial exploitation of thesesystems.

  15. FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing.

    PubMed

    Shen, Ronglai; Seshan, Venkatraman E

    2016-09-19

    Allele-specific copy number analysis (ASCN) from next generation sequencing (NGS) data can greatly extend the utility of NGS beyond the identification of mutations to precisely annotate the genome for the detection of homozygous/heterozygous deletions, copy-neutral loss-of-heterozygosity (LOH), allele-specific gains/amplifications. In addition, as targeted gene panels are increasingly used in clinical sequencing studies for the detection of 'actionable' mutations and copy number alterations to guide treatment decisions, accurate, tumor purity-, ploidy- and clonal heterogeneity-adjusted integer copy number calls are greatly needed to more reliably interpret NGS-based cancer gene copy number data in the context of clinical sequencing. We developed FACETS, an ASCN tool and open-source software with a broad application to whole genome, whole-exome, as well as targeted panel sequencing platforms. It is a fully integrated stand-alone pipeline that includes sequencing BAM file post-processing, joint segmentation of total- and allele-specific read counts, and integer copy number calls corrected for tumor purity, ploidy and clonal heterogeneity, with comprehensive output and integrated visualization. We demonstrate the application of FACETS using The Cancer Genome Atlas (TCGA) whole-exome sequencing of lung adenocarcinoma samples. We also demonstrate its application to a clinical sequencing platform based on a targeted gene panel.

  16. FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing

    PubMed Central

    Shen, Ronglai; Seshan, Venkatraman E.

    2016-01-01

    Allele-specific copy number analysis (ASCN) from next generation sequencing (NGS) data can greatly extend the utility of NGS beyond the identification of mutations to precisely annotate the genome for the detection of homozygous/heterozygous deletions, copy-neutral loss-of-heterozygosity (LOH), allele-specific gains/amplifications. In addition, as targeted gene panels are increasingly used in clinical sequencing studies for the detection of ‘actionable’ mutations and copy number alterations to guide treatment decisions, accurate, tumor purity-, ploidy- and clonal heterogeneity-adjusted integer copy number calls are greatly needed to more reliably interpret NGS-based cancer gene copy number data in the context of clinical sequencing. We developed FACETS, an ASCN tool and open-source software with a broad application to whole genome, whole-exome, as well as targeted panel sequencing platforms. It is a fully integrated stand-alone pipeline that includes sequencing BAM file post-processing, joint segmentation of total- and allele-specific read counts, and integer copy number calls corrected for tumor purity, ploidy and clonal heterogeneity, with comprehensive output and integrated visualization. We demonstrate the application of FACETS using The Cancer Genome Atlas (TCGA) whole-exome sequencing of lung adenocarcinoma samples. We also demonstrate its application to a clinical sequencing platform based on a targeted gene panel. PMID:27270079

  17. FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing.

    PubMed

    Shen, Ronglai; Seshan, Venkatraman E

    2016-09-19

    Allele-specific copy number analysis (ASCN) from next generation sequencing (NGS) data can greatly extend the utility of NGS beyond the identification of mutations to precisely annotate the genome for the detection of homozygous/heterozygous deletions, copy-neutral loss-of-heterozygosity (LOH), allele-specific gains/amplifications. In addition, as targeted gene panels are increasingly used in clinical sequencing studies for the detection of 'actionable' mutations and copy number alterations to guide treatment decisions, accurate, tumor purity-, ploidy- and clonal heterogeneity-adjusted integer copy number calls are greatly needed to more reliably interpret NGS-based cancer gene copy number data in the context of clinical sequencing. We developed FACETS, an ASCN tool and open-source software with a broad application to whole genome, whole-exome, as well as targeted panel sequencing platforms. It is a fully integrated stand-alone pipeline that includes sequencing BAM file post-processing, joint segmentation of total- and allele-specific read counts, and integer copy number calls corrected for tumor purity, ploidy and clonal heterogeneity, with comprehensive output and integrated visualization. We demonstrate the application of FACETS using The Cancer Genome Atlas (TCGA) whole-exome sequencing of lung adenocarcinoma samples. We also demonstrate its application to a clinical sequencing platform based on a targeted gene panel. PMID:27270079

  18. Sequence analysis of mitochondrial DNA hypervariable regions using infrared fluorescence detection.

    PubMed

    Steffens, D L; Roy, R

    1998-06-01

    The non-coding region of the mitochondrial genome provides an attractive target for human forensic identification studies. Two hypervariable (HV) regions, each approximately 250-350 bp in length, contain the majority of mitochondrial DNA (mtDNA) sequence variability among different individuals. Various approaches to determine mtDNA sequence were evaluated utilizing highly sensitive infrared (IR) fluorescence detection. HV regions were amplified either together or separately and cycle-sequenced using a Thermo Sequenase protocol. An M13 universal primer sequence tail covalently attached to the 5' terminus of an amplification primer facilitated electrophoretic analysis and direct sequencing of the amplification products using IR detection. PMID:9631201

  19. Transcriptome Analysis of the Mud Crab (Scylla paramamosain) by 454 Deep Sequencing: Assembly, Annotation, and Marker Discovery

    PubMed Central

    Ma, Hongyu; Ma, Chunyan; Li, Shujuan; Jiang, Wei; Li, Xincang; Liu, Yuexing; Ma, Lingbo

    2014-01-01

    In this study, we reported the characterization of the first transcriptome of the mud crab (Scylla paramamosain). Pooled cDNAs of four tissue types from twelve wild individuals were sequenced using the Roche 454 FLX platform. Analysis performed included de novo assembly of transcriptome sequences, functional annotation, and molecular marker discovery. A total of 1,314,101 high quality reads with an average length of 411 bp were generated by 454 sequencing on a mixed cDNA library. De novo assembly of these 1,314,101 reads produced 76,778 contigs (consisting of 818,154 reads) with 5.4-fold average sequencing coverage. The remaining 495,947 reads were singletons. A total of 78,268 unigenes were identified based on sequence similarity with known proteins (E≤0.00001) in UniProt and non-redundant protein databases. Meanwhile, 44,433 sequences were identified (E≤0.00001) using a BLASTN search against the NCBI nucleotide database. Gene Ontology (GO) analysis indicated that biosynthetic process, cell part, and ion binding were the most abundant terms in biological process, cellular component, and molecular function categories, respectively. Kyoto Encyclopedia of Genes and Genome (KEGG) pathway analysis revealed that 4,878 unigenes distributed in 281 different pathways. In addition, 19,011 microsatellites and 37,063 potential single nucleotide polymorphisms were detected from the transcriptome of S. paramamosain. Finally, thirty polymorphic microsatellite markers were developed and used to assess genetic diversity of a wild population of S. paramamosain. So far, existing sequence resources for S. paramamosain are extremely limited. The present study provides a characterization of transcriptome from multiple tissues and individuals, as well as an assessment of genetic diversity of a wild population. These sequence resources will facilitate the investigation of population genetic diversity, the development of genetic maps, and the conduct of molecular marker

  20. Using Whole Genome Analysis to Examine Recombination across Diverse Sequence Types of Staphylococcus aureus

    PubMed Central

    Driebe, Elizabeth M.; Sahl, Jason W.; Roe, Chandler; Bowers, Jolene R.; Schupp, James M.; Gillece, John D.; Kelley, Erin; Price, Lance B.; Pearson, Talima R.; Hepp, Crystal M.; Brzoska, Pius M.; Cummings, Craig A.; Furtado, Manohar R.; Andersen, Paal S.; Stegger, Marc; Engelthaler, David M.; Keim, Paul S.

    2015-01-01

    Staphylococcus aureus is an important clinical pathogen worldwide and understanding this organism's phylogeny and, in particular, the role of recombination, is important both to understand the overall spread of virulent lineages and to characterize outbreaks. To further elucidate the phylogeny of S. aureus, 35 diverse strains were sequenced using whole genome sequencing. In addition, 29 publicly available whole genome sequences were included to create a single nucleotide polymorphism (SNP)-based phylogenetic tree encompassing 11 distinct lineages. All strains of a particular sequence type fell into the same clade with clear groupings of the major clonal complexes of CC8, CC5, CC30, CC45 and CC1. Using a novel analysis method, we plotted the homoplasy density and SNP density across the whole genome and found evidence of recombination throughout the entire chromosome, but when we examined individual clonal lineages we found very little recombination. However, when we analyzed three branches of multiple lineages, we saw intermediate and differing levels of recombination between them. These data demonstrate that in S. aureus, recombination occurs across major lineages that subsequently expand in a clonal manner. Estimated mutation rates for the CC8 and CC5 lineages were different from each other. While the CC8 lineage rate was similar to previous studies, the CC5 lineage was 100-fold greater. Fifty known virulence genes were screened in all genomes in silico to determine their distribution across major clades. Thirty-three genes were present variably across clades, most of which were not constrained by ancestry, indicating horizontal gene transfer or gene loss. PMID:26161978

  1. High-throughput analysis of T-DNA location and structure using sequence capture

    DOE PAGES

    Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.; Comai, Luca

    2015-10-07

    Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously,more » using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. As a result, our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.« less

  2. High-throughput analysis of T-DNA location and structure using sequence capture

    SciTech Connect

    Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.; Comai, Luca

    2015-10-07

    Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. As a result, our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.

  3. High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture

    PubMed Central

    Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.; Comai, Luca

    2015-01-01

    Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines. PMID:26445462

  4. Whole genome sequence and analysis of the Marwari horse breed and its genetic origin

    PubMed Central

    2014-01-01

    Background The horse (Equus ferus caballus) is one of the earliest domesticated species and has played an important role in the development of human societies over the past 5,000 years. In this study, we characterized the genome of the Marwari horse, a rare breed with unique phenotypic characteristics, including inwardly turned ear tips. It is thought to have originated from the crossbreeding of local Indian ponies with Arabian horses beginning in the 12th century. Results We generated 101 Gb (~30 × coverage) of whole genome sequences from a Marwari horse using the Illumina HiSeq2000 sequencer. The sequences were mapped to the horse reference genome at a mapping rate of ~98% and with ~95% of the genome having at least 10 × coverage. A total of 5.9 million single nucleotide variations, 0.6 million small insertions or deletions, and 2,569 copy number variation blocks were identified. We confirmed a strong Arabian and Mongolian component in the Marwari genome. Novel variants from the Marwari sequences were annotated, and were found to be enriched in olfactory functions. Additionally, we suggest a potential functional genetic variant in the TSHZ1 gene (p.Ala344>Val) associated with the inward-turning ear tip shape of the Marwari horses. Conclusions Here, we present an analysis of the Marwari horse genome. This is the first genomic data for an Asian breed, and is an invaluable resource for future studies of genetic variation associated with phenotypes and diseases in horses. PMID:25521865

  5. Sequence analysis of Meq oncogene among Indian isolates of Marek's disease herpesvirus.

    PubMed

    Gupta, Mridula; Deka, Dipak; Ramneek

    2016-09-01

    Marek's disease (MD), caused by Marek's disease virus (MDV), is a highly contagious neoplastic disease of chicken that can be prevented by vaccination. However, in recent years many cases of vaccine failure have been reported worldwide as chickens develop symptoms of MD in spite of proper vaccination. Distinct polymorphism and point mutations in Meq gene of MDV have been reported to be associated with virulence and oncogenicity. The present study was carried out with the objective to isolate and characterize field isolates of MDV on the basis of Meq gene. Twenty five samples of suspected cases of MD were collected and processed for virus isolation in duck embryo fibroblast (DEF) primary culture where 28% (7 of 25) samples showed characteristic cytopathic effects of MDV in the form of plaques and syncytia. Additional evidence of presence of MDV in these samples was confirmed by PCR. To analyze diversity in all seven isolates of MDV, a polymorphism study was carried out by cloning and sequencing of full length of Meq gene (1020 bp). Sequence homology of 7 isolates with 23 reference strains showed 98.10-99.40% similarity in nucleotide and 95.90-98.50% similarity in amino acid sequences. Six isolates revealed 5 repeat sequences of 4 prolines (PPPP) whereas, one isolate revealed only 4 repeats. In phylogenetic analysis, these isolates formed a separate cluster showing close relatedness to the Chinese isolates. The study indicates a high mutation rate in field isolates of MDV that may be probable cause of vaccination failure. PMID:27617224

  6. Mercury: Next-gen Data Analysis and Annotation Pipeline (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Sexton, David

    2012-06-01

    David Sexton (Baylor) gives a talk titled "Mercury: Next-gen Data Analysis and Annotation Pipeline" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  7. Mercury: Next-gen Data Analysis and Annotation Pipeline (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Sexton, David [Baylor

    2016-07-12

    David Sexton (Baylor) gives a talk titled "Mercury: Next-gen Data Analysis and Annotation Pipeline" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  8. Phylogenetic Analysis of the Bifidobacterium Genus Using Glycolysis Enzyme Sequences

    PubMed Central

    Brandt, Katelyn; Barrangou, Rodolphe

    2016-01-01

    Bifidobacteria are important members of the human gastrointestinal tract that promote the establishment of a healthy microbial consortium in the gut of infants. Recent studies have established that the Bifidobacterium genus is a polymorphic phylogenetic clade, which encompasses a diversity of species and subspecies that encode a broad range of proteins implicated in complex and non-digestible carbohydrate uptake and catabolism, ranging from human breast milk oligosaccharides, to plant fibers. Recent genomic studies have created a need to properly place Bifidobacterium species in a phylogenetic tree. Current approaches, based on core-genome analyses come at the cost of intensive sequencing and demanding analytical processes. Here, we propose a typing method based on sequences of glycolysis genes and the proteins they encode, to provide insights into diversity, typing, and phylogeny in this complex and broad genus. We show that glycolysis genes occur broadly in these genomes, to encode the machinery necessary for the biochemical spine of the cell, and provide a robust phylogenetic marker. Furthermore, glycolytic sequences-based trees are congruent with both the classical 16S rRNA phylogeny, and core genome-based strain clustering. Furthermore, these glycolysis markers can also be used to provide insights into the adaptive evolution of this genus, especially with regards to trends toward a high GC content. This streamlined method may open new avenues for phylogenetic studies on a broad scale, given the widespread occurrence of the glycolysis pathway in bacteria, and the diversity of the sequences they encode. PMID:27242688

  9. Phylogenetic Analysis of the Bifidobacterium Genus Using Glycolysis Enzyme Sequences.

    PubMed

    Brandt, Katelyn; Barrangou, Rodolphe

    2016-01-01

    Bifidobacteria are important members of the human gastrointestinal tract that promote the establishment of a healthy microbial consortium in the gut of infants. Recent studies have established that the Bifidobacterium genus is a polymorphic phylogenetic clade, which encompasses a diversity of species and subspecies that encode a broad range of proteins implicated in complex and non-digestible carbohydrate uptake and catabolism, ranging from human breast milk oligosaccharides, to plant fibers. Recent genomic studies have created a need to properly place Bifidobacterium species in a phylogenetic tree. Current approaches, based on core-genome analyses come at the cost of intensive sequencing and demanding analytical processes. Here, we propose a typing method based on sequences of glycolysis genes and the proteins they encode, to provide insights into diversity, typing, and phylogeny in this complex and broad genus. We show that glycolysis genes occur broadly in these genomes, to encode the machinery necessary for the biochemical spine of the cell, and provide a robust phylogenetic marker. Furthermore, glycolytic sequences-based trees are congruent with both the classical 16S rRNA phylogeny, and core genome-based strain clustering. Furthermore, these glycolysis markers can also be used to provide insights into the adaptive evolution of this genus, especially with regards to trends toward a high GC content. This streamlined method may open new avenues for phylogenetic studies on a broad scale, given the widespread occurrence of the glycolysis pathway in bacteria, and the diversity of the sequences they encode. PMID:27242688

  10. Analysis of the complete DNA sequence of murine cytomegalovirus.

    PubMed Central

    Rawlinson, W D; Farrell, H E; Barrell, B G

    1996-01-01

    The complete DNA sequence of the Smith strain of murine cytomegalovirus (MCMV) was determined from virion DNA by using a whole-genome shotgun approach. The genome has an overall G+C content of 58.7%, consists of 230,278 bp, and is arranged as a single unique sequence with short (31-bp) terminal direct repeats and several short internal repeats. Significant similarity to the genome of the sequenced human cytomegalovirus (HCMV) strain AD169 is evident, particularly for 78 open reading frames encoded by the central part of the genome. There is a very similar distribution of G+C content across the two genomes. Sequences toward the ends of the MCMV genome encode tandem arrays of homologous glycoproteins (gps) arranged as two gene families. The left end encodes 15 gps that represent one family, and the right end encodes a different family of 11 gps. A homolog (m144) of cellular major histocompatibility complex (MHC) class I genes is located at the end of the genome opposite the HCMV MHC class I homolog (UL18). G protein-coupled receptor (GCR) homologs (M33 and M78) occur in positions congruent with two (UL33 and UL78) of the four putative HCMV GCR homologs. Counterparts of all of the known enzyme homologs in HCMV are present in the MCMV genome, including the phosphotransferase gene (M97), whose product phosphorylates ganciclovir in HCMV-infected cells, and the assembly protein (M80). PMID:8971012

  11. Analysis of the complete DNA sequence of murine cytomegalovirus.

    PubMed

    Rawlinson, W D; Farrell, H E; Barrell, B G

    1996-12-01

    The complete DNA sequence of the Smith strain of murine cytomegalovirus (MCMV) was determined from virion DNA by using a whole-genome shotgun approach. The genome has an overall G+C content of 58.7%, consists of 230,278 bp, and is arranged as a single unique sequence with short (31-bp) terminal direct repeats and several short internal repeats. Significant similarity to the genome of the sequenced human cytomegalovirus (HCMV) strain AD169 is evident, particularly for 78 open reading frames encoded by the central part of the genome. There is a very similar distribution of G+C content across the two genomes. Sequences toward the ends of the MCMV genome encode tandem arrays of homologous glycoproteins (gps) arranged as two gene families. The left end encodes 15 gps that represent one family, and the right end encodes a different family of 11 gps. A homolog (m144) of cellular major histocompatibility complex (MHC) class I genes is located at the end of the genome opposite the HCMV MHC class I homolog (UL18). G protein-coupled receptor (GCR) homologs (M33 and M78) occur in positions congruent with two (UL33 and UL78) of the four putative HCMV GCR homologs. Counterparts of all of the known enzyme homologs in HCMV are present in the MCMV genome, including the phosphotransferase gene (M97), whose product phosphorylates ganciclovir in HCMV-infected cells, and the assembly protein (M80). PMID:8971012

  12. Functional analysis of bipartite begomovirus coat protein promoter sequences

    SciTech Connect

    Lacatus, Gabriela; Sunter, Garry

    2008-06-20

    We demonstrate that the AL2 gene of Cabbage leaf curl virus (CaLCuV) activates the CP promoter in mesophyll and acts to derepress the promoter in vascular tissue, similar to that observed for Tomato golden mosaic virus (TGMV). Binding studies indicate that sequences mediating repression and activation of the TGMV and CaLCuV CP promoter specifically bind different nuclear factors common to Nicotiana benthamiana, spinach and tomato. However, chromatin immunoprecipitation demonstrates that TGMV AL2 can interact with both sequences independently. Binding of nuclear protein(s) from different crop species to viral sequences conserved in both bipartite and monopartite begomoviruses, including TGMV, CaLCuV, Pepper golden mosaic virus and Tomato yellow leaf curl virus suggests that bipartite begomoviruses bind common host factors to regulate the CP promoter. This is consistent with a model in which AL2 interacts with different components of the cellular transcription machinery that bind viral sequences important for repression and activation of begomovirus CP promoters.

  13. Learning Progressions and Teaching Sequences: A Review and Analysis

    ERIC Educational Resources Information Center

    Duschl, Richard; Maeng, Seungho; Sezen, Asli

    2011-01-01

    Our paper is an analytical review of the design, development and reporting of learning progressions and teaching sequences. Research questions are: (1) what criteria are being used to propose a "hypothetical learning progression/trajectory" and (2) what measurements/evidence are being used to empirically define and refine a "hypothetical learning…

  14. Microbial community analysis of swine wastewater anaerobic lagoons by next-generation DNA sequencing.

    PubMed

    Ducey, Thomas F; Hunt, Patrick G

    2013-06-01

    Anaerobic lagoons are a standard practice for the treatment of swine wastewater. This practice relies heavily on microbiological processes to reduce concentrated organic material and nutrients. Despite this reliance on microbiological processes, research has only recently begun to identify and enumerate the myriad and complex interactions that occur in this microbial ecosystem. To further this line of study, we utilized a next-generation sequencing (NGS) technology to gain a deeper insight into the microbial communities along the water column of four anaerobic swine wastewater lagoons. Analysis of roughly one million 16S rDNA sequences revealed a predominance of operational taxonomic units (OTUs) classified as belonging to the phyla Firmicutes (54.1%) and Proteobacteria (15.8%). At the family level, 33 bacterial families were found in all 12 lagoon sites and accounted for between 30% and 50% of each lagoon's OTUs. Analysis by nonmetric multidimensional scaling (NMS) revealed that TKN, COD, ORP, TSS, and DO were the major environmental variables in affecting microbial community structure. Overall, 839 individual genera were classified, with 223 found in all four lagoons. An additional 321 genera were identified in sole lagoons. The top 25 genera accounted for approximately 20% of the OTUs identified in the study, and the low abundances of most of the genera suggests that most OTUs are present at low levels. Overall, these results demonstrate that anaerobic lagoons have distinct microbial communities which are strongly controlled by the environmental conditions present in each individual lagoon.

  15. XplorSeq: A software environment for integrated management and phylogenetic analysis of metagenomic sequence data

    PubMed Central

    Frank, Daniel N

    2008-01-01

    Background Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects. Results XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI). Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly), perform BLAST (Basic Local Alignment and Search Tool; [1-3]) searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file. Conclusion XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at . PMID:18840282

  16. Addressing challenges in the production and analysis of illumina sequencing data.

    PubMed

    Kircher, Martin; Heyn, Patricia; Kelso, Janet

    2011-01-01

    Advances in DNA sequencing technologies have made it possible to generate large amounts of sequence data very rapidly and at substantially lower cost than capillary sequencing. These new technologies have specific characteristics and limitations that require either consideration during project design, or which must be addressed during data analysis. Specialist skills, both at the laboratory and the computational stages of project design and analysis, are crucial to the generation of high quality data from these new platforms. The Illumina sequencers (including the Genome Analyzers I/II/IIe/IIx and the new HiScan and HiSeq) represent a widely used platform providing parallel readout of several hundred million immobilized sequences using fluorescent-dye reversible-terminator chemistry. Sequencing library quality, sample handling, instrument settings and sequencing chemistry have a strong impact on sequencing run quality. The presence of adapter chimeras and adapter sequences at the end of short-insert molecules, as well as increased error rates and short read lengths complicate many computational analyses. We discuss here some of the factors that influence the frequency and severity of these problems and provide solutions for circumventing these. Further, we present a set of general principles for good analysis practice that enable problems with sequencing runs to be identified and dealt with.

  17. FASTAptamer: A Bioinformatic Toolkit for High-throughput Sequence Analysis of Combinatorial Selections

    PubMed Central

    Alam, Khalid K; Chang, Jonathan L; Burke, Donald H

    2015-01-01

    High-throughput sequence (HTS) analysis of combinatorial selection populations accelerates lead discovery and optimization and offers dynamic insight into selection processes. An underlying principle is that selection enriches high-fitness sequences as a fraction of the population, whereas low-fitness sequences are depleted. HTS analysis readily provides the requisite numerical information by tracking the evolutionary trajectory of individual sequences in response to selection pressures. Unlike genomic data, for which a number of software solutions exist, user-friendly tools are not readily available for the combinatorial selections field, leading many users to create custom software. FASTAptamer was designed to address the sequence-level analysis needs of the field. The open source FASTAptamer toolkit counts, normalizes and ranks read counts in a FASTQ file, compares populations for sequence distribution, generates clusters of sequence families, calculates fold-enrichment of sequences throughout the course of a selection and searches for degenerate sequence motifs. While originally designed for aptamer selections, FASTAptamer can be applied to any selection strategy that can utilize next-generation DNA sequencing, such as ribozyme or deoxyribozyme selections, in vivo mutagenesis and various surface display technologies (peptide, antibody fragment, mRNA, etc.). FASTAptamer software, sample data and a user's guide are available for download at http://burkelab.missouri.edu/fastaptamer.html. PMID:25734917

  18. Sequence analysis of 203 kilobases from Saccharomyces cerevisiae chromosome VII.

    PubMed

    Rieger, M; Brückner, M; Schäfer, M; Müller-Auer, S

    1997-09-15

    The nucleotide sequences of five major regions from chromosome VII of Saccharomyces cerevisiae have been determined and analysed. These regions represent 203 kilobases corresponding to approximately one-fifth of the complete yeast chromosome VII. Two fragments originate from the left arm of this chromosome. The first one of about 15.8 kb starts approximately 75 kb from the left telomere and is bordered by the SK18 chromosomal marker. The second fragment covers the 72.6 kb region between the chromosomal markers CYH2 and ALG2. On the right chromosomal arm three regions, a 70.6 kb region between the MSB2 and the KSS1 chromosomal markers and two smaller regions dominated by the KRE11 marker and another one in the vicinity of the SER2 marker were sequenced. We found a total of 114 open reading frames (ORFs), 13 of which were completely overlapping with larger ORFs running in the opposite direction. A total of 44 yeast genes, the physiological functions of which are known, could be precisely mapped on this chromosome. Of the remaining 57 ORFs, 26 shared sequence homologies with known genes, among which were 13 other S. cerevisiae genes and five genes from other organisms. No homology with any sequence in the databases could be found for 31 ORFs. Furthermore, five Ty elements were found, one of which may not be functional due to a frame shift in its Ty1B amino acid sequence. The five chromosomal regions harboured five potential ARS elements and one sigma element together with eight tRNA genes and two snRNAs, one of which is encoded by an intron of a protein-coding gene. PMID:9290212

  19. Quantitative analysis of a deeply sequenced marine microbial metatranscriptome.

    PubMed

    Gifford, Scott M; Sharma, Shalabh; Rinta-Kanto, Johanna M; Moran, Mary Ann

    2011-03-01

    The potential of metatranscriptomic sequencing to provide insights into the environmental factors that regulate microbial activities depends on how fully the sequence libraries capture community expression (that is, sample-sequencing depth and coverage depth), and the sensitivity with which expression differences between communities can be detected (that is, statistical power for hypothesis testing). In this study, we use an internal standard approach to make absolute (per liter) estimates of transcript numbers, a significant advantage over proportional estimates that can be biased by expression changes in unrelated genes. Coastal waters of the southeastern United States contain 1 × 10(12) bacterioplankton mRNA molecules per liter of seawater (~200 mRNA molecules per bacterial cell). Even for the large bacterioplankton libraries obtained in this study (~500,000 possible protein-encoding sequences in each of two libraries after discarding rRNAs and small RNAs from >1 million 454 FLX pyrosequencing reads), sample-sequencing depth was only 0.00001%. Expression levels of 82 genes diagnostic for transformations in the marine nitrogen, phosphorus and sulfur cycles ranged from below detection (<1 × 10(6) transcripts per liter) for 36 genes (for example, phosphonate metabolism gene phnH, dissimilatory nitrate reductase subunit napA) to >2.7 × 10(9) transcripts per liter (ammonia transporter amt and ammonia monooxygenase subunit amoC). Half of the categories for which expression was detected, however, had too few copy numbers for robust statistical resolution, as would be required for comparative (experimental or time-series) expression studies. By representing whole community gene abundance and expression in absolute units (per volume or mass of environment), 'omics' data can be better leveraged to improve understanding of microbially mediated processes in the ocean.

  20. Novel technologies applied to the nucleotide sequencing and comparative sequence analysis of the genomes of infectious agents in veterinary medicine.

    PubMed

    Granberg, F; Bálint, Á; Belák, S

    2016-04-01

    Next-generation sequencing (NGS), also referred to as deep, high-throughput or massively parallel sequencing, is a powerful new tool that can be used for the complex diagnosis and intensive monitoring of infectious disease in veterinary medicine. NGS technologies are also being increasingly used to study the aetiology, genomics, evolution and epidemiology of infectious disease, as well as host-pathogen interactions and other aspects of infection biology. This review briefly summarises recent progress and achievements in this field by first introducing a range of novel techniques and then presenting examples of NGS applications in veterinary infection biology. Various work steps and processes for sampling and sample preparation, sequence analysis and comparative genomics, and improving the accuracy of genomic prediction are discussed, as are bioinformatics requirements. Examples of sequencing-based applications and comparative genomics in veterinary medicine are then provided. This review is based on novel references selected from the literature and on experiences of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine, Uppsala, Sweden.

  1. Novel technologies applied to the nucleotide sequencing and comparative sequence analysis of the genomes of infectious agents in veterinary medicine.

    PubMed

    Granberg, F; Bálint, Á; Belák, S

    2016-04-01

    Next-generation sequencing (NGS), also referred to as deep, high-throughput or massively parallel sequencing, is a powerful new tool that can be used for the complex diagnosis and intensive monitoring of infectious disease in veterinary medicine. NGS technologies are also being increasingly used to study the aetiology, genomics, evolution and epidemiology of infectious disease, as well as host-pathogen interactions and other aspects of infection biology. This review briefly summarises recent progress and achievements in this field by first introducing a range of novel techniques and then presenting examples of NGS applications in veterinary infection biology. Various work steps and processes for sampling and sample preparation, sequence analysis and comparative genomics, and improving the accuracy of genomic prediction are discussed, as are bioinformatics requirements. Examples of sequencing-based applications and comparative genomics in veterinary medicine are then provided. This review is based on novel references selected from the literature and on experiences of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine, Uppsala, Sweden. PMID:27217166

  2. Digital fragment analysis of short tandem repeats by high-throughput amplicon sequencing.

    PubMed

    Darby, Brian J; Erickson, Shay F; Hervey, Samuel D; Ellis-Felege, Susan N

    2016-07-01

    High-throughput sequencing has been proposed as a method to genotype microsatellites and overcome the four main technical drawbacks of capillary electrophoresis: amplification artifacts, imprecise sizing, length homoplasy, and limited multiplex capability. The objective of this project was to test a high-throughput amplicon sequencing approach to fragment analysis of short tandem repeats and characterize its advantages and disadvantages against traditional capillary electrophoresis. We amplified and sequenced 12 muskrat microsatellite loci from 180 muskrat specimens and analyzed the sequencing data for precision of allele calling, propensity for amplification or sequencing artifacts, and for evidence of length homoplasy. Of the 294 total alleles, we detected by sequencing, only 164 alleles would have been detected by capillary electrophoresis as the remaining 130 alleles (44%) would have been hidden by length homoplasy. The ability to detect a greater number of unique alleles resulted in the ability to resolve greater population genetic structure. The primary advantages of fragment analysis by sequencing are the ability to precisely size fragments, resolve length homoplasy, multiplex many individuals and many loci into a single high-throughput run, and compare data across projects and across laboratories (present and future) with minimal technical calibration. A significant disadvantage of fragment analysis by sequencing is that the method is only practical and cost-effective when performed on batches of several hundred samples with multiple loci. Future work is needed to optimize throughput while minimizing costs and to update existing microsatellite allele calling and analysis programs to accommodate sequence-aware microsatellite data. PMID:27386092

  3. Comparative analysis of antigen-targeting sequences used in DNA vaccines.

    PubMed

    Carvalho, Joana A; Azzoni, Adriano R; Prazeres, Duarte M F; Monteiro, Gabriel A

    2010-03-01

    Plasmid vectors can be optimized by including specific signals that promote antigen targeting to the major antigen presentation and processing pathways, increasing the immunogenicity and potency of DNA vaccines. A pVAX1-based backbone was used to encode the Green Fluorescence Protein (GFP) reporter gene fused either to ISG (Invariant Surface Glycoprotein) or to TSA (trans-sialidase) Trypanosoma brucei genes. The plasmids were further engineered to carry antigen-targeting sequences, which promote protein transport to the extracellular space (secretion signal), lysosomes (LAMP-1) and to the endoplasmic reticulum (adenovirus e1a). Transfection efficiency was not affected by differences in the size between each construct as no differences in the plasmid copy number per cell were found. This finding also suggests that the addition of both ISG gene and targeting sequences did not add sensitive regions prone to nuclease attack to the plasmid. Cells transfected with pVAX1GFP had a significant higher number of transcripts. This could be a result of lower mRNA stability and/or a lower transcription rate associated with the bigger transcripts. On the other hand, no differences were found between transcript levels of each ISG-GFP plasmids. Therefore, the addition of these targeting sequences does not affect the maturation/stability of the transcripts. Microscopy analysis showed differences in protein localization and fluorescent levels of cells transfected with pVAX1GFP and ISG constructs. Moreover, cells transfected with the lamp and secretory sequences presented a distinct distribution pattern when compared with ISG protein. Protein expression was quantified by flow cytometry. Higher cell fluorescence was observed in cells expressing the cytoplasmic fusion protein (ISG-GFP or TSA-GFP) compared with cells where the protein was transported to the lysosomal pathway. Protein transport to the endoplasmic reticulum does not lead to a decrease in the mean fluorescence values. The

  4. The use of additive and subtractive approaches to examine the nuclear localization sequence of the polyomavirus major capsid protein VP1

    NASA Technical Reports Server (NTRS)

    Chang, D.; Haynes, J. I. 2nd; Brady, J. N.; Consigli, R. A.; Spooner, B. S. (Principal Investigator)

    1992-01-01

    A nuclear localization signal (NLS) has been identified in the N-terminal (Ala1-Pro-Lys-Arg-Lys-Ser-Gly-Val-Ser-Lys-Cys11) amino acid sequence of the polyomavirus major capsid protein VP1. The importance of this amino acid sequence for nuclear transport of VP1 protein was demonstrated by a genetic "subtractive" study using the constructs pSG5VP1 (full-length VP1) and pSG5 delta 5'VP1 (truncated VP1, lacking amino acids Ala1-Cys11). These constructs were used to transfect COS-7 cells, and expression and intracellular localization of the VP1 protein was visualized by indirect immunofluorescence. These studies revealed that the full-length VP1 was expressed and localized in the nucleus, while the truncated VP1 protein was localized in the cytoplasm and not transported to the nucleus. These findings were substantiated by an "additive" approach using FITC-labeled conjugates of synthetic peptides homologous to the NLS of VP1 cross-linked to bovine serum albumin or immunoglobulin G. Both conjugates localized in the nucleus after microinjection into the cytoplasm of 3T6 cells. The importance of individual amino acids found in the basic sequence (Lys3-Arg-Lys5) of the NLS was also investigated. This was accomplished by synthesizing three additional peptides in which lysine-3 was substituted with threonine, arginine-4 was substituted with threonine, or lysine-5 was substituted with threonine. It was found that lysine-3 was crucial for nuclear transport, since substitution of this amino acid with threonine prevented nuclear localization of the microinjected, FITC-labeled conjugate.

  5. Long terminal repeat of murine retroviral DNAs: sequence analysis, host-proviral junctions, and preintegration site.

    PubMed Central

    Van Beveren, C; Rands, E; Chattopadhyay, S K; Lowy, D R; Verma, I M

    1982-01-01

    The nucleotide sequence of the long terminal repeat (LTR) of three murine retroviral DNAs has been determined. The data indicate that the U5 region (sequences originating from the 5' end of the genome) of various LTRs is more conserved than the U3 region (sequences from the 3' end of the genome). The location and sequence of the control elements such as the 5' cap, "TATA-like" sequences, "CCAAT-box," and presumptive polyadenylic acid addition signal AATAAA in the various LTRs are nearly identical. Some murine retroviral DNAs contain a duplication of sequences within the LTR ranging in size from 58 to 100 base pairs. A variant of molecularly cloned Moloney murine sarcoma virus DNA in which one of the two LTRs integrated into the viral DNA was also analyzed. A 4-base-pair duplication was generated at the site of integration of LTR in the viral DNA. The host-viral junction of two molecularly cloned AKR-murine leukemia virus DNAs (clones 623 and 614) was determined. In the case of AKR-623 DNA, a 3- or 4-base-pair direct repeat of cellular sequences flanking the viral DNA was observed. However, AKR-614 DNA contained a 5-base-pair repeat of cellular sequences. The nucleotide sequence of the preintegration site of AKR-623 DNA revealed that the cellular sequences duplicated during integration are present only once. Finally, a striking homology between the sequences flanking the preintegration site and viral LTRs was observed. Images PMID:6281466

  6. K-mer natural vector and its application to the phylogenetic analysis of genetic sequences

    PubMed Central

    Wen, Jia; Chan, Raymond H.; Yau, Shek-Chung; He, Rong L.; Yau, Stephen S. T.

    2014-01-01

    Based on the well-known k-mer model, we propose a k-mer natural vector model for representing a genetic sequence based on the numbers and distributions of k-mers in the sequence. We show that there exists a one-to-one correspondence between a genetic sequence and its associated k-mer natural vector. The k-mer natural vector method can be easily and quickly used to perform phylogenetic analysis of genetic sequences without requiring evolutionary models or human intervention. Whole or partial genomes can be handled more effective with our proposed method. It is applied to the phylogenetic analysis of genetic sequences, and the obtaining results fully demonstrate that the k-mer natural vector method is a very powerful tool for analysing and annotating genetic sequences and determining evolutionary relationships both in terms of accuracy and efficiency. PMID:24858075

  7. Computational methods for the analysis of tag sequences in metagenomics studies.

    PubMed

    Chang, Qin; Luan, Yihui; Chen, Ting; Fuhrman, Jed A; Sun, Fengzhu

    2012-06-01

    Metagenomics commonly refers to the study of genetic materials directly derived from environments without culturing. Several ongoing large-scale metagenomics projects related to human and marine life, as well as pedology studies, have generated enormous amounts of data, posing a key challenge for efficient analysis, as we try to 1) understand microbial organism assemblage under different conditions, 2) compare different communities, and 3) understand how microbial organisms associate with each other and the environment.To address such questions, investigators are using new sequencing technologies, including Sanger, Illumina Solexa, and Roche 454, to sequence either particular genes, called tag sequences, mostly 16S or 18S ribosomal RNA sequences or other conserved genes, or whole metagenome shotgun sequences of all the genetic materials in a given community. In this paper, we review computational methods used for the analysis of tag sequences.

  8. [Highly efficient and rapid capillary electrophoretic analysis of seven organic acid additives in beverages using polymeric ionic liquid as additive].

    PubMed

    Han, Haifeng; Wang, Qing; Liu, Xi; Jiang, Shengxiang

    2012-05-01

    A new capillary electrophoretic method for the rapid and direct separation of seven organic acids in beverages was developed, with poly (1-vinyl-3-butylimidazolium bromide) as the reliable background electrolyte modifier to reverse the direction of anode electroosmotic flow (EOF) severely. Several factors that affected the separation efficiency were investigated in detail. The optimal running buffer consisted of 125 mmol/L sodium dihydrogen phosphate (pH 6.5) and 0.01 g/L poly (1-vinyl-3-butylimidazolium bromide). Highly efficient separation (105,000 to 636,000 plates/m) was achieved within 4 min and standard deviations of the migration times (n=3) were lower than 0.0213 min under optimal conditions. The limits of detection (S/N = 3) ranged from 0.001 to 0.05 g/L. The present method was applied to determine a beverage sample (Mirinda) for sodium citrate, benzoic acid and sorbic acid with concentration of 2.64, 0.10 and 0.08 g/L, respectively. The recoveries of the three analytes in the sample were 100.3%, 100.7% and 131.7%, respectively. The method is simple, rapid, inexpensive, and can be applied to determine organic acids as additives in beverages.

  9. The sequence and analysis of duplication rich human chromosome 16

    SciTech Connect

    Martin, J; Han, C; Gordon, L A; Terry, A; Prabhakar, S; She, X; Xie, G; Hellsten, U; Chan, Y M; Altherr, M; Couronne, O; Aerts, A; Bajorek, E; Black, S; Blumer, H; Branscomb, E; Brown, N; Bruno, W J; Buckingham, J; Callen, D F; Campbell, C S; Campbell, M L; Campbell, E W; Caoile, C; Challacombe, J F; Chasteen, L A; Chertkov, O; Chi, H C; Christensen, M; Clark, L M; Cohn, J D; Denys, M; Detter, J C; Dickson, M; Dimitrijevic-Bussod, M; Escobar, J; Fawcett, J J; Flowers, D; Fotopulos, D; Glavina, T; Gomez, M; Gonzales, E; Goodstein, D; Goodwin, L A; Grady, D L; Grigoriev, I; Groza, M; Hammon, N; Hawkins, T; Haydu, L; Hildebrand, C E; Huang, W; Israni, S; Jett, J; Jewett, P B; Kadner, K; Kimball, H; Kobayashi, A; Krawczyk, M; Leyba, T; Longmire, J L; Lopez, F; Lou, Y; Lowry, S; Ludeman, T; Manohar, C F; Mark, G A; McMurray, K L; Meincke, L J; Morgan, J; Moyzis, R K; Mundt, M O; Munk, A C; Nandkeshwar, R D; Pitluck, S; Pollard, M; Predki, P; Parson-Quintana, B; Ramirez, L; Rash, S; Retterer, J; Ricke, D O; Robinson, D; Rodriguez, A; Salamov, A; Saunders, E H; Scott, D; Shough, T; Stallings, R L; Stalvey, M; Sutherland, R D; Tapia, R; Tesmer, J G; Thayer, N; Thompson, L S; Tice, H; Torney, D C; Tran-Gyamfi, M; Tsai, M; Ulanovsky, L E; Ustaszewska, A; Vo, N; White, P S; Williams, A L; Wills, P L; Wu, J; Wu, K; Yang, J; DeJong, P; Bruce, D; Doggett, N A; Deaven, L; Schmutz, J; Grimwood, J; Richardson, P; Rokhsar, D S; Eichler, E E; Gilna, P; Lucas, S M; Myers, R M; Rubin, E M; Pennacchio, L A

    2005-04-06

    Human chromosome 16 features one of the highest levels of segmentally duplicated sequence among the human autosomes. We report here the 78,884,754 base pairs of finished chromosome 16 sequence, representing over 99.9% of its euchromatin. Manual annotation revealed 880 protein-coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes, and 3 RNA pseudogenes. These genes include metallothionein, cadherin, and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobase pairs were identified and result in gene content differences among humans. While the segmental duplications of chromosome 16 are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events likely to have had an impact on the evolution of primates and human disease susceptibility.

  10. Drug resistance analysis by next generation sequencing in Leishmania

    PubMed Central

    Leprohon, Philippe; Fernandez-Prada, Christopher; Gazanion, Élodie; Monte-Neto, Rubens; Ouellette, Marc

    2014-01-01

    The use of next generation sequencing has the power to expedite the identification of drug resistance determinants and biomarkers and was applied successfully to drug resistance studies in Leishmania. This allowed the identification of modulation in gene expression, gene dosage alterations, changes in chromosome copy numbers and single nucleotide polymorphisms that correlated with resistance in Leishmania strains derived from the laboratory and from the field. An impressive heterogeneity at the population level was also observed, individual clones within populations often differing in both genotypes and phenotypes, hence complicating the elucidation of resistance mechanisms. This review summarizes the most recent highlights that whole genome sequencing brought to our understanding of Leishmania drug resistance and likely new directions. PMID:25941624

  11. mitoSAVE: mitochondrial sequence analysis of variants in Excel.

    PubMed

    King, Jonathan L; Sajantila, Antti; Budowle, Bruce

    2014-09-01

    The mitochondrial genome (mtGenome) contains genetic information amenable to numerous applications such as medical research, population and evolutionary studies, and human identity testing. However, inconsistent nomenclature assignment makes haplotype comparison difficult and can lead to false exclusion of potentially useful profiles. Massively Parallel Sequencing (MPS) is a platform for sequencing large datasets and potentially whole populations with relative ease. However, the data generated are not easily parsed and interpreted. With this in mind, mitoSAVE has been developed to enable fast conversion of Variant Call Format (VCF) files. mitoSAVE is an Excel-based workbook that converts data within the VCF into mtDNA haplotypes using phylogenetically-established nomenclature as well as rule-based alignments consistent with current forensic standards. mitoSAVE is formatted for human mitochondrial genome; however, it can easily be adapted to support other reasonably small genomes.

  12. The DNA Sequence And Comparative Analysis Of Human Chromosome5

    SciTech Connect

    Schmutz, Jeremy; Martin, Joel; Terry, Astrid; Couronne, Olivier; Grimwood, Jane; Lowry, Steve; Gordon, Laurie A.; Scott, Duncan; Xie,Gary; Huang, Wayne; Hellsten, Uffe; Tran-Gyamfi, Mary; She, Xinwei; Prabhakar, Shyam; Aerts, Andrea; Altherr, Michael; Bajorek, Eva; Black,Stacey; Branscomb, Elbert; Caoile, Chenier; Challacombe, Jean F.; Chan,Yee Man; Denys, Mirian; Detter, John C.; Escobar, Julio; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Israni, Sanjay; Jett, Jamie; Kadner,Kristen; Kimball, Heather; Kobayashi, Arthur; Lopez, Frederick; Lou,Yunian; Martinez, Diego; Medina, Catherine; Morgan, Jenna; Nandkeshwar,Richard; Noonan, James P.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Priest, James; Ramirez, Lucia; Retterer, James; Rodriguez, Alex; Rogers,Stephanie; Salamov, Asaf; Salazar, Angelica; Thayer, Nina; Tice, Hope; Tsai, Ming; Ustaszewska, Anna; Vo, Nu; Wheeler, Jeremy; Wu, Kevin; Yang,Joan; Dickson, Mark; Cheng, Jan-Fang; Eichler, Evan E.; Olsen, Anne; Pennacchio, Len A.; Rokhsar, Daniel S.; Richardson, Paul; Lucas, SusanM.; Myers, Richard M.; Rubin, Edward M.

    2004-08-01

    Chromosome 5 is one of the largest human chromosomes and contains numerous intrachromosomal duplications, yet it has one of the lowest gene densities. This is partially explained by numerous gene-poor regions that display a remarkable degree of noncoding conservation with non-mammalian vertebrates, suggesting that they are functionally constrained. In total, we compiled 177.7 million base pairs of highly accurate finished sequence containing 923 manually curated protein-coding genes including the protocadherin and interleukin gene families. We also completely sequenced versions of the large chromosome-5-specific internal duplications. These duplications are very recent evolutionary events and probably have a mechanistic role in human physiological variation, as deletions in these regions are the cause of debilitating disorders including spinal muscular atrophy.

  13. Analysis of Whole Transcriptome Sequencing Data: Workflow and Software

    PubMed Central

    Yang, In Seok

    2015-01-01

    RNA is a polymeric molecule implicated in various biological processes, such as the coding, decoding, regulation, and expression of genes. Numerous studies have examined RNA features using whole transcriptome sequencing (RNA-seq) approaches. RNA-seq is a powerful technique for characterizing and quantifying the transcriptome and accelerates the development of bioinformatics software. In this review, we introduce routine RNA-seq workflow together with related software, focusing particularly on transcriptome reconstruction and expression quantification. PMID:26865842

  14. Sequence analysis and structural implications of rotavirus capsid proteins.

    PubMed

    Parbhoo, N; Dewar, J B; Gildenhuys, S

    2016-01-01

    Rotavirus is the major cause of severe virus-associated gastroenteritis worldwide in children aged 5 and younger. Many children lose their lives annually due to this infection and the impact is particularly pronounced in developing countries. The mature rotavirus is a non-enveloped triple-layered nucleocapsid containing 11 double stranded RNA segments. Here a global view on the sequence and structure of the three main capsid proteins, VP2, VP6 and VP7 is shown by generating a consensus sequence for each of these rotavirus proteins, for each species obtained from published data of representative rotavirus genotypes from across the world and across species. Degree of conservation between species was represented on homology models for each of the proteins. VP7 shows the highest level of variation with 14-45 amino acids showing conservation of less than 60%. These changes are localised to the outer surface alluding to a possible mechanism in evading the immune system. The middle layer, VP6 shows lower variability with only 14-32 sites having lower than 70% conservation. The inner structural layer made up of VP2 showed the lowest variability with only 1-16 sites having less than 70% conservation across species. The results correlate with each protein's multiple structural roles in the infection cycle. Thus, although the nucleotide sequences vary due to the error-prone nature of replication and lack of proof reading, the corresponding amino acid sequence of VP2, 6 and 7 remain relatively conserved. Benefits of this knowledge about the conservation include the ability to target proteins at sites that cannot undergo mutational changes without influencing viral fitness; as well as possibility to study systems that are highly evolved for structure and function in order to determine how to generate and manipulate such systems for use in various biotechnological applications. PMID:27640436

  15. Structure prediction and analysis of neuraminidase sequence variants.

    PubMed

    Thayer, Kelly M

    2016-07-01

    Analyzing protein structure has become an integral aspect of understanding systems of biochemical import. The laboratory experiment endeavors to introduce protein folding to ascertain structures of proteins for which the structure is unavailable, as well as to critically evaluate the quality of the prediction obtained. The model system used is the highly mutable influenza virus protein neuraminidase, which is the key target in the development of therapeutics. In light of recent pandemics, understanding how mutations confer drug resistance, which translates at the molecular level to understanding how different sequence variants differ, constitutes an area of great interest because of the ramifications in public health. This lab targets upper level undergraduate biochemistry students, and aims to introduce tools to be used to explore protein folding and protein visualization in the context of the neuraminidase case study. Students proceed to critically evaluate the folded models by comparison with crystallographic structures. When validity is established, they fold a neuraminidase sequence for which a structure is not available. Through structural alignment and visual inspection of the 150 loop, students gain molecular insight into two possible conformations of the protein, which are actively being studied. Folding the third chosen sequence mimics a true research environment in allowing students to generate a structure from a sequence for which a structure was not previously available, and to assess whether their particular variant has an open or closed loop. From this vantage, they are then challenged to speculate about the connection between loop conformation and drug susceptibility. © 2016 by The International Union of Biochemistry and Molecular Biology, 44(4):361-376, 2016. PMID:26900942

  16. Molecular Identification of Two Strains of Phellinus sp. by Internal Transcribed Spacer Sequence Analysis

    PubMed Central

    2011-01-01

    Two species of cultivated Phellinus sp. were identified as P. baumii by internal transcribed spacer (ITS) sequence analysis. The fruit bodies of the examined strains were similar to those of naturally occurring strains, having a bracket-like form, yellow-to-orange color, and poroid hymenial surfaces. The DNA sequences of ITS region of both strains showed a homology of 99% with ITS1 to ITS2 sequences of P. (Inonotus) baumii strain PB0806. PMID:22783119

  17. Sorbitol dehydrogenase. Full-length cDNA sequencing reveals a mRNA coding for a protein containing an additional 42 amino acids at the N-terminal end.

    PubMed

    Wen, Y; Bekhor, I

    1993-10-01

    A cDNA clone encoding rat sorbitol dehydrogenase (SDH) was isolated from a rat testis lambda ZAP II cDNA library. The full-length cDNA insert contained 2277 base pairs (bp), starting 182 bp upstream from an ATG codon where translation to the active enzyme SDH is presumed to be initiated. A second ATG codon, however, was found 126 bp upstream, aligned in the same reading frame as that of the active enzyme. Therefore, the coding sequence for SDH can be translated into an additional 42-amino-acid polypeptide linked to the N-terminal amino acid of the enzyme, generating a pre-sorbitol dehydrogenase. The sequence data indicate that the nucleotide environment around this ATG codon is more favorable towards it being the actual open reading frame (ORF) for a pre-SDH than the ATG codon preceding the nucleotide sequence for SDH. Since no known SDH starts with the additional 42 amino acids, it may be that post-translational removal of this polypeptide accompanies the release of the active enzyme. Next, the 3' untranslated region of the cDNA contained a non-coding 1021 bp downstream from the TAA stop codon. The latter sequence included three putative poly(A) signals: one at nucleotides 1362-1367, the second at nucleotides 1465-1470, and the third at nucleotides 2212-2217 [17 bp away from the poly(A) tail]. In addition to the above findings we also report a variance in one of the amino acids in the SDH cDNA sequence. This variance occurs at position 957-960, where threonine is coded for instead of aspartic acid; in the rat testis SDH cDNA, we find the sequence is ACG instead of GAC, as was reported for the rat liver SDH cDNA. Northern-blot hybridization analysis showed that SDH mRNA is a doublet, one band of 4 kb and the other of 2.3-2.4 kb, in both the rat liver and the rat lens, further confirming that the isolated SDH cDNA constituted a full-length cDNA.

  18. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments.

    PubMed

    Schwarz, Roland F; Tamuri, Asif U; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M; Schultz, Jörg; Goldman, Nick

    2016-05-01

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles). PMID:26819408

  19. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments.

    PubMed

    Schwarz, Roland F; Tamuri, Asif U; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M; Schultz, Jörg; Goldman, Nick

    2016-05-01

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles).

  20. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments

    PubMed Central

    Schwarz, Roland F.; Tamuri, Asif U.; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M.; Schultz, Jörg; Goldman, Nick

    2016-01-01

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles). PMID:26819408

  1. Analysis of xylem formation in pine by cDNA sequencing

    NASA Technical Reports Server (NTRS)

    Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.; Whetten, R. W.; Davies, E. (Principal Investigator)

    1998-01-01

    Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.

  2. Complete sequence of the genome of the human isolate of Andes virus CHI-7913: comparative sequence and protein structure analysis.

    PubMed

    Tischler, Nicole D; Fernández, Jorge; Müller, Ilse; Martínez, Rodrigo; Galeno, Héctor; Villagra, Eliecer; Mora, Judith; Ramírez, Eugenio; Rosemblatt, Mario; Valenzuela, Pablo D

    2003-01-01

    We report here the complete genomic sequence of the Chilean human isolate of Andes virus CHI-7913. The S, M, and L genome segment sequences of this isolate are 1,802, 3,641 and 6,466 bases in length, with an overall GC content of 38.7%. These genome segments code for a nucleocapsid protein of 428 amino acids, a glycoprotein precursor protein of 1,138 amino acids and a RNA-dependent RNA polymerase of 2,152 amino acids. In addition, the genome also has other ORFs coding for putative proteins of 34 to 103 amino acids. The encoded proteins have greater than 98% overall similarity with the proteins of Andes virus isolates AH-1 and Chile R123. Among other sequenced Hantavirus, CHI-7913 is more closely related to Sin Nombre virus, with an overall protein similarity of 92%. The characteristics of the encoded proteins of this isolate, such as hydrophobic domains, glycosylation sites, and conserved amino acid motifs shared with other Hantavirus and other members of the Bunyaviridae family, are identified and discussed.

  3. The Complete Genome Sequence and Analysis of the Epsilonproteobacterium Arcobacter butzleri

    PubMed Central

    Miller, William G.; Parker, Craig T.; Rubenfield, Marc; Mendz, George L.; Wösten, Marc M. S. M.; Ussery, David W.; Stolz, John F.; Binnewies, Tim T.; Hallin, Peter F.; Wang, Guilin; Malek, Joel A.; Rogosin, Andrea; Stanker, Larry H.; Mandrell, Robert E.

    2007-01-01

    Background Arcobacter butzleri is a member of the epsilon subdivision of the Proteobacteria and a close taxonomic relative of established pathogens, such as Campylobacter jejuni and Helicobacter pylori. Here we present the complete genome sequence of the human clinical isolate, A. butzleri strain RM4018. Methodology/Principal Findings Arcobacter butzleri is a member of the Campylobacteraceae, but the majority of its proteome is most similar to those of Sulfuromonas denitrificans and Wolinella succinogenes, both members of the Helicobacteraceae, and those of the deep-sea vent Epsilonproteobacteria Sulfurovum and Nitratiruptor. In addition, many of the genes and pathways described here, e.g. those involved in signal transduction and sulfur metabolism, have been identified previously within the epsilon subdivision only in S. denitrificans, W. succinogenes, Sulfurovum, and/or Nitratiruptor, or are unique to the subdivision. In addition, the analyses indicated also that a substantial proportion of the A. butzleri genome is devoted to growth and survival under diverse environmental conditions, with a large number of respiration-associated proteins, signal transduction and chemotaxis proteins and proteins involved in DNA repair and adaptation. To investigate the genomic diversity of A. butzleri strains, we constructed an A. butzleri DNA microarray comprising 2238 genes from strain RM4018. Comparative genomic indexing analysis of 12 additional A. butzleri strains identified both the core genes of A. butzleri and intraspecies hypervariable regions, where <70% of the genes were present in at least two strains. Conclusion/Significance The presence of pathways and loci associated often with non-host-associated organisms, as well as genes associated with virulence, suggests that A. butzleri is a free-living, water-borne organism that might be classified rightfully as an emerging pathogen. The genome sequence and analyses presented in this study are an important first step in

  4. Analysis of a marine picoplankton community by 16S rRNA gene cloning and sequencing.

    PubMed Central

    Schmidt, T M; DeLong, E F; Pace, N R

    1991-01-01

    The phylogenetic diversity of an oligotrophic marine picoplankton community was examined by analyzing the sequences of cloned ribosomal genes. This strategy does not rely on cultivation of the resident microorganisms. Bulk genomic DNA was isolated from picoplankton collected in the north central Pacific Ocean by tangential flow filtration. The mixed-population DNA was fragmented, size fractionated, and cloned into bacteriophage lambda. Thirty-eight clones containing 16S rRNA genes were identified in a screen of 3.2 x 10(4) recombinant phage, and portions of the rRNA gene were amplified by polymerase chain reaction and sequenced. The resulting sequences were used to establish the identities of the picoplankton by comparison with an established data base of rRNA sequences. Fifteen unique eubacterial sequences were obtained, including four from cyanobacteria and eleven from proteobacteria. A single eucaryote related to dinoflagellates was identified; no archaebacterial sequences were detected. The cyanobacterial sequences are all closely related to sequences from cultivated marine Synechococcus strains and with cyanobacterial sequences obtained from the Atlantic Ocean (Sargasso Sea). Several sequences were related to common marine isolates of the gamma subdivision of proteobacteria. In addition to sequences closely related to those of described bacteria, sequences were obtained from two phylogenetic groups of organisms that are not closely related to any known rRNA sequences from cultivated organisms. Both of these novel phylogenetic clusters are proteobacteria, one group within the alpha subdivision and the other distinct from known proteobacterial subdivisions. The rRNA sequences of the alpha-related group are nearly identical to those of some Sargasso Sea picoplankton, suggesting a global distribution of these organisms. Images PMID:2066334

  5. Multivariate qualitative analysis of banned additives in food safety using surface enhanced Raman scattering spectroscopy

    NASA Astrophysics Data System (ADS)

    He, Shixuan; Xie, Wanyi; Zhang, Wei; Zhang, Liqun; Wang, Yunxia; Liu, Xiaoling; Liu, Yulong; Du, Chunlei

    2015-02-01

    A novel strategy which combines iteratively cubic spline fitting baseline correction method with discriminant partial least squares qualitative analysis is employed to analyze the surface enhanced Raman scattering (SERS) spectroscopy of banned food additives, such as Sudan I dye and Rhodamine B in food, Malachite green residues in aquaculture fish. Multivariate qualitative analysis methods, using the combination of spectra preprocessing iteratively cubic spline fitting (ICSF) baseline correction with principal component analysis (PCA) and discriminant partial least squares (DPLS) classification respectively, are applied to investigate the effectiveness of SERS spectroscopy for predicting the class assignments of unknown banned food additives. PCA cannot be used to predict the class assignments of unknown samples. However, the DPLS classification can discriminate the class assignment of unknown banned additives using the information of differences in relative intensities. The results demonstrate that SERS spectroscopy combined with ICSF baseline correction method and exploratory analysis methodology DPLS classification can be potentially used for distinguishing the banned food additives in field of food safety.

  6. Multivariate qualitative analysis of banned additives in food safety using surface enhanced Raman scattering spectroscopy.

    PubMed

    He, Shixuan; Xie, Wanyi; Zhang, Wei; Zhang, Liqun; Wang, Yunxia; Liu, Xiaoling; Liu, Yulong; Du, Chunlei

    2015-02-25

    A novel strategy which combines iteratively cubic spline fitting baseline correction method with discriminant partial least squares qualitative analysis is employed to analyze the surface enhanced Raman scattering (SERS) spectroscopy of banned food additives, such as Sudan I dye and Rhodamine B in food, Malachite green residues in aquaculture fish. Multivariate qualitative analysis methods, using the combination of spectra preprocessing iteratively cubic spline fitting (ICSF) baseline correction with principal component analysis (PCA) and discriminant partial least squares (DPLS) classification respectively, are applied to investigate the effectiveness of SERS spectroscopy for predicting the class assignments of unknown banned food additives. PCA cannot be used to predict the class assignments of unknown samples. However, the DPLS classification can discriminate the class assignment of unknown banned additives using the information of differences in relative intensities. The results demonstrate that SERS spectroscopy combined with ICSF baseline correction method and exploratory analysis methodology DPLS classification can be potentially used for distinguishing the banned food additives in field of food safety.

  7. Multivariate qualitative analysis of banned additives in food safety using surface enhanced Raman scattering spectroscopy.

    PubMed

    He, Shixuan; Xie, Wanyi; Zhang, Wei; Zhang, Liqun; Wang, Yunxia; Liu, Xiaoling; Liu, Yulong; Du, Chunlei

    2015-02-25

    A novel strategy which combines iteratively cubic spline fitting baseline correction method with discriminant partial least squares qualitative analysis is employed to analyze the surface enhanced Raman scattering (SERS) spectroscopy of banned food additives, such as Sudan I dye and Rhodamine B in food, Malachite green residues in aquaculture fish. Multivariate qualitative analysis methods, using the combination of spectra preprocessing iteratively cubic spline fitting (ICSF) baseline correction with principal component analysis (PCA) and discriminant partial least squares (DPLS) classification respectively, are applied to investigate the effectiveness of SERS spectroscopy for predicting the class assignments of unknown banned food additives. PCA cannot be used to predict the class assignments of unknown samples. However, the DPLS classification can discriminate the class assignment of unknown banned additives using the information of differences in relative intensities. The results demonstrate that SERS spectroscopy combined with ICSF baseline correction method and exploratory analysis methodology DPLS classification can be potentially used for distinguishing the banned food additives in field of food safety. PMID:25300041

  8. Stimulation of terrestrial ecosystem carbon storage by nitrogen addition: a meta-analysis

    NASA Astrophysics Data System (ADS)

    Yue, Kai; Peng, Yan; Peng, Changhui; Yang, Wanqin; Peng, Xin; Wu, Fuzhong

    2016-01-01

    Elevated nitrogen (N) deposition alters the terrestrial carbon (C) cycle, which is likely to feed back to further climate change. However, how the overall terrestrial ecosystem C pools and fluxes respond to N addition remains unclear. By synthesizing data from multiple terrestrial ecosystems, we quantified the response of C pools and fluxes to experimental N addition using a comprehensive meta-analysis method. Our results showed that N addition significantly stimulated soil total C storage by 5.82% ([2.47%, 9.27%], 95% CI, the same below) and increased the C contents of the above- and below-ground parts of plants by 25.65% [11.07%, 42.12%] and 15.93% [6.80%, 25.85%], respectively. Furthermore, N addition significantly increased aboveground net primary production by 52.38% [40.58%, 65.19%] and litterfall by 14.67% [9.24%, 20.38%] at a global scale. However, the C influx from the plant litter to the soil through litter decomposition and the efflux from the soil due to microbial respiration and soil respiration showed insignificant responses to N addition. Overall, our meta-analysis suggested that N addition will increase soil C storage and plant C in both above- and below-ground parts, indicating that terrestrial ecosystems might act to strengthen as a C sink under increasing N deposition.

  9. Stimulation of terrestrial ecosystem carbon storage by nitrogen addition: a meta-analysis

    PubMed Central

    Yue, Kai; Peng, Yan; Peng, Changhui; Yang, Wanqin; Peng, Xin; Wu, Fuzhong

    2016-01-01

    Elevated nitrogen (N) deposition alters the terrestrial carbon (C) cycle, which is likely to feed back to further climate change. However, how the overall terrestrial ecosystem C pools and fluxes respond to N addition remains unclear. By synthesizing data from multiple terrestrial ecosystems, we quantified the response of C pools and fluxes to experimental N addition using a comprehensive meta-analysis method. Our results showed that N addition significantly stimulated soil total C storage by 5.82% ([2.47%, 9.27%], 95% CI, the same below) and increased the C contents of the above- and below-ground parts of plants by 25.65% [11.07%, 42.12%] and 15.93% [6.80%, 25.85%], respectively. Furthermore, N addition significantly increased aboveground net primary production by 52.38% [40.58%, 65.19%] and litterfall by 14.67% [9.24%, 20.38%] at a global scale. However, the C influx from the plant litter to the soil through litter decomposition and the efflux from the soil due to microbial respiration and soil respiration showed insignificant responses to N addition. Overall, our meta-analysis suggested that N addition will increase soil C storage and plant C in both above- and below-ground parts, indicating that terrestrial ecosystems might act to strengthen as a C sink under increasing N deposition. PMID:26813078

  10. A Proposed Taxonomy of Anaerobic Fungi (Class Neocallimastigomycetes) Suitable for Large-Scale Sequence-Based Community Structure Analysis

    PubMed Central

    Kittelmann, Sandra; Naylor, Graham E.; Koolaard, John P.; Janssen, Peter H.

    2012-01-01

    Anaerobic fungi are key players in the breakdown of fibrous plant material in the rumen, but not much is known about the composition and stability of fungal communities in ruminants. We analyzed anaerobic fungi in 53 rumen samples from farmed sheep (4 different flocks), cattle, and deer feeding on a variety of diets. Denaturing gradient gel electrophoresis fingerprinting of the internal transcribed spacer 1 (ITS1) region of the rrn operon revealed a high diversity of anaerobic fungal phylotypes across all samples. Clone libraries of the ITS1 region were constructed from DNA from 11 rumen samples that had distinctly different fungal communities. A total of 417 new sequences were generated to expand the number and diversity of ITS1 sequences available. Major phylogenetic groups of anaerobic fungi in New Zealand ruminants belonged to the genera Piromyces, Neocallimastix, Caecomyces and Orpinomyces. In addition, sequences forming four novel clades were obtained, which may represent so far undetected genera or species of anaerobic fungi. We propose a revised phylogeny and pragmatic taxonomy for anaerobic fungi, which was tested and proved suitable for analysis of datasets stemming from high-throughput next-generation sequencing methods. Comparing our revised taxonomy to the taxonomic assignment of sequences deposited in the GenBank database, we believe that >29% of ITS1 sequences derived from anaerobic fungal isolates or clones are misnamed at the genus level. PMID:22615827

  11. Molecular identification of veterinary yeast isolates by use of sequence-based analysis of the D1/D2 region of the large ribosomal subunit.

    PubMed

    Garner, Cherilyn D; Starr, Jennifer K; McDonough, Patrick L; Altier, Craig

    2010-06-01

    Conventional methods of yeast identification are often time-consuming and difficult; however, recent studies of sequence-based identification methods have shown promise. Additionally, little is known about the diversity of yeasts identified from various animal species in veterinary diagnostic laboratories. Therefore, in this study, we examined three methods of identification by using 109 yeast samples isolated during a 1-year period from veterinary clinical samples. Comparison of the three methods-traditional substrate assimilation, fatty acid profile analysis, and sequence-based analysis of the region spanning the D1 and D2 regions (D1/D2) of the large ribosomal subunit-showed that sequence analysis provided the highest percent identification among the three. Sequence analysis identified 87% of isolates to the species level, whereas substrate assimilation and fatty acid profile analysis identified only 54% and 47%, respectively. Less-stringent criteria for identification increased the percentage of isolates identified to 98% for sequence analysis, 62% for substrate assimilation, and 55% for fatty acid profile analysis. We also found that sequence analysis of the internal transcribed spacer 2 (ITS2) region provided further identification for 36% of yeast not identified to the species level by D1/D2 sequence analysis. Additionally, we identified a large variety of yeast from animal sources, with at least 30 different species among the isolates tested, and with the majority not belonging to the common Candida spp., such as C. albicans, C. glabrata, C. tropicalis, and the C. parapsilosis group. Thus, we determined that sequence analysis of the D1/D2 region was the best method for identification of the variety of yeasts found in a veterinary population.

  12. Molecular Identification of Veterinary Yeast Isolates by Use of Sequence-Based Analysis of the D1/D2 Region of the Large Ribosomal Subunit▿

    PubMed Central

    Garner, Cherilyn D.; Starr, Jennifer K.; McDonough, Patrick L.; Altier, Craig

    2010-01-01

    Conventional methods of yeast identification are often time-consuming and difficult; however, recent studies of sequence-based identification methods have shown promise. Additionally, little is known about the diversity of yeasts identified from various animal species in veterinary diagnostic laboratories. Therefore, in this study, we examined three methods of identification by using 109 yeast samples isolated during a 1-year period from veterinary clinical samples. Comparison of the three methods—traditional substrate assimilation, fatty acid profile analysis, and sequence-based analysis of the region spanning the D1 and D2 regions (D1/D2) of the large ribosomal subunit—showed that sequence analysis provided the highest percent identification among the three. Sequence analysis identified 87% of isolates to the species level, whereas substrate assimilation and fatty acid profile analysis identified only 54% and 47%, respectively. Less-stringent criteria for identification increased the percentage of isolates identified to 98% for sequence analysis, 62% for substrate assimilation, and 55% for fatty acid profile analysis. We also found that sequence analysis of the internal transcribed spacer 2 (ITS2) region provided further identification for 36% of yeast not identified to the species level by D1/D2 sequence analysis. Additionally, we identified a large variety of yeast from animal sources, with at least 30 different species among the isolates tested, and with the majority not belonging to the common Candida spp., such as C. albicans, C. glabrata, C. tropicalis, and the C. parapsilosis group. Thus, we determined that sequence analysis of the D1/D2 region was the best method for identification of the variety of yeasts found in a veterinary population. PMID:20392917

  13. Sequence and structural analysis of BTB domain proteins

    PubMed Central

    Stogios, Peter J; Downs, Gregory S; Jauhal, Jimmy JS; Nandra, Sukhjeen K; Privé, Gilbert G

    2005-01-01

    Background The BTB domain (also known as the POZ domain) is a versatile protein-protein interaction motif that participates in a wide range of cellular functions, including transcriptional regulation, cytoskeleton dynamics, ion channel assembly and gating, and targeting proteins for ubiquitination. Several BTB domain structures have been experimentally determined, revealing a highly conserved core structure. Results We surveyed the protein architecture, genomic distribution and sequence conservation of BTB domain proteins in 17 fully sequenced eukaryotes. The BTB domain is typically found as a single copy in proteins that contain only one or two other types of domain, and this defines the BTB-zinc finger (BTB-ZF), BTB-BACK-kelch (BBK), voltage-gated potassium channel T1 (T1-Kv), MATH-BTB, BTB-NPH3 and BTB-BACK-PHR (BBP) families of proteins, among others. In contrast, the Skp1 and ElonginC proteins consist almost exclusively of the core BTB fold. There are numerous lineage-specific expansions of BTB proteins, as seen by the relatively large number of BTB-ZF and BBK proteins in vertebrates, MATH-BTB proteins in Caenorhabditis elegans, and BTB-NPH3 proteins in Arabidopsis thaliana. Using the structural homology between Skp1 and the PLZF BTB homodimer, we present a model of a BTB-Cul3 SCF-like E3 ubiquitin ligase complex that shows that the BTB dimer or the T1 tetramer is compatible in this complex. Conclusion Despite widely divergent sequences, the BTB fold is structurally well conserved. The fold has adapted to several different modes of self-association and interactions with non-BTB proteins. PMID:16207353

  14. Analysis of Binary Series to Evaluate Astronomical Forcing of a Middle Permian Chert Sequence in South China

    NASA Astrophysics Data System (ADS)

    Hinnov, L. A.; Yao, X.; Zhou, Y.

    2014-12-01

    We describe a Middle Permian radiolarian chert sequence in South China (Chaohu area), with sequence of chert and mudstone layers formulated into binary series.Two interpolation approaches were tested: linear interpolation resulting in a "triangle" series, and staircase interpolation resulting in a "boxcar" series. Spectral analysis of the triangle series reveals decimeter chert-mudstone cycles which represent theoretical Middle Permian 32 kyr obliquity cycling. Tuning these cycles to a 32-kyr periodicity reveals that other cm-scale cycles are in the precession index band and have a strong ~400 kyr amplitude modulation. Additional tuning tests further support a hypothesis of astronomical forcing of the chert sequence. Analysis of the boxcar series reveals additional "eccentricity" terms transmitted by the boxcar representation of the modulating precession-scale cycles. An astronomical time scale reconstructed from these results assumes a Roadian/Wordian boundary age of 268.8 Ma for the onset of the first chert layer at the base of the sequence and ends at 264.1 Ma, for a total duration of 4.7 Myrs. We propose that monsoon-controlled upwelling contributed to the development of the chert-mudstone cycles. A seasonal monsoon controlled by astronomical forcing influenced the intensity of upwelling, modulating radiolarian productivity and silica deposition.

  15. New approaches for computer analysis of nucleic acid sequences.

    PubMed

    Karlin, S; Ghandour, G; Ost, F; Tavare, S; Korn, L J

    1983-09-01

    A new high-speed computer algorithm is outlined that ascertains within and between nucleic acid and protein sequences all direct repeats, dyad symmetries, and other structural relationships. Large repeats, repeats of high frequency, dyad symmetries of specified stem length and loop distance, and their distributions are determined. Significance of homologies is assessed by a hierarchy of permutation procedures. Applications are made to papovaviruses, the human papillomavirus HPV, lambda phage, the human and mouse mitochondrial genomes, and the human and mouse immunoglobulin kappa-chain genes. PMID:6577449

  16. Rapid ribosomal RNA sequencing and the phylogenetic analysis of protists.

    PubMed

    Johnson, A M; Baverstock, P R

    1989-04-01

    A newly described technique for rapidly obtaining the partial nucleotide sequence of ribosomal RNA is being applied to investigate phylogenetic relationships among living organisms. Alan Johnson and Peter Boverstock describe the importance of this method to parasitology in providing new information on the phylogenetic relationships of parasitic organisms previously placed in groups of convenience. The phylum Apicomplexo in particular, has been the object of much study using this technique, but the technology is likely to extend soon to the restructuring of the phylogenetic trees of many groups of parasites.

  17. On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis

    PubMed Central

    Li, Bing; Chun, Hyonho; Zhao, Hongyu

    2014-01-01

    We introduce a nonparametric method for estimating non-gaussian graphical models based on a new statistical relation called additive conditional independence, which is a three-way relation among random vectors that resembles the logical structure of conditional independence. Additive conditional independence allows us to use one-dimensional kernel regardless of the dimension of the graph, which not only avoids the curse of dimensionality but also simplifies computation. It also gives rise to a parallel structure to the gaussian graphical model that replaces the precision matrix by an additive precision operator. The estimators derived from additive conditional independence cover the recently introduced nonparanormal graphical model as a special case, but outperform it when the gaussian copula assumption is violated. We compare the new method with existing ones by simulations and in genetic pathway analysis. PMID:26401064

  18. Analysis of occupational accidents: prevention through the use of additional technical safety measures for machinery

    PubMed Central

    Dźwiarek, Marek; Latała, Agata

    2016-01-01

    This article presents an analysis of results of 1035 serious and 341 minor accidents recorded by Poland's National Labour Inspectorate (PIP) in 2005–2011, in view of their prevention by means of additional safety measures applied by machinery users. Since the analysis aimed at formulating principles for the application of technical safety measures, the analysed accidents should bear additional attributes: the type of machine operation, technical safety measures and the type of events causing injuries. The analysis proved that the executed tasks and injury-causing events were closely connected and there was a relation between casualty events and technical safety measures. In the case of tasks consisting of manual feeding and collecting materials, the injuries usually occur because of the rotating motion of tools or crushing due to a closing motion. Numerous accidents also happened in the course of supporting actions, like removing pollutants, correcting material position, cleaning, etc. PMID:26652689

  19. Transcriptome Analysis of Leaf Tissue of Raphanus sativus by RNA Sequencing

    PubMed Central

    Yin, Yongtai; Wu, Gang; Xia, Heng; Wang, Xiaodong; Fu, Chunhua; Li, Maoteng; Wu, Jiangsheng

    2013-01-01

    Raphanus sativus is not only a popular edible vegetable but also an important source of medicinal compounds. However, the paucity of knowledge about the transcriptome of R. sativus greatly impedes better understanding of the functional genomics and medicinal potential of R. sativus. In this study, the transcriptome sequencing of leaf tissues in R. sativus was performed for the first time. Approximately 22 million clean reads were generated and used for transcriptome assembly. The generated unigenes were subsequently annotated against gene ontology (GO) database. KEGG analysis further revealed two important pathways in the bolting stage of R.sativus including spliceosome assembly and alkaloid synthesis. In addition, a total of 6,295 simple sequence repeats (SSRs) with various motifs were identified in the unigene library of R. sativus. Finally, four unigenes of R. sativus were selected for alignment with their homologs from other plants, and phylogenetic trees for each of the genes were constructed. Taken together, this study will provide a platform to facilitate gene discovery and advance functional genomic research of R. sativus. PMID:24265813

  20. Transcriptome analysis of leaf tissue of Raphanus sativus by RNA sequencing.

    PubMed

    Zhang, Libin; Jia, Haibo; Yin, Yongtai; Wu, Gang; Xia, Heng; Wang, Xiaodong; Fu, Chunhua; Li, Maoteng; Wu, Jiangsheng

    2013-01-01

    Raphanus sativus is not only a popular edible vegetable but also an important source of medicinal compounds. However, the paucity of knowledge about the transcriptome of R. sativus greatly impedes better understanding of the functional genomics and medicinal potential of R. sativus. In this study, the transcriptome sequencing of leaf tissues in R. sativus was performed for the first time. Approximately 22 million clean reads were generated and used for transcriptome assembly. The generated unigenes were subsequently annotated against gene ontology (GO) database. KEGG analysis further revealed two important pathways in the bolting stage of R.sativus including spliceosome assembly and alkaloid synthesis. In addition, a total of 6,295 simple sequence repeats (SSRs) with various motifs were identified in the unigene library of R. sativus. Finally, four unigenes of R. sativus were selected for alignment with their homologs from other plants, and phylogenetic trees for each of the genes were constructed. Taken together, this study will provide a platform to facilitate gene discovery and advance functional genomic research of R. sativus.

  1. Genome Sequence and Analysis of the Soil Cellulolytic ActinomyceteThermobifida fusca

    SciTech Connect

    Lykidis, Athanasios; Ivanova, Natalia; Anderson, Iain; Mavromatis, Konstantinos; Copeland, Alex; Richardson, Paul; Lucas, Susan; DiBartolo, Genevieve; Martinez, Michele; Lapidus, Alla; Wilson, David B.; Kyrpides, Nikos

    2006-01-01

    Thermobifida fusca is a moderately thermophilic soilbacterium that belongs to Actinobacteria. It is a major degrader of plantcell walls and has been used as a model organism for the study ofsecreted, thermostable cellulases. The complete genome sequence showedthat T. fusca has a single circular chromosome of 3642249 bp predicted toencode 3117 proteins and 65 RNA species with a coding density of 85percent. Genome analysis revealed the existence of 29 putative glycosidehydrolases in addition to the previously identified cellulases andxylanases. The glycosyl hydrolases include enzymes predicted to exhibitmainly dextran/starch and xylan degrading functions. T. fusca possessestwo protein secretion systems: the sec general secretion system and thetwin-arginine translocation system. Several of the secreted cellulaseshave sequence signatures indicating their secretion may be mediated bythe twin-arginine translocation system. T. fusca has extensive transportsystems for import of carbohydrates coupled to transcriptional regulatorscontrolling the expression of the transporters and glycosylhydrolases. Inaddition to providing an overview of the physiology of a soilactinomycete, this study presents insights on the transcriptionalregulation and secretion of cellulases which may facilitate theindustrial exploitation of these systems.

  2. Transcript analysis of a goat mesenteric lymph node by deep next-generation sequencing.

    PubMed

    E, G X; Zhao, Y J; Na, R S; Huang, Y F

    2016-01-01

    Deep RNA sequencing (RNA-seq) provides a practical and inexpensive alternative for exploring genomic data in non-model organisms. The functional annotation of non-model mammalian genomes, such as that of goats, is still poor compared to that of humans and mice. In the current study, we performed a whole transcriptome analysis of an intestinal mucous membrane lymph node to comprehensively characterize the transcript catalogue of this tissue in a goat. Using an Illumina HiSeq 4000 sequencing platform, 9.692 GB of raw reads were acquired. A total of 57,526 lymph transcripts were obtained, and the majority of these were mapped to known transcriptional units (42.67%). A comparison of the mRNA expression of the mesenteric lymph nodes during the juvenile and post-adolescent stages revealed 8949 transcripts that were differentially expressed, including 6174 known genes. In addition, we functionally classified these transcripts using Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) terms. A total of 6174 known genes were assigned to 64 GO terms, and 3782 genes were assigned to 303 KEGG pathways, including some related to immunity. Our results reveal the complex transcriptome profile of the lymph node and suggest that the immune system is immature in the mesenteric lymph nodes of juvenile goats. PMID:27173308

  3. Transcript analysis of a goat mesenteric lymph node by deep next-generation sequencing.

    PubMed

    E, G X; Zhao, Y J; Na, R S; Huang, Y F

    2016-01-01

    Deep RNA sequencing (RNA-seq) provides a practical and inexpensive alternative for exploring genomic data in non-model organisms. The functional annotation of non-model mammalian genomes, such as that of goats, is still poor compared to that of humans and mice. In the current study, we performed a whole transcriptome analysis of an intestinal mucous membrane lymph node to comprehensively characterize the transcript catalogue of this tissue in a goat. Using an Illumina HiSeq 4000 sequencing platform, 9.692 GB of raw reads were acquired. A total of 57,526 lymph transcripts were obtained, and the majority of these were mapped to known transcriptional units (42.67%). A comparison of the mRNA expression of the mesenteric lymph nodes during the juvenile and post-adolescent stages revealed 8949 transcripts that were differentially expressed, including 6174 known genes. In addition, we functionally classified these transcripts using Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) terms. A total of 6174 known genes were assigned to 64 GO terms, and 3782 genes were assigned to 303 KEGG pathways, including some related to immunity. Our results reveal the complex transcriptome profile of the lymph node and suggest that the immune system is immature in the mesenteric lymph nodes of juvenile goats.

  4. Design and assembly sequence analysis of option 3 for CETF reference space station

    NASA Technical Reports Server (NTRS)

    Garrett, L. Bernard; Andersen, Gregory C.; Hall, John B., Jr.; Allen, Cheryl L.; Scott, A. D., Jr.; So, Kenneth T.

    1987-01-01

    A design and assembly sequence was conducted on one option of the Dual Keel Space Station examined by a NASA Critical Evaluation Task Force to establish viability of several variations of that option. A goal of the study was to produce and analyze technical data to support Task Force decisions to either examine particular Option 3 variations in more depth or eliminate them from further consideration. An analysis of the phasing assembly showed that use of an Expendable Launch Vehicle in conjunction with the Space Transportation System (STS) can accelerate the buildup of the Station and ease the STS launch rate constraints. The study also showed that use of an Orbital Maneuvering Vehicle on the first flight can significantly benefit Station assembly and, by performing Station subsystem functions, can alleviate the need for operational control and reboost systems during the early flights. In addition to launch and assembly sequencing, the study assessed stability and control, and analyzed node-packaging options and the effects of keel removal on the structural dynamics of the Station. Results of these analyses are presented and discussed.

  5. Expressed sequence tag analysis of functional genes associated with adventitious rooting in Liriodendron hybrids.

    PubMed

    Zhong, Y D; Sun, X Y; Liu, E Y; Li, Y Q; Gao, Z; Yu, F X

    2016-06-24

    Liriodendron hybrids (Liriodendron chinense x L. tulipifera) are important landscaping and afforestation hardwood trees. To date, little genomic research on adventitious rooting has been reported in these hybrids, as well as in the genus Liriodendron. In the present study, we used adventitious roots to construct the first cDNA library for Liriodendron hybrids. A total of 5176 expressed sequence tags (ESTs) were generated and clustered into 2921 unigenes. Among these unigenes, 2547 had significant homology to the non-redundant protein database representing a wide variety of putative functions. Homologs of these genes regulated many aspects of adventitious rooting, including those for auxin signal transduction and root hair development. Results of quantitative real-time polymerase chain reaction showed that AUX1, IRE, and FB1 were highly expressed in adventitious roots and the expression of AUX1, ARF1, NAC1, RHD1, and IRE increased during the development of adventitious roots. Additionally, 181 simple sequence repeats were identified from 166 ESTs and more than 91.16% of these were dinucleotide and trinucleotide repeats. To the best of our knowledge, the present study reports the identification of the genes associated with adventitious rooting in the genus Liriodendron for the first time and provides a valuable resource for future genomic studies. Expression analysis of selected genes could allow us to identify regulatory genes that may be essential for adventitious rooting.

  6. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology

    PubMed Central

    Grüning, Björn A.; Paszkiewicz, Konrad; Pritchard, Leighton

    2013-01-01

    The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of “effector” proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen’s predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu). PMID:24109552

  7. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    SciTech Connect

    FitzGerald, Michael

    2012-06-01

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  8. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    ScienceCinema

    FitzGerald, Michael [Broad Institute

    2016-07-12

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  9. In Vivo Enhancer Analysis Chromosome 16 Conserved NoncodingSequences

    SciTech Connect

    Pennacchio, Len A.; Ahituv, Nadav; Moses, Alan M.; Nobrega,Marcelo; Prabhakar, Shyam; Shoukry, Malak; Minovitsky, Simon; Visel,Axel; Dubchak, Inna; Holt, Amy; Lewis, Keith D.; Plajzer-Frick, Ingrid; Akiyama, Jennifer; De Val, Sarah; Afzal, Veena; Black, Brian L.; Couronne, Olivier; Eisen, Michael B.; Rubin, Edward M.

    2006-02-01

    The identification of enhancers with predicted specificitiesin vertebrate genomes remains a significant challenge that is hampered bya lack of experimentally validated training sets. In this study, weleveraged extreme evolutionary sequence conservation as a filter toidentify putative gene regulatory elements and characterized the in vivoenhancer activity of human-fish conserved and ultraconserved1 noncodingelements on human chromosome 16 as well as such elements from elsewherein the genome. We initially tested 165 of these extremely conservedsequences in a transgenic mouse enhancer assay and observed that 48percent (79/165) functioned reproducibly as tissue-specific enhancers ofgene expression at embryonic day 11.5. While driving expression in abroad range of anatomical structures in the embryo, the majority of the79 enhancers drove expression in various regions of the developingnervous system. Studying a set of DNA elements that specifically droveforebrain expression, we identified DNA signatures specifically enrichedin these elements and used these parameters to rank all ~;3,400human-fugu conserved noncoding elements in the human genome. The testingof the top predictions in transgenic mice resulted in a three-foldenrichment for sequences with forebrain enhancer activity. These datadramatically expand the catalogue of in vivo-characterized human geneenhancers and illustrate the future utility of such training sets for avariety of iological applications including decoding the regulatoryvocabulary of the human genome.

  10. Accident sequence precursor analysis level 2/3 model development

    SciTech Connect

    Lui, C.H.; Galyean, W.J.; Brownson, D.A.

    1997-02-01

    The US Nuclear Regulatory Commission`s Accident Sequence Precursor (ASP) program currently uses simple Level 1 models to assess the conditional core damage probability for operational events occurring in commercial nuclear power plants (NPP). Since not all accident sequences leading to core damage will result in the same radiological consequences, it is necessary to develop simple Level 2/3 models that can be used to analyze the response of the NPP containment structure in the context of a core damage accident, estimate the magnitude of the resulting radioactive releases to the environment, and calculate the consequences associated with these releases. The simple Level 2/3 model development work was initiated in 1995, and several prototype models have been completed. Once developed, these simple Level 2/3 models are linked to the simple Level 1 models to provide risk perspectives for operational events. This paper describes the methods implemented for the development of these simple Level 2/3 ASP models, and the linkage process to the existing Level 1 models.

  11. Analysis of sequencing and scheduling methods for arrival traffic

    NASA Technical Reports Server (NTRS)

    Neuman, Frank; Erzberger, Heinz

    1990-01-01

    The air traffic control subsystem that performs scheduling is discussed. The function of the scheduling algorithms is to plan automatically the most efficient landing order and to assign optimally spaced landing times to all arrivals. Several important scheduling algorithms are described and the statistical performance of the scheduling algorithms is examined. Scheduling brings order to an arrival sequence for aircraft. First-come-first-served scheduling (FCFS) establishes a fair order, based on estimated times of arrival, and determines proper separations. Because of the randomness of the traffic, gaps will remain in the scheduled sequence of aircraft. These gaps are filled, or partially filled, by time-advancing the leading aircraft after a gap while still preserving the FCFS order. Tightly scheduled groups of aircraft remain with a mix of heavy and large aircraft. Separation requirements differ for different types of aircraft trailing each other. Advantage is taken of this fact through mild reordering of the traffic, thus shortening the groups and reducing average delays. Actual delays for different samples with the same statistical parameters vary widely, especially for heavy traffic.

  12. Identification and sequence analysis of potyviruses infecting crops in Vietnam.

    PubMed

    Ha, C; Revill, P; Harding, R M; Vu, M; Dale, J L

    2008-01-01

    Fifty-two virus isolates from 13 distinct potyvirus species infecting crops in Vietnam were identified and the 3' region of each genome was sequenced. The viruses were: bean common mosaic virus (BCMV), potato virus Y (PVY), sugarcane mosaic virus (SCMV), sorghum mosaic virus (SrMV), chilli veinal mottle virus (ChiVMV), zucchini yellow mosaic virus (ZYMV), leek yellow stripe virus (LYMV), shallot yellow stripe virus (SYSV), onion yellow dwarf virus (OYDV), turnip mosaic virus (TuMV), dasheen mosaic virus (DsMV), sweet potato feathery mottle virus (SPFMV) and a novel potyvirus infecting chilli, tentatively named chilli ringspot virus (ChiRSV). With the exception of BCMV and PVY, this is first report of these viruses in Vietnam. Further, rabbit bell (Crotalaria anagyroides) and typhonia (Typhonium trilobatum) were identified as new natural hosts of the peanut stunt virus (PStV) strain of BCMV and of DsMV, respectively. Sequence and phylogenetic analyses of the entire CP-coding region revealed considerable variability in BCMV, SCMV, PVY, ZYMV and DsMV. PMID:17906829

  13. Primary sequence analysis of Clostridium cellulovorans cellulose binding protein A.

    PubMed Central

    Shoseyov, O; Takagi, M; Goldstein, M A; Doi, R H

    1992-01-01

    The cbpA gene for the Clostridium cellulovorans cellulose binding protein (CbpA), which is part of the multisubunit cellulase complex, has been cloned and sequenced. When cbpA was expressed in Escherichia coli, proteins capable of binding to crystalline cellulose and of interacting with anti-CbpA were observed. The cbpA gene consists of 5544 base pairs and encodes a protein containing 1848 amino acids with a molecular mass of 189,036 Da. The open reading frame is preceded by a Gram-positive-type ribosome binding site. A signal peptide sequence of 28 amino acids is present at its N terminus. The encoded protein is highly hydrophobic with extremely high levels of threonine and valine residues. There are two types of putative cellulose binding domains of approximately 100 amino acids that are slightly hydrophilic and eight conserved, highly hydrophobic beta-sheet regions of approximately 140 amino acids. These latter hydrophobic regions may be the CbpA domains that interact with the different enzymatic subunits of the cellulase complex. Images PMID:1565642

  14. Detection and characterization of Histoplasma capsulatum in a German badger (Meles meles) by ITS sequencing and multilocus sequencing analysis.

    PubMed

    Eisenberg, Tobias; Seeger, Helga; Kasuga, Takao; Eskens, Ulrich; Sauerwald, Claudia; Kaim, Ute

    2013-05-01

    A wild badger (Meles meles) with a severe nodular dermatitis was presented for post mortem examination. Numerous cutaneous granulomas with superficial ulceration were present especially on head, dorsum, and forearms were found at necropsy. Histopathological examination of the skin revealed a severe granulomatous dermatitis with abundant intralesional round to spherical yeast-like cells, 2-5 μm in diameter, altogether consistent with the clinical appearance of histoplasmosis farciminosi. The structures stained positively with Grocott's methenamine silver and Periodic acid-Schiff stains, but attempts to isolate the etiologic agent at 25 and 37°C failed. DNA was directly extracted from tissue samples and the ribosomal genes ITS1-5.8S-ITS2 were partially sequenced. This revealed 99% identity to sequences from Ajellomyces capsulatus, the teleomorph of Histoplasma capsulatum, which was derived from a human case in Japan, as well as from horses from Egypt and Poland. Phylogenetic multi-locus sequence analysis demonstrated that the fungus in our case belonged to the Eurasian clade which contains members of former varieties H. capsulatum var. capsulatum, H. capsulatum var. farciminosum. This is the first study of molecular and phylogenetic aspects of H. capsulatum, as well as evidence for histoplasmosis farciminosi in a badger, further illuminating the role of this rare pathogen in Central Europe. PMID:23035880

  15. IMSA: integrated metagenomic sequence analysis for identification of exogenous reads in a host genomic background.

    PubMed

    Dimon, Michelle T; Wood, Henry M; Rabbitts, Pamela H; Arron, Sarah T

    2013-01-01

    Metagenomics, the study of microbial genomes within diverse environments, is a rapidly developing field. The identification of microbial sequences within a host organism enables the study of human intestinal, respiratory, and skin microbiota, and has allowed the identification of novel viruses in diseases such as Merkel cell carcinoma. There are few publicly available tools for metagenomic high throughput sequence analysis. We present Integrated Metagenomic Sequence Analysis (IMSA), a flexible, fast, and robust computational analysis pipeline that is available for public use. IMSA takes input sequence from high throughput datasets and uses a user-defined host database to filter out host sequence. IMSA then aligns the filtered reads to a user-defined universal database to characterize exogenous reads within the host background. IMSA assigns a score to each node of the taxonomy based on read frequency, and can output this as a taxonomy report suitable for cluster analysis or as a taxonomy map (TaxMap). IMSA also outputs the specific sequence reads assigned to a taxon of interest for downstream analysis. We demonstrate the use of IMSA to detect pathogens and normal flora within sequence data from a primary human cervical cancer carrying HPV16, a primary human cutaneous squamous cell carcinoma carrying HPV 16, the CaSki cell line carrying HPV16, and the HeLa cell line carrying HPV18. PMID:23717627

  16. Signature Peptide-Enabled Metagenomics (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    McMahon, Ben

    2012-06-01

    Ben McMahon of Los Alamos National Laboratory (LANL) presents "Signature Peptide-Enabled Metagenomics" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  17. Signature Peptide-Enabled Metagenomics (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    McMahon, Ben [LANL

    2016-07-12

    Ben McMahon of Los Alamos National Laboratory (LANL) presents "Signature Peptide-Enabled Metagenomics" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  18. Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses.

    PubMed

    Yanagisawa, Hironobu; Tomita, Reiko; Katsu, Koji; Uehara, Takuya; Atsumi, Go; Tateda, Chika; Kobayashi, Kappei; Sekine, Ken-Taro

    2016-03-01

    The presence of high molecular weight double-stranded RNA (dsRNA) within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing) analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS) would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV), a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt) that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT)-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as "DECS-C," is a powerful method for detecting novel plant viruses. PMID:27072419

  19. Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses

    PubMed Central

    Yanagisawa, Hironobu; Tomita, Reiko; Katsu, Koji; Uehara, Takuya; Atsumi, Go; Tateda, Chika; Kobayashi, Kappei; Sekine, Ken-Taro

    2016-01-01

    The presence of high molecular weight double-stranded RNA (dsRNA) within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing) analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS) would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV), a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt) that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT)-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as “DECS-C,” is a powerful method for detecting novel plant viruses. PMID:27072419

  20. Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

    PubMed Central

    Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

    2003-01-01

    To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979

  1. NGS-eval: NGS Error analysis and novel sequence VAriant detection tooL.

    PubMed

    May, Ali; Abeln, Sanne; Buijs, Mark J; Heringa, Jaap; Crielaard, Wim; Brandt, Bernd W

    2015-07-01

    Massively parallel sequencing of microbial genetic markers (MGMs) is used to uncover the species composition in a multitude of ecological niches. These sequencing runs often contain a sample with known composition that can be used to evaluate the sequencing quality or to detect novel sequence variants. With NGS-eval, the reads from such (mock) samples can be used to (i) explore the differences between the reads and their references and to (ii) estimate the sequencing error rate. This tool maps these reads to references and calculates as well as visualizes the different types of sequencing errors. Clearly, sequencing errors can only be accurately calculated if the reference sequences are correct. However, even with known strains, it is not straightforward to select the correct references from databases. We previously analysed a pyrosequencing dataset from a mock sample to estimate sequencing error rates and detected sequence variants in our mock community, allowing us to obtain an accurate error estimation. Here, we demonstrate the variant detection and error analysis capability of NGS-eval with Illumina MiSeq reads from the same mock community. While tailored towards the field of metagenomics, this server can be used for any type of MGM-based reads. NGS-eval is available at http://www.ibi.vu.nl/programs/ngsevalwww/.

  2. Exploring genome wide bisulfite sequencing for DNA methylation analysis in livestock: a technical assessment.

    PubMed

    Doherty, Rachael; Couldrey, Christine

    2014-01-01

    Recent advances made in "omics" technologies are contributing to a revolution in livestock selection and breeding practices. Epigenetic mechanisms, including DNA methylation are important determinants for the control of gene expression in mammals. DNA methylation research will help our understanding of how environmental factors contribute to phenotypic variation of complex production and health traits. High-throughput sequencing is a vital tool for the comprehensive analysis of DNA methylation, and bisulfite-based strategies coupled with DNA sequencing allows for quantitative, site-specific methylation analysis at the genome level or genome wide. Reduced representation bisulfite sequencing (RRBS) and more recently whole genome bisulfite sequencing (WGBS) have proven to be effective techniques for studying DNA methylation in both humans and mice. Here we report the development of RRBS and WGBS for use in sheep, the first application of this technology in livestock species. Important technical issues associated with these methodologies including fragment size selection and sequence depth are examined and discussed. PMID:24860595

  3. A convolutional code-based sequence analysis model and its application.

    PubMed

    Liu, Xiao; Geng, Xiaoli

    2013-04-16

    A new approach for encoding DNA sequences as input for DNA sequence analysis is proposed using the error correction coding theory of communication engineering. The encoder was designed as a convolutional code model whose generator matrix is designed based on the degeneracy of codons, with a codon treated in the model as an informational unit. The utility of the proposed model was demonstrated through the analysis of twelve prokaryote and nine eukaryote DNA sequences having different GC contents. Distinct differences in code distances were observed near the initiation and termination sites in the open reading frame, which provided a well-regulated characterization of the DNA sequences. Clearly distinguished period-3 features appeared in the coding regions, and the characteristic average code distances of the analyzed sequences were approximately proportional to their GC contents, particularly in the selected prokaryotic organisms, presenting the potential utility as an added taxonomic characteristic for use in studying the relationships of living organisms.

  4. The sequence and analysis of duplication rich human chromosome 16

    SciTech Connect

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-08-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  5. Analysis of simple sequence repeats in mammalian cell cycle genes.

    PubMed

    Trivedi, Seema; Wills, Christopher; Metzgar, David

    2014-01-01

    Simple sequence repeats (SSRs), or microsatellites are hyper-mutable and can lead to disorders. Here we explore SSR distribution in cell cycle-associated genes [grouped into: checkpoint; regulation; replication, repair, and recombination (RRR); and transition] in humans and orthologues of eight mammals. Among the gene groups studied, transition genes have the highest SSR density. Trinucleotide repeats are not abundant and introns have higher repeat density than exons. Many repeats in human genes are conserved; however, CG motifs are conserved only in regulation genes. SSR variability in cell cycle genes represents a genetic Achilles' heel, yet SSRs are common in all groups of genes. This tolerance many be due to i) positions in introns where they do not disrupt gene function, ii) essential roles in regulation, iii) specific value of adaptability, and/or iv) lack of negative selection pressure. Present study may be useful for further exploration of their medical relevance and potential functionality.

  6. The Sequence and Analysis of Duplication Rich Human Chromosome 16

    DOE R&D Accomplishments Database

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-01-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  7. STELLAR DIAMETERS AND TEMPERATURES. III. MAIN-SEQUENCE A, F, G, AND K STARS: ADDITIONAL HIGH-PRECISION MEASUREMENTS AND EMPIRICAL RELATIONS

    SciTech Connect

    Boyajian, Tabetha S.; Jones, Jeremy; White, Russel; McAlister, Harold A.; Gies, Douglas; Von Braun, Kaspar; Van Belle, Gerard; Farrington, Chris; Schaefer, Gail; Ten Brummelaar, Theo A.; Sturmann, Laszlo; Sturmann, Judit; Turner, Nils H.; Goldfinger, P. J.; Vargas, Norm; Ridgway, Stephen

    2013-07-01

    Based on CHARA Array measurements, we present the angular diameters of 23 nearby, main-sequence stars, ranging from spectral types A7 to K0, 5 of which are exoplanet host stars. We derive linear radii, effective temperatures, and absolute luminosities of the stars using Hipparcos parallaxes and measured bolometric fluxes. The new data are combined with previously published values to create an Angular Diameter Anthology of measured angular diameters to main-sequence stars (luminosity classes V and IV). This compilation consists of 125 stars with diameter uncertainties of less than 5%, ranging in spectral types from A to M. The large quantity of empirical data is used to derive color-temperature relations to an assortment of color indices in the Johnson (BVR{sub J} I{sub J} JHK), Cousins (R{sub C} I{sub C}), Kron (R{sub K} I{sub K}), Sloan (griz), and WISE (W{sub 3} W{sub 4}) photometric systems. These relations have an average standard deviation of {approx}3% and are valid for stars with spectral types A0-M4. To derive even more accurate relations for Sun-like stars, we also determined these temperature relations omitting early-type stars (T{sub eff} > 6750 K) that may have biased luminosity estimates because of rapid rotation; for this subset the dispersion is only {approx}2.5%. We find effective temperatures in agreement within a couple of percent for the interferometrically characterized sample of main-sequence stars compared to those derived via the infrared flux method and spectroscopic analysis.

  8. Inhibition of protein kinase C catalytic activity by additional regions within the human protein kinase Calpha-regulatory domain lying outside of the pseudosubstrate sequence.

    PubMed Central

    Kirwan, Angie F; Bibby, Ashley C; Mvilongo, Thierry; Riedel, Heimo; Burke, Thomas; Millis, Sherri Z; Parissenti, Amadeo M

    2003-01-01

    The N-terminal pseudosubstrate site within the protein kinase Calpha (PKCalpha)-regulatory domain has long been regarded as the major determinant for autoinhibition of catalytic domain activity. Previously, we observed that the PKC-inhibitory capacity of the human PKCalpha-regulatory domain was only reduced partially on removal of the pseudosubstrate sequence [Parissenti, Kirwan, Kim, Colantonio and Schimmer (1998) J. Biol. Chem. 273, 8940-8945]. This finding suggested that one or more additional region(s) contributes to the inhibition of catalytic domain activity. To assess this hypothesis, we first examined the PKC-inhibitory capacity of a smaller fragment of the PKCalpha-regulatory domain consisting of the C1a, C1b and V2 regions [GST-Ralpha(39-177): this protein contained the full regulatory domain of human PKCalpha fused to glutathione S-transferase (GST), but lacked amino acids 1-38 (including the pseudosubstrate sequence) and amino acids 178-270 (including the C2 region)]. GST-Ralpha(39-177) significantly inhibited PKC in a phorbol-independent manner and could not bind the peptide substrate used in our assays. These results suggested that a region within C1/V2 directly inhibits catalytic domain activity. Providing further in vivo support for this hypothesis, we found that expression of N-terminally truncated pseudosubstrate-less bovine PKCalpha holoenzymes in yeast was capable of inhibiting cell growth in a phorbol-dependent manner. This suggested that additional autoinhibitory force(s) remained within the truncated holoenzymes that could be relieved by phorbol ester. Using tandem PCR-mediated mutagenesis, we observed that mutation of amino acids 33-86 within GST-Ralpha(39-177) dramatically reduced its PKC-inhibitory capacity when protamine was used as substrate. Mutagenesis of a broad range of sequences within C2 (amino acids 159-242) also significantly reduced PKC-inhibitory capacity. Taken together, these observations support strongly the existence of

  9. A phylogenetic analysis of the genus Fragaria (strawberry) using intron-containing sequence from the ADH-1 gene.

    PubMed

    DiMeglio, Laura M; Staudt, Günter; Yu, Hongrun; Davis, Thomas M

    2014-01-01

    The genus Fragaria encompasses species at ploidy levels ranging from diploid to decaploid. The cultivated strawberry, Fragaria×ananassa, and its two immediate progenitors, F. chiloensis and F. virginiana, are octoploids. To elucidate the ancestries of these octoploid species, we performed a phylogenetic analysis using intron-containing sequences of the nuclear ADH-1 gene from 39 germplasm accessions representing nineteen Fragaria species and one outgroup species, Dasiphora fruticosa. All trees from Maximum Parsimony and Maximum Likelihood analyses showed two major clades, Clade A and Clade B. Each of the sampled octoploids contributed alleles to both major clades. All octoploid-derived alleles in Clade A clustered with alleles of diploid F. vesca, with the exception of one octoploid allele that clustered with the alleles of diploid F. mandshurica. All octoploid-derived alleles in clade B clustered with the alleles of only one diploid species, F. iinumae. When gaps encoded as binary characters were included in the Maximum Parsimony analysis, tree resolution was improved with the addition of six nodes, and the bootstrap support was generally higher, rising above the 50% threshold for an additional nine branches. These results, coupled with the congruence of the sequence data and the coded gap data, validate and encourage the employment of sequence sets containing gaps for phylogenetic analysis. Our phylogenetic conclusions, based upon sequence data from the ADH-1 gene located on F. vesca linkage group II, complement and generally agree with those obtained from analyses of protein-encoding genes GBSSI-2 and DHAR located on F. vesca linkage groups V and VII, respectively, but differ from a previous study that utilized rDNA sequences and did not detect the ancestral role of F. iinumae.

  10. A Phylogenetic Analysis of the Genus Fragaria (Strawberry) Using Intron-Containing Sequence from the ADH-1 Gene

    PubMed Central

    DiMeglio, Laura M.; Yu, Hongrun; Davis, Thomas M.

    2014-01-01

    The genus Fragaria encompasses species at ploidy levels ranging from diploid to decaploid. The cultivated strawberry, Fragaria×ananassa, and its two immediate progenitors, F. chiloensis and F. virginiana, are octoploids. To elucidate the ancestries of these octoploid species, we performed a phylogenetic analysis using intron-containing sequences of the nuclear ADH-1 gene from 39 germplasm accessions representing nineteen Fragaria species and one outgroup species, Dasiphora fruticosa. All trees from Maximum Parsimony and Maximum Likelihood analyses showed two major clades, Clade A and Clade B. Each of the sampled octoploids contributed alleles to both major clades. All octoploid-derived alleles in Clade A clustered with alleles of diploid F. vesca, with the exception of one octoploid allele that clustered with the alleles of diploid F. mandshurica. All octoploid-derived alleles in clade B clustered with the alleles of only one diploid species, F. iinumae. When gaps encoded as binary characters were included in the Maximum Parsimony analysis, tree resolution was improved with the addition of six nodes, and the bootstrap support was generally higher, rising above the 50% threshold for an additional nine branches. These results, coupled with the congruence of the sequence data and the coded gap data, validate and encourage the employment of sequence sets containing gaps for phylogenetic analysis. Our phylogenetic conclusions, based upon sequence data from the ADH-1 gene located on F. vesca linkage group II, complement and generally agree with those obtained from analyses of protein-encoding genes GBSSI-2 and DHAR located on F. vesca linkage groups V and VII, respectively, but differ from a previous study that utilized rDNA sequences and did not detect the ancestral role of F. iinumae. PMID:25078607

  11. Household Clustering of Escherichia coli Sequence Type 131 Clinical and Fecal Isolates According to Whole Genome Sequence Analysis

    PubMed Central

    Johnson, James R.; Davis, Gregg; Clabots, Connie; Johnston, Brian D.; Porter, Stephen; DebRoy, Chitrita; Pomputius, William; Ender, Peter T.; Cooperstock, Michael; Slater, Billie Savvas; Banerjee, Ritu; Miller, Sybille; Kisiela, Dagmara; Sokurenko, Evgeni V.; Aziz, Maliha; Price, Lance B.

    2016-01-01

    Background. Within-household sharing of strains from the resistance-associated H30R1 and H30Rx subclones of Escherichia coli sequence type 131 (ST131) has been inferred based on conventional typing data, but it has been assessed minimally using whole genome sequence (WGS) analysis. Methods. Thirty-three clinical and fecal isolates of ST131-H30R1 and ST131-H30Rx, from 20 humans and pets in 6 households, underwent WGS analysis for comparison with 52 published ST131 genomes. Phylogenetic relationships were inferred using a bootstrapped maximum likelihood tree based on core genome sequence polymorphisms. Accessory traits were compared between phylogenetically similar isolates. Results. In the WGS-based phylogeny, isolates clustered strictly by household, in clades that were distributed widely across the phylogeny, interspersed between H30R1 and H30Rx comparison genomes. For only 1 household did the core genome phylogeny place epidemiologically unlinked isolates together with household isolates, but even there multiple differences in accessory genome content clearly differentiated these 2 groups. The core genome phylogeny supported within-household strain sharing, fecal-urethral urinary tract infection pathogenesis (with the entire household potentially providing the fecal reservoir), and instances of host-specific microevolution. In 1 instance, the household's index strain persisted for 6 years before causing a new infection in a different household member. Conclusions. Within-household sharing of E coli ST131 strains was confirmed extensively at the genome level, as was long-term colonization and repeated infections due to an ST131-H30Rx strain. Future efforts toward surveillance and decolonization may need to address not just the affected patient but also other human and animal household members. PMID:27703993

  12. Analysis of zinc in biological samples by flame atomic absorption spectrometry: use of addition calibration technique.

    PubMed

    Dutra, Rosilene L; Cantos, Geny A; Carasek, Eduardo

    2006-01-01

    The quantification of target analytes in complex matrices requires special calibration approaches to compensate for additional capacity or activity in the matrix samples. The standard addition is one of the most important calibration procedures for quantification of analytes in such matrices. However, this technique requires a great number of reagents and material, and it consumes a considerable amount of time throughout the analysis. In this work, a new calibration procedure to analyze biological samples is proposed. The proposed calibration, called the addition calibration technique, was used for the determination of zinc (Zn) in blood serum and erythrocyte samples. The results obtained were compared with those obtained using conventional calibration techniques (standard addition and standard calibration). The proposed addition calibration was validated by recovery tests using blood samples spiked with Zn. The range of recovery for blood serum and erythrocyte samples were 90-132% and 76-112%, respectively. Statistical studies among results obtained by the addition technique and conventional techniques, using a paired two-tailed Student's t-test and linear regression, demonstrated good agreement among them. PMID:16943611

  13. SAW: a graphical user interface for the analysis of immunoglobulin variable domain sequences.

    PubMed

    Elgavish, R A; Schroeder, H W

    1993-12-01

    The Sequence Analysis Workshop (SAW) is an interactive program for sequence analysis of immunoglobulin variable domains. Sequences for SAW can be obtained from GenBank or from a standard text file. SAW can compare a variable domain to as many as 100 different sequences, calculate the extent of homology, sort the sequences by their degree of similarity, translate the nucleotide codons into amino acids and then display the results in either a graphical or text format. These comparisons allow the investigator to determine the likely germ-line progenitors of a variable domain and to visualize how it differs from other antibody genes by functional region. SAW supports replacement and silent site substitution analysis by either codon or region, thus providing rapid insight into the forces that have shaped mutations. The sequence comparisons can be printed out as an aid for paper analysis or for preparation of figures for publication. SAW is written in Microsoft C for use with the Microsoft Windows graphics environment. The use of color and graphics, the generation of subsidiary windows that contain the results of specific analyses and the mouse-driven control of the program make SAW an easy-to-use tool for immunoglobulin sequence comparison. PMID:8292340

  14. Probabilistic topic modeling for the analysis and classification of genomic sequences

    PubMed Central

    2015-01-01

    Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734

  15. Phylogenetic analysis of beta-papillomaviruses as inferred from nucleotide and amino acid sequence data.

    PubMed

    Gottschling, Marc; Köhler, Anja; Stockfleth, Eggert; Nindl, Ingo

    2007-01-01

    Human papillomaviruses (HPV) of the beta-group seem to be involved in the pathogenesis of non-melanoma skin cancer. Papillomaviruses are host specific and are considered closely co-evolving with their hosts. Evolutionary incongruence between early genes and late genes has been reported among oncogenic genital alpha-papillomaviruses and considerably challenge phylogenetic reconstructions. We investigated the relationships of 29 beta-HPV (25 types plus four putative new types, subtypes, or variants) as inferred from codon aligned and amino acid sequence data of the genes E1, E2, E6, E7, L1, and L2 using likelihood, distance, and parsimony approaches. An analysis of a L1 fragment included additional nucleotide and amino acid sequences from seven non-human beta-papillomaviruses. Early genes and late genes evolution did not conflict significantly in beta-papillomaviruses based on partition homogeneity tests (p > or = 0.001). As inferred from the complete genome analyses, beta-papillomaviruses were monophyletic and segregated into four highly supported monophyletic assemblages corresponding to the species 1, 2, 3, and fused 4/5. They basically split into the species 1 and the remainder of beta-papillomaviruses, whose species 3, 4, and 5 constituted the sistergroup of species 2. beta-Papillomaviruses have been isolated from humans, apes, and monkeys, and phylogenetic analyses of the L1 fragment showed non-human papillomaviruses highly polyphyletic nesting within the HPV species. Thus, host and virus phylogenies were not congruent in beta-papillomaviruses, and multiple invasions across species borders may contribute (additionally to host-linked evolution) to their diversification.

  16. Analysis of human mini-exome sequencing data from Genetic Analysis Workshop 17 using a Bayesian hierarchical mixture model

    PubMed Central

    2011-01-01

    Next-generation sequencing technologies are rapidly changing the field of genetic epidemiology and enabling exploration of the full allele frequency spectrum underlying complex diseases. Although sequencing technologies have shifted our focus toward rare genetic variants, statistical methods traditionally used in genetic association studies are inadequate for estimating effects of low minor allele frequency variants. Four our study we use the Genetic Analysis Workshop 17 data from 697 unrelated individuals (genotypes for 24,487 autosomal variants from 3,205 genes). We apply a Bayesian hierarchical mixture model to identify genes associated with a simulated binary phenotype using a transformed genotype design matrix weighted by allele frequencies. A Metropolis Hasting algorithm is used to jointly sample each indicator variable and additive genetic effect pair from its conditional posterior distribution, and remaining parameters are sampled by Gibbs sampling. This method identified 58 genes with a posterior probability greater than 0.8 for being associated with the phenotype. One of these 58 genes, PIK3C2B was correctly identified as being associated with affected status based on the simulation process. This project demonstrates the utility of Bayesian hierarchical mixture models using a transformed genotype matrix to detect genes containing rare and common variants associated with a binary phenotype. PMID:22373180

  17. Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C. elegans.

    PubMed

    Ruby, J Graham; Jan, Calvin; Player, Christopher; Axtell, Michael J; Lee, William; Nusbaum, Chad; Ge, Hui; Bartel, David P

    2006-12-15

    We sequenced approximately 400,000 small RNAs from Caenorhabditis elegans. Another 18 microRNA (miRNA) genes were identified, thereby extending to 112 our tally of confidently identified miRNA genes in C. elegans. Also observed were thousands of endogenous siRNAs generated by RNA-directed RNA polymerases acting preferentially on transcripts associated with spermatogenesis and transposons. In addition, a third class of nematode small RNAs, called 21U-RNAs, was discovered. 21U-RNAs are precisely 21 nucleotides long, begin with a uridine 5'-monophosphate but are diverse in their remaining 20 nucleotides, and appear modified at their 3'-terminal ribose. 21U-RNAs originate from more than 5700 genomic loci dispersed in two broad regions of chromosome IV-primarily between protein-coding genes or within their introns. These loci share a large upstream motif that enables accurate prediction of additional 21U-RNAs. The motif is conserved in other nematodes, presumably because of its importance for producing these diverse, autonomously expressed, small RNAs (dasRNAs).

  18. Sequence analysis of the omp2 region of Chlamydia psittaci strain GPIC: structural and functional implications.

    PubMed

    Hsia, R C; Bavoil, P M

    1996-10-17

    The nucleotide sequence of a 3.1-kb genomic DNA fragment carrying the omp3, omp2 and srp gene homologs from Chlamydia psittaci strain GPIC was determined. A comparative analysis of the GPIC sequence with other chlamydial omp2-linked sequences reveals highly conserved omp3 and omp2 upstream sequences across species, suggesting a unified mechanism of transcription regulation. In contrast, the omp2-srp intergenic segment, which encompasses hypothetical srp transcriptional initiation sites, is relatively less conserved in length and in sequence. Examination of the predicted translation products reveals a high degree of homology within Omp3 and Omp2 across species, with the notable exception of the N-terminal fifth of Omp2. Although the latter segment displays relatively high interspecies sequence variation, it includes a smaller segment, whose high positive charge density is conserved across species, suggesting a conserved structure/function. In contrast to Omp2 and Omp3, a comparative analysis of the predicted amino acid (aa) sequence of the srp product reveals high homology within species, but relatively little across species. A 38-aa segment near the C-terminus of Srp, whose sequence is 64% identical between C. psittaci GPIC and C. trachomatis, is partially truncated in C. psittaci 6BC.

  19. LOESS correction for length variation in gene set-based genomic sequence analysis

    PubMed Central

    Aboukhalil, Anton; Bulyk, Martha L.

    2012-01-01

    Motivation: Sequence analysis algorithms are often applied to sets of DNA, RNA or protein sequences to identify common or distinguishing features. Controlling for sequence length variation is critical to properly score sequence features and identify true biological signals rather than length-dependent artifacts. Results: Several cis-regulatory module discovery algorithms exhibit a substantial dependence between DNA sequence score and sequence length. Our newly developed LOESS method is flexible in capturing diverse score-length relationships and is more effective in correcting DNA sequence scores for length-dependent artifacts, compared with four other approaches. Application of this method to genes co-expressed during Drosophila melanogaster embryonic mesoderm development or neural development scored by the Lever motif analysis algorithm resulted in successful recovery of their biologically validated cis-regulatory codes. The LOESS length-correction method is broadly applicable, and may be useful not only for more accurate inference of cis-regulatory codes, but also for detection of other types of patterns in biological sequences. Availability: Source code and compiled code are available from http://thebrain.bwh.harvard.edu/LM_LOESS/ Contact: mlbulyk@receptor.med.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22492312

  20. Molecular cloning and sequencing analysis of the interferon β from Coturnix.

    PubMed

    Zheng, Bei; Chang, Wei-Shan

    2014-01-01

    One pair of primers was designed according to Gallus and Meleagris gallopavo interferon β (IFN-β) sequences published in GenBank. The primers and RNA extraction from the spleen of Coturnix were used to amplify Coturnix IFN-β cDNA by real-time polymerase chain reaction (RT-PCR). The product was cloned into pEasy-T1 vector. Evaluating recombinant plasmid by PCR and restriction enzyme digestion. Sequence the cloning sequences, comparing the sequencing results by NCBI. We successfully got a Coturnix IFN-β partial sequence. The sequence was subtyped and put to homologous analysis. The results suggested the homology of IFN-β gene of Coturnix and gene of Coturnix and chicken (88.7%), the homology of IFN-β gene of Coturnix and chicken (88.7%), the homology of IFN-β gene of Coturnix and Anas platyrhynchos (72.5%), the homology of IFN-β sequence registered in GenBank. The analysis of the genetic tree showed that the relationship of Coturnix and chicken IFN-β had a high homology. It can be seen that in this study we successfully got a partial sequence of IFN-β of quail. PMID:26155095

  1. Unamplified cap analysis of gene expression on a single-molecule sequencer

    PubMed Central

    Kanamori-Katayama, Mutsumi; Itoh, Masayoshi; Kawaji, Hideya; Lassmann, Timo; Katayama, Shintaro; Kojima, Miki; Bertin, Nicolas; Kaiho, Ai; Ninomiya, Noriko; Daub, Carsten O.; Carninci, Piero; Forrest, Alistair R.R.; Hayashizaki, Yoshihide

    2011-01-01

    We report the development of a simplified cap