Sample records for global sequence information

  1. Inter-laboratory evaluation of the EUROFORGEN Global ancestry-informative SNP panel by massively parallel sequencing using the Ion PGM™.

    PubMed

    Eduardoff, M; Gross, T E; Santos, C; de la Puente, M; Ballard, D; Strobl, C; Børsting, C; Morling, N; Fusco, L; Hussing, C; Egyed, B; Souto, L; Uacyisrael, J; Syndercombe Court, D; Carracedo, Á; Lareu, M V; Schneider, P M; Parson, W; Phillips, C; Parson, W; Phillips, C

    2016-07-01

    The EUROFORGEN Global ancestry-informative SNP (AIM-SNPs) panel is a forensic multiplex of 128 markers designed to differentiate an individual's ancestry from amongst the five continental population groups of Africa, Europe, East Asia, Native America, and Oceania. A custom multiplex of AmpliSeq™ PCR primers was designed for the Global AIM-SNPs to perform massively parallel sequencing using the Ion PGM™ system. This study assessed individual SNP genotyping precision using the Ion PGM™, the forensic sensitivity of the multiplex using dilution series, degraded DNA plus simple mixtures, and the ancestry differentiation power of the final panel design, which required substitution of three original ancestry-informative SNPs with alternatives. Fourteen populations that had not been previously analyzed were genotyped using the custom multiplex and these studies allowed assessment of genotyping performance by comparison of data across five laboratories. Results indicate a low level of genotyping error can still occur from sequence misalignment caused by homopolymeric tracts close to the target SNP, despite careful scrutiny of candidate SNPs at the design stage. Such sequence misalignment required the exclusion of component SNP rs2080161 from the Global AIM-SNPs panel. However, the overall genotyping precision and sensitivity of this custom multiplex indicates the Ion PGM™ assay for the Global AIM-SNPs is highly suitable for forensic ancestry analysis with massively parallel sequencing. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  2. Poliovirus serotype-specific VP1 sequencing primers.

    PubMed

    Kilpatrick, David R; Iber, Jane C; Chen, Qi; Ching, Karen; Yang, Su-Ju; De, Lina; Mandelbaum, Mark D; Emery, Brian; Campagnoli, Ray; Burns, Cara C; Kew, Olen

    2011-06-01

    The Global Polio Laboratory Network routinely uses poliovirus-specific PCR primers and probes to determine the serotype and genotype of poliovirus isolates obtained as part of global poliovirus surveillance. To provide detailed molecular epidemiologic information, poliovirus isolates are further characterized by sequencing the ~900-nucleotide region encoding the major capsid protein, VP1. It is difficult to obtain quality sequence information when clinical or environmental samples contain poliovirus mixtures. As an alternative to conventional methods for resolving poliovirus mixtures, sets of serotype-specific primers were developed for amplifying and sequencing the VP1 regions of individual components of mixed populations of vaccine-vaccine, vaccine-wild, and wild-wild polioviruses. Published by Elsevier B.V.

  3. The global catalogue of microorganisms 10K type strain sequencing project: closing the genomic gaps for the validly published prokaryotic and fungi species.

    PubMed

    Wu, Linhuan; McCluskey, Kevin; Desmeth, Philippe; Liu, Shuangjiang; Hideaki, Sugawara; Yin, Ye; Moriya, Ohkuma; Itoh, Takashi; Kim, Cha Young; Lee, Jung-Sook; Zhou, Yuguang; Kawasaki, Hiroko; Hazbón, Manzour Hernando; Robert, Vincent; Boekhout, Teun; Lima, Nelson; Evtushenko, Lyudmila; Boundy-Mills, Kyria; Bunk, Boyke; Moore, Edward R B; Eurwilaichitr, Lily; Ingsriswang, Supawadee; Shah, Heena; Yao, Su; Jin, Tao; Huang, Jinqun; Shi, Wenyu; Sun, Qinglan; Fan, Guomei; Li, Wei; Li, Xian; Kurtböke, Ipek; Ma, Juncai

    2018-05-01

    Genomic information is essential for taxonomic, phylogenetic, and functional studies to comprehensively decipher the characteristics of microorganisms, to explore microbiomes through metagenomics, and to answer fundamental questions of nature and human life. However, large gaps remain in the available genomic sequencing information published for bacterial and archaeal species, and the gaps are even larger for fungal type strains. The Global Catalogue of Microorganisms (GCM) leads an internationally coordinated effort to sequence type strains and close gaps in the genomic maps of microorganisms. Hence, the GCM aims to promote research by deep-mining genomic data.

  4. Integrating Genome-based Informatics to Modernize Global Disease Monitoring, Information Sharing, and Response

    PubMed Central

    Brown, Eric W.; Detter, Chris; Gerner-Smidt, Peter; Gilmour, Matthew W.; Harmsen, Dag; Hendriksen, Rene S.; Hewson, Roger; Heymann, David L.; Johansson, Karin; Ijaz, Kashef; Keim, Paul S.; Koopmans, Marion; Kroneman, Annelies; Wong, Danilo Lo Fo; Lund, Ole; Palm, Daniel; Sawanpanyalert, Pathom; Sobel, Jeremy; Schlundt, Jørgen

    2012-01-01

    The rapid advancement of genome technologies holds great promise for improving the quality and speed of clinical and public health laboratory investigations and for decreasing their cost. The latest generation of genome DNA sequencers can provide highly detailed and robust information on disease-causing microbes, and in the near future these technologies will be suitable for routine use in national, regional, and global public health laboratories. With additional improvements in instrumentation, these next- or third-generation sequencers are likely to replace conventional culture-based and molecular typing methods to provide point-of-care clinical diagnosis and other essential information for quicker and better treatment of patients. Provided there is free-sharing of information by all clinical and public health laboratories, these genomic tools could spawn a global system of linked databases of pathogen genomes that would ensure more efficient detection, prevention, and control of endemic, emerging, and other infectious disease outbreaks worldwide. PMID:23092707

  5. Sequence information signal processor for local and global string comparisons

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1997-01-01

    A sequence information signal processing integrated circuit chip designed to perform high speed calculation of a dynamic programming algorithm based upon the algorithm defined by Waterman and Smith. The signal processing chip of the present invention is designed to be a building block of a linear systolic array, the performance of which can be increased by connecting additional sequence information signal processing chips to the array. The chip provides a high speed, low cost linear array processor that can locate highly similar global sequences or segments thereof such as contiguous subsequences from two different DNA or protein sequences. The chip is implemented in a preferred embodiment using CMOS VLSI technology to provide the equivalent of about 400,000 transistors or 100,000 gates. Each chip provides 16 processing elements, and is designed to provide 16 bit, two's compliment operation for maximum score precision of between -32,768 and +32,767. It is designed to provide a comparison between sequences as long as 4,194,304 elements without external software and between sequences of unlimited numbers of elements with the aid of external software. Each sequence can be assigned different deletion and insertion weight functions. Each processor is provided with a similarity measure device which is independently variable. Thus, each processor can contribute to maximum value score calculation using a different similarity measure.

  6. The World Health Organization Global Programme on AIDS proposal for standardization of HIV sequence nomenclature. WHO Network for HIV Isolation and Characterization.

    PubMed

    Korber, B T; Osmanov, S; Esparza, J; Myers, G

    1994-11-01

    The World Health Organization Global Programme on AIDS (WHO/GPA) is conducting a large-scale collaborative study of human immunodeficiency virus type 1 (HIV-1) variation, based in four potential vaccine-trial site countries: Brazil, Rwanda, Thailand, and Uganda. Through the course of this study, it was crucial to keep track of certain attributes of the samples from which the viral nucleotide sequences were derived (e.g., country of origin and viral culture characterization), so that meaningful sequence comparisons could be made. Here we describe a system developed in the context of the WHO/GPA study that summarizes such critical attributes by representing them as standardized characters directly incorporated into sequence names. This nomenclature allows linkage of clinical, phenotypic, and geographic information with molecular data. We propose that other investigators involved in human immunodeficiency virus (HIV) nucleotide sequencing efforts adopt a similar standardized sequence nomenclature to facilitate cross-study sequence comparison. HIV sequence data are being generated at an ever-increasing rate; directly coupled to this increase is our deepening understanding of biological parameters that influence or result from sequence variability. A standardized sequence nomenclature that includes relevant biological information would enable researchers to better utilize the growing body of sequence data, and enhance their ability to interpret the biological implications of their own data through facilitating comparisons with previously published work.

  7. Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks.

    PubMed

    Pan, Xiaoyong; Shen, Hong-Bin

    2018-05-02

    RNA-binding proteins (RBPs) take over 5∼10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. Experimental detection of RBP binding sites is still time-intensive and high-costly. Instead, computational prediction of the RBP binding sites using pattern learned from existing annotation knowledge is a fast approach. From the biological point of view, the local structure context derived from local sequences will be recognized by specific RBPs. However, in computational modeling using deep learning, to our best knowledge, only global representations of entire RNA sequences are employed. So far, the local sequence information is ignored in the deep model construction process. In this study, we present a computational method iDeepE to predict RNA-protein binding sites from RNA sequences by combining global and local convolutional neural networks (CNNs). For the global CNN, we pad the RNA sequences into the same length. For the local CNN, we split a RNA sequence into multiple overlapping fixed-length subsequences, where each subsequence is a signal channel of the whole sequence. Next, we train deep CNNs for multiple subsequences and the padded sequences to learn high-level features, respectively. Finally, the outputs from local and global CNNs are combined to improve the prediction. iDeepE demonstrates a better performance over state-of-the-art methods on two large-scale datasets derived from CLIP-seq. We also find that the local CNN run 1.8 times faster than the global CNN with comparable performance when using GPUs. Our results show that iDeepE has captured experimentally verified binding motifs. https://github.com/xypan1232/iDeepE. xypan172436@gmail.com or hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.

  8. The Present and Future of Whole Genome Sequencing (WGS) and Whole Metagenome Sequencing (WMS) for Surveillance of Antimicrobial Resistant Microorganisms and Antimicrobial Resistance Genes across the Food Chain

    PubMed Central

    Oniciuc, Elena A.; Likotrafiti, Eleni; Alvarez-Molina, Adrián; Alvarez-Ordóñez, Avelino

    2018-01-01

    Antimicrobial resistance (AMR) surveillance is a critical step within risk assessment schemes, as it is the basis for informing global strategies, monitoring the effectiveness of public health interventions, and detecting new trends and emerging threats linked to food. Surveillance of AMR is currently based on the isolation of indicator microorganisms and the phenotypic characterization of clinical, environmental and food strains isolated. However, this approach provides very limited information on the mechanisms driving AMR or on the presence or spread of AMR genes throughout the food chain. Whole-genome sequencing (WGS) of bacterial pathogens has shown potential for epidemiological surveillance, outbreak detection, and infection control. In addition, whole metagenome sequencing (WMS) allows for the culture-independent analysis of complex microbial communities, providing useful information on AMR genes occurrence. Both technologies can assist the tracking of AMR genes and mobile genetic elements, providing the necessary information for the implementation of quantitative risk assessments and allowing for the identification of hotspots and routes of transmission of AMR across the food chain. This review article summarizes the information currently available on the use of WGS and WMS for surveillance of AMR in foodborne pathogenic bacteria and food-related samples and discusses future needs that will have to be considered for the routine implementation of these next-generation sequencing methodologies with this aim. In particular, methodological constraints that impede the use at a global scale of these high-throughput sequencing (HTS) technologies are identified, and the standardization of methods and protocols is suggested as a measure to upgrade HTS-based AMR surveillance schemes. PMID:29789467

  9. Working Memory Capacity and Stroop Interference: Global versus Local Indices of Executive Control

    ERIC Educational Resources Information Center

    Meier, Matt E.; Kane, Michael J.

    2013-01-01

    Two experiments examined the relations among working memory capacity (WMC), congruency-sequence effects, proportion-congruency effects, and the color-word Stroop effect to test whether congruency-sequence effects might inform theoretical claims regarding WMC's prediction of Stroop interference. In Experiment 1, subjects completed either a…

  10. Life-cycle analysis of dryland greenhouse gases affected by cropping sequence and nitrogen fertilization

    USDA-ARS?s Scientific Manuscript database

    Little information is available about management practices effect on net global warming potential (GWP) and greenhouse gas intensity (GHGI) under dryland cropping systems. We evaluated the effects of cropping sequences (conventional till malt barley-fallow [CTB-F], no-till malt barley-pea [NTB-P], a...

  11. On the nature of global classification

    NASA Technical Reports Server (NTRS)

    Wheelis, M. L.; Kandler, O.; Woese, C. R.

    1992-01-01

    Molecular sequencing technology has brought biology into the era of global (universal) classification. Methodologically and philosophically, global classification differs significantly from traditional, local classification. The need for uniformity requires that higher level taxa be defined on the molecular level in terms of universally homologous functions. A global classification should reflect both principal dimensions of the evolutionary process: genealogical relationship and quality and extent of divergence within a group. The ultimate purpose of a global classification is not simply information storage and retrieval; such a system should also function as an heuristic representation of the evolutionary paradigm that exerts a directing influence on the course of biology. The global system envisioned allows paraphyletic taxa. To retain maximal phylogenetic information in these cases, minor notational amendments in existing taxonomic conventions should be adopted.

  12. Multi-scale symbolic transfer entropy analysis of EEG

    NASA Astrophysics Data System (ADS)

    Yao, Wenpo; Wang, Jun

    2017-10-01

    From both global and local perspectives, we symbolize two kinds of EEG and analyze their dynamic and asymmetrical information using multi-scale transfer entropy. Multi-scale process with scale factor from 1 to 199 and step size of 2 is applied to EEG of healthy people and epileptic patients, and then the permutation with embedding dimension of 3 and global approach are used to symbolize the sequences. The forward and reverse symbol sequences are taken as the inputs of transfer entropy. Scale factor intervals of permutation and global way are (37, 57) and (65, 85) where the two kinds of EEG have satisfied entropy distinctions. When scale factor is 67, transfer entropy of the healthy and epileptic subjects of permutation, 0.1137 and 0.1028, have biggest difference. And the corresponding values of the global symbolization is 0.0641 and 0.0601 which lies in the scale factor of 165. Research results show that permutation which takes contribution of local information has better distinction and is more effectively applied to our multi-scale transfer entropy analysis of EEG.

  13. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.

    PubMed

    Jones, David T; Kandathil, Shaun M

    2018-04-26

    In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.

  14. Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks

    NASA Astrophysics Data System (ADS)

    Liu, Jun; Wang, Gang; Duan, Ling-Yu; Abdiyeva, Kamila; Kot, Alex C.

    2018-04-01

    Human action recognition in 3D skeleton sequences has attracted a lot of research attention. Recently, Long Short-Term Memory (LSTM) networks have shown promising performance in this task due to their strengths in modeling the dependencies and dynamics in sequential data. As not all skeletal joints are informative for action recognition, and the irrelevant joints often bring noise which can degrade the performance, we need to pay more attention to the informative ones. However, the original LSTM network does not have explicit attention ability. In this paper, we propose a new class of LSTM network, Global Context-Aware Attention LSTM (GCA-LSTM), for skeleton based action recognition. This network is capable of selectively focusing on the informative joints in each frame of each skeleton sequence by using a global context memory cell. To further improve the attention capability of our network, we also introduce a recurrent attention mechanism, with which the attention performance of the network can be enhanced progressively. Moreover, we propose a stepwise training scheme in order to train our network effectively. Our approach achieves state-of-the-art performance on five challenging benchmark datasets for skeleton based action recognition.

  15. MIPS: analysis and annotation of genome information in 2007

    PubMed Central

    Mewes, H. W.; Dietmann, S.; Frishman, D.; Gregory, R.; Mannhaupt, G.; Mayer, K. F. X.; Münsterkötter, M.; Ruepp, A.; Spannagl, M.; Stümpflen, V.; Rattei, T.

    2008-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:18158298

  16. MIPS: analysis and annotation of genome information in 2007.

    PubMed

    Mewes, H W; Dietmann, S; Frishman, D; Gregory, R; Mannhaupt, G; Mayer, K F X; Münsterkötter, M; Ruepp, A; Spannagl, M; Stümpflen, V; Rattei, T

    2008-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  17. A laboratory information management system for DNA barcoding workflows.

    PubMed

    Vu, Thuy Duong; Eberhardt, Ursula; Szöke, Szániszló; Groenewald, Marizeth; Robert, Vincent

    2012-07-01

    This paper presents a laboratory information management system for DNA sequences (LIMS) created and based on the needs of a DNA barcoding project at the CBS-KNAW Fungal Biodiversity Centre (Utrecht, the Netherlands). DNA barcoding is a global initiative for species identification through simple DNA sequence markers. We aim at generating barcode data for all strains (or specimens) included in the collection (currently ca. 80 k). The LIMS has been developed to better manage large amounts of sequence data and to keep track of the whole experimental procedure. The system has allowed us to classify strains more efficiently as the quality of sequence data has improved, and as a result, up-to-date taxonomic names have been given to strains and more accurate correlation analyses have been carried out.

  18. Sequence stratigraphy of the Triassic in the Barentsz Sea

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Skjold, L.JU.; Van Veen, P.M.; Gjelberg, J.

    1990-05-01

    A regional study of the Triassic in the Barentsz Sea (20-32{degree}E, 71-74{degree}N) revealed sequences that correlate seismically for hundreds of kilometers. Recent offshore drilling results enabled them to establish a biostratigraphic time framework. Comparisons with information from onshore outcrops (such as the Svalbard Archipelago) aided the piecing together of these superregional sequences. Seismic character analysis identified three units with composite progradational patterns (Induan, Olenekian, and Anisian). Fluvial, deltaic, and marine deposits can be distinguished and located relative to the paleocoastlines. Corresponding downlap surfaces suggest the development of condensed intervals, predicted to consist of organic-rich source rocks, as was later confirmedmore » by drilling. Regional predictions based on this sequence-stratigraphic approach have proved valuable when correlating and evaluating well information. The sequences identified also help define third-order sea level curves for the area; these improve published curves thought to have global significance.« less

  19. Decoding DNA, RNA and peptides with quantum tunnelling

    NASA Astrophysics Data System (ADS)

    di Ventra, Massimiliano; Taniguchi, Masateru

    2016-02-01

    Drugs and treatments could be precisely tailored to an individual patient by extracting their cellular- and molecular-level information. For this approach to be feasible on a global scale, however, information on complete genomes (DNA), transcriptomes (RNA) and proteomes (all proteins) needs to be obtained quickly and at low cost. Quantum mechanical phenomena could potentially be of value here, because the biological information needs to be decoded at an atomic level and quantum tunnelling has recently been shown to be able to differentiate single nucleobases and amino acids in short sequences. Here, we review the different approaches to using quantum tunnelling for sequencing, highlighting the theoretical background to the method and the experimental capabilities demonstrated to date. We also explore the potential advantages of the approach and the technical challenges that must be addressed to deliver practical quantum sequencing devices.

  20. GMDD: a database of GMO detection methods.

    PubMed

    Dong, Wei; Yang, Litao; Shen, Kailin; Kim, Banghyun; Kleter, Gijs A; Marvin, Hans J P; Guo, Rong; Liang, Wanqi; Zhang, Dabing

    2008-06-04

    Since more than one hundred events of genetically modified organisms (GMOs) have been developed and approved for commercialization in global area, the GMO analysis methods are essential for the enforcement of GMO labelling regulations. Protein and nucleic acid-based detection techniques have been developed and utilized for GMOs identification and quantification. However, the information for harmonization and standardization of GMO analysis methods at global level is needed. GMO Detection method Database (GMDD) has collected almost all the previous developed and reported GMOs detection methods, which have been grouped by different strategies (screen-, gene-, construct-, and event-specific), and also provide a user-friendly search service of the detection methods by GMO event name, exogenous gene, or protein information, etc. In this database, users can obtain the sequences of exogenous integration, which will facilitate PCR primers and probes design. Also the information on endogenous genes, certified reference materials, reference molecules, and the validation status of developed methods is included in this database. Furthermore, registered users can also submit new detection methods and sequences to this database, and the newly submitted information will be released soon after being checked. GMDD contains comprehensive information of GMO detection methods. The database will make the GMOs analysis much easier.

  1. Information theory applications for biological sequence analysis.

    PubMed

    Vinga, Susana

    2014-05-01

    Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.

  2. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models.

    PubMed

    Ding, Jiarui; Condon, Anne; Shah, Sohrab P

    2018-05-21

    Single-cell RNA-sequencing has great potential to discover cell types, identify cell states, trace development lineages, and reconstruct the spatial organization of cells. However, dimension reduction to interpret structure in single-cell sequencing data remains a challenge. Existing algorithms are either not able to uncover the clustering structures in the data or lose global information such as groups of clusters that are close to each other. We present a robust statistical model, scvis, to capture and visualize the low-dimensional structures in single-cell gene expression data. Simulation results demonstrate that low-dimensional representations learned by scvis preserve both the local and global neighbor structures in the data. In addition, scvis is robust to the number of data points and learns a probabilistic parametric mapping function to add new data points to an existing embedding. We then use scvis to analyze four single-cell RNA-sequencing datasets, exemplifying interpretable two-dimensional representations of the high-dimensional single-cell RNA-sequencing data.

  3. Leptospira species molecular epidemiology in the genomic era.

    PubMed

    Caimi, K; Repetto, S A; Varni, V; Ruybal, P

    2017-10-01

    Leptospirosis is a zoonotic disease which global burden is increasing often related to climatic change. Hundreds of whole genome sequences from worldwide isolates of Leptospira spp. are available nowadays, together with online tools that permit to assign MLST sequence types (STs) directly from raw sequence data. In this work we have applied R7L-MLST to near 500 genomes and strains collection globally distributed. All 10 pathogenic species as well as intermediate were typed using this MLST scheme. The correlation observed between STs and serogroups in our previous work, is still satisfied with this higher dataset sustaining the implementation of MLST to assist serological classification as a complementary approach. Bayesian phylogenetic analysis of concatenated sequences from R7-MLST loci allowed us to resolve taxonomic inconsistencies but also showed that events such as recombination, gene conversion or lateral gene transfer played an important role in the evolution of Leptospira genus. Whole genome sequencing allows us to contribute with suitable epidemiologic information useful to apply in the design of control strategies and also in diagnostic methods for this illness. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Expression profiling of the mouse early embryo: Reflections and Perspectives

    PubMed Central

    Ko, Minoru S. H.

    2008-01-01

    Laboratory mouse plays important role in our understanding of early mammalian development and provides invaluable model for human early embryos, which are difficult to study for ethical and technical reasons. Comprehensive collection of cDNA clones, their sequences, and complete genome sequence information, which have been accumulated over last two decades, have provided even more advantages to mouse models. Here the progress in global gene expression profiling in early mouse embryos and, to some extent, stem cells are reviewed and the future directions and challenges are discussed. The discussions include the restatement of global gene expression profiles as snapshot of cellular status, and subsequent distinction between the differentiation state and physiological state of the cells. The discussions then extend to the biological problems that can be addressed only through global expression profiling, which include: bird’s-eye view of global gene expression changes, molecular index for developmental potency, cell lineage trajectory, microarray-guided cell manipulation, and the possibility of delineating gene regulatory cascades and networks. PMID:16739220

  5. A generalized global alignment algorithm.

    PubMed

    Huang, Xiaoqiu; Chao, Kun-Mao

    2003-01-22

    Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.

  6. The Ocean Gene Atlas: exploring the biogeography of plankton genes online.

    PubMed

    Villar, Emilie; Vannier, Thomas; Vernette, Caroline; Lescot, Magali; Cuenca, Miguelangel; Alexandre, Aurélien; Bachelerie, Paul; Rosnet, Thomas; Pelletier, Eric; Sunagawa, Shinichi; Hingamp, Pascal

    2018-05-21

    The Ocean Gene Atlas is a web service to explore the biogeography of genes from marine planktonic organisms. It allows users to query protein or nucleotide sequences against global ocean reference gene catalogs. With just one click, the abundance and location of target sequences are visualized on world maps as well as their taxonomic distribution. Interactive results panels allow for adjusting cutoffs for alignment quality and displaying the abundances of genes in the context of environmental features (temperature, nutrients, etc.) measured at the time of sampling. The ease of use enables non-bioinformaticians to explore quantitative and contextualized information on genes of interest in the global ocean ecosystem. Currently the Ocean Gene Atlas is deployed with (i) the Ocean Microbial Reference Gene Catalog (OM-RGC) comprising 40 million non-redundant mostly prokaryotic gene sequences associated with both Tara Oceans and Global Ocean Sampling (GOS) gene abundances and (ii) the Marine Atlas of Tara Ocean Unigenes (MATOU) composed of >116 million eukaryote unigenes. Additional datasets will be added upon availability of further marine environmental datasets that provide the required complement of sequence assemblies, raw reads and contextual environmental parameters. Ocean Gene Atlas is a freely-available web service at: http://tara-oceans.mio.osupytheas.fr/ocean-gene-atlas/.

  7. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.

    PubMed

    Daily, Jeff

    2016-02-10

    Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar's 'striped' approach. Rognes's SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail's prefix scan implementation is generally the fastest, faster even than Farrar's 'striped' approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.

  8. Towards a global cancer knowledge network: dissecting the current international cancer genomic sequencing landscape.

    PubMed

    Vis, D J; Lewin, J; Liao, R G; Mao, M; Andre, F; Ward, R L; Calvo, F; Teh, B T; Camargo, A A; Knoppers, B M; Sawyers, C L; Wessels, L F A; Lawler, M; Siu, L L; Voest, E

    2017-05-01

    While next generation sequencing has enhanced our understanding of the biological basis of malignancy, current knowledge on global practices for sequencing cancer samples is limited. To address this deficiency, we developed a survey to provide a snapshot of current sequencing activities globally, identify barriers to data sharing and use this information to develop sustainable solutions for the cancer research community. A multi-item survey was conducted assessing demographics, clinical data collection, genomic platforms, privacy/ethics concerns, funding sources and data sharing barriers for sequencing initiatives globally. Additionally, respondents were asked as to provide the primary intent of their initiative (clinical diagnostic, research or combination). Of 107 initiatives invited to participate, 59 responded (response rate = 55%). Whole exome sequencing (P = 0.03) and whole genome sequencing (P = 0.01) were utilized less frequently in clinical diagnostic than in research initiatives. Procedures to identify cancer-specific variants were heterogeneous, with bioinformatics pipelines employing different mutation calling/variant annotation algorithms. Measurement of treatment efficacy varied amongst initiatives, with time on treatment (57%) and RECIST (53%) being the most common; however, other parameters were also employed. Whilst 72% of initiatives indicated data sharing, its scope varied, with a number of restrictions in place (e.g. transfer of raw data). The largest perceived barriers to data harmonization were the lack of financial support (P < 0.01) and bioinformatics concerns (e.g. lack of interoperability) (P = 0.02). Capturing clinical data was more likely to be perceived as a barrier to data sharing by larger initiatives than by smaller initiatives (P = 0.01). These results identify the main barriers, as perceived by the cancer sequencing community, to effective sharing of cancer genomic and clinical data. They highlight the need for greater harmonization of technical, ethical and data capture processes in cancer sample sequencing worldwide, in order to support effective and responsible data sharing for the benefit of patients. © The Author 2017. Published by Oxford University Press on behalf of the European Society for Medical Oncology.

  9. GMDD: a database of GMO detection methods

    PubMed Central

    Dong, Wei; Yang, Litao; Shen, Kailin; Kim, Banghyun; Kleter, Gijs A; Marvin, Hans JP; Guo, Rong; Liang, Wanqi; Zhang, Dabing

    2008-01-01

    Background Since more than one hundred events of genetically modified organisms (GMOs) have been developed and approved for commercialization in global area, the GMO analysis methods are essential for the enforcement of GMO labelling regulations. Protein and nucleic acid-based detection techniques have been developed and utilized for GMOs identification and quantification. However, the information for harmonization and standardization of GMO analysis methods at global level is needed. Results GMO Detection method Database (GMDD) has collected almost all the previous developed and reported GMOs detection methods, which have been grouped by different strategies (screen-, gene-, construct-, and event-specific), and also provide a user-friendly search service of the detection methods by GMO event name, exogenous gene, or protein information, etc. In this database, users can obtain the sequences of exogenous integration, which will facilitate PCR primers and probes design. Also the information on endogenous genes, certified reference materials, reference molecules, and the validation status of developed methods is included in this database. Furthermore, registered users can also submit new detection methods and sequences to this database, and the newly submitted information will be released soon after being checked. Conclusion GMDD contains comprehensive information of GMO detection methods. The database will make the GMOs analysis much easier. PMID:18522755

  10. Sharing Data to Build a Medical Information Commons: From Bermuda to the Global Alliance.

    PubMed

    Cook-Deegan, Robert; Ankeny, Rachel A; Maxson Jones, Kathryn

    2017-08-31

    The Human Genome Project modeled its open science ethos on nematode biology, most famously through daily release of DNA sequence data based on the 1996 Bermuda Principles. That open science philosophy persists, but daily, unfettered release of data has had to adapt to constraints occasioned by the use of data from individual people, broader use of data not only by scientists but also by clinicians and individuals, the global reach of genomic applications and diverse national privacy and research ethics laws, and the rising prominence of a diverse commercial genomics sector. The Global Alliance for Genomics and Health was established to enable the data sharing that is essential for making meaning of genomic variation. Data-sharing policies and practices will continue to evolve as researchers, health professionals, and individuals strive to construct a global medical and scientific information commons.

  11. iPARTS2: an improved tool for pairwise alignment of RNA tertiary structures, version 2.

    PubMed

    Yang, Chung-Han; Shih, Cheng-Ting; Chen, Kun-Tze; Lee, Po-Han; Tsai, Ping-Han; Lin, Jian-Cheng; Yen, Ching-Yu; Lin, Tiao-Yin; Lu, Chin Lung

    2016-07-08

    Since its first release in 2010, iPARTS has become a valuable tool for globally or locally aligning two RNA 3D structures. It was implemented by a structural alphabet (SA)-based approach, which uses an SA of 23 letters to reduce RNA 3D structures into 1D sequences of SA letters and applies traditional sequence alignment to these SA-encoded sequences for determining their global or local similarity. In this version, we have re-implemented iPARTS into a new web server iPARTS2 by constructing a totally new SA, which consists of 92 elements with each carrying both information of base and backbone geometry for a representative nucleotide. This SA is significantly different from the one used in iPARTS, because the latter consists of only 23 elements with each carrying only the backbone geometry information of a representative nucleotide. Our experimental results have shown that iPARTS2 outperforms its previous version iPARTS and also achieves better accuracy than other popular tools, such as SARA, SETTER and RASS, in RNA alignment quality and function prediction. iPARTS2 takes as input two RNA 3D structures in the PDB format and outputs their global or local alignments with graphical display. iPARTS2 is now available online at http://genome.cs.nthu.edu.tw/iPARTS2/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Global ecological pattern of ammonia-oxidizing archaea.

    PubMed

    Cao, Huiluo; Auguet, Jean-Christophe; Gu, Ji-Dong

    2013-01-01

    The global distribution of ammonia-oxidizing archaea (AOA), which play a pivotal role in the nitrification process, has been confirmed through numerous ecological studies. Though newly available amoA (ammonia monooxygenase subunit A) gene sequences from new environments are accumulating rapidly in public repositories, a lack of information on the ecological and evolutionary factors shaping community assembly of AOA on the global scale is apparent. We conducted a meta-analysis on uncultured AOA using over ca. 6,200 archaeal amoA gene sequences, so as to reveal their community distribution patterns along a wide spectrum of physicochemical conditions and habitat types. The sequences were dereplicated at 95% identity level resulting in a dataset containing 1,476 archaeal amoA gene sequences from eight habitat types: namely soil, freshwater, freshwater sediment, estuarine sediment, marine water, marine sediment, geothermal system, and symbiosis. The updated comprehensive amoA phylogeny was composed of three major monophyletic clusters (i.e. Nitrosopumilus, Nitrosotalea, Nitrosocaldus) and a non-monophyletic cluster constituted mostly by soil and sediment sequences that we named Nitrososphaera. Diversity measurements indicated that marine and estuarine sediments as well as symbionts might be the largest reservoirs of AOA diversity. Phylogenetic analyses were further carried out using macroevolutionary analyses to explore the diversification pattern and rates of nitrifying archaea. In contrast to other habitats that displayed constant diversification rates, marine planktonic AOA interestingly exhibit a very recent and accelerating diversification rate congruent with the lowest phylogenetic diversity observed in their habitats. This result suggested the existence of AOA communities with different evolutionary history in the different habitats. Based on an up-to-date amoA phylogeny, this analysis provided insights into the possible evolutionary mechanisms and environmental parameters that shape AOA community assembly at global scale.

  13. Analysis of time in establishing synchronization radio communication system with expanded spectrum conditions for communication with mobile robots

    NASA Astrophysics Data System (ADS)

    Latinovic, T. S.; Kalabic, S. B.; Barz, C. R.; Petrica, P. Paul; Pop-Vădean, A.

    2018-01-01

    This paper analyzes the influence of the Doppler Effect on the length of time to establish synchronization pseudorandom sequences in radio communications systems with an expanded spectrum. Also, this paper explores the possibility of using secure wireless communication for modular robots. Wireless communication could be used for local and global communication. We analyzed a radio communication system integrator, including the two effects of the Doppler signal on the duration of establishing synchronization of the received and locally generated pseudorandom sequence. The effects of the impact of the variability of the phase were analyzed between the said sequences and correspondence of the phases of these signals with the interval of time of acquisition of received sequences. An analysis of these impacts is essential in the transmission of signal and protection of the transfer of information in the communication systems with an expanded range (telecommunications, mobile telephony, Global Navigation Satellite System GNSS, and wireless communication). Results show that wireless communication can provide a safety approach for communication with mobile robots.

  14. Unifying viral genetics and human transportation data to predict the global transmission dynamics of human influenza H3N2.

    PubMed

    Lemey, Philippe; Rambaut, Andrew; Bedford, Trevor; Faria, Nuno; Bielejec, Filip; Baele, Guy; Russell, Colin A; Smith, Derek J; Pybus, Oliver G; Brockmann, Dirk; Suchard, Marc A

    2014-02-01

    Information on global human movement patterns is central to spatial epidemiological models used to predict the behavior of influenza and other infectious diseases. Yet it remains difficult to test which modes of dispersal drive pathogen spread at various geographic scales using standard epidemiological data alone. Evolutionary analyses of pathogen genome sequences increasingly provide insights into the spatial dynamics of influenza viruses, but to date they have largely neglected the wealth of information on human mobility, mainly because no statistical framework exists within which viral gene sequences and empirical data on host movement can be combined. Here, we address this problem by applying a phylogeographic approach to elucidate the global spread of human influenza subtype H3N2 and assess its ability to predict the spatial spread of human influenza A viruses worldwide. Using a framework that estimates the migration history of human influenza while simultaneously testing and quantifying a range of potential predictive variables of spatial spread, we show that the global dynamics of influenza H3N2 are driven by air passenger flows, whereas at more local scales spread is also determined by processes that correlate with geographic distance. Our analyses further confirm a central role for mainland China and Southeast Asia in maintaining a source population for global influenza diversity. By comparing model output with the known pandemic expansion of H1N1 during 2009, we demonstrate that predictions of influenza spatial spread are most accurate when data on human mobility and viral evolution are integrated. In conclusion, the global dynamics of influenza viruses are best explained by combining human mobility data with the spatial information inherent in sampled viral genomes. The integrated approach introduced here offers great potential for epidemiological surveillance through phylogeographic reconstructions and for improving predictive models of disease control.

  15. Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.

    PubMed

    Neuwald, Andrew F

    2009-08-01

    The patterns of sequence similarity and divergence present within functionally diverse, evolutionarily related proteins contain implicit information about corresponding biochemical similarities and differences. A first step toward accessing such information is to statistically analyze these patterns, which, in turn, requires that one first identify and accurately align a very large set of protein sequences. Ideally, the set should include many distantly related, functionally divergent subgroups. Because it is extremely difficult, if not impossible for fully automated methods to align such sequences correctly, researchers often resort to manual curation based on detailed structural and biochemical information. However, multiply-aligning vast numbers of sequences in this way is clearly impractical. This problem is addressed using Multiply-Aligned Profiles for Global Alignment of Protein Sequences (MAPGAPS). The MAPGAPS program uses a set of multiply-aligned profiles both as a query to detect and classify related sequences and as a template to multiply-align the sequences. It relies on Karlin-Altschul statistics for sensitivity and on PSI-BLAST (and other) heuristics for speed. Using as input a carefully curated multiple-profile alignment for P-loop GTPases, MAPGAPS correctly aligned weakly conserved sequence motifs within 33 distantly related GTPases of known structure. By comparison, the sequence- and structurally based alignment methods hmmalign and PROMALS3D misaligned at least 11 and 23 of these regions, respectively. When applied to a dataset of 65 million protein sequences, MAPGAPS identified, classified and aligned (with comparable accuracy) nearly half a million putative P-loop GTPase sequences. A C++ implementation of MAPGAPS is available at http://mapgaps.igs.umaryland.edu. Supplementary data are available at Bioinformatics online.

  16. Whole Genome Sequence Analysis of Salmonella Enteritidis Isolated from Wild Mice

    USDA-ARS?s Scientific Manuscript database

    Salmonella Enteritidis is a foodborne pathogen of global concern because of the high frequency isolated from foods and patients. Draft genomes of 64 S. Enteritidis strains from intestines and spleens of mice were reported. The availability of these genomes provides useful information on genomic dive...

  17. Overcoming Species Boundaries in Peptide Identification with Bayesian Information Criterion-driven Error-tolerant Peptide Search (BICEPS)*

    PubMed Central

    Renard, Bernhard Y.; Xu, Buote; Kirchner, Marc; Zickmann, Franziska; Winter, Dominic; Korten, Simone; Brattig, Norbert W.; Tzur, Amit; Hamprecht, Fred A.; Steen, Hanno

    2012-01-01

    Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. Although sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides that are not exactly contained in a protein database. De novo searches are generally hindered by their restricted reliability, and current error-tolerant search strategies are limited by global, heuristic tradeoffs between database and spectral information. We propose a Bayesian information criterion-driven error-tolerant peptide search (BICEPS) and offer an open source implementation based on this statistical criterion to automatically balance the information of each single spectrum and the database, while limiting the run time. We show that BICEPS performs as well as current database search algorithms when such algorithms are applied to sequenced organisms, whereas BICEPS only uses a remotely related organism database. For instance, we use a chicken instead of a human database corresponding to an evolutionary distance of more than 300 million years (International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716). We demonstrate the successful application to cross-species proteomics with a 33% increase in the number of identified proteins for a filarial nematode sample of Litomosoides sigmodontis. PMID:22493179

  18. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

    PubMed

    Bastien, Olivier; Maréchal, Eric

    2008-08-07

    Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the information hazard rate, and that pairwise sequence alignment scores should follow a Gumbel distribution, which parameters could find some theoretical rationale. In particular, one parameter corresponds to the information hazard rate. Extreme value distribution of alignment scores, assessed from high scoring segments pairs following the Karlin-Altschul model, can also be deduced from the Reliability Theory applied to molecular sequences. It reflects the redundancy of information between homologous sequences, under functional conservative pressure. This model also provides a link between concepts of biological sequence analysis and of systems biology.

  19. Global catalogue of microorganisms (gcm): a comprehensive database and information retrieval, analysis, and visualization system for microbial resources

    PubMed Central

    2013-01-01

    Background Throughout the long history of industrial and academic research, many microbes have been isolated, characterized and preserved (whenever possible) in culture collections. With the steady accumulation in observational data of biodiversity as well as microbial sequencing data, bio-resource centers have to function as data and information repositories to serve academia, industry, and regulators on behalf of and for the general public. Hence, the World Data Centre for Microorganisms (WDCM) started to take its responsibility for constructing an effective information environment that would promote and sustain microbial research data activities, and bridge the gaps currently present within and outside the microbiology communities. Description Strain catalogue information was collected from collections by online submission. We developed tools for automatic extraction of strain numbers and species names from various sources, including Genbank, Pubmed, and SwissProt. These new tools connect strain catalogue information with the corresponding nucleotide and protein sequences, as well as to genome sequence and references citing a particular strain. All information has been processed and compiled in order to create a comprehensive database of microbial resources, and was named Global Catalogue of Microorganisms (GCM). The current version of GCM contains information of over 273,933 strains, which includes 43,436bacterial, fungal and archaea species from 52 collections in 25 countries and regions. A number of online analysis and statistical tools have been integrated, together with advanced search functions, which should greatly facilitate the exploration of the content of GCM. Conclusion A comprehensive dynamic database of microbial resources has been created, which unveils the resources preserved in culture collections especially for those whose informatics infrastructures are still under development, which should foster cumulative research, facilitating the activities of microbiologists world-wide, who work in both public and industrial research centres. This database is available from http://gcm.wfcc.info. PMID:24377417

  20. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daily, Jeffrey A.

    Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less

  1. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

    DOE PAGES

    Daily, Jeffrey A.

    2016-02-10

    Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less

  2. An improved stochastic fractal search algorithm for 3D protein structure prediction.

    PubMed

    Zhou, Changjun; Sun, Chuan; Wang, Bin; Wang, Xiaojun

    2018-05-03

    Protein structure prediction (PSP) is a significant area for biological information research, disease treatment, and drug development and so on. In this paper, three-dimensional structures of proteins are predicted based on the known amino acid sequences, and the structure prediction problem is transformed into a typical NP problem by an AB off-lattice model. This work applies a novel improved Stochastic Fractal Search algorithm (ISFS) to solve the problem. The Stochastic Fractal Search algorithm (SFS) is an effective evolutionary algorithm that performs well in exploring the search space but falls into local minimums sometimes. In order to avoid the weakness, Lvy flight and internal feedback information are introduced in ISFS. In the experimental process, simulations are conducted by ISFS algorithm on Fibonacci sequences and real peptide sequences. Experimental results prove that the ISFS performs more efficiently and robust in terms of finding the global minimum and avoiding getting stuck in local minimums.

  3. Collaborative Effort for a Centralized Worldwide Tuberculosis Relational Sequencing Data Platform.

    PubMed

    Starks, Angela M; Avilés, Enrique; Cirillo, Daniela M; Denkinger, Claudia M; Dolinger, David L; Emerson, Claudia; Gallarda, Jim; Hanna, Debra; Kim, Peter S; Liwski, Richard; Miotto, Paolo; Schito, Marco; Zignol, Matteo

    2015-10-15

    Continued progress in addressing challenges associated with detection and management of tuberculosis requires new diagnostic tools. These tools must be able to provide rapid and accurate information for detecting resistance to guide selection of the treatment regimen for each patient. To achieve this goal, globally representative genotypic, phenotypic, and clinical data are needed in a standardized and curated data platform. A global partnership of academic institutions, public health agencies, and nongovernmental organizations has been established to develop a tuberculosis relational sequencing data platform (ReSeqTB) that seeks to increase understanding of the genetic basis of resistance by correlating molecular data with results from drug susceptibility testing and, optimally, associated patient outcomes. These data will inform development of new diagnostics, facilitate clinical decision making, and improve surveillance for drug resistance. ReSeqTB offers an opportunity for collaboration to achieve improved patient outcomes and to advance efforts to prevent and control this devastating disease. Published by Oxford University Press on behalf of the Infectious Diseases Society of America 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  4. Simultaneous and complete genome sequencing of influenza A and B with high coverage by Illumina MiSeq Platform.

    PubMed

    Rutvisuttinunt, Wiriya; Chinnawirotpisan, Piyawan; Simasathien, Sriluck; Shrestha, Sanjaya K; Yoon, In-Kyu; Klungthong, Chonticha; Fernandez, Stefan

    2013-11-01

    Active global surveillance and characterization of influenza viruses are essential for better preparation against possible pandemic events. Obtaining comprehensive information about the influenza genome can improve our understanding of the evolution of influenza viruses and emergence of new strains, and improve the accuracy when designing preventive vaccines. This study investigated the use of deep sequencing by the next-generation sequencing (NGS) Illumina MiSeq Platform to obtain complete genome sequence information from influenza virus isolates. The influenza virus isolates were cultured from 6 respiratory acute clinical specimens collected in Thailand and Nepal. DNA libraries obtained from each viral isolate were mixed and all were sequenced simultaneously. Total information of 2.6 Gbases was obtained from a 455±14 K/mm2 density with 95.76% (8,571,655/8,950,724 clusters) of the clusters passing quality control (QC) filters. Approximately 93.7% of all sequences from Read1 and 83.5% from Read2 contained high quality sequences that were ≥Q30, a base calling QC score standard. Alignments analysis identified three seasonal influenza A H3N2 strains, one 2009 pandemic influenza A H1N1 strain and two influenza B strains. The nearly entire genomes of all six virus isolates yielded equal or greater than 600-fold sequence coverage depth. MiSeq Platform identified seasonal influenza A H3N2, 2009 pandemic influenza A H1N1and influenza B in the DNA library mixtures efficiently. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.

  5. RNA-Seq Analysis of Cocos nucifera: Transcriptome Sequencing and De Novo Assembly for Subsequent Functional Genomics Approaches

    PubMed Central

    Xia, Wei; Mason, Annaliese S.; Xia, Zhihui; Qiao, Fei; Zhao, Songlin; Tang, Haoru

    2013-01-01

    Background Cocos nucifera (coconut), a member of the Arecaceae family, is an economically important woody palm grown in tropical regions. Despite its agronomic importance, previous germplasm assessment studies have relied solely on morphological and agronomical traits. Molecular biology techniques have been scarcely used in assessment of genetic resources and for improvement of important agronomic and quality traits in Cocos nucifera, mostly due to the absence of available sequence information. Methodology/Principal Findings To provide basic information for molecular breeding and further molecular biological analysis in Cocos nucifera, we applied RNA-seq technology and de novo assembly to gain a global overview of the Cocos nucifera transcriptome from mixed tissue samples. Using Illumina sequencing, we obtained 54.9 million short reads and conducted de novo assembly to obtain 57,304 unigenes with an average length of 752 base pairs. Sequence comparison between assembled unigenes and released cDNA sequences of Cocos nucifera and Elaeis guineensis indicated that the assembled sequences were of high quality. Approximately 99.9% of unigenes were novel compared to the released coconut EST sequences. Using BLASTX, 68.2% of unigenes were successfully annotated based on the Genbank non-redundant (Nr) protein database. The annotated unigenes were then further classified using the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Conclusions/Significance Our study provides a large quantity of novel genetic information for Cocos nucifera. This information will act as a valuable resource for further molecular genetic studies and breeding in coconut, as well as for isolation and characterization of functional genes involved in different biochemical pathways in this important tropical crop species. PMID:23555859

  6. RNA-Seq analysis of Cocos nucifera: transcriptome sequencing and de novo assembly for subsequent functional genomics approaches.

    PubMed

    Fan, Haikuo; Xiao, Yong; Yang, Yaodong; Xia, Wei; Mason, Annaliese S; Xia, Zhihui; Qiao, Fei; Zhao, Songlin; Tang, Haoru

    2013-01-01

    Cocos nucifera (coconut), a member of the Arecaceae family, is an economically important woody palm grown in tropical regions. Despite its agronomic importance, previous germplasm assessment studies have relied solely on morphological and agronomical traits. Molecular biology techniques have been scarcely used in assessment of genetic resources and for improvement of important agronomic and quality traits in Cocos nucifera, mostly due to the absence of available sequence information. To provide basic information for molecular breeding and further molecular biological analysis in Cocos nucifera, we applied RNA-seq technology and de novo assembly to gain a global overview of the Cocos nucifera transcriptome from mixed tissue samples. Using Illumina sequencing, we obtained 54.9 million short reads and conducted de novo assembly to obtain 57,304 unigenes with an average length of 752 base pairs. Sequence comparison between assembled unigenes and released cDNA sequences of Cocos nucifera and Elaeis guineensis indicated that the assembled sequences were of high quality. Approximately 99.9% of unigenes were novel compared to the released coconut EST sequences. Using BLASTX, 68.2% of unigenes were successfully annotated based on the Genbank non-redundant (Nr) protein database. The annotated unigenes were then further classified using the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Our study provides a large quantity of novel genetic information for Cocos nucifera. This information will act as a valuable resource for further molecular genetic studies and breeding in coconut, as well as for isolation and characterization of functional genes involved in different biochemical pathways in this important tropical crop species.

  7. Spatial and temporal processing in healthy aging: implications for perceptions of driving skills.

    PubMed

    Conlon, Elizabeth; Herkes, Kathleen

    2008-07-01

    Sensitivity to the attributes of a stimulus (form or motion) and accuracy when detecting rapidly presented stimulus information were measured in older (N = 36) and younger (N = 37) groups. Before and after practice, the older group was significantly less sensitive to global motion (but not to form) and less accurate on a rapid sequencing task when detecting the individual elements presented in long but not short sequences. These effect sizes produced power for the different analyses that ranged between 0.5 and 1.00. The reduced sensitivity found among older individuals to temporal but not spatial stimuli, adds support to previous findings of a selective age-related deficit in temporal processing. Older women were significantly less sensitive than older men, younger men and younger women on the global motion task. Gender effects were evident when, in response to global motion stimuli, complex extraction and integration processes needed to be undertaken rapidly. Significant moderate correlations were found between age, global motion sensitivity and reports of perceptions of other vehicles and road signs when driving. These associations suggest that reduced motion sensitivity may produce functional difficulties for the older adults when judging speeds or estimating gaps in traffic while driving.

  8. Global copy number profiling of cancer genomes | Office of Cancer Genomics

    Cancer.gov

    In this article, we introduce a robust and efficient strategy for deriving global and allele-specific copy number alternations (CNA) from cancer whole exome sequencing data based on Log R ratios and B-allele frequencies. Applying the approach to the analysis of over 200 skin cancer samples, we demonstrate its utility for discovering distinct CNA events and for deriving ancillary information such as tumor purity. Availability and implementation: https://github.com/xfwang/CLOSE CONTACT: xuefeng.wang@stonybrook.edu or michael.krauthammer@yale.edu. (Publication Abstract)

  9. When Global Structure "Explains Away" Local Grammar: A Bayesian Account of Rule-Induction in Tone Sequences

    ERIC Educational Resources Information Center

    Dawson, Colin; Gerken, LouAnn

    2011-01-01

    While many constraints on learning must be relatively experience-independent, past experience provides a rich source of guidance for subsequent learning. Discovering structure in some domain can inform a learner's future hypotheses about that domain. If a general property accounts for particular sub-patterns, a rational learner should not…

  10. Method and apparatus for determining position using global positioning satellites

    NASA Technical Reports Server (NTRS)

    Ward, John (Inventor); Ward, William S. (Inventor)

    1998-01-01

    A global positioning satellite receiver having an antenna for receiving a L1 signal from a satellite. The L1 signal is processed by a preamplifier stage including a band pass filter and a low noise amplifier and output as a radio frequency (RF) signal. A mixer receives and de-spreads the RF signal in response to a pseudo-random noise code, i.e., Gold code, generated by an internal pseudo-random noise code generator. A microprocessor enters a code tracking loop, such that during the code tracking loop, it addresses the pseudo-random code generator to cause the pseudo-random code generator to sequentially output pseudo-random codes corresponding to satellite codes used to spread the L1 signal, until correlation occurs. When an output of the mixer is indicative of the occurrence of correlation between the RF signal and the generated pseudo-random codes, the microprocessor enters an operational state which slows the receiver code sequence to stay locked with the satellite code sequence. The output of the mixer is provided to a detector which, in turn, controls certain routines of the microprocessor. The microprocessor will output pseudo range information according to an interrupt routine in response detection of correlation. The pseudo range information is to be telemetered to a ground station which determines the position of the global positioning satellite receiver.

  11. Global Linking of Cell Tracks Using the Viterbi Algorithm

    PubMed Central

    Jaldén, Joakim; Gilbert, Penney M.; Blau, Helen M.

    2016-01-01

    Automated tracking of living cells in microscopy image sequences is an important and challenging problem. With this application in mind, we propose a global track linking algorithm, which links cell outlines generated by a segmentation algorithm into tracks. The algorithm adds tracks to the image sequence one at a time, in a way which uses information from the complete image sequence in every linking decision. This is achieved by finding the tracks which give the largest possible increases to a probabilistically motivated scoring function, using the Viterbi algorithm. We also present a novel way to alter previously created tracks when new tracks are created, thus mitigating the effects of error propagation. The algorithm can handle mitosis, apoptosis, and migration in and out of the imaged area, and can also deal with false positives, missed detections, and clusters of jointly segmented cells. The algorithm performance is demonstrated on two challenging datasets acquired using bright-field microscopy, but in principle, the algorithm can be used with any cell type and any imaging technique, presuming there is a suitable segmentation algorithm. PMID:25415983

  12. Role of local sequence in the folding of cellular retinoic abinding protein I: structural propensities of reverse turns.

    PubMed

    Rotondi, Kenneth S; Gierasch, Lila M

    2003-07-08

    The experiments described here explore the role of local sequence in the folding of cellular retinoic acid binding protein I (CRABP I). This is a 136-residue, 10-stranded, antiparallel beta-barrel protein with seven beta-hairpins and is a member of the intracellular lipid binding protein (iLBP) family. The relative roles of local and global sequence information in governing the folding of this class of proteins are not well-understood. In question is whether the beta-turns are locally defined by short-range interactions within their sequences, and are thus able to play an active role in reducing the conformational space available to the folding chain, or whether the turns are passive, relying upon global forces to form. Short (six- and seven-residue) peptides corresponding to the seven CRABP I turns were analyzed by circular dichroism and NMR for their tendencies to take up the conformations they adopt in the context of the native protein. The results indicate that two of the peptides, encompassing turns III and IV in CRABP I, have a strong intrinsic bias to form native turns. Intriguingly, these turns are on linked hairpins in CRABP I and represent the best-conserved turns in the iLBP family. These results suggest that local sequence may play an important role in narrowing the conformational ensemble of CRABP I during folding.

  13. Evaluation of haplotype diversity of Achatina fulica (Lissachatina) [Bowdich] from Indian sub-continent by means of 16S rDNA sequence and its phylogenetic relationships with other global populations.

    PubMed

    Ayyagari, Vijaya Sai; Sreerama, Krupanidhi

    2017-08-01

    Achatina fulica (Lissachatina fulica) is one of the most invasive species found across the globe causing a significant damage to crops, vegetables, and horticultural plants. This terrestrial snail is native to east Africa and spread to different parts of the world by introductions. India, a hot spot for biodiversity of several endemic gastropods, has witnessed an outburst of this snail population in several parts of the country posing a serious threat to crop loss and also to human health. With an objective to evaluate the genetic diversity of this snail, we have sampled this snail from different parts of India and analyzed its haplotype diversity by means of 16S rDNA sequence information. Apart from this, we have studied the phylogenetic relationships of the isolates sequenced in the present study in relation with other global populations by Bayesian and Maximum-likelihood approaches. Of the isolates sequenced, haplotype 'C' is the predominant one. A new haplotype 'S' from the state of Odisha was observed. The isolates sequenced in the present study clustered with its conspecifics from the Indian sub-continent. Haplotype network analyses were also carried out for studying the evolution of different haplotypes. It was observed that haplotype 'S' was associated with a Mauritius haplotype 'H', indicating the possibility of multiple introductions of A. fulica to India.

  14. Global tropospheric chemistry: A plan for action

    NASA Technical Reports Server (NTRS)

    1984-01-01

    Prompted by an increasing awareness of the influence of human activity on the chemistry of the global troposphere, a panel was formed to (1) assess the requirement for a global study of the chemistry of the troposphere; (2) develop a scientific strategy for a comprehensive plan taking into account the existing and projected programs of the government; (3) assess the requirements of a global study in terms of theoretical knowledge, numerical modeling, instrumentation, observing platforms, ground-level observational techniques, and other related needs; and (4) outline the appropriate sequence and coordination required to achieve the most effective utilization of available resources. Part 1 presents a coordinated national blueprint for scientific investigations of biogeochemical cycles in the global troposphere. part 2 presents much of the background information of the present knowledge and gaps in the understanding of tropospheric chemical cycles and processes from which the proposed program was developed.

  15. Global tropospheric chemistry: A plan for action

    NASA Astrophysics Data System (ADS)

    1984-10-01

    Prompted by an increasing awareness of the influence of human activity on the chemistry of the global troposphere, a panel was formed to (1) assess the requirement for a global study of the chemistry of the troposphere; (2) develop a scientific strategy for a comprehensive plan taking into account the existing and projected programs of the government; (3) assess the requirements of a global study in terms of theoretical knowledge, numerical modeling, instrumentation, observing platforms, ground-level observational techniques, and other related needs; and (4) outline the appropriate sequence and coordination required to achieve the most effective utilization of available resources. Part 1 presents a coordinated national blueprint for scientific investigations of biogeochemical cycles in the global troposphere. part 2 presents much of the background information of the present knowledge and gaps in the understanding of tropospheric chemical cycles and processes from which the proposed program was developed.

  16. Exploiting the explosion of information associated with whole genome sequencing to tackle Shiga toxin-producing Escherichia coli (STEC) in global food production systems

    USDA-ARS?s Scientific Manuscript database

    The rates of foodborne disease caused by gastrointestinal pathogens continue to be a concern in both the developed and developing worlds. The growing world population, the increasing complexity of agri-food networks and the wide range of foods now associated with STEC are potential drivers for incre...

  17. The 3,000 rice genomes project

    PubMed Central

    2014-01-01

    Background Rice, Oryza sativa L., is the staple food for half the world’s population. By 2030, the production of rice must increase by at least 25% in order to keep up with global population growth and demand. Accelerated genetic gains in rice improvement are needed to mitigate the effects of climate change and loss of arable land, as well as to ensure a stable global food supply. Findings We resequenced a core collection of 3,000 rice accessions from 89 countries. All 3,000 genomes had an average sequencing depth of 14×, with average genome coverages and mapping rates of 94.0% and 92.5%, respectively. From our sequencing efforts, approximately 18.9 million single nucleotide polymorphisms (SNPs) in rice were discovered when aligned to the reference genome of the temperate japonica variety, Nipponbare. Phylogenetic analyses based on SNP data confirmed differentiation of the O. sativa gene pool into 5 varietal groups – indica, aus/boro, basmati/sadri, tropical japonica and temperate japonica. Conclusions Here, we report an international resequencing effort of 3,000 rice genomes. This data serves as a foundation for large-scale discovery of novel alleles for important rice phenotypes using various bioinformatics and/or genetic approaches. It also serves to understand the genomic diversity within O. sativa at a higher level of detail. With the release of the sequencing data, the project calls for the global rice community to take advantage of this data as a foundation for establishing a global, public rice genetic/genomic database and information platform for advancing rice breeding technology for future rice improvement. PMID:24872877

  18. Association mining of dependency between time series

    NASA Astrophysics Data System (ADS)

    Hafez, Alaaeldin

    2001-03-01

    Time series analysis is considered as a crucial component of strategic control over a broad variety of disciplines in business, science and engineering. Time series data is a sequence of observations collected over intervals of time. Each time series describes a phenomenon as a function of time. Analysis on time series data includes discovering trends (or patterns) in a time series sequence. In the last few years, data mining has emerged and been recognized as a new technology for data analysis. Data Mining is the process of discovering potentially valuable patterns, associations, trends, sequences and dependencies in data. Data mining techniques can discover information that many traditional business analysis and statistical techniques fail to deliver. In this paper, we adapt and innovate data mining techniques to analyze time series data. By using data mining techniques, maximal frequent patterns are discovered and used in predicting future sequences or trends, where trends describe the behavior of a sequence. In order to include different types of time series (e.g. irregular and non- systematic), we consider past frequent patterns of the same time sequences (local patterns) and of other dependent time sequences (global patterns). We use the word 'dependent' instead of the word 'similar' for emphasis on real life time series where two time series sequences could be completely different (in values, shapes, etc.), but they still react to the same conditions in a dependent way. In this paper, we propose the Dependence Mining Technique that could be used in predicting time series sequences. The proposed technique consists of three phases: (a) for all time series sequences, generate their trend sequences, (b) discover maximal frequent trend patterns, generate pattern vectors (to keep information of frequent trend patterns), use trend pattern vectors to predict future time series sequences.

  19. TotalReCaller: improved accuracy and performance via integrated alignment and base-calling.

    PubMed

    Menges, Fabian; Narzisi, Giuseppe; Mishra, Bud

    2011-09-01

    Currently, re-sequencing approaches use multiple modules serially to interpret raw sequencing data from next-generation sequencing platforms, while remaining oblivious to the genomic information until the final alignment step. Such approaches fail to exploit the full information from both raw sequencing data and the reference genome that can yield better quality sequence reads, SNP-calls, variant detection, as well as an alignment at the best possible location in the reference genome. Thus, there is a need for novel reference-guided bioinformatics algorithms for interpreting analog signals representing sequences of the bases ({A, C, G, T}), while simultaneously aligning possible sequence reads to a source reference genome whenever available. Here, we propose a new base-calling algorithm, TotalReCaller, to achieve improved performance. A linear error model for the raw intensity data and Burrows-Wheeler transform (BWT) based alignment are combined utilizing a Bayesian score function, which is then globally optimized over all possible genomic locations using an efficient branch-and-bound approach. The algorithm has been implemented in soft- and hardware [field-programmable gate array (FPGA)] to achieve real-time performance. Empirical results on real high-throughput Illumina data were used to evaluate TotalReCaller's performance relative to its peers-Bustard, BayesCall, Ibis and Rolexa-based on several criteria, particularly those important in clinical and scientific applications. Namely, it was evaluated for (i) its base-calling speed and throughput, (ii) its read accuracy and (iii) its specificity and sensitivity in variant calling. A software implementation of TotalReCaller as well as additional information, is available at: http://bioinformatics.nyu.edu/wordpress/projects/totalrecaller/ fabian.menges@nyu.edu.

  20. Prediction and identification of sequences coding for orphan enzymes using genomic and metagenomic neighbours

    PubMed Central

    Yamada, Takuji; Waller, Alison S; Raes, Jeroen; Zelezniak, Aleksej; Perchat, Nadia; Perret, Alain; Salanoubat, Marcel; Patil, Kiran R; Weissenbach, Jean; Bork, Peer

    2012-01-01

    Despite the current wealth of sequencing data, one-third of all biochemically characterized metabolic enzymes lack a corresponding gene or protein sequence, and as such can be considered orphan enzymes. They represent a major gap between our molecular and biochemical knowledge, and consequently are not amenable to modern systemic analyses. As 555 of these orphan enzymes have metabolic pathway neighbours, we developed a global framework that utilizes the pathway and (meta)genomic neighbour information to assign candidate sequences to orphan enzymes. For 131 orphan enzymes (37% of those for which (meta)genomic neighbours are available), we associate sequences to them using scoring parameters with an estimated accuracy of 70%, implying functional annotation of 16 345 gene sequences in numerous (meta)genomes. As a case in point, two of these candidate sequences were experimentally validated to encode the predicted activity. In addition, we augmented the currently available genome-scale metabolic models with these new sequence–function associations and were able to expand the models by on average 8%, with a considerable change in the flux connectivity patterns and improved essentiality prediction. PMID:22569339

  1. Ultrafast scene detection and recognition with limited visual information

    PubMed Central

    Hagmann, Carl Erick; Potter, Mary C.

    2016-01-01

    Humans can detect target color pictures of scenes depicting concepts like picnic or harbor in sequences of six or twelve pictures presented as briefly as 13 ms, even when the target is named after the sequence (Potter, Wyble, Hagmann, & McCourt, 2014). Such rapid detection suggests that feedforward processing alone enabled detection without recurrent cortical feedback. There is debate about whether coarse, global, low spatial frequencies (LSFs) provide predictive information to high cortical levels through the rapid magnocellular (M) projection of the visual path, enabling top-down prediction of possible object identities. To test the “Fast M” hypothesis, we compared detection of a named target across five stimulus conditions: unaltered color, blurred color, grayscale, thresholded monochrome, and LSF pictures. The pictures were presented for 13–80 ms in six-picture rapid serial visual presentation (RSVP) sequences. Blurred, monochrome, and LSF pictures were detected less accurately than normal color or grayscale pictures. When the target was named before the sequence, all picture types except LSF resulted in above-chance detection at all durations. Crucially, when the name was given only after the sequence, performance dropped and the monochrome and LSF pictures (but not the blurred pictures) were at or near chance. Thus, without advance information, monochrome and LSF pictures were rarely understood. The results offer only limited support for the Fast M hypothesis, suggesting instead that feedforward processing is able to activate conceptual representations without complementary reentrant processing. PMID:28255263

  2. Effects of global and local contexts on chord processing: An ERP study.

    PubMed

    Zhang, Jingjing; Zhou, Xuefeng; Chang, Ruohan; Yang, Yufang

    2018-01-31

    In real life, the processing of an incoming event is continuously influenced by prior information at multiple timescales. The present study investigated how harmonic contexts at both local and global levels influence the processing of an incoming chord in an event-related potentials experiment. Chord sequences containing two phrases were presented to musically trained listeners, with the last critical chord either harmonically related or less related to its preceding context at local and/or global levels. ERPs data showed an ERAN-like effect for local context in early time window and a N5-like component for later interaction between the local context and global context. These results suggest that both the local and global contexts influence the processing of an incoming music event, and the local effect happens earlier than the global. Moreover, the interaction between the local context and global context in N5 may suggest that music syntactic integration at local level takes place prior to the integration at global level. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Protein location prediction using atomic composition and global features of the amino acid sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less

  4. Influenza Virus Database (IVDB): an integrated information resource and analysis platform for influenza virus research.

    PubMed

    Chang, Suhua; Zhang, Jiajie; Liao, Xiaoyun; Zhu, Xinxing; Wang, Dahai; Zhu, Jiang; Feng, Tao; Zhu, Baoli; Gao, George F; Wang, Jian; Yang, Huanming; Yu, Jun; Wang, Jing

    2007-01-01

    Frequent outbreaks of highly pathogenic avian influenza and the increasing data available for comparative analysis require a central database specialized in influenza viruses (IVs). We have established the Influenza Virus Database (IVDB) to integrate information and create an analysis platform for genetic, genomic, and phylogenetic studies of the virus. IVDB hosts complete genome sequences of influenza A virus generated by Beijing Institute of Genomics (BIG) and curates all other published IV sequences after expert annotation. Our Q-Filter system classifies and ranks all nucleotide sequences into seven categories according to sequence content and integrity. IVDB provides a series of tools and viewers for comparative analysis of the viral genomes, genes, genetic polymorphisms and phylogenetic relationships. A search system has been developed for users to retrieve a combination of different data types by setting search options. To facilitate analysis of global viral transmission and evolution, the IV Sequence Distribution Tool (IVDT) has been developed to display the worldwide geographic distribution of chosen viral genotypes and to couple genomic data with epidemiological data. The BLAST, multiple sequence alignment and phylogenetic analysis tools were integrated for online data analysis. Furthermore, IVDB offers instant access to pre-computed alignments and polymorphisms of IV genes and proteins, and presents the results as SNP distribution plots and minor allele distributions. IVDB is publicly available at http://influenza.genomics.org.cn.

  5. Genomics of high molecular weight plasmids isolated from an on-farm biopurification system.

    PubMed

    Martini, María C; Wibberg, Daniel; Lozano, Mauricio; Torres Tejerizo, Gonzalo; Albicoro, Francisco J; Jaenicke, Sebastian; van Elsas, Jan Dirk; Petroni, Alejandro; Garcillán-Barcia, M Pilar; de la Cruz, Fernando; Schlüter, Andreas; Pühler, Alfred; Pistorio, Mariano; Lagares, Antonio; Del Papa, María F

    2016-06-20

    The use of biopurification systems (BPS) constitutes an efficient strategy to eliminate pesticides from polluted wastewaters from farm activities. BPS environments contain a high microbial density and diversity facilitating the exchange of information among bacteria, mediated by mobile genetic elements (MGEs), which play a key role in bacterial adaptation and evolution in such environments. Here we sequenced and characterized high-molecular-weight plasmids from a bacterial collection of an on-farm BPS. The high-throughput-sequencing of the plasmid pool yielded a total of several Mb sequence information. Assembly of the sequence data resulted in six complete replicons. Using in silico analyses we identified plasmid replication genes whose encoding proteins represent 13 different Pfam families, as well as proteins involved in plasmid conjugation, indicating a large diversity of plasmid replicons and suggesting the occurrence of horizontal gene transfer (HGT) events within the habitat analyzed. In addition, genes conferring resistance to 10 classes of antimicrobial compounds and those encoding enzymes potentially involved in pesticide and aromatic hydrocarbon degradation were found. Global analysis of the plasmid pool suggest that the analyzed BPS represents a key environment for further studies addressing the dissemination of MGEs carrying catabolic genes and pathway assembly regarding degradation capabilities.

  6. A RESTful application programming interface for the PubMLST molecular typing and genome databases

    PubMed Central

    Bray, James E.; Maiden, Martin C. J.

    2017-01-01

    Abstract Molecular typing is used to differentiate microorganisms at the subspecies or strain level for epidemiological investigations, infection control, public health and environmental sampling. DNA sequence-based typing methods require authoritative databases that link sequence variants to nomenclature in order to facilitate communication and comparison of identified types in national or global settings. The PubMLST website (https://pubmlst.org/) fulfils this role for over a hundred microorganisms for which it hosts curated molecular sequence typing data, providing sequence and allelic profile definitions for multi-locus sequence typing (MLST) and single-gene typing approaches. In recent years, these have expanded to cover the whole genome with schemes such as core genome MLST (cgMLST) and whole genome MLST (wgMLST) which catalogue the allelic diversity found in hundreds to thousands of genes. These approaches provide a common nomenclature for high-resolution strain characterization and comparison. Molecular typing information is linked to isolate provenance, phenotype, and increasingly genome assemblies, providing a resource for outbreak investigation and research in to population structure, gene association, global epidemiology and vaccine coverage. A Representational State Transfer (REST) Application Programming Interface (API) has been developed for the PubMLST website to make these large quantities of structured molecular typing and whole genome sequence data available for programmatic access by any third party application. The API is an integral component of the Bacterial Isolate Genome Sequence Database (BIGSdb) platform that is used to host PubMLST resources, and exposes all public data within the site. In addition to data browsing, searching and download, the API supports authentication and submission of new data to curator queues. Database URL: http://rest.pubmlst.org/ PMID:29220452

  7. Phytophthora database 2.0: update and future direction.

    PubMed

    Park, Bongsoo; Martin, Frank; Geiser, David M; Kim, Hye-Seon; Mansfield, Michele A; Nikolaeva, Ekaterina; Park, Sook-Young; Coffey, Michael D; Russo, Joseph; Kim, Seong H; Balci, Yilmaz; Abad, Gloria; Burgess, Treena; Grünwald, Niklaus J; Cheong, Kyeongchae; Choi, Jaeyoung; Lee, Yong-Hwan; Kang, Seogchan

    2013-12-01

    The online community resource Phytophthora database (PD) was developed to support accurate and rapid identification of Phytophthora and to help characterize and catalog the diversity and evolutionary relationships within the genus. Since its release in 2008, the sequence database has grown to cover 1 to 12 loci for ≈2,600 isolates (representing 138 described and provisional species). Sequences of multiple mitochondrial loci were added to complement nuclear loci-based phylogenetic analyses and diagnostic tool development. Key characteristics of most newly described and provisional species have been summarized. Other additions to improve the PD functionality include: (i) geographic information system tools that enable users to visualize the geographic origins of chosen isolates on a global-scale map, (ii) a tool for comparing genetic similarity between isolates via microsatellite markers to support population genetic studies, (iii) a comprehensive review of molecular diagnostics tools and relevant references, (iv) sequence alignments used to develop polymerase chain reaction-based diagnostics tools to support their utilization and new diagnostic tool development, and (v) an online community forum for sharing and preserving experience and knowledge accumulated in the global Phytophthora community. Here we present how these improvements can support users and discuss the PD's future direction.

  8. FARME DB: a functional antibiotic resistance element database

    PubMed Central

    Wallace, James C.; Port, Jesse A.; Smith, Marissa N.; Faustman, Elaine M.

    2017-01-01

    Antibiotic resistance (AR) is a major global public health threat but few resources exist that catalog AR genes outside of a clinical context. Current AR sequence databases are assembled almost exclusively from genomic sequences derived from clinical bacterial isolates and thus do not include many microbial sequences derived from environmental samples that confer resistance in functional metagenomic studies. These environmental metagenomic sequences often show little or no similarity to AR sequences from clinical isolates using standard classification criteria. In addition, existing AR databases provide no information about flanking sequences containing regulatory or mobile genetic elements. To help address this issue, we created an annotated database of DNA and protein sequences derived exclusively from environmental metagenomic sequences showing AR in laboratory experiments. Our Functional Antibiotic Resistant Metagenomic Element (FARME) database is a compilation of publically available DNA sequences and predicted protein sequences conferring AR as well as regulatory elements, mobile genetic elements and predicted proteins flanking antibiotic resistant genes. FARME is the first database to focus on functional metagenomic AR gene elements and provides a resource to better understand AR in the 99% of bacteria which cannot be cultured and the relationship between environmental AR sequences and antibiotic resistant genes derived from cultured isolates. Database URL: http://staff.washington.edu/jwallace/farme PMID:28077567

  9. PreCisIon: PREdiction of CIS-regulatory elements improved by gene's positION.

    PubMed

    Elati, Mohamed; Nicolle, Rémy; Junier, Ivan; Fernández, David; Fekih, Rim; Font, Julio; Képès, François

    2013-02-01

    Conventional approaches to predict transcriptional regulatory interactions usually rely on the definition of a shared motif sequence on the target genes of a transcription factor (TF). These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices, which may match large numbers of sites and produce an unreliable list of target genes. To improve the prediction of binding sites, we propose to additionally use the unrelated knowledge of the genome layout. Indeed, it has been shown that co-regulated genes tend to be either neighbors or periodically spaced along the whole chromosome. This study demonstrates that respective gene positioning carries significant information. This novel type of information is combined with traditional sequence information by a machine learning algorithm called PreCisIon. To optimize this combination, PreCisIon builds a strong gene target classifier by adaptively combining weak classifiers based on either local binding sequence or global gene position. This strategy generically paves the way to the optimized incorporation of any future advances in gene target prediction based on local sequence, genome layout or on novel criteria. With the current state of the art, PreCisIon consistently improves methods based on sequence information only. This is shown by implementing a cross-validation analysis of the 20 major TFs from two phylogenetically remote model organisms. For Bacillus subtilis and Escherichia coli, respectively, PreCisIon achieves on average an area under the receiver operating characteristic curve of 70 and 60%, a sensitivity of 80 and 70% and a specificity of 60 and 56%. The newly predicted gene targets are demonstrated to be functionally consistent with previously known targets, as assessed by analysis of Gene Ontology enrichment or of the relevant literature and databases.

  10. Population genetic structure and natural selection of Plasmodium falciparum apical membrane antigen-1 in Myanmar isolates.

    PubMed

    Kang, Jung-Mi; Lee, Jinyoung; Moe, Mya; Jun, Hojong; Lê, Hương Giang; Kim, Tae Im; Thái, Thị Lam; Sohn, Woon-Mok; Myint, Moe Kyaw; Lin, Khin; Shin, Ho-Joon; Kim, Tong-Soo; Na, Byoung-Kuk

    2018-02-07

    Plasmodium falciparum apical membrane antigen-1 (PfAMA-1) is one of leading blood stage malaria vaccine candidates. However, genetic variation and antigenic diversity identified in global PfAMA-1 are major hurdles in the development of an effective vaccine based on this antigen. In this study, genetic structure and the effect of natural selection of PfAMA-1 among Myanmar P. falciparum isolates were analysed. Blood samples were collected from 58 Myanmar patients with falciparum malaria. Full-length PfAMA-1 gene was amplified by polymerase chain reaction and cloned into a TA cloning vector. PfAMA-1 sequence of each isolate was sequenced. Polymorphic characteristics and effect of natural selection were analysed with using DNASTAR, MEGA4, and DnaSP programs. Polymorphic nature and natural selection in 459 global PfAMA-1 were also analysed. Thirty-seven different haplotypes of PfAMA-1 were identified in 58 Myanmar P. falciparum isolates. Most amino acid changes identified in Myanmar PfAMA-1 were found in domains I and III. Overall patterns of amino acid changes in Myanmar PfAMA-1 were similar to those in global PfAMA-1. However, frequencies of amino acid changes differed by country. Novel amino acid changes in Myanmar PfAMA-1 were also identified. Evidences for natural selection and recombination event were observed in global PfAMA-1. Among 51 commonly identified amino acid changes in global PfAMA-1 sequences, 43 were found in predicted RBC-binding sites, B-cell epitopes, or IUR regions. Myanmar PfAMA-1 showed similar patterns of nucleotide diversity and amino acid polymorphisms compared to those of global PfAMA-1. Balancing natural selection and intragenic recombination across PfAMA-1 are likely to play major roles in generating genetic diversity in global PfAMA-1. Most common amino acid changes in global PfAMA-1 were located in predicted B-cell epitopes where high levels of nucleotide diversity and balancing natural selection were found. These results highlight the strong selective pressure of host immunity on the PfAMA-1 gene. These results have significant implications in understanding the nature of Myanmar PfAMA-1 along with global PfAMA-1. They also provide useful information for the development of effective malaria vaccine based on this antigen.

  11. Paralogues of nuclear ribosomal genes conceal phylogenetic signals within the invasive Asian fish tapeworm lineage: evidence from next generation sequencing data.

    PubMed

    Brabec, Jan; Kuchta, Roman; Scholz, Tomáš; Littlewood, D Timothy J

    2016-08-01

    Complete mitochondrial genomes and nuclear rRNA operons of eight geographically distinct isolates of the Asian fish tapeworm Schyzocotyle acheilognathi (syn. Bothriocephalus acheilognathi), representing the parasite's global diversity spanning four continents, were fully characterised using an Illumina sequencing platform. This cestode species represents an extreme example of a highly invasive, globally distributed pathogen of veterinary importance with exceptionally low host specificity unseen elsewhere within the parasitic flatworms. In addition to eight specimens of S. acheilognathi, we fully characterised its closest known relative and the only congeneric species, Schyzocotyle nayarensis, from cyprinids in the Indian subcontinent. Since previous nucleotide sequence data on the Asian fish tapeworm were restricted to a single molecular locus of questionable phylogenetic utility-the nuclear rRNA genes-separating internal transcribed spacers-the mitogenomic data presented here offer a unique opportunity to gain the first detailed insights into both the intraspecific phylogenetic relationships and population genetic structure of the parasite, providing key baseline information for future research in the field. Additionally, we identify a previously unnoticed source of error and demonstrate the limited utility of the nuclear rRNA sequences, including the internal transcribed spacers that has likely misled most of the previous molecular phylogenetic and population genetic estimates on the Asian fish tapeworm. Copyright © 2016 Australian Society for Parasitology. Published by Elsevier Ltd. All rights reserved.

  12. Bacterial taxa–area and distance–decay relationships in marine environments

    PubMed Central

    Zinger, L; Boetius, A; Ramette, A

    2014-01-01

    The taxa–area relationship (TAR) and the distance–decay relationship (DDR) both describe spatial turnover of taxa and are central patterns of biodiversity. Here, we compared TAR and DDR of bacterial communities across different marine realms and ecosystems at the global scale. To obtain reliable global estimates for both relationships, we quantified the poorly assessed effects of sequencing depth, rare taxa removal and number of sampling sites. Slope coefficients of bacterial TARs were within the range of those of plants and animals, whereas slope coefficients of bacterial DDR were much lower. Slope coefficients were mostly affected by removing rare taxa and by the number of sampling sites considered in the calculations. TAR and DDR slope coefficients were overestimated at sequencing depth <4000 sequences per sample. Noticeably, bacterial TAR and DDR patterns did not correlate with each other both within and across ecosystem types, suggesting that (i) TAR cannot be directly derived from DDR and (ii) TAR and DDR may be influenced by different ecological factors. Nevertheless, we found marine bacterial TAR and DDR to be steeper in ecosystems associated with high environmental heterogeneity or spatial isolation, namely marine sediments and coastal environments compared with pelagic ecosystems. Hence, our study provides information on macroecological patterns of marine bacteria, as well as methodological and conceptual insights, at a time when biodiversity surveys increasingly make use of high-throughput sequencing technologies. PMID:24460915

  13. Microbial Culturomics Application for Global Health: Noncontiguous Finished Genome Sequence and Description of Pseudomonas massiliensis Strain CB-1T sp. nov. in Brazil.

    PubMed

    Bardet, Lucie; Cimmino, Teresa; Buffet, Clémence; Michelle, Caroline; Rathored, Jaishriram; Tandina, Fatalmoudou; Lagier, Jean-Christophe; Khelaifia, Saber; Abrahão, Jônatas; Raoult, Didier; Rolain, Jean-Marc

    2018-02-01

    Culturomics is a new postgenomics field that explores the microbial diversity of the human gut coupled with taxono-genomic strategy. Culturomics, and the microbiome science more generally, are anticipated to transform global health diagnostics and inform the ways in which gut microbial diversity contributes to human health and disease, and by extension, to personalized medicine. Using culturomics, we report in this study the description of strain CB1 T ( = CSUR P1334 = DSM 29075), a new species isolated from a stool specimen from a 37-year-old Brazilian woman. This description includes phenotypic characteristics and complete genome sequence and annotation. Strain CB1 T is a gram-negative aerobic and motile bacillus, exhibits neither catalase nor oxidase activities, and presents a 98.3% 16S rRNA sequence similarity with Pseudomonas putida. The 4,723,534 bp long genome contains 4239 protein-coding genes and 74 RNA genes, including 15 rRNA genes (5 16S rRNA, 4 23S rRNA, and 6 5S rRNA) and 59 tRNA genes. Strain CB1 T was named Pseudomonas massiliensis sp. nov. and classified into the family Pseudomonadaceae. This study demonstrates the usefulness of microbial culturomics in exploration of human microbiota in diverse geographies and offers new promise for incorporating new omics technologies for innovation in diagnostic medicine and global health.

  14. Precision global health in the digital age.

    PubMed

    Flahault, Antoine; Geissbuhler, Antoine; Guessous, Idris; Guérin, Philippe; Bolon, Isabelle; Salathé, Marcel; Escher, Gérard

    2017-04-19

    Precision global health is an approach similar to precision medicine, which facilitates, through innovation and technology, better targeting of public health interventions on a global scale, for the purpose of maximising their effectiveness and relevance. Illustrative examples include: the use of remote sensing data to fight vector-borne diseases; large databases of genomic sequences of foodborne pathogens helping to identify origins of outbreaks; social networks and internet search engines for tracking communicable diseases; cell phone data in humanitarian actions; drones to deliver healthcare services in remote and secluded areas. Open science and data sharing platforms are proposed for fostering international research programmes under fair, ethical and respectful conditions. Innovative education, such as massive open online courses or serious games, can promote wider access to training in public health and improving health literacy. The world is moving towards learning healthcare systems. Professionals are equipped with data collection and decision support devices. They share information, which are complemented by external sources, and analysed in real time using machine learning techniques. They allow for the early detection of anomalies, and eventually guide appropriate public health interventions. This article shows how information-driven approaches, enabled by digital technologies, can help improving global health with greater equity.

  15. Collective decision dynamics in the presence of external drivers

    NASA Astrophysics Data System (ADS)

    Bassett, Danielle S.; Alderson, David L.; Carlson, Jean M.

    2012-09-01

    We develop a sequence of models describing information transmission and decision dynamics for a network of individual agents subject to multiple sources of influence. Our general framework is set in the context of an impending natural disaster, where individuals, represented by nodes on the network, must decide whether or not to evacuate. Sources of influence include a one-to-many externally driven global broadcast as well as pairwise interactions, across links in the network, in which agents transmit either continuous opinions or binary actions. We consider both uniform and variable threshold rules on the individual opinion as baseline models for decision making. Our results indicate that (1) social networks lead to clustering and cohesive action among individuals, (2) binary information introduces high temporal variability and stagnation, and (3) information transmission over the network can either facilitate or hinder action adoption, depending on the influence of the global broadcast relative to the social network. Our framework highlights the essential role of local interactions between agents in predicting collective behavior of the population as a whole.

  16. Phylo-mLogo: an interactive and hierarchical multiple-logo visualization tool for alignment of many sequences

    PubMed Central

    Shih, Arthur Chun-Chieh; Lee, DT; Peng, Chin-Lin; Wu, Yu-Wei

    2007-01-01

    Background When aligning several hundreds or thousands of sequences, such as epidemic virus sequences or homologous/orthologous sequences of some big gene families, to reconstruct the epidemiological history or their phylogenies, how to analyze and visualize the alignment results of many sequences has become a new challenge for computational biologists. Although there are several tools available for visualization of very long sequence alignments, few of them are applicable to the alignments of many sequences. Results A multiple-logo alignment visualization tool, called Phylo-mLogo, is presented in this paper. Phylo-mLogo calculates the variabilities and homogeneities of alignment sequences by base frequencies or entropies. Different from the traditional representations of sequence logos, Phylo-mLogo not only displays the global logo patterns of the whole alignment of multiple sequences, but also demonstrates their local homologous logos for each clade hierarchically. In addition, Phylo-mLogo also allows the user to focus only on the analysis of some important, structurally or functionally constrained sites in the alignment selected by the user or by built-in automatic calculation. Conclusion With Phylo-mLogo, the user can symbolically and hierarchically visualize hundreds of aligned sequences simultaneously and easily check the changes of their amino acid sites when analyzing many homologous/orthologous or influenza virus sequences. More information of Phylo-mLogo can be found at URL . PMID:17319966

  17. Financial time series analysis based on information categorization method

    NASA Astrophysics Data System (ADS)

    Tian, Qiang; Shang, Pengjian; Feng, Guochen

    2014-12-01

    The paper mainly applies the information categorization method to analyze the financial time series. The method is used to examine the similarity of different sequences by calculating the distances between them. We apply this method to quantify the similarity of different stock markets. And we report the results of similarity in US and Chinese stock markets in periods 1991-1998 (before the Asian currency crisis), 1999-2006 (after the Asian currency crisis and before the global financial crisis), and 2007-2013 (during and after global financial crisis) by using this method. The results show the difference of similarity between different stock markets in different time periods and the similarity of the two stock markets become larger after these two crises. Also we acquire the results of similarity of 10 stock indices in three areas; it means the method can distinguish different areas' markets from the phylogenetic trees. The results show that we can get satisfactory information from financial markets by this method. The information categorization method can not only be used in physiologic time series, but also in financial time series.

  18. HIITE: HIV-1 incidence and infection time estimator.

    PubMed

    Park, Sung Yong; Love, Tanzy M T; Kapoor, Shivankur; Lee, Ha Youn

    2018-06-15

    Around 2.1 million new HIV-1 infections were reported in 2015, alerting that the HIV-1 epidemic remains a significant global health challenge. Precise incidence assessment strengthens epidemic monitoring efforts and guides strategy optimization for prevention programs. Estimating the onset time of HIV-1 infection can facilitate optimal clinical management and identify key populations largely responsible for epidemic spread and thereby infer HIV-1 transmission chains. Our goal is to develop a genomic assay estimating the incidence and infection time in a single cross-sectional survey setting. We created a web-based platform, HIV-1 incidence and infection time estimator (HIITE), which processes envelope gene sequences using hierarchical clustering algorithms and informs the stage of infection, along with time since infection for incident cases. HIITE's performance was evaluated using 585 incident and 305 chronic specimens' envelope gene sequences collected from global cohorts including HIV-1 vaccine trial participants. HIITE precisely identified chronically infected individuals as being chronic with an error less than 1% and correctly classified 94% of recently infected individuals as being incident. Using a mixed-effect model, an incident specimen's time since infection was estimated from its single lineage diversity, showing 14% prediction error for time since infection. HIITE is the first algorithm to inform two key metrics from a single time point sequence sample. HIITE has the capacity for assessing not only population-level epidemic spread but also individual-level transmission events from a single survey, advancing HIV prevention and intervention programs. Web-based HIITE and source code of HIITE are available at http://www.hayounlee.org/software.html. Supplementary data are available at Bioinformatics online.

  19. A novel model for DNA sequence similarity analysis based on graph theory.

    PubMed

    Qi, Xingqin; Wu, Qin; Zhang, Yusen; Fuller, Eddie; Zhang, Cun-Quan

    2011-01-01

    Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to develop novel mathematical descriptors for similarity analysis such that various mutation phenomena information would be involved simultaneously. In this paper, different from traditional methods (eg, nucleotide frequency, geometric representations) as bases for construction of mathematical descriptors, we construct novel mathematical descriptors based on graph theory. In particular, for each DNA sequence, we will set up a weighted directed graph. The adjacency matrix of the directed graph will be used to induce a representative vector for DNA sequence. This new approach measures similarity based on both ordering and frequency of nucleotides so that much more information is involved. As an application, the method is tested on a set of 0.9-kb mtDNA sequences of twelve different primate species. All output phylogenetic trees with various distance estimations have the same topology, and are generally consistent with the reported results from early studies, which proves the new method's efficiency; we also test the new method on a simulated data set, which shows our new method performs better than traditional global alignment method when subsequent rearrangements happen frequently during evolutionary history.

  20. PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition.

    PubMed

    Chen, Wei; Lei, Tian-Yu; Jin, Dian-Chuan; Lin, Hao; Chou, Kuo-Chen

    2014-07-01

    The pseudo oligonucleotide composition, or pseudo K-tuple nucleotide composition (PseKNC), can be used to represent a DNA or RNA sequence with a discrete model or vector yet still keep considerable sequence order information, particularly the global or long-range sequence order information, via the physicochemical properties of its constituent oligonucleotides. Therefore, the PseKNC approach may hold very high potential for enhancing the power in dealing with many problems in computational genomics and genome sequence analysis. However, dealing with different DNA or RNA problems may need different kinds of PseKNC. Here, we present a flexible and user-friendly web server for PseKNC (at http://lin.uestc.edu.cn/pseknc/default.aspx) by which users can easily generate many different modes of PseKNC according to their need by selecting various parameters and physicochemical properties. Furthermore, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the current web server to generate their desired PseKNC without the need to follow the complicated mathematical equations, which are presented in this article just for the integrity of PseKNC formulation and its development. It is anticipated that the PseKNC web server will become a very useful tool in computational genomics and genome sequence analysis. Copyright © 2014 Elsevier Inc. All rights reserved.

  1. Molecular Epidemiology of Oyster-Related Human Noroviruses and Their Global Genetic Diversity and Temporal-Geographical Distribution from 1983 to 2014

    PubMed Central

    Yu, Yongxin; Cai, Hui; Hu, Linghao; Lei, Rongwei; Pan, Yingjie; Yan, Shuling

    2015-01-01

    Noroviruses (NoVs) are a leading cause of epidemic and sporadic cases of acute gastroenteritis worldwide. Oysters are well recognized as the main vectors of environmentally transmitted NoVs, and disease outbreaks linked to oyster consumption have been commonly observed. Here, to quantify the genetic diversity, temporal distribution, and circulation of oyster-related NoVs on a global scale, 1,077 oyster-related NoV sequences deposited from 1983 to 2014 were downloaded from both NCBI GenBank and the NoroNet outbreak database and were then screened for quality control. A total of 665 sequences with reliable information were obtained and were subsequently subjected to genotyping and phylogenetic analyses. The results indicated that the majority of oyster-related NoV sequences were obtained from coastal countries and regions and that the numbers of sequences in these regions were unevenly distributed. Moreover, >80% of human NoV genotypes were detected in oyster samples or oyster-related outbreaks. A higher proportion of genogroup I (GI) (34%) was observed for oyster-related sequences than for non-oyster-related outbreaks, where GII strains dominated with an overwhelming majority of >90%, indicating that the prevalences of GI and GII are different in humans and oysters. In addition, a related convergence of the circulation trend was found between oyster-related NoV sequences and human pandemic outbreaks. This suggests that oysters not only act as a vector of NoV through environmental transmission but also serve as an important reservoir of human NoVs. These results highlight the importance of oysters in the persistence and transmission of human NoVs in the environment and have important implications for the surveillance of human NoVs in oyster samples. PMID:26319869

  2. The practical evaluation of DNA barcode efficacy.

    PubMed

    Spouge, John L; Mariño-Ramírez, Leonardo

    2012-01-01

    This chapter describes a workflow for measuring the efficacy of a barcode in identifying species. First, assemble individual sequence databases corresponding to each barcode marker. A controlled collection of taxonomic data is preferable to GenBank data, because GenBank data can be problematic, particularly when comparing barcodes based on more than one marker. To ensure proper controls when evaluating species identification, specimens not having a sequence in every marker database should be discarded. Second, select a computer algorithm for assigning species to barcode sequences. No algorithm has yet improved notably on assigning a specimen to the species of its nearest neighbor within a barcode database. Because global sequence alignments (e.g., with the Needleman-Wunsch algorithm, or some related algorithm) examine entire barcode sequences, they generally produce better species assignments than local sequence alignments (e.g., with BLAST). No neighboring method (e.g., global sequence similarity, global sequence distance, or evolutionary distance based on a global alignment) has yet shown a notable superiority in identifying species. Finally, "the probability of correct identification" (PCI) provides an appropriate measurement of barcode efficacy. The overall PCI for a data set is the average of the species PCIs, taken over all species in the data set. This chapter states explicitly how to calculate PCI, how to estimate its statistical sampling error, and how to use data on PCR failure to set limits on how much improvements in PCR technology can improve species identification.

  3. The Tension Between Data Sharing and the Protection of Privacy in Genomics Research

    PubMed Central

    Kaye, Jane

    2014-01-01

    Next-generation sequencing and global data sharing challenge many of the governance mechanisms currently in place to protect the privacy of research participants. These challenges will make it more difficult to guarantee anonymity for participants, provide information to satisfy the requirements of informed consent, and ensure complete withdrawal from research when requested. To move forward, we need to improve the current governance systems for research so that they are responsive to individual privacy concerns but can also be effective at a global level. We need to develop a system of e-governance that can complement existing governance systems but that places greater reliance on the use of technology to ensure compliance with ethical and legal requirements. These new governance structures must be able to address the concerns of research participants while at the same time ensuring effective data sharing that promotes public trust in genomics research. PMID:22404490

  4. The tension between data sharing and the protection of privacy in genomics research.

    PubMed

    Kaye, Jane

    2012-01-01

    Next-generation sequencing and global data sharing challenge many of the governance mechanisms currently in place to protect the privacy of research participants. These challenges will make it more difficult to guarantee anonymity for participants, provide information to satisfy the requirements of informed consent, and ensure complete withdrawal from research when requested. To move forward, we need to improve the current governance systems for research so that they are responsive to individual privacy concerns but can also be effective at a global level. We need to develop a system of e-governance that can complement existing governance systems but that places greater reliance on the use of technology to ensure compliance with ethical and legal requirements. These new governance structures must be able to address the concerns of research participants while at the same time ensuring effective data sharing that promotes public trust in genomics research.

  5. Clinically actionable mutation profiles in patients with cancer identified by whole-genome sequencing

    PubMed Central

    Mizani, Tuba; Hamblin, Angela; Parton, Marina; Orosz, Zsolt; Athanasou, Nick; Hassan, Bass; Flanagan, Adrienne M.; Ahmed, Ahmed; Winter, Stuart; Harris, Adrian; Popitsch, Niko; Church, David; Taylor, Jenny C.

    2018-01-01

    Next-generation sequencing (NGS) efforts have established catalogs of mutations relevant to cancer development. However, the clinical utility of this information remains largely unexplored. Here, we present the results of the first eight patients recruited into a clinical whole-genome sequencing (WGS) program in the United Kingdom. We performed PCR-free WGS of fresh frozen tumors and germline DNA at 75× and 30×, respectively, using the HiSeq2500 HTv4. Subtracted tumor VCFs and paired germlines were subjected to comprehensive analysis of coding and noncoding regions, integration of germline with somatically acquired variants, and global mutation signatures and pathway analyses. Results were classified into tiers and presented to a multidisciplinary tumor board. WGS results helped to clarify an uncertain histopathological diagnosis in one case, led to informed or supported prognosis in two cases, leading to de-escalation of therapy in one, and indicated potential treatments in all eight. Overall 26 different tier 1 potentially clinically actionable findings were identified using WGS compared with six SNVs/indels using routine targeted NGS. These initial results demonstrate the potential of WGS to inform future diagnosis, prognosis, and treatment choice in cancer and justify the systematic evaluation of the clinical utility of WGS in larger cohorts of patients with cancer. PMID:29610388

  6. Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone.

    PubMed

    Hallam, Steven J; Torres-Beltrán, Mónica; Hawley, Alyse K

    2017-10-31

    Today in Scientific Data, two compendia of geochemical and multi-omic sequence information (DNA, RNA, protein) generated over almost a decade of time series monitoring in a seasonally anoxic coastal marine setting are presented to the scientific community. These data descriptors introduce a model ecosystem for the study of microbial responses to ocean deoxygenation, a phenotype that is currently expanding due to climate change. Public access to this time series information is intended to promote scientific collaborations and the generation of new hypotheses relevant to microbial ecology, biogeochemistry and global change issues.

  7. Understanding the molecular epidemiology and global relationships of Brachyspira hyodysenteriae from swine herds in the United States: a multi-locus sequence typing approach.

    PubMed

    Mirajkar, Nandita S; Gebhart, Connie J

    2014-01-01

    Outbreaks of mucohemorrhagic diarrhea in pigs caused by Brachyspira hyodysenteriae in the late 2000s indicated the re-emergence of Swine Dysentery (SD) in the U.S. Although the clinical disease was absent in the U.S. since the early 1990s, it continued to cause significant economic losses to other swine rearing countries worldwide. This study aims to fill the gap in knowledge pertaining to the re-emergence and epidemiology of B. hyodysenteriae in the U.S. and its global relationships using a multi-locus sequence typing (MLST) approach. Fifty-nine post re-emergent isolates originating from a variety of sources in the U.S. were characterized by MLST, analyzed for epidemiological relationships (within and between multiple sites of swine systems), and were compared with pre re-emergent isolates from the U.S. Information for an additional 272 global isolates from the MLST database was utilized for international comparisons. Thirteen nucleotide sequence types (STs) including a predominant genotype (ST93) were identified in the post re-emergent U.S. isolates; some of which showed genetic similarity to the pre re-emergent STs thereby suggesting its likely role in the re-emergence of SD. In the U.S., in general, no more than one ST was found on a site; multiple sites of a common system shared a ST; and STs found in the U.S. were distinct from those identified globally. Of the 110 STs characterized from ten countries, only two were found in more than one country. The U.S. and global populations, identified as clonal and heterogeneous based on STs, showed close relatedness based on amino acid types (AATs). One predicted founder type (AAT9) and multiple predicted subgroup founder types identified for both the U.S. and the global population indicate the potential microevolution of this pathogen. This study elucidates the strain diversity and microevolution of B. hyodysenteriae, and highlights the utility of MLST for epidemiological and surveillance studies.

  8. Genomic Definition of Hypervirulent and Multidrug-Resistant Klebsiella pneumoniae Clonal Groups

    PubMed Central

    Bialek-Davenet, Suzanne; Criscuolo, Alexis; Ailloud, Florent; Passet, Virginie; Jones, Louis; Delannoy-Vieillard, Anne-Sophie; Garin, Benoit; Le Hello, Simon; Arlet, Guillaume; Nicolas-Chanoine, Marie-Hélène; Decré, Dominique

    2014-01-01

    Multidrug-resistant and highly virulent Klebsiella pneumoniae isolates are emerging, but the clonal groups (CGs) corresponding to these high-risk strains have remained imprecisely defined. We aimed to identify K. pneumoniae CGs on the basis of genome-wide sequence variation and to provide a simple bioinformatics tool to extract virulence and resistance gene data from genomic data. We sequenced 48 K. pneumoniae isolates, mostly of serotypes K1 and K2, and compared the genomes with 119 publicly available genomes. A total of 694 highly conserved genes were included in a core-genome multilocus sequence typing scheme, and cluster analysis of the data enabled precise definition of globally distributed hypervirulent and multidrug-resistant CGs. In addition, we created a freely accessible database, BIGSdb-Kp, to enable rapid extraction of medically and epidemiologically relevant information from genomic sequences of K. pneumoniae. Although drug-resistant and virulent K. pneumoniae populations were largely nonoverlapping, isolates with combined virulence and resistance features were detected. PMID:25341126

  9. 3D RNA and functional interactions from evolutionary couplings

    PubMed Central

    Weinreb, Caleb; Riesselman, Adam; Ingraham, John B.; Gross, Torsten; Sander, Chris; Marks, Debora S.

    2016-01-01

    Summary Non-coding RNAs are ubiquitous, but the discovery of new RNA gene sequences far outpaces research on their structure and functional interactions. We mine the evolutionary sequence record to derive precise information about function and structure of RNAs and RNA-protein complexes. As in protein structure prediction, we use maximum entropy global probability models of sequence co-variation to infer evolutionarily constrained nucleotide-nucleotide interactions within RNA molecules, and nucleotide-amino acid interactions in RNA-protein complexes. The predicted contacts allow all-atom blinded 3D structure prediction at good accuracy for several known RNA structures and RNA-protein complexes. For unknown structures, we predict contacts in 160 non-coding RNA families. Beyond 3D structure prediction, evolutionary couplings help identify important functional interactions, e.g., at switch points in riboswitches and at a complex nucleation site in HIV. Aided by accelerating sequence accumulation, evolutionary coupling analysis can accelerate the discovery of functional interactions and 3D structures involving RNA. PMID:27087444

  10. Three ingredients for Improved global aftershock forecasts: Tectonic region, time-dependent catalog incompleteness, and inter-sequence variability

    USGS Publications Warehouse

    Page, Morgan T.; Van Der Elst, Nicholas; Hardebeck, Jeanne L.; Felzer, Karen; Michael, Andrew J.

    2016-01-01

    Following a large earthquake, seismic hazard can be orders of magnitude higher than the long‐term average as a result of aftershock triggering. Because of this heightened hazard, emergency managers and the public demand rapid, authoritative, and reliable aftershock forecasts. In the past, U.S. Geological Survey (USGS) aftershock forecasts following large global earthquakes have been released on an ad hoc basis with inconsistent methods, and in some cases aftershock parameters adapted from California. To remedy this, the USGS is currently developing an automated aftershock product based on the Reasenberg and Jones (1989) method that will generate more accurate forecasts. To better capture spatial variations in aftershock productivity and decay, we estimate regional aftershock parameters for sequences within the García et al. (2012) tectonic regions. We find that regional variations for mean aftershock productivity reach almost a factor of 10. We also develop a method to account for the time‐dependent magnitude of completeness following large events in the catalog. In addition to estimating average sequence parameters within regions, we develop an inverse method to estimate the intersequence parameter variability. This allows for a more complete quantification of the forecast uncertainties and Bayesian updating of the forecast as sequence‐specific information becomes available.

  11. Twenty-first century vaccinomics innovation systems: capacity building in the global South and the role of Product Development Partnerships (PDPs).

    PubMed

    Huzair, Farah; Borda-Rodriguez, Alexander; Upton, Mary

    2011-09-01

    The availability of sequence information from publicly available complete genomes and data intensive sciences, together with next-generation sequencing technologies offer substantial promise for innovation in vaccinology and global public health in the beginning of the 21st century. This article presents an innovation analysis for the nascent field of vaccinomics by describing one of the major challenges in this endeavor: the need for capacities in "vaccinomics innovation systems" to support the developing countries involved in the creation and testing of new vaccines. In particular, we discuss the need for understanding how institutional frameworks can enhance capacities as intrinsic to a systems approach to health technology development. We focus our attention on the global South, meaning the technically less advanced and developing nations in Africa, Asia, and Latin America. This focus is timely and appropriate because the challenge for innovation in postgenomics medicine is markedly much greater in these regions where basic infrastructures are often underresourced and new or the anticipated institutional relationships can be fragile. Importantly, we examine the role of Product Development Partnerships (PDPs) as a 21st century organizational innovation that contributes to strengthening fragile institutions and capacity building. For vaccinomics innovation systems to stand the test of time in a context of global public health, local communities, knowledge, and cultures need to be collectively taken into account at all stages in programs for vaccinomics-guided vaccine development and delivery in the global South where the public health needs for rational vaccine development are urgent.

  12. Data Release: DNA barcodes of plant species collected for the Global Genome Initiative for Gardens Program, National Museum of Natural History, Smithsonian Institution

    PubMed Central

    Zúñiga, Jose D.; Gostel, Morgan R.; Mulcahy, Daniel G.; Barker, Katharine; Asia Hill; Sedaghatpour, Maryam; Vo, Samantha Q.; Funk, Vicki A.; Coddington, Jonathan A.

    2017-01-01

    Abstract The Global Genome Initiative has sequenced and released 1961 DNA barcodes for genetic samples obtained as part of the Global Genome Initiative for Gardens Program. The dataset includes barcodes for 29 plant families and 309 genera that did not have sequences flagged as barcodes in GenBank and sequences from officially recognized barcoding genetic markers meet the data standard of the Consortium for the Barcode of Life. The genetic samples were deposited in the Smithsonian Institution’s National Museum of Natural History Biorepository and their records were made public through the Global Genome Biodiversity Network’s portal. The DNA barcodes are now available on GenBank. PMID:29118648

  13. Analysis of correlated mutations in HIV-1 protease using spectral clustering.

    PubMed

    Liu, Ying; Eyal, Eran; Bahar, Ivet

    2008-05-15

    The ability of human immunodeficiency virus-1 (HIV-1) protease to develop mutations that confer multi-drug resistance (MDR) has been a major obstacle in designing rational therapies against HIV. Resistance is usually imparted by a cooperative mechanism that can be elucidated by a covariance analysis of sequence data. Identification of such correlated substitutions of amino acids may be obscured by evolutionary noise. HIV-1 protease sequences from patients subjected to different specific treatments (set 1), and from untreated patients (set 2) were subjected to sequence covariance analysis by evaluating the mutual information (MI) between all residue pairs. Spectral clustering of the resulting covariance matrices disclosed two distinctive clusters of correlated residues: the first, observed in set 1 but absent in set 2, contained residues involved in MDR acquisition; and the second, included those residues differentiated in the various HIV-1 protease subtypes, shortly referred to as the phylogenetic cluster. The MDR cluster occupies sites close to the central symmetry axis of the enzyme, which overlap with the global hinge region identified from coarse-grained normal-mode analysis of the enzyme structure. The phylogenetic cluster, on the other hand, occupies solvent-exposed and highly mobile regions. This study demonstrates (i) the possibility of distinguishing between the correlated substitutions resulting from neutral mutations and those induced by MDR upon appropriate clustering analysis of sequence covariance data and (ii) a connection between global dynamics and functional substitution of amino acids.

  14. Analysis of the global transcriptome of longan (Dimocarpus longan Lour.) embryogenic callus using Illumina paired-end sequencing

    PubMed Central

    2013-01-01

    Background Longan is a tropical/subtropical fruit tree of great economic importance in Southeast Asia. Progress in understanding molecular mechanisms of longan embryogenesis, which is the primary influence on fruit quality and yield, is slowed by lack of transcriptomic and genomic information. Illumina second generation sequencing, which is suitable for generating enormous numbers of transcript sequences that can be used for functional genomic analysis of longan. Results In this study, a longan embryogenic callus (EC) cDNA library was sequenced using an Illumina HiSeq 2000 system. A total of 64,876,258 clean reads comprising 5.84 Gb of nucleotides were assembled into 68,925 unigenes of 448-bp mean length, with unigenes ≥1000 bp accounting for 8.26% of the total. Using BLASTx, 40,634 unigenes were found to have significant similarity with accessions in Nr and Swiss- Prot databases. Of these, 38,845 unigenes were assigned to 43 GO sub-categories and 17,118 unigenes were classified into 25 COG sub-groups. In addition, 17,306 unigenes mapped to 199 KEGG pathways, with the categories of Metabolic pathways, Plant-pathogen interaction, Biosynthesis of secondary metabolites, and Genetic information processing being well represented. Analyses of unigenes ≥1000 bp revealed 328 embryogenesis-related unigenes as well as numerous unigenes expressed in EC associated with functions of reproductive growth, such as flowering, gametophytogenesis, and fertility, and vegetative growth, such as root and shoot growth. Furthermore, 23 unigenes related to embryogenesis and reproductive and vegetative growth were validated by quantitative real time PCR (qPCR) in samples from different stages of longan somatic embryogenesis (SE); their differentially expressions in the various embryogenic cultures indicated their possible roles in longan SE. Conclusions The quantity and variety of expressed EC genes identified in this study is sufficient to serve as a global transcriptome dataset for longan EC and to provide more molecular resources for longan functional genomics. PMID:23957614

  15. Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns

    PubMed Central

    2013-01-01

    Background It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. Results We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. Conclusions The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected sublinear running time of the presented index-based algorithms, allows for the first time approximate matching of RNA sequence-structure patterns in large sequence databases. Beyond the algorithmic contributions, we provide with RaligNAtor a robust and well documented open-source software package implementing the algorithms presented in this manuscript. The RaligNAtor software is available at http://www.zbh.uni-hamburg.de/ralignator. PMID:23865810

  16. Striatal and Hippocampal Involvement in Motor Sequence Chunking Depends on the Learning Strategy

    PubMed Central

    Lungu, Ovidiu; Monchi, Oury; Albouy, Geneviève; Jubault, Thomas; Ballarin, Emanuelle; Burnod, Yves; Doyon, Julien

    2014-01-01

    Motor sequences can be learned using an incremental approach by starting with a few elements and then adding more as training evolves (e.g., learning a piano piece); conversely, one can use a global approach and practice the whole sequence in every training session (e.g., shifting gears in an automobile). Yet, the neural correlates associated with such learning strategies in motor sequence learning remain largely unexplored to date. Here we used functional magnetic resonance imaging to measure the cerebral activity of individuals executing the same 8-element sequence after they completed a 4-days training regimen (2 sessions each day) following either a global or incremental strategy. A network comprised of striatal and fronto-parietal regions was engaged significantly regardless of the learning strategy, whereas the global training regimen led to additional cerebellar and temporal lobe recruitment. Analysis of chunking/grouping of sequence elements revealed a common prefrontal network in both conditions during the chunk initiation phase, whereas execution of chunk cores led to higher mediotemporal activity (involving the hippocampus) after global than incremental training. The novelty of our results relate to the recruitment of mediotemporal regions conditional of the learning strategy. Thus, the present findings may have clinical implications suggesting that the ability of patients with lesions to the medial temporal lobe to learn and consolidate new motor sequences may benefit from using an incremental strategy. PMID:25148078

  17. Striatal and hippocampal involvement in motor sequence chunking depends on the learning strategy.

    PubMed

    Lungu, Ovidiu; Monchi, Oury; Albouy, Geneviève; Jubault, Thomas; Ballarin, Emanuelle; Burnod, Yves; Doyon, Julien

    2014-01-01

    Motor sequences can be learned using an incremental approach by starting with a few elements and then adding more as training evolves (e.g., learning a piano piece); conversely, one can use a global approach and practice the whole sequence in every training session (e.g., shifting gears in an automobile). Yet, the neural correlates associated with such learning strategies in motor sequence learning remain largely unexplored to date. Here we used functional magnetic resonance imaging to measure the cerebral activity of individuals executing the same 8-element sequence after they completed a 4-days training regimen (2 sessions each day) following either a global or incremental strategy. A network comprised of striatal and fronto-parietal regions was engaged significantly regardless of the learning strategy, whereas the global training regimen led to additional cerebellar and temporal lobe recruitment. Analysis of chunking/grouping of sequence elements revealed a common prefrontal network in both conditions during the chunk initiation phase, whereas execution of chunk cores led to higher mediotemporal activity (involving the hippocampus) after global than incremental training. The novelty of our results relate to the recruitment of mediotemporal regions conditional of the learning strategy. Thus, the present findings may have clinical implications suggesting that the ability of patients with lesions to the medial temporal lobe to learn and consolidate new motor sequences may benefit from using an incremental strategy.

  18. In the search for the low-complexity sequences in prokaryotic and eukaryotic genomes: how to derive a coherent picture from global and local entropy measures

    NASA Astrophysics Data System (ADS)

    Acquisti, Claudia; Allegrini, Paolo; Bogani, Patrizia; Buiatti, Marcello; Catanese, Elena; Fronzoni, Leone; Grigolini, Paolo; Mersi, Giuseppe; Palatella, Luigi

    2004-04-01

    We investigate on a possible way to connect the presence of Low-Complexity Sequences (LCS) in DNA genomes and the nonstationary properties of base correlations. Under the hypothesis that these variations signal a change in the DNA function, we use a new technique, called Non-Stationarity Entropic Index (NSEI) method, and we prove that this technique is an efficient way to detect functional changes with respect to a random baseline. The remarkable aspect is that NSEI does not imply any training data or fitting parameter, the only arbitrarity being the choice of a marker in the sequence. We make this choice on the basis of biological information about LCS distributions in genomes. We show that there exists a correlation between changing the amount in LCS and the ratio of long- to short-range correlation.

  19. DNA enrichment approaches to identify unauthorized genetically modified organisms (GMOs).

    PubMed

    Arulandhu, Alfred J; van Dijk, Jeroen P; Dobnik, David; Holst-Jensen, Arne; Shi, Jianxin; Zel, Jana; Kok, Esther J

    2016-07-01

    With the increased global production of different genetically modified (GM) plant varieties, chances increase that unauthorized GM organisms (UGMOs) may enter the food chain. At the same time, the detection of UGMOs is a challenging task because of the limited sequence information that will generally be available. PCR-based methods are available to detect and quantify known UGMOs in specific cases. If this approach is not feasible, DNA enrichment of the unknown adjacent sequences of known GMO elements is one way to detect the presence of UGMOs in a food or feed product. These enrichment approaches are also known as chromosome walking or gene walking (GW). In recent years, enrichment approaches have been coupled with next generation sequencing (NGS) analysis and implemented in, amongst others, the medical and microbiological fields. The present review will provide an overview of these approaches and an evaluation of their applicability in the identification of UGMOs in complex food or feed samples.

  20. Exploiting the explosion of information associated with whole genome sequencing to tackle Shiga toxin-producing Escherichia coli (STEC) in global food production systems.

    PubMed

    Franz, Eelco; Delaquis, Pascal; Morabito, Stefano; Beutin, Lothar; Gobius, Kari; Rasko, David A; Bono, Jim; French, Nigel; Osek, Jacek; Lindstedt, Bjørn-Arne; Muniesa, Maite; Manning, Shannon; LeJeune, Jeff; Callaway, Todd; Beatson, Scott; Eppinger, Mark; Dallman, Tim; Forbes, Ken J; Aarts, Henk; Pearl, David L; Gannon, Victor P J; Laing, Chad R; Strachan, Norval J C

    2014-09-18

    The rates of foodborne disease caused by gastrointestinal pathogens continue to be a concern in both the developed and developing worlds. The growing world population, the increasing complexity of agri-food networks and the wide range of foods now associated with STEC are potential drivers for increased risk of human disease. It is vital that new developments in technology, such as whole genome sequencing (WGS), are effectively utilized to help address the issues associated with these pathogenic microorganisms. This position paper, arising from an OECD funded workshop, provides a brief overview of next generation sequencing technologies and software. It then uses the agent-host-environment paradigm as a basis to investigate the potential benefits and pitfalls of WGS in the examination of (1) the evolution and virulence of STEC, (2) epidemiology from bedside diagnostics to investigations of outbreaks and sporadic cases and (3) food protection from routine analysis of foodstuffs to global food networks. A number of key recommendations are made that include: validation and standardization of acquisition, processing and storage of sequence data including the development of an open access "WGSNET"; building up of sequence databases from both prospective and retrospective isolates; development of a suite of open-access software specific for STEC accessible to non-bioinformaticians that promotes understanding of both the computational and biological aspects of the problems at hand; prioritization of research funding to both produce and integrate genotypic and phenotypic information suitable for risk assessment; training to develop a supply of individuals working in bioinformatics/software development; training for clinicians, epidemiologists, the food industry and other stakeholders to ensure uptake of the technology and finally review of progress of implementation of WGS. Currently the benefits of WGS are being slowly teased out by academic, government, and industry or private sector researchers around the world. The next phase will require a coordinated international approach to ensure that it's potential to contribute to the challenge of STEC disease can be realized in a cost effective and timely manner. Copyright © 2014. Published by Elsevier B.V.

  1. Insights into the phylogeny of Northern Hemisphere Armillaria: Neighbor-net and Bayesian analyses of translation elongation factor 1-α gene sequences.

    PubMed

    Klopfenstein, Ned B; Stewart, Jane E; Ota, Yuko; Hanna, John W; Richardson, Bryce A; Ross-Davis, Amy L; Elías-Román, Rubén D; Korhonen, Kari; Keča, Nenad; Iturritxa, Eugenia; Alvarado-Rosales, Dionicio; Solheim, Halvor; Brazee, Nicholas J; Łakomy, Piotr; Cleary, Michelle R; Hasegawa, Eri; Kikuchi, Taisei; Garza-Ocañas, Fortunato; Tsopelas, Panaghiotis; Rigling, Daniel; Prospero, Simone; Tsykun, Tetyana; Bérubé, Jean A; Stefani, Franck O P; Jafarpour, Saeideh; Antonín, Vladimír; Tomšovský, Michal; McDonald, Geral I; Woodward, Stephen; Kim, Mee-Sook

    2017-01-01

    Armillaria possesses several intriguing characteristics that have inspired wide interest in understanding phylogenetic relationships within and among species of this genus. Nuclear ribosomal DNA sequence-based analyses of Armillaria provide only limited information for phylogenetic studies among widely divergent taxa. More recent studies have shown that translation elongation factor 1-α (tef1) sequences are highly informative for phylogenetic analysis of Armillaria species within diverse global regions. This study used Neighbor-net and coalescence-based Bayesian analyses to examine phylogenetic relationships of newly determined and existing tef1 sequences derived from diverse Armillaria species from across the Northern Hemisphere, with Southern Hemisphere Armillaria species included for reference. Based on the Bayesian analysis of tef1 sequences, Armillaria species from the Northern Hemisphere are generally contained within the following four superclades, which are named according to the specific epithet of the most frequently cited species within the superclade: (i) Socialis/Tabescens (exannulate) superclade including Eurasian A. ectypa, North American A. socialis (A. tabescens), and Eurasian A. socialis (A. tabescens) clades; (ii) Mellea superclade including undescribed annulate North American Armillaria sp. (Mexico) and four separate clades of A. mellea (Europe and Iran, eastern Asia, and two groups from North America); (iii) Gallica superclade including Armillaria Nag E (Japan), multiple clades of A. gallica (Asia and Europe), A. calvescens (eastern North America), A. cepistipes (North America), A. altimontana (western USA), A. nabsnona (North America and Japan), and at least two A. gallica clades (North America); and (iv) Solidipes/Ostoyae superclade including two A. solidipes/ostoyae clades (North America), A. gemina (eastern USA), A. solidipes/ostoyae (Eurasia), A. cepistipes (Europe and Japan), A. sinapina (North America and Japan), and A. borealis (Eurasia) clade 2. Of note is that A. borealis (Eurasia) clade 1 appears basal to the Solidipes/Ostoyae and Gallica superclades. The Neighbor-net analysis showed similar phylogenetic relationships. This study further demonstrates the utility of tef1 for global phylogenetic studies of Armillaria species and provides critical insights into multiple taxonomic issues that warrant further study.

  2. Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA

    NASA Astrophysics Data System (ADS)

    Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.

    2017-07-01

    DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.

  3. Ethical issues in consumer genome sequencing: Use of consumers' samples and data

    PubMed Central

    Niemiec, Emilia; Howard, Heidi Carmen

    2016-01-01

    High throughput approaches such as whole genome sequencing (WGS) and whole exome sequencing (WES) create an unprecedented amount of data providing powerful resources for clinical care and research. Recently, WGS and WES services have been made available by commercial direct-to-consumer (DTC) companies. The DTC offer of genetic testing (GT) has already brought attention to potentially problematic issues such as the adequacy of consumers' informed consent and transparency of companies' research activities. In this study, we analysed the websites of four DTC GT companies offering WGS and/or WES with regard to their policies governing storage and future use of consumers' data and samples. The results are discussed in relation to recommendations and guiding principles such as the “Statement of the European Society of Human Genetics on DTC GT for health-related purposes” (2010) and the “Framework for responsible sharing of genomic and health-related data” (Global Alliance for Genomics and Health, 2014). The analysis reveals that some companies may store and use consumers' samples or sequencing data for unspecified research and share the data with third parties. Moreover, the companies do not provide sufficient or clear information to consumers about this, which can undermine the validity of the consent process. Furthermore, while all companies state that they provide privacy safeguards for data and mention the limitations of these, information about the possibility of re-identification is lacking. Finally, although the companies that may conduct research do include information regarding proprietary claims and commercialisation of the results, it is not clear whether consumers are aware of the consequences of these policies. These results indicate that DTC GT companies still need to improve the transparency regarding handling of consumers' samples and data, including having an explicit and clear consent process for research activities. PMID:27047756

  4. Identification and analysis of multigene families by comparison of exon fingerprints.

    PubMed

    Brown, N P; Whittaker, A J; Newell, W R; Rawlings, C J; Beck, S

    1995-06-02

    Gene families are often recognised by sequence homology using similarity searching to find relationships, however, genomic sequence data provides gene architectural information not used by conventional search methods. In particular, intron positions and phases are expected to be relatively conserved features, because mis-splicing and reading frame shifts should be selected against. A fast search technique capable of detecting possible weak sequence homologies apparent at the intron/exon level of gene organization is presented for comparing spliceosomal genes and gene fragments. FINEX compares strings of exons delimited by intron/exon boundary positions and intron phases (exon fingerprint) using a global dynamic programming algorithm with a combined intron phase identity and exon size dissimilarity score. Exon fingerprints are typically two orders of magnitude smaller than their nucleic acid sequence counterparts giving rise to fast search times: a ranked search against a library of 6755 fingerprints for a typical three exon fingerprint completes in under 30 seconds on an ordinary workstation, while a worst case largest fingerprint of 52 exons completes in just over one minute. The short "sequence" length of exon fingerprints in comparisons is compensated for by the large exon alphabet compounded of intron phase types and a wide range of exon sizes, the latter contributing the most information to alignments. FINEX performs better in some searches than conventional methods, finding matches with similar exon organization, but low sequence homology. A search using a human serum albumin finds all members of the multigene family in the FINEX database at the top of the search ranking, despite very low amino acid percentage identities between family members. The method should complement conventional sequence searching and alignment techniques, offering a means of identifying otherwise hard to detect homologies where genomic data are available.

  5. A draft annotation and overview of the human genome

    PubMed Central

    Wright, Fred A; Lemon, William J; Zhao, Wei D; Sears, Russell; Zhuo, Degen; Wang, Jian-Ping; Yang, Hee-Yung; Baer, Troy; Stredney, Don; Spitzner, Joe; Stutz, Al; Krahe, Ralf; Yuan, Bo

    2001-01-01

    Background The recent draft assembly of the human genome provides a unified basis for describing genomic structure and function. The draft is sufficiently accurate to provide useful annotation, enabling direct observations of previously inferred biological phenomena. Results We report here a functionally annotated human gene index placed directly on the genome. The index is based on the integration of public transcript, protein, and mapping information, supplemented with computational prediction. We describe numerous global features of the genome and examine the relationship of various genetic maps with the assembly. In addition, initial sequence analysis reveals highly ordered chromosomal landscapes associated with paralogous gene clusters and distinct functional compartments. Finally, these annotation data were synthesized to produce observations of gene density and number that accord well with historical estimates. Such a global approach had previously been described only for chromosomes 21 and 22, which together account for 2.2% of the genome. Conclusions We estimate that the genome contains 65,000-75,000 transcriptional units, with exon sequences comprising 4%. The creation of a comprehensive gene index requires the synthesis of all available computational and experimental evidence. PMID:11516338

  6. Building toy models of proteins using coevolutionary information

    NASA Astrophysics Data System (ADS)

    Cheng, Ryan; Raghunathan, Mohit; Onuchic, Jose

    2015-03-01

    Recent developments in global statistical methodologies have advanced the analysis of large collections of protein sequences for coevolutionary information. Coevolution between amino acids in a protein arises from compensatory mutations that are needed to maintain the stability or function of a protein over the course of evolution. This gives rise to quantifiable correlations between amino acid positions within the multiple sequence alignment of a protein family. Here, we use Direct Coupling Analysis (DCA) to infer a Potts model Hamiltonian governing the correlated mutations in a protein family to obtain the sequence-dependent interaction energies of a toy protein model. We demonstrate that this methodology predicts residue-residue interaction energies that are consistent with experimental mutational changes in protein stabilities as well as other computational methodologies. Furthermore, we demonstrate with several examples that DCA could be used to construct a structure-based model that quantitatively agrees with experimental data on folding mechanisms. This work serves as a potential framework for generating models of proteins that are enriched by evolutionary data that can potentially be used to engineer key functional motions and interactions in protein systems. This research has been supported by the NSF INSPIRE award MCB-1241332 and by the CTBP sponsored by the NSF (Grant PHY-1427654).

  7. Cortical neurons of bats respond best to echoes from nearest targets when listening to natural biosonar multi-echo streams.

    PubMed

    Beetz, M Jerome; Hechavarría, Julio C; Kössl, Manfred

    2016-10-27

    Bats orientate in darkness by listening to echoes from their biosonar calls, a behaviour known as echolocation. Recent studies showed that cortical neurons respond in a highly selective manner when stimulated with natural echolocation sequences that contain echoes from single targets. However, it remains unknown how cortical neurons process echolocation sequences containing echo information from multiple objects. In the present study, we used echolocation sequences containing echoes from three, two or one object separated in the space depth as stimuli to study neuronal activity in the bat auditory cortex. Neuronal activity was recorded with multi-electrode arrays placed in the dorsal auditory cortex, where neurons tuned to target-distance are found. Our results show that target-distance encoding neurons are mostly selective to echoes coming from the closest object, and that the representation of echo information from distant objects is selectively suppressed. This suppression extends over a large part of the dorsal auditory cortex and may override possible parallel processing of multiple objects. The presented data suggest that global cortical suppression might establish a cortical "default mode" that allows selectively focusing on close obstacle even without active attention from the animals.

  8. Cortical neurons of bats respond best to echoes from nearest targets when listening to natural biosonar multi-echo streams

    PubMed Central

    Beetz, M. Jerome; Hechavarría, Julio C.; Kössl, Manfred

    2016-01-01

    Bats orientate in darkness by listening to echoes from their biosonar calls, a behaviour known as echolocation. Recent studies showed that cortical neurons respond in a highly selective manner when stimulated with natural echolocation sequences that contain echoes from single targets. However, it remains unknown how cortical neurons process echolocation sequences containing echo information from multiple objects. In the present study, we used echolocation sequences containing echoes from three, two or one object separated in the space depth as stimuli to study neuronal activity in the bat auditory cortex. Neuronal activity was recorded with multi-electrode arrays placed in the dorsal auditory cortex, where neurons tuned to target-distance are found. Our results show that target-distance encoding neurons are mostly selective to echoes coming from the closest object, and that the representation of echo information from distant objects is selectively suppressed. This suppression extends over a large part of the dorsal auditory cortex and may override possible parallel processing of multiple objects. The presented data suggest that global cortical suppression might establish a cortical “default mode” that allows selectively focusing on close obstacle even without active attention from the animals. PMID:27786252

  9. ocsESTdb: a database of oil crop seed EST sequences for comparative analysis and investigation of a global metabolic network and oil accumulation metabolism.

    PubMed

    Ke, Tao; Yu, Jingyin; Dong, Caihua; Mao, Han; Hua, Wei; Liu, Shengyi

    2015-01-21

    Oil crop seeds are important sources of fatty acids (FAs) for human and animal nutrition. Despite their importance, there is a lack of an essential bioinformatics resource on gene transcription of oil crops from a comparative perspective. In this study, we developed ocsESTdb, the first database of expressed sequence tag (EST) information on seeds of four large-scale oil crops with an emphasis on global metabolic networks and oil accumulation metabolism that target the involved unigenes. A total of 248,522 ESTs and 106,835 unigenes were collected from the cDNA libraries of rapeseed (Brassica napus), soybean (Glycine max), sesame (Sesamum indicum) and peanut (Arachis hypogaea). These unigenes were annotated by a sequence similarity search against databases including TAIR, NR protein database, Gene Ontology, COG, Swiss-Prot, TrEMBL and Kyoto Encyclopedia of Genes and Genomes (KEGG). Five genome-scale metabolic networks that contain different numbers of metabolites and gene-enzyme reaction-association entries were analysed and constructed using Cytoscape and yEd programs. Details of unigene entries, deduced amino acid sequences and putative annotation are available from our database to browse, search and download. Intuitive and graphical representations of EST/unigene sequences, functional annotations, metabolic pathways and metabolic networks are also available. ocsESTdb will be updated regularly and can be freely accessed at http://ocri-genomics.org/ocsESTdb/ . ocsESTdb may serve as a valuable and unique resource for comparative analysis of acyl lipid synthesis and metabolism in oilseed plants. It also may provide vital insights into improving oil content in seeds of oil crop species by transcriptional reconstruction of the metabolic network.

  10. Global DNA methylation analysis using methyl-sensitive amplification polymorphism (MSAP).

    PubMed

    Yaish, Mahmoud W; Peng, Mingsheng; Rothstein, Steven J

    2014-01-01

    DNA methylation is a crucial epigenetic process which helps control gene transcription activity in eukaryotes. Information regarding the methylation status of a regulatory sequence of a particular gene provides important knowledge of this transcriptional control. DNA methylation can be detected using several methods, including sodium bisulfite sequencing and restriction digestion using methylation-sensitive endonucleases. Methyl-Sensitive Amplification Polymorphism (MSAP) is a technique used to study the global DNA methylation status of an organism and hence to distinguish between two individuals based on the DNA methylation status determined by the differential digestion pattern. Therefore, this technique is a useful method for DNA methylation mapping and positional cloning of differentially methylated genes. In this technique, genomic DNA is first digested with a methylation-sensitive restriction enzyme such as HpaII, and then the DNA fragments are ligated to adaptors in order to facilitate their amplification. Digestion using a methylation-insensitive isoschizomer of HpaII, MspI is used in a parallel digestion reaction as a loading control in the experiment. Subsequently, these fragments are selectively amplified by fluorescently labeled primers. PCR products from different individuals are compared, and once an interesting polymorphic locus is recognized, the desired DNA fragment can be isolated from a denaturing polyacrylamide gel, sequenced and identified based on DNA sequence similarity to other sequences available in the database. We will use analysis of met1, ddm1, and atmbd9 mutants and wild-type plants treated with a cytidine analogue, 5-azaC, or zebularine to demonstrate how to assess the genetic modulation of DNA methylation in Arabidopsis. It should be noted that despite the fact that MSAP is a reliable technique used to fish for polymorphic methylated loci, its power is limited to the restriction recognition sites of the enzymes used in the genomic DNA digestion.

  11. Recognition of coarse-grained protein tertiary structure.

    PubMed

    Lezon, Timothy; Banavar, Jayanth R; Maritan, Amos

    2004-05-15

    A model of the protein backbone is considered in which each residue is characterized by the location of its C(alpha) atom and one of a discrete set of conformal (phi, psi) states. We investigate the key differences between a description that offers a locally precise fit to known backbone structures and one that provides a globally accurate fit to protein structures. Using a statistical scoring scheme and threading, a protein's local best-fit conformation is highly recognizable, but its global structure cannot be directly determined from an amino acid sequence. The incorporation of information about the conformal states of neighboring residues along the chain allows one to accurately translate the local structure into a global structure. We present a two-step algorithm, which recognizes up to 95% of the tested protein native-state structures to within a 2.5 A root mean square deviation. Copyright 2004 Wiley-Liss, Inc.

  12. The Global Genome Biodiversity Network (GGBN) Data Standard specification

    PubMed Central

    Droege, G.; Barker, K.; Seberg, O.; Coddington, J.; Benson, E.; Berendsohn, W. G.; Bunk, B.; Butler, C.; Cawsey, E. M.; Deck, J.; Döring, M.; Flemons, P.; Gemeinholzer, B.; Güntsch, A.; Hollowell, T.; Kelbert, P.; Kostadinov, I.; Kottmann, R.; Lawlor, R. T.; Lyal, C.; Mackenzie-Dodds, J.; Meyer, C.; Mulcahy, D.; Nussbeck, S. Y.; O'Tuama, É.; Orrell, T.; Petersen, G.; Robertson, T.; Söhngen, C.; Whitacre, J.; Wieczorek, J.; Yilmaz, P.; Zetzsche, H.; Zhang, Y.; Zhou, X.

    2016-01-01

    Genomic samples of non-model organisms are becoming increasingly important in a broad range of studies from developmental biology, biodiversity analyses, to conservation. Genomic sample definition, description, quality, voucher information and metadata all need to be digitized and disseminated across scientific communities. This information needs to be concise and consistent in today’s ever-increasing bioinformatic era, for complementary data aggregators to easily map databases to one another. In order to facilitate exchange of information on genomic samples and their derived data, the Global Genome Biodiversity Network (GGBN) Data Standard is intended to provide a platform based on a documented agreement to promote the efficient sharing and usage of genomic sample material and associated specimen information in a consistent way. The new data standard presented here build upon existing standards commonly used within the community extending them with the capability to exchange data on tissue, environmental and DNA sample as well as sequences. The GGBN Data Standard will reveal and democratize the hidden contents of biodiversity biobanks, for the convenience of everyone in the wider biobanking community. Technical tools exist for data providers to easily map their databases to the standard. Database URL: http://terms.tdwg.org/wiki/GGBN_Data_Standard PMID:27694206

  13. Oligonucleotide indexing of DNA barcodes: identification of tuna and other scombrid species in food products.

    PubMed

    Botti, Sara; Giuffra, Elisabetta

    2010-08-23

    DNA barcodes are a global standard for species identification and have countless applications in the medical, forensic and alimentary fields, but few barcoding methods work efficiently in samples in which DNA is degraded, e.g. foods and archival specimens. This limits the choice of target regions harbouring a sufficient number of diagnostic polymorphisms. The method described here uses existing PCR and sequencing methodologies to detect mitochondrial DNA polymorphisms in complex matrices such as foods. The reported application allowed the discrimination among 17 fish species of the Scombridae family with high commercial interest such as mackerels, bonitos and tunas which are often present in processed seafood. The approach can be easily upgraded with the release of new genetic diversity information to increase the range of detected species. Cocktail of primers are designed for PCR using publicly available sequences of the target sequence. They are composed of a fixed 5' region and of variable 3' cocktail portions that allow amplification of any member of a group of species of interest. The population of short amplicons is directly sequenced and indexed using primers containing a longer 5' region and the non polymorphic portion of the cocktail portion. A 226 bp region of CytB was selected as target after collection and screening of 148 online sequences; 85 SNPs were found, of which 75 were present in at least two sequences. Primers were also designed for two shorter sub-fragments that could be amplified from highly degraded samples. The test was used on 103 samples of seafood (canned tuna and scomber, tuna salad, tuna sauce) and could successfully detect the presence of different or additional species that were not identified on the labelling of canned tuna, tuna salad and sauce samples. The described method is largely independent of the degree of degradation of DNA source and can thus be applied to processed seafood. Moreover, the method is highly flexible: publicly available sequence information on mitochondrial genomes are rapidly increasing for most species, facilitating the choice of target sequences and the improvement of resolution of the test. This is particularly important for discrimination of marine and aquaculture species for which genome information is still limited.

  14. DNA Barcode Sequence Identification Incorporating Taxonomic Hierarchy and within Taxon Variability

    PubMed Central

    Little, Damon P.

    2011-01-01

    For DNA barcoding to succeed as a scientific endeavor an accurate and expeditious query sequence identification method is needed. Although a global multiple–sequence alignment can be generated for some barcoding markers (e.g. COI, rbcL), not all barcoding markers are as structurally conserved (e.g. matK). Thus, algorithms that depend on global multiple–sequence alignments are not universally applicable. Some sequence identification methods that use local pairwise alignments (e.g. BLAST) are unable to accurately differentiate between highly similar sequences and are not designed to cope with hierarchic phylogenetic relationships or within taxon variability. Here, I present a novel alignment–free sequence identification algorithm–BRONX–that accounts for observed within taxon variability and hierarchic relationships among taxa. BRONX identifies short variable segments and corresponding invariant flanking regions in reference sequences. These flanking regions are used to score variable regions in the query sequence without the production of a global multiple–sequence alignment. By incorporating observed within taxon variability into the scoring procedure, misidentifications arising from shared alleles/haplotypes are minimized. An explicit treatment of more inclusive terminals allows for separate identifications to be made for each taxonomic level and/or for user–defined terminals. BRONX performs better than all other methods when there is imperfect overlap between query and reference sequences (e.g. mini–barcode queries against a full–length barcode database). BRONX consistently produced better identifications at the genus–level for all query types. PMID:21857897

  15. Impact of geostationary satellite water vapor channel data on weather analysis and forecasting

    NASA Technical Reports Server (NTRS)

    Velden, Christopher S.

    1995-01-01

    Preliminary results from NWP impact studies are indicating that upper-tropospheric wind information provided by tracking motions in sequences of geostationary satellite water vapor imagery can positively influence forecasts on regional scales, and possibly on global scales as well. The data are complimentary to cloud-tracked winds by providing data in cloud-free regions, as well as comparable in quality. First results from GOES-8 winds are encouraging, and further efforts and model impacts will be directed towards optimizing these data in numerical weather prediction (NWP). Assuming successful launches of GOES-J and GMS-5 satellites in 1995, high quality and resolution water vapor imagers will be available to provide nearly complete global upper-tropospheric wind coverage.

  16. An extended genotyping framework for Salmonella enterica serovar Typhi, the cause of human typhoid

    PubMed Central

    Wong, Vanessa K.; Baker, Stephen; Connor, Thomas R.; Pickard, Derek; Page, Andrew J.; Dave, Jayshree; Murphy, Niamh; Holliman, Richard; Sefton, Armine; Millar, Michael; Dyson, Zoe A.; Dougan, Gordon; Holt, Kathryn E.; Parkhill, Julian; Feasey, Nicholas A.; Kingsley, Robert A.; Thomson, Nicholas R.; Keane, Jacqueline A.; Weill, François- Xavier; Le Hello, Simon; Hawkey, Jane; Edwards, David J.; Harris, Simon R.; Cain, Amy K.; Hadfield, James; Hart, Peter J.; Thieu, Nga Tran Vu; Klemm, Elizabeth J.; Breiman, Robert F.; Watson, Conall H.; Edmunds, W. John; Kariuki, Samuel; Gordon, Melita A.; Heyderman, Robert S.; Okoro, Chinyere; Jacobs, Jan; Lunguya, Octavie; Msefula, Chisomo; Chabalgoity, Jose A.; Kama, Mike; Jenkins, Kylie; Dutta, Shanta; Marks, Florian; Campos, Josefina; Thompson, Corinne; Obaro, Stephen; MacLennan, Calman A.; Dolecek, Christiane; Keddy, Karen H.; Smith, Anthony M.; Parry, Christopher M.; Karkey, Abhilasha; Dongol, Sabina; Basnyat, Buddha; Arjyal, Amit; Mulholland, E. Kim; Campbell, James I.; Dufour, Muriel; Bandaranayake, Don; Toleafoa, Take N.; Singh, Shalini Pravin; Hatta, Mochammad; Newton, Paul N.; Dance, David; Davong, Viengmon; Onsare, Robert S.; Isaia, Lupeoletalalelei; Thwaites, Guy; Wijedoru, Lalith; Crump, John A.; De Pinna, Elizabeth; Nair, Satheesh; Nilles, Eric J.; Thanh, Duy Pham; Turner, Paul; Soeng, Sona; Valcanis, Mary; Powling, Joan; Dimovski, Karolina; Hogg, Geoff; Farrar, Jeremy; Mather, Alison E.; Amos, Ben

    2016-01-01

    The population of Salmonella enterica serovar Typhi (S. Typhi), the causative agent of typhoid fever, exhibits limited DNA sequence variation, which complicates efforts to rationally discriminate individual isolates. Here we utilize data from whole-genome sequences (WGS) of nearly 2,000 isolates sourced from over 60 countries to generate a robust genotyping scheme that is phylogenetically informative and compatible with a range of assays. These data show that, with the exception of the rapidly disseminating H58 subclade (now designated genotype 4.3.1), the global S. Typhi population is highly structured and includes dozens of subclades that display geographical restriction. The genotyping approach presented here can be used to interrogate local S. Typhi populations and help identify recent introductions of S. Typhi into new or previously endemic locations, providing information on their likely geographical source. This approach can be used to classify clinical isolates and provides a universal framework for further experimental investigations. PMID:27703135

  17. Protein structure and evolution: are they constrained globally by a principle derived from information theory?

    PubMed

    Hatton, Leslie; Warr, Gregory

    2015-01-01

    That the physicochemical properties of amino acids constrain the structure, function and evolution of proteins is not in doubt. However, principles derived from information theory may also set bounds on the structure (and thus also the evolution) of proteins. Here we analyze the global properties of the full set of proteins in release 13-11 of the SwissProt database, showing by experimental test of predictions from information theory that their collective structure exhibits properties that are consistent with their being guided by a conservation principle. This principle (Conservation of Information) defines the global properties of systems composed of discrete components each of which is in turn assembled from discrete smaller pieces. In the system of proteins, each protein is a component, and each protein is assembled from amino acids. Central to this principle is the inter-relationship of the unique amino acid count and total length of a protein and its implications for both average protein length and occurrence of proteins with specific unique amino acid counts. The unique amino acid count is simply the number of distinct amino acids (including those that are post-translationally modified) that occur in a protein, and is independent of the number of times that the particular amino acid occurs in the sequence. Conservation of Information does not operate at the local level (it is independent of the physicochemical properties of the amino acids) where the influences of natural selection are manifest in the variety of protein structure and function that is well understood. Rather, this analysis implies that Conservation of Information would define the global bounds within which the whole system of proteins is constrained; thus it appears to be acting to constrain evolution at a level different from natural selection, a conclusion that appears counter-intuitive but is supported by the studies described herein.

  18. Global and local pitch perception in children with developmental dyslexia.

    PubMed

    Ziegler, Johannes C; Pech-Georgel, Catherine; George, Florence; Foxton, Jessica M

    2012-03-01

    This study investigated global versus local pitch pattern perception in children with dyslexia aged between 8 and 11 years. Children listened to two consecutive 4-tone pitch sequences while performing a same/different task. On the different trials, sequences either preserved the contour (local condition) or they violated the contour (global condition). Compared to normally developing children, dyslexics showed robust pitch perception deficits in the local but not the global condition. This finding was replicated in a simple pitch direction task, which minimizes sequencing and short term memory. Results are consistent with a left-hemisphere deficit in dyslexia because local pitch changes are supposedly processed by the left hemisphere, whereas global pitch changes are processed by the right hemisphere. The present data suggest a link between impaired pitch processing and abnormal phonological development in children with dyslexia, which makes pitch pattern processing a potent tool for early diagnosis and remediation of dyslexia. Copyright © 2011 Elsevier Inc. All rights reserved.

  19. Taxonomic evaluation of selected Ganoderma species and database sequence validation

    PubMed Central

    Jargalmaa, Suldbold; Eimes, John A.; Park, Myung Soo; Park, Jae Young; Oh, Seung-Yoon

    2017-01-01

    Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II). These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species. PMID:28761785

  20. The identification of complete domains within protein sequences using accurate E-values for semi-global alignment

    PubMed Central

    Kann, Maricel G.; Sheetlin, Sergey L.; Park, Yonil; Bryant, Stephen H.; Spouge, John L.

    2007-01-01

    The sequencing of complete genomes has created a pressing need for automated annotation of gene function. Because domains are the basic units of protein function and evolution, a gene can be annotated from a domain database by aligning domains to the corresponding protein sequence. Ideally, complete domains are aligned to protein subsequences, in a ‘semi-global alignment’. Local alignment, which aligns pieces of domains to subsequences, is common in high-throughput annotation applications, however. It is a mature technique, with the heuristics and accurate E-values required for screening large databases and evaluating the screening results. Hidden Markov models (HMMs) provide an alternative theoretical framework for semi-global alignment, but their use is limited because they lack heuristic acceleration and accurate E-values. Our new tool, GLOBAL, overcomes some limitations of previous semi-global HMMs: it has accurate E-values and the possibility of the heuristic acceleration required for high-throughput applications. Moreover, according to a standard of truth based on protein structure, two semi-global HMM alignment tools (GLOBAL and HMMer) had comparable performance in identifying complete domains, but distinctly outperformed two tools based on local alignment. When searching for complete protein domains, therefore, GLOBAL avoids disadvantages commonly associated with HMMs, yet maintains their superior retrieval performance. PMID:17596268

  1. Global and Local Pitch Perception in Children with Developmental Dyslexia

    ERIC Educational Resources Information Center

    Ziegler, Johannes C.; Pech-Georgel, Catherine; George, Florence; Foxton, Jessica M.

    2012-01-01

    This study investigated global versus local pitch pattern perception in children with dyslexia aged between 8 and 11 years. Children listened to two consecutive 4-tone pitch sequences while performing a same/different task. On the different trials, sequences either preserved the contour (local condition) or they violated the contour (global…

  2. Molecular epidemiology of oyster-related human noroviruses and their global genetic diversity and temporal-geographical distribution from 1983 to 2014.

    PubMed

    Yu, Yongxin; Cai, Hui; Hu, Linghao; Lei, Rongwei; Pan, Yingjie; Yan, Shuling; Wang, Yongjie

    2015-11-01

    Noroviruses (NoVs) are a leading cause of epidemic and sporadic cases of acute gastroenteritis worldwide. Oysters are well recognized as the main vectors of environmentally transmitted NoVs, and disease outbreaks linked to oyster consumption have been commonly observed. Here, to quantify the genetic diversity, temporal distribution, and circulation of oyster-related NoVs on a global scale, 1,077 oyster-related NoV sequences deposited from 1983 to 2014 were downloaded from both NCBI GenBank and the NoroNet outbreak database and were then screened for quality control. A total of 665 sequences with reliable information were obtained and were subsequently subjected to genotyping and phylogenetic analyses. The results indicated that the majority of oyster-related NoV sequences were obtained from coastal countries and regions and that the numbers of sequences in these regions were unevenly distributed. Moreover, >80% of human NoV genotypes were detected in oyster samples or oyster-related outbreaks. A higher proportion of genogroup I (GI) (34%) was observed for oyster-related sequences than for non-oyster-related outbreaks, where GII strains dominated with an overwhelming majority of >90%, indicating that the prevalences of GI and GII are different in humans and oysters. In addition, a related convergence of the circulation trend was found between oyster-related NoV sequences and human pandemic outbreaks. This suggests that oysters not only act as a vector of NoV through environmental transmission but also serve as an important reservoir of human NoVs. These results highlight the importance of oysters in the persistence and transmission of human NoVs in the environment and have important implications for the surveillance of human NoVs in oyster samples. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  3. Extraction of High Molecular Weight DNA from Fungal Rust Spores for Long Read Sequencing.

    PubMed

    Schwessinger, Benjamin; Rathjen, John P

    2017-01-01

    Wheat rust fungi are complex organisms with a complete life cycle that involves two different host plants and five different spore types. During the asexual infection cycle on wheat, rusts produce massive amounts of dikaryotic urediniospores. These spores are dikaryotic (two nuclei) with each nucleus containing one haploid genome. This dikaryotic state is likely to contribute to their evolutionary success, making them some of the major wheat pathogens globally. Despite this, most published wheat rust genomes are highly fragmented and contain very little haplotype-specific sequence information. Current long-read sequencing technologies hold great promise to provide more contiguous and haplotype-phased genome assemblies. Long reads are able to span repetitive regions and phase structural differences between the haplomes. This increased genome resolution enables the identification of complex loci and the study of genome evolution beyond simple nucleotide polymorphisms. Long-read technologies require pure high molecular weight DNA as an input for sequencing. Here, we describe a DNA extraction protocol for rust spores that yields pure double-stranded DNA molecules with molecular weight of >50 kilo-base pairs (kbp). The isolated DNA is of sufficient purity for PacBio long-read sequencing, but may require additional purification for other sequencing technologies such as Nanopore and 10× Genomics.

  4. Multi-modulus algorithm based on global artificial fish swarm intelligent optimization of DNA encoding sequences.

    PubMed

    Guo, Y C; Wang, H; Wu, H P; Zhang, M Q

    2015-12-21

    Aimed to address the defects of the large mean square error (MSE), and the slow convergence speed in equalizing the multi-modulus signals of the constant modulus algorithm (CMA), a multi-modulus algorithm (MMA) based on global artificial fish swarm (GAFS) intelligent optimization of DNA encoding sequences (GAFS-DNA-MMA) was proposed. To improve the convergence rate and reduce the MSE, this proposed algorithm adopted an encoding method based on DNA nucleotide chains to provide a possible solution to the problem. Furthermore, the GAFS algorithm, with its fast convergence and global search ability, was used to find the best sequence. The real and imaginary parts of the initial optimal weight vector of MMA were obtained through DNA coding of the best sequence. The simulation results show that the proposed algorithm has a faster convergence speed and smaller MSE in comparison with the CMA, the MMA, and the AFS-DNA-MMA.

  5. Approximate likelihood calculation on a phylogeny for Bayesian estimation of divergence times.

    PubMed

    dos Reis, Mario; Yang, Ziheng

    2011-07-01

    The molecular clock provides a powerful way to estimate species divergence times. If information on some species divergence times is available from the fossil or geological record, it can be used to calibrate a phylogeny and estimate divergence times for all nodes in the tree. The Bayesian method provides a natural framework to incorporate different sources of information concerning divergence times, such as information in the fossil and molecular data. Current models of sequence evolution are intractable in a Bayesian setting, and Markov chain Monte Carlo (MCMC) is used to generate the posterior distribution of divergence times and evolutionary rates. This method is computationally expensive, as it involves the repeated calculation of the likelihood function. Here, we explore the use of Taylor expansion to approximate the likelihood during MCMC iteration. The approximation is much faster than conventional likelihood calculation. However, the approximation is expected to be poor when the proposed parameters are far from the likelihood peak. We explore the use of parameter transforms (square root, logarithm, and arcsine) to improve the approximation to the likelihood curve. We found that the new methods, particularly the arcsine-based transform, provided very good approximations under relaxed clock models and also under the global clock model when the global clock is not seriously violated. The approximation is poorer for analysis under the global clock when the global clock is seriously wrong and should thus not be used. The results suggest that the approximate method may be useful for Bayesian dating analysis using large data sets.

  6. High-cost, high-capacity backbone for global brain communication.

    PubMed

    van den Heuvel, Martijn P; Kahn, René S; Goñi, Joaquín; Sporns, Olaf

    2012-07-10

    Network studies of human brain structural connectivity have identified a specific set of brain regions that are both highly connected and highly central. Recent analyses have shown that these putative hub regions are mutually and densely interconnected, forming a "rich club" within the human brain. Here we show that the set of pathways linking rich club regions forms a central high-cost, high-capacity backbone for global brain communication. Diffusion tensor imaging (DTI) data of two sets of 40 healthy subjects were used to map structural brain networks. The contributions to network cost and communication capacity of global cortico-cortical connections were assessed through measures of their topology and spatial embedding. Rich club connections were found to be more costly than predicted by their density alone and accounted for 40% of the total communication cost. Furthermore, 69% of all minimally short paths between node pairs were found to travel through the rich club and a large proportion of these communication paths consisted of ordered sequences of edges ("path motifs") that first fed into, then traversed, and finally exited the rich club, while passing through nodes of increasing and then decreasing degree. The prevalence of short paths that follow such ordered degree sequences suggests that neural communication might take advantage of strategies for dynamic routing of information between brain regions, with an important role for a highly central rich club. Taken together, our results show that rich club connections make an important contribution to interregional signal traffic, forming a central high-cost, high-capacity backbone for global brain communication.

  7. HubAlign: an accurate and efficient method for global alignment of protein-protein interaction networks.

    PubMed

    Hashemifar, Somaye; Xu, Jinbo

    2014-09-01

    High-throughput experimental techniques have produced a large amount of protein-protein interaction (PPI) data. The study of PPI networks, such as comparative analysis, shall benefit the understanding of life process and diseases at the molecular level. One way of comparative analysis is to align PPI networks to identify conserved or species-specific subnetwork motifs. A few methods have been developed for global PPI network alignment, but it still remains challenging in terms of both accuracy and efficiency. This paper presents a novel global network alignment algorithm, denoted as HubAlign, that makes use of both network topology and sequence homology information, based upon the observation that topologically important proteins in a PPI network usually are much more conserved and thus, more likely to be aligned. HubAlign uses a minimum-degree heuristic algorithm to estimate the topological and functional importance of a protein from the global network topology information. Then HubAlign aligns topologically important proteins first and gradually extends the alignment to the whole network. Extensive tests indicate that HubAlign greatly outperforms several popular methods in terms of both accuracy and efficiency, especially in detecting functionally similar proteins. HubAlign is available freely for non-commercial purposes at http://ttic.uchicago.edu/∼hashemifar/software/HubAlign.zip. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  8. DNA barcode goes two-dimensions: DNA QR code web server.

    PubMed

    Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.

  9. ATtRACT-a database of RNA-binding proteins and associated motifs.

    PubMed

    Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

    2016-01-01

    RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es. © The Author(s) 2016. Published by Oxford University Press.

  10. Mitochondrial Disease Sequence Data Resource (MSeqDR): a global grass-roots consortium to facilitate deposition, curation, annotation, and integrated analysis of genomic data for the mitochondrial disease clinical and research communities.

    PubMed

    Falk, Marni J; Shen, Lishuang; Gonzalez, Michael; Leipzig, Jeremy; Lott, Marie T; Stassen, Alphons P M; Diroma, Maria Angela; Navarro-Gomez, Daniel; Yeske, Philip; Bai, Renkui; Boles, Richard G; Brilhante, Virginia; Ralph, David; DaRe, Jeana T; Shelton, Robert; Terry, Sharon F; Zhang, Zhe; Copeland, William C; van Oven, Mannis; Prokisch, Holger; Wallace, Douglas C; Attimonelli, Marcella; Krotoski, Danuta; Zuchner, Stephan; Gai, Xiaowu

    2015-03-01

    Success rates for genomic analyses of highly heterogeneous disorders can be greatly improved if a large cohort of patient data is assembled to enhance collective capabilities for accurate sequence variant annotation, analysis, and interpretation. Indeed, molecular diagnostics requires the establishment of robust data resources to enable data sharing that informs accurate understanding of genes, variants, and phenotypes. The "Mitochondrial Disease Sequence Data Resource (MSeqDR) Consortium" is a grass-roots effort facilitated by the United Mitochondrial Disease Foundation to identify and prioritize specific genomic data analysis needs of the global mitochondrial disease clinical and research community. A central Web portal (https://mseqdr.org) facilitates the coherent compilation, organization, annotation, and analysis of sequence data from both nuclear and mitochondrial genomes of individuals and families with suspected mitochondrial disease. This Web portal provides users with a flexible and expandable suite of resources to enable variant-, gene-, and exome-level sequence analysis in a secure, Web-based, and user-friendly fashion. Users can also elect to share data with other MSeqDR Consortium members, or even the general public, either by custom annotation tracks or through the use of a convenient distributed annotation system (DAS) mechanism. A range of data visualization and analysis tools are provided to facilitate user interrogation and understanding of genomic, and ultimately phenotypic, data of relevance to mitochondrial biology and disease. Currently available tools for nuclear and mitochondrial gene analyses include an MSeqDR GBrowse instance that hosts optimized mitochondrial disease and mitochondrial DNA (mtDNA) specific annotation tracks, as well as an MSeqDR locus-specific database (LSDB) that curates variant data on more than 1300 genes that have been implicated in mitochondrial disease and/or encode mitochondria-localized proteins. MSeqDR is integrated with a diverse array of mtDNA data analysis tools that are both freestanding and incorporated into an online exome-level dataset curation and analysis resource (GEM.app) that is being optimized to support needs of the MSeqDR community. In addition, MSeqDR supports mitochondrial disease phenotyping and ontology tools, and provides variant pathogenicity assessment features that enable community review, feedback, and integration with the public ClinVar variant annotation resource. A centralized Web-based informed consent process is being developed, with implementation of a Global Unique Identifier (GUID) system to integrate data deposited on a given individual from different sources. Community-based data deposition into MSeqDR has already begun. Future efforts will enhance capabilities to incorporate phenotypic data that enhance genomic data analyses. MSeqDR will fill the existing void in bioinformatics tools and centralized knowledge that are necessary to enable efficient nuclear and mtDNA genomic data interpretation by a range of shareholders across both clinical diagnostic and research settings. Ultimately, MSeqDR is focused on empowering the global mitochondrial disease community to better define and explore mitochondrial diseases. Copyright © 2014 Elsevier Inc. All rights reserved.

  11. Mitochondrial Disease Sequence Data Resource (MSeqDR): A global grass-roots consortium to facilitate deposition, curation, annotation, and integrated analysis of genomic data for the mitochondrial disease clinical and research communities

    PubMed Central

    Falk, Marni J.; Shen, Lishuang; Gonzalez, Michael; Leipzig, Jeremy; Lott, Marie T.; Stassen, Alphons P.M.; Diroma, Maria Angela; Navarro-Gomez, Daniel; Yeske, Philip; Bai, Renkui; Boles, Richard G.; Brilhante, Virginia; Ralph, David; DaRe, Jeana T.; Shelton, Robert; Terry, Sharon; Zhang, Zhe; Copeland, William C.; van Oven, Mannis; Prokisch, Holger; Wallace, Douglas C.; Attimonelli, Marcella; Krotoski, Danuta; Zuchner, Stephan; Gai, Xiaowu

    2014-01-01

    Success rates for genomic analyses of highly heterogeneous disorders can be greatly improved if a large cohort of patient data is assembled to enhance collective capabilities for accurate sequence variant annotation, analysis, and interpretation. Indeed, molecular diagnostics requires the establishment of robust data resources to enable data sharing that informs accurate understanding of genes, variants, and phenotypes. The “Mitochondrial Disease Sequence Data Resource (MSeqDR) Consortium” is a grass-roots effort facilitated by the United Mitochondrial Disease Foundation to identify and prioritize specific genomic data analysis needs of the global mitochondrial disease clinical and research community. A central Web portal (https://mseqdr.org) facilitates the coherent compilation, organization, annotation, and analysis of sequence data from both nuclear and mitochondrial genomes of individuals and families with suspected mitochondrial disease. This Web portal provides users with a flexible and expandable suite of resources to enable variant-, gene-, and exome-level sequence analysis in a secure, Web-based, and user-friendly fashion. Users can also elect to share data with other MSeqDR Consortium members, or even the general public, either by custom annotation tracks or through use of a convenient distributed annotation system (DAS) mechanism. A range of data visualization and analysis tools are provided to facilitate user interrogation and understanding of genomic, and ultimately phenotypic, data of relevance to mitochondrial biology and disease. Currently available tools for nuclear and mitochondrial gene analyses include an MSeqDR GBrowse instance that hosts optimized mitochondrial disease and mitochondrial DNA (mtDNA) specific annotation tracks, as well as an MSeqDR locus-specific database (LSDB) that curates variant data on more than 1,300 genes that have been implicated in mitochondrial disease and/or encode mitochondria-localized proteins. MSeqDR is integrated with a diverse array of mtDNA data analysis tools that are both freestanding and incorporated into an online exome-level dataset curation and analysis resource (GEM.app) that is being optimized to support needs of the MSeqDR community. In addition, MSeqDR supports mitochondrial disease phenotyping and ontology tools, and provides variant pathogenicity assessment features that enable community review, feedback, and integration with the public ClinVar variant annotation resource. A centralized Web-based informed consent process is being developed, with implementation of a Global Unique Identifier (GUID) system to integrate data deposited on a given individual from different sources. Community-based data deposition into MSeqDR has already begun. Future efforts will enhance capabilities to incorporate phenotypic data that enhance genomic data analyses. MSeqDR will fill the existing void in bioinformatics tools and centralized knowledge that are necessary to enable efficient nuclear and mtDNA genomic data interpretation by a range of shareholders across both clinical diagnostic and research settings. Ultimately, MSeqDR is focused on empowering the global mitochondrial disease community to better define and explore mitochondrial disease. PMID:25542617

  12. Connected Component Model for Multi-Object Tracking.

    PubMed

    He, Zhenyu; Li, Xin; You, Xinge; Tao, Dacheng; Tang, Yuan Yan

    2016-08-01

    In multi-object tracking, it is critical to explore the data associations by exploiting the temporal information from a sequence of frames rather than the information from the adjacent two frames. Since straightforwardly obtaining data associations from multi-frames is an NP-hard multi-dimensional assignment (MDA) problem, most existing methods solve this MDA problem by either developing complicated approximate algorithms, or simplifying MDA as a 2D assignment problem based upon the information extracted only from adjacent frames. In this paper, we show that the relation between associations of two observations is the equivalence relation in the data association problem, based on the spatial-temporal constraint that the trajectories of different objects must be disjoint. Therefore, the MDA problem can be equivalently divided into independent subproblems by equivalence partitioning. In contrast to existing works for solving the MDA problem, we develop a connected component model (CCM) by exploiting the constraints of the data association and the equivalence relation on the constraints. Based upon CCM, we can efficiently obtain the global solution of the MDA problem for multi-object tracking by optimizing a sequence of independent data association subproblems. Experiments on challenging public data sets demonstrate that our algorithm outperforms the state-of-the-art approaches.

  13. Complete Sequence and Molecular Epidemiology of IncK Epidemic Plasmid Encoding blaCTX-M-14

    PubMed Central

    Cottell, Jennifer L.; Webber, Mark A.; Coldham, Nick G.; Taylor, Dafydd L.; Cerdeño-Tárraga, Anna M.; Hauser, Heidi; Thomson, Nicholas R.; Woodward, Martin J.

    2011-01-01

    Antimicrobial drug resistance is a global challenge for the 21st century with the emergence of resistant bacterial strains worldwide. Transferable resistance to β-lactam antimicrobial drugs, mediated by production of extended-spectrum β-lactamases (ESBLs), is of particular concern. In 2004, an ESBL-carrying IncK plasmid (pCT) was isolated from cattle in the United Kingdom. The sequence was a 93,629-bp plasmid encoding a single antimicrobial drug resistance gene, blaCTX-M-14. From this information, PCRs identifying novel features of pCT were designed and applied to isolates from several countries, showing that the plasmid has disseminated worldwide in bacteria from humans and animals. Complete DNA sequences can be used as a platform to develop rapid epidemiologic tools to identify and trace the spread of plasmids in clinically relevant pathogens, thus facilitating a better understanding of their distribution and ability to transfer between bacteria of humans and animals. PMID:21470454

  14. Auditory perception in the child.

    PubMed

    Nicolay-Pirmolin, M

    2003-01-01

    The development of auditory perception in the infant starts in utero and continues up to the age of 9-10 years. We shall examine the various stages, the various acoustic parameters and the segmental level. Three stages are important: from 7 months onwards: first perceptual reorganization; between 7 and 12 months: second perceptual reorganization; from 10 to 24 months: segmentation of the spoken word. We will note the evolution between 2 and 6 years and between 6 and 9 years: 9 years being the critical age--switching from global treatment to analytic treatment of utterances. We will then examine musical perception and we note that at the prelinguistic level it is the same perceptive units that handle verbal sequences and musical sequences. The stages of musical perception are parallel to those for speech. Bigand posed the question: "should we see in these hierarchies, and in their importance to perception, the manifestation of an overall cognitive constraint restricting the handling of long sequences of acoustic events (including language) and why not even for all processes dealing with symbolic information".

  15. Adaptive correlation filter-based video stabilization without accumulative global motion estimation

    NASA Astrophysics Data System (ADS)

    Koh, Eunjin; Lee, Chanyong; Jeong, Dong Gil

    2014-12-01

    We present a digital video stabilization approach that provides both robustness and efficiency for practical applications. In this approach, we adopt a stabilization model that maintains spatio-temporal information of past input frames efficiently and can track original stabilization position. Because of the stabilization model, the proposed method does not need accumulative global motion estimation and can recover the original position even if there is a failure in interframe motion estimation. It can also intelligently overcome the situation of damaged or interrupted video sequences. Moreover, because it is simple and suitable to parallel scheme, we implement it on a commercial field programmable gate array and a graphics processing unit board with compute unified device architecture in a breeze. Experimental results show that the proposed approach is both fast and robust.

  16. Application Architecture of Avian Influenza Research Collaboration Network in Korea e-Science

    NASA Astrophysics Data System (ADS)

    Choi, Hoon; Lee, Junehawk

    In the pursuit of globalization of the AI e-Science environment, KISTI is fostering to extend the AI research community to the AI research institutes of neighboring countries and to share the AI e-Science environment with them in the near future. In this paper we introduce the application architecture of AI research collaboration network (AIRCoN). AIRCoN is a global e-Science environment for AI research conducted by KISTI. It consists of AI virus sequence information sharing system for sufficing data requirement of research community, integrated analysis environment for analyzing the mutation pattern of AI viruses and their risks, epidemic modeling and simulation environment for establishing national effective readiness strategy against AI pandemics, and knowledge portal for sharing expertise of epidemic study and unpublished research results with community members.

  17. The Essential Component in DNA-Based Information Storage System: Robust Error-Tolerating Module

    PubMed Central

    Yim, Aldrin Kay-Yuen; Yu, Allen Chi-Shing; Li, Jing-Woei; Wong, Ada In-Chun; Loo, Jacky F. C.; Chan, King Ming; Kong, S. K.; Yip, Kevin Y.; Chan, Ting-Fung

    2014-01-01

    The size of digital data is ever increasing and is expected to grow to 40,000 EB by 2020, yet the estimated global information storage capacity in 2011 is <300 EB, indicating that most of the data are transient. DNA, as a very stable nano-molecule, is an ideal massive storage device for long-term data archive. The two most notable illustrations are from Church et al. and Goldman et al., whose approaches are well-optimized for most sequencing platforms – short synthesized DNA fragments without homopolymer. Here, we suggested improvements on error handling methodology that could enable the integration of DNA-based computational process, e.g., algorithms based on self-assembly of DNA. As a proof of concept, a picture of size 438 bytes was encoded to DNA with low-density parity-check error-correction code. We salvaged a significant portion of sequencing reads with mutations generated during DNA synthesis and sequencing and successfully reconstructed the entire picture. A modular-based programing framework – DNAcodec with an eXtensible Markup Language-based data format was also introduced. Our experiments demonstrated the practicability of long DNA message recovery with high error tolerance, which opens the field to biocomputing and synthetic biology. PMID:25414846

  18. MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.

    PubMed

    Fang, Chao; Shang, Yi; Xu, Dong

    2018-05-01

    Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.

  19. Toward a mtDNA locus-specific mutation database using the LOVD platform.

    PubMed

    Elson, Joanna L; Sweeney, Mary G; Procaccio, Vincent; Yarham, John W; Salas, Antonio; Kong, Qing-Peng; van der Westhuizen, Francois H; Pitceathly, Robert D S; Thorburn, David R; Lott, Marie T; Wallace, Douglas C; Taylor, Robert W; McFarland, Robert

    2012-09-01

    The Human Variome Project (HVP) is a global effort to collect and curate all human genetic variation affecting health. Mutations of mitochondrial DNA (mtDNA) are an important cause of neurogenetic disease in humans; however, identification of the pathogenic mutations responsible can be problematic. In this article, we provide explanations as to why and suggest how such difficulties might be overcome. We put forward a case in support of a new Locus Specific Mutation Database (LSDB) implemented using the Leiden Open-source Variation Database (LOVD) system that will not only list primary mutations, but also present the evidence supporting their role in disease. Critically, we feel that this new database should have the capacity to store information on the observed phenotypes alongside the genetic variation, thereby facilitating our understanding of the complex and variable presentation of mtDNA disease. LOVD supports fast queries of both seen and hidden data and allows storage of sequence variants from high-throughput sequence analysis. The LOVD platform will allow construction of a secure mtDNA database; one that can fully utilize currently available data, as well as that being generated by high-throughput sequencing, to link genotype with phenotype enhancing our understanding of mitochondrial disease, with a view to providing better prognostic information. © 2012 Wiley Periodicals, Inc.

  20. Toward a mtDNA Locus-Specific Mutation Database Using the LOVD Platform

    PubMed Central

    Elson, Joanna L.; Sweeney, Mary G.; Procaccio, Vincent; Yarham, John W.; Salas, Antonio; Kong, Qing-Peng; van der Westhuizen, Francois H.; Pitceathly, Robert D.S.; Thorburn, David R.; Lott, Marie T.; Wallace, Douglas C.; Taylor, Robert W.; McFarland, Robert

    2015-01-01

    The Human Variome Project (HVP) is a global effort to collect and curate all human genetic variation affecting health. Mutations of mitochondrial DNA (mtDNA) are an important cause of neurogenetic disease in humans; however, identification of the pathogenic mutations responsible can be problematic. In this article, we provide explanations as to why and suggest how such difficulties might be overcome. We put forward a case in support of a new Locus Specific Mutation Database (LSDB) implemented using the Leiden Open-source Variation Database (LOVD) system that will not only list primary mutations, but also present the evidence supporting their role in disease. Critically, we feel that this new database should have the capacity to store information on the observed phenotypes alongside the genetic variation, thereby facilitating our understanding of the complex and variable presentation of mtDNA disease. LOVD supports fast queries of both seen and hidden data and allows storage of sequence variants from high-throughput sequence analysis. The LOVD platform will allow construction of a secure mtDNA database; one that can fully utilize currently available data, as well as that being generated by high-throughput sequencing, to link genotype with phenotype enhancing our understanding of mitochondrial disease, with a view to providing better prognostic information. PMID:22581690

  1. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles

    PubMed Central

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G.; Gelly, Jean-Christophe

    2016-01-01

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation —with Protein Blocks—, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the ‘Hard’ category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/. PMID:27319297

  2. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles.

    PubMed

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe

    2016-06-20

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.

  3. Cross cultural differences in unconscious knowledge.

    PubMed

    Kiyokawa, Sachiko; Dienes, Zoltán; Tanaka, Daisuke; Yamada, Ayumi; Crowe, Louise

    2012-07-01

    Previous studies have indicated cross cultural differences in conscious processes, such that Asians have a global preference and Westerners a more analytical one. We investigated whether these biases also apply to unconscious knowledge. In Experiment 1, Japanese and UK participants memorized strings of large (global) letters made out of small (local) letters. The strings constituted one sequence of letters at a global level and a different sequence at a local level. Implicit learning occurred at the global and not the local level for the Japanese but equally at both levels for the English. In Experiment 2, the Japanese preference for global over local processing persisted even when structure existed only at the local but not global level. In Experiment 3, Japanese and UK participants were asked to attend to just one of the levels, global or local. Now the cultural groups performed similarly, indicating that the bias largely reflects preference rather than ability (although the data left room for residual ability differences). In Experiment 4, the greater global advantage of Japanese rather English was confirmed for strings made of Japanese kana rather than Roman letters. That is, the cultural difference is not due to familiarity of the sequence elements. In sum, we show for the first time that cultural biases strongly affect the type of unconscious knowledge people acquire. Copyright © 2012 Elsevier B.V. All rights reserved.

  4. Proteomics technique opens new frontiers in mobilome research.

    PubMed

    Davidson, Andrew D; Matthews, David A; Maringer, Kevin

    2017-01-01

    A large proportion of the genome of most eukaryotic organisms consists of highly repetitive mobile genetic elements. The sum of these elements is called the "mobilome," which in eukaryotes is made up mostly of transposons. Transposable elements contribute to disease, evolution, and normal physiology by mediating genetic rearrangement, and through the "domestication" of transposon proteins for cellular functions. Although 'omics studies of mobilome genomes and transcriptomes are common, technical challenges have hampered high-throughput global proteomics analyses of transposons. In a recent paper, we overcame these technical hurdles using a technique called "proteomics informed by transcriptomics" (PIT), and thus published the first unbiased global mobilome-derived proteome for any organism (using cell lines derived from the mosquito Aedes aegypti ). In this commentary, we describe our methods in more detail, and summarise our major findings. We also use new genome sequencing data to show that, in many cases, the specific genomic element expressing a given protein can be identified using PIT. This proteomic technique therefore represents an important technological advance that will open new avenues of research into the role that proteins derived from transposons and other repetitive and sequence diverse genetic elements, such as endogenous retroviruses, play in health and disease.

  5. The Global Genome Biodiversity Network (GGBN) Data Standard specification.

    PubMed

    Droege, G; Barker, K; Seberg, O; Coddington, J; Benson, E; Berendsohn, W G; Bunk, B; Butler, C; Cawsey, E M; Deck, J; Döring, M; Flemons, P; Gemeinholzer, B; Güntsch, A; Hollowell, T; Kelbert, P; Kostadinov, I; Kottmann, R; Lawlor, R T; Lyal, C; Mackenzie-Dodds, J; Meyer, C; Mulcahy, D; Nussbeck, S Y; O'Tuama, É; Orrell, T; Petersen, G; Robertson, T; Söhngen, C; Whitacre, J; Wieczorek, J; Yilmaz, P; Zetzsche, H; Zhang, Y; Zhou, X

    2016-01-01

    Genomic samples of non-model organisms are becoming increasingly important in a broad range of studies from developmental biology, biodiversity analyses, to conservation. Genomic sample definition, description, quality, voucher information and metadata all need to be digitized and disseminated across scientific communities. This information needs to be concise and consistent in today's ever-increasing bioinformatic era, for complementary data aggregators to easily map databases to one another. In order to facilitate exchange of information on genomic samples and their derived data, the Global Genome Biodiversity Network (GGBN) Data Standard is intended to provide a platform based on a documented agreement to promote the efficient sharing and usage of genomic sample material and associated specimen information in a consistent way. The new data standard presented here build upon existing standards commonly used within the community extending them with the capability to exchange data on tissue, environmental and DNA sample as well as sequences. The GGBN Data Standard will reveal and democratize the hidden contents of biodiversity biobanks, for the convenience of everyone in the wider biobanking community. Technical tools exist for data providers to easily map their databases to the standard.Database URL: http://terms.tdwg.org/wiki/GGBN_Data_Standard. © The Author(s) 2016. Published by Oxford University Press.

  6. Genome and Transcriptome Sequencing of the Ostreid herpesvirus 1 From Tomales Bay, California

    NASA Astrophysics Data System (ADS)

    Burge, C. A.; Langevin, S.; Closek, C. J.; Roberts, S. B.; Friedman, C. S.

    2016-02-01

    Mass mortalities of larval and seed bivalve molluscs attributed to the Ostreid herpesvirus 1 (OsHV-1) occur globally. OsHV-1 was fully sequenced and characterized as a member of the Family Malacoherpesviridae. Multiple strains of OsHV-1 exist and may vary in virulence, i.e. OsHV-1 µvar. For most global variants of OsHV-1, sequence data is limited to PCR-based sequencing of segments, including two recent genomes. In the United States, OsHV-1 is limited to detection in adjacent embayments in California, Tomales and Drakes bays. Limited DNA sequence data of OsHV-1 infecting oysters in Tomales Bay indicates the virus detected in Tomales Bay is similar but not identical to any one global variant of OsHV-1. In order to better understand both strain variation and virulence of OsHV-1 infecting oysters in Tomales Bay, we used genomic and transcriptomic sequencing. Meta-genomic sequencing (Illumina MiSeq) was conducted from infected oysters (n=4 per year) collected in 2003, 2007, and 2014, where full OsHV-1 genome sequences and low overall microbial diversity were achieved from highly infected oysters. Increased microbial diversity was detected in three of four samples sequenced from 2003, where qPCR based genome copy numbers of OsHV-1 were lower. Expression analysis (SOLiD RNA sequencing) of OsHV-1 genes expressed in oyster larvae at 24 hours post exposure revealed a nearly complete transcriptome, with several highly expressed genes, which are similar to recent transcriptomic analyses of other OsHV-1 variants. Taken together, our results indicate that genome and transcriptome sequencing may be powerful tools in understanding both strain variation and virulence of non-culturable marine viruses.

  7. Integrated biostratigraphic and sequence stratigraphic framework for Upper Cretaceous strata of the eastern Gulf Coastal Plain, USA

    USGS Publications Warehouse

    Mancini, E.A.; Puckett, T.M.; Tew, B.H.

    1996-01-01

    Upper Cretaceous (Santonian-Maastrichtian stages) strata of the eastern US Gulf Coastal Plain represent a relatively complete section of marine to nonmarine mixed siliciclastic and carbonate sediments. This section includes three depositional sequences which display characteristic systems tracts and distinct physical defining surfaces. The marine lithofacies are rich in calcareous nannoplankton and planktonic foraminifera which can be used for biostratigraphic zonation. Integration of this zonation with the lithostratigraphy and sequence stratigraphy of these strata results in a framework that can be used for local and regional intrabasin correlation and potentially for global interbasin correlation. Only the synchronous maximum flooding surfaces of these depositional sequences, however, have chronostratigraphic significance. The sequence boundaries and initial flooding surfaces are diachronous, and their use for correlation can produce conflicting results. The availability of high resolution biostratigraphy is critical for global correlation of depositional sequences. ?? 1996 Academic Press Limited.

  8. HydroSHEDS: A global comprehensive hydrographic dataset

    NASA Astrophysics Data System (ADS)

    Wickel, B. A.; Lehner, B.; Sindorf, N.

    2007-12-01

    The Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales (HydroSHEDS) is an innovative product that, for the first time, provides hydrographic information in a consistent and comprehensive format for regional and global-scale applications. HydroSHEDS offers a suite of geo-referenced data sets, including stream networks, watershed boundaries, drainage directions, and ancillary data layers such as flow accumulations, distances, and river topology information. The goal of developing HydroSHEDS was to generate key data layers to support regional and global watershed analyses, hydrological modeling, and freshwater conservation planning at a quality, resolution and extent that had previously been unachievable. Available resolutions range from 3 arc-second (approx. 90 meters at the equator) to 5 minute (approx. 10 km at the equator) with seamless near-global extent. HydroSHEDS is derived from elevation data of the Shuttle Radar Topography Mission (SRTM) at 3 arc-second resolution. The original SRTM data have been hydrologically conditioned using a sequence of automated procedures. Existing methods of data improvement and newly developed algorithms have been applied, including void filling, filtering, stream burning, and upscaling techniques. Manual corrections were made where necessary. Preliminary quality assessments indicate that the accuracy of HydroSHEDS significantly exceeds that of existing global watershed and river maps. HydroSHEDS was developed by the Conservation Science Program of the World Wildlife Fund (WWF) in partnership with the U.S. Geological Survey (USGS), the International Centre for Tropical Agriculture (CIAT), The Nature Conservancy (TNC), and the Center for Environmental Systems Research (CESR) of the University of Kassel, Germany.

  9. Spatiotemporal coding of inputs for a system of globally coupled phase oscillators

    NASA Astrophysics Data System (ADS)

    Wordsworth, John; Ashwin, Peter

    2008-12-01

    We investigate the spatiotemporal coding of low amplitude inputs to a simple system of globally coupled phase oscillators with coupling function g(ϕ)=-sin(ϕ+α)+rsin(2ϕ+β) that has robust heteroclinic cycles (slow switching between cluster states). The inputs correspond to detuning of the oscillators. It was recently noted that globally coupled phase oscillators can encode their frequencies in the form of spatiotemporal codes of a sequence of cluster states [P. Ashwin, G. Orosz, J. Wordsworth, and S. Townley, SIAM J. Appl. Dyn. Syst. 6, 728 (2007)]. Concentrating on the case of N=5 oscillators we show in detail how the spatiotemporal coding can be used to resolve all of the information that relates the individual inputs to each other, providing that a long enough time series is considered. We investigate robustness to the addition of noise and find a remarkable stability, especially of the temporal coding, to the addition of noise even for noise of a comparable magnitude to the inputs.

  10. Cystic echinococcosis in South America: systematic review of species and genotypes of Echinococcus granulosus sensu lato in humans and natural domestic hosts.

    PubMed

    Cucher, Marcela Alejandra; Macchiaroli, Natalia; Baldi, Germán; Camicia, Federico; Prada, Laura; Maldonado, Lucas; Avila, Héctor Gabriel; Fox, Adolfo; Gutiérrez, Ariana; Negro, Perla; López, Raúl; Jensen, Oscar; Rosenzvit, Mara; Kamenetzky, Laura

    2016-02-01

    To systematically review publications on Echinococcus granulosus sensu lato species/genotypes reported in domestic intermediate and definitive hosts in South America and in human cases worldwide, taking into account those articles where DNA sequencing was performed; and to analyse the density of each type of livestock that can act as intermediate host, and features of medical importance such as cyst organ location. Literature search in numerous databases. We included only articles where samples were genotyped by sequencing since to date it is the most accurate method to unambiguously identify all E. granulosus s. l. genotypes. Also, we report new E. granulosus s. l. samples from Argentina and Uruguay analysed by sequencing of cox1 gene. In South America, five countries have cystic echinococcosis cases for which sequencing data are available: Argentina, Brazil, Chile, Peru and Uruguay, adding up 1534 cases. E. granulosus s. s. (G1) accounts for most of the global burden of human and livestock cases. Also, E. canadensis (G6) plays a significant role in human cystic echinococcosis. Likewise, worldwide analysis of human cases showed that 72.9% are caused by E. granulosus s. s. (G1) and 12.2% and 9.6% by E. canadensis G6 and G7, respectively. E. granulosus s. s. (G1) accounts for most of the global burden followed by E. canadensis (G6 and G7) in South America and worldwide. This information should be taken into account to suit local cystic echinococcosis control and prevention programmes according to each molecular epidemiological situation. © 2015 John Wiley & Sons Ltd.

  11. Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples

    PubMed Central

    Liu, Zhandong; Venkatesh, Santosh S; Maley, Carlo C

    2008-01-01

    Background Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. Results We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (> 98%) 12 bp oligomers appear in vertebrate genomes while < 2% of 19 bp oligomers are present. Other species showed different ranges of > 98% to < 2% of possible oligomers in D. melanogaster (12–17 bp), C. elegans (11–17 bp), A. thaliana (11–17 bp), S. cerevisiae (10–16 bp) and E. coli (9–15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Conclusion Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect novel microbes in human tissues. PMID:18973670

  12. Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples.

    PubMed

    Liu, Zhandong; Venkatesh, Santosh S; Maley, Carlo C

    2008-10-30

    Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (> 98%) 12 bp oligomers appear in vertebrate genomes while < 2% of 19 bp oligomers are present. Other species showed different ranges of > 98% to < 2% of possible oligomers in D. melanogaster (12-17 bp), C. elegans (11-17 bp), A. thaliana (11-17 bp), S. cerevisiae (10-16 bp) and E. coli (9-15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect novel microbes in human tissues.

  13. RNA Sequencing Analysis of the Gametophyte Transcriptome from the Liverwort, Marchantia polymorpha

    PubMed Central

    Sharma, Niharika; Jung, Chol-Hee; Bhalla, Prem L.; Singh, Mohan B.

    2014-01-01

    The liverwort Marchantia polymorpha is a member of the most basal lineage of land plants (embryophytes) and likely retains many ancestral morphological, physiological and molecular characteristics. Despite its phylogenetic importance and the availability of previous EST studies, M. polymorpha’s lack of economic importance limits accessible genomic resources for this species. We employed Illumina RNA-Seq technology to sequence the gametophyte transcriptome of M. polymorpha. cDNA libraries from 6 different male and female developmental tissues were sequenced to delineate a global view of the M. polymorpha transcriptome. Approximately 80 million short reads were obtained and assembled into a non-redundant set of 46,533 transcripts (> = 200 bp) from 46,070 loci. The average length and the N50 length of the transcripts were 757 bp and 471 bp, respectively. Sequence comparison of assembled transcripts with non-redundant proteins from embryophytes resulted in the annotation of 43% of the transcripts. The transcripts were also compared with M. polymorpha expressed sequence tags (ESTs), and approximately 69.5% of the transcripts appeared to be novel. Twenty-one percent of the transcripts were assigned GO terms to improve annotation. In addition, 6,112 simple sequence repeats (SSRs) were identified as potential molecular markers, which may be useful in studies of genetic diversity. A comparative genomics approach revealed that a substantial proportion of the genes (35.5%) expressed in M. polymorpha were conserved across phylogenetically related species, such as Selaginella and Physcomitrella, and identified 580 genes that are potentially unique to liverworts. Our study presents an extensive amount of novel sequence information for M. polymorpha. This information will serve as a valuable genomics resource for further molecular, developmental and comparative evolutionary studies, as well as for the isolation and characterization of functional genes that are involved in sex differentiation and sexual reproduction in this liverwort. PMID:24841988

  14. SVM-Based Prediction of Propeptide Cleavage Sites in Spider Toxins Identifies Toxin Innovation in an Australian Tarantula

    PubMed Central

    Wong, Emily S. W.; Hardy, Margaret C.; Wood, David; Bailey, Timothy; King, Glenn F.

    2013-01-01

    Spider neurotoxins are commonly used as pharmacological tools and are a popular source of novel compounds with therapeutic and agrochemical potential. Since venom peptides are inherently toxic, the host spider must employ strategies to avoid adverse effects prior to venom use. It is partly for this reason that most spider toxins encode a protective proregion that upon enzymatic cleavage is excised from the mature peptide. In order to identify the mature toxin sequence directly from toxin transcripts, without resorting to protein sequencing, the propeptide cleavage site in the toxin precursor must be predicted bioinformatically. We evaluated different machine learning strategies (support vector machines, hidden Markov model and decision tree) and developed an algorithm (SpiderP) for prediction of propeptide cleavage sites in spider toxins. Our strategy uses a support vector machine (SVM) framework that combines both local and global sequence information. Our method is superior or comparable to current tools for prediction of propeptide sequences in spider toxins. Evaluation of the SVM method on an independent test set of known toxin sequences yielded 96% sensitivity and 100% specificity. Furthermore, we sequenced five novel peptides (not used to train the final predictor) from the venom of the Australian tarantula Selenotypus plumipes to test the accuracy of the predictor and found 80% sensitivity and 99.6% 8-mer specificity. Finally, we used the predictor together with homology information to predict and characterize seven groups of novel toxins from the deeply sequenced venom gland transcriptome of S. plumipes, which revealed structural complexity and innovations in the evolution of the toxins. The precursor prediction tool (SpiderP) is freely available on ArachnoServer (http://www.arachnoserver.org/spiderP.html), a web portal to a comprehensive relational database of spider toxins. All training data, test data, and scripts used are available from the SpiderP website. PMID:23894279

  15. A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution.

    PubMed

    Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme

    2013-07-01

    The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/. Supplementary data are available at Bioinformatics online.

  16. Automated sequence-specific protein NMR assignment using the memetic algorithm MATCH.

    PubMed

    Volk, Jochen; Herrmann, Torsten; Wüthrich, Kurt

    2008-07-01

    MATCH (Memetic Algorithm and Combinatorial Optimization Heuristics) is a new memetic algorithm for automated sequence-specific polypeptide backbone NMR assignment of proteins. MATCH employs local optimization for tracing partial sequence-specific assignments within a global, population-based search environment, where the simultaneous application of local and global optimization heuristics guarantees high efficiency and robustness. MATCH thus makes combined use of the two predominant concepts in use for automated NMR assignment of proteins. Dynamic transition and inherent mutation are new techniques that enable automatic adaptation to variable quality of the experimental input data. The concept of dynamic transition is incorporated in all major building blocks of the algorithm, where it enables switching between local and global optimization heuristics at any time during the assignment process. Inherent mutation restricts the intrinsically required randomness of the evolutionary algorithm to those regions of the conformation space that are compatible with the experimental input data. Using intact and artificially deteriorated APSY-NMR input data of proteins, MATCH performed sequence-specific resonance assignment with high efficiency and robustness.

  17. A world without bacterial meningitis: how genomic epidemiology can inform vaccination strategy.

    PubMed

    Rodrigues, Charlene M C; Maiden, Martin C J

    2018-01-01

    Bacterial meningitis remains an important cause of global morbidity and mortality. Although effective vaccinations exist and are being increasingly used worldwide, bacterial diversity threatens their impact and the ultimate goal of eliminating the disease. Through genomic epidemiology, we can appreciate bacterial population structure and its consequences for transmission dynamics, virulence, antimicrobial resistance, and development of new vaccines. Here, we review what we have learned through genomic epidemiological studies, following the rapid implementation of whole genome sequencing that can help to optimise preventative strategies for bacterial meningitis.

  18. Evolutionary diversification of type 2 porcine reproductive and respiratory syndrome virus.

    PubMed

    Brar, Manreetpal Singh; Shi, Mang; Murtaugh, Michael P; Leung, Frederick Chi-Ching

    2015-07-01

    Porcine reproductive and respiratory syndrome virus (PRRSV) is one of the leading swine pathogens causing tremendous economic loss to the global swine industry due to its virulence, pathogenesis, infectivity and transmissibility. Although formally recognized only two and half decades ago, molecular dating estimation indicates a more ancient evolutionary history, which involved divergence into two genotypes (type 1 and type 2) prior to the 'initial' outbreaks of the late 1980s. Type 2 PRRSV circulates primarily in North America and Asia. The relatively greater availability of sequence data for this genotype from widespread geographical territories has enabled a better understanding of the evolving genotype. However, there are a number of challenges in terms of the vastness of data available and what this indicates in the context of viral diversity. Accordingly, here we revisit the mechanisms by which PRRSV generates variability, describe a means of organizing type 2 diversity captured in voluminous ORF5 sequences in a phylogenetic framework and provide a holistic view of known global type 2 diversity in the same setting. The consequences of the expanding diversity for control measures such as vaccination are discussed, as well as the contribution of modified live vaccines to the circulation of field isolates. We end by highlighting some limitations of current molecular epidemiology studies in relation to inferring PRRSV diversity, and what steps can be taken to overcome these and additionally enable PRRSV sequence data to be informative about viral phenotypic traits such as virulence.

  19. Global analysis of gene expression profiles in developing physic nut (Jatropha curcas L.) seeds.

    PubMed

    Jiang, Huawu; Wu, Pingzhi; Zhang, Sheng; Song, Chi; Chen, Yaping; Li, Meiru; Jia, Yongxia; Fang, Xiaohua; Chen, Fan; Wu, Guojiang

    2012-01-01

    Physic nut (Jatropha curcas L.) is an oilseed plant species with high potential utility as a biofuel. Furthermore, following recent sequencing of its genome and the availability of expressed sequence tag (EST) libraries, it is a valuable model plant for studying carbon assimilation in endosperms of oilseed plants. There have been several transcriptomic analyses of developing physic nut seeds using ESTs, but they have provided limited information on the accumulation of stored resources in the seeds. We applied next-generation Illumina sequencing technology to analyze global gene expression profiles of developing physic nut seeds 14, 19, 25, 29, 35, 41, and 45 days after pollination (DAP). The acquired profiles reveal the key genes, and their expression timeframes, involved in major metabolic processes including: carbon flow, starch metabolism, and synthesis of storage lipids and proteins in the developing seeds. The main period of storage reserves synthesis in the seeds appears to be 29-41 DAP, and the fatty acid composition of the developing seeds is consistent with relative expression levels of different isoforms of acyl-ACP thioesterase and fatty acid desaturase genes. Several transcription factor genes whose expression coincides with storage reserve deposition correspond to those known to regulate the process in Arabidopsis. The results will facilitate searches for genes that influence de novo lipid synthesis, accumulation and their regulatory networks in developing physic nut seeds, and other oil seeds. Thus, they will be helpful in attempts to modify these plants for efficient biofuel production.

  20. Global characterization of Artemisia annua glandular trichome transcriptome using 454 pyrosequencing

    PubMed Central

    Wang, Wei; Wang, Yejun; Zhang, Qing; Qi, Yan; Guo, Dianjing

    2009-01-01

    Background Glandular trichomes produce a wide variety of commercially important secondary metabolites in many plant species. The most prominent anti-malarial drug artemisinin, a sesquiterpene lactone, is produced in glandular trichomes of Artemisia annua. However, only limited genomic information is currently available in this non-model plant species. Results We present a global characterization of A. annua glandular trichome transcriptome using 454 pyrosequencing. Sequencing runs using two normalized cDNA collections from glandular trichomes yielded 406,044 expressed sequence tags (average length = 210 nucleotides), which assembled into 42,678 contigs and 147,699 singletons. Performing a second sequencing run only increased the number of genes identified by ~30%, indicating that massively parallel pyrosequencing provides deep coverage of the A. annua trichome transcriptome. By BLAST search against the NCBI non-redundant protein database, putative functions were assigned to over 28,573 unigenes, including previously undescribed enzymes likely involved in sesquiterpene biosynthesis. Comparison with ESTs derived from trichome collections of other plant species revealed expressed genes in common functional categories across different plant species. RT-PCR analysis confirmed the expression of selected unigenes and novel transcripts in A. annua glandular trichomes. Conclusion The presence of contigs corresponding to enzymes for terpenoids and flavonoids biosynthesis suggests important metabolic activity in A. annua glandular trichomes. Our comprehensive survey of genes expressed in glandular trichome will facilitate new gene discovery and shed light on the regulatory mechanism of artemisinin metabolism and trichome function in A. annua. PMID:19818120

  1. Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm

    PubMed Central

    Glunčić, Matko; Paar, Vladimir

    2013-01-01

    The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012.exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of α-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of ‘magnifying glass’ effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes). PMID:22977183

  2. A Global Comparison of the Human and T. brucei Degradomes Gives Insights about Possible Parasite Drug Targets

    PubMed Central

    Mashiyama, Susan T.; Koupparis, Kyriacos; Caffrey, Conor R.; McKerrow, James H.; Babbitt, Patricia C.

    2012-01-01

    We performed a genome-level computational study of sequence and structure similarity, the latter using crystal structures and models, of the proteases of Homo sapiens and the human parasite Trypanosoma brucei. Using sequence and structure similarity networks to summarize the results, we constructed global views that show visually the relative abundance and variety of proteases in the degradome landscapes of these two species, and provide insights into evolutionary relationships between proteases. The results also indicate how broadly these sequence sets are covered by three-dimensional structures. These views facilitate cross-species comparisons and offer clues for drug design from knowledge about the sequences and structures of potential drug targets and their homologs. Two protease groups (“M32” and “C51”) that are very different in sequence from human proteases are examined in structural detail, illustrating the application of this global approach in mining new pathogen genomes for potential drug targets. Based on our analyses, a human ACE2 inhibitor was selected for experimental testing on one of these parasite proteases, TbM32, and was shown to inhibit it. These sequence and structure data, along with interactive versions of the protein similarity networks generated in this study, are available at http://babbittlab.ucsf.edu/resources.html. PMID:23236535

  3. Targeted reconstruction of T cell receptor sequence from single cell RNA-seq links CDR3 length to T cell differentiation state

    PubMed Central

    Yates, Kathleen B.; Bi, Kevin; Darko, Samuel; Godec, Jernej; Gerdemann, Ulrike; Swadling, Leo; Douek, Daniel C.; Klenerman, Paul; Barnes, Eleanor J.; Sharpe, Arlene H.

    2017-01-01

    Abstract The T cell compartment must contain diversity in both T cell receptor (TCR) repertoire and cell state to provide effective immunity against pathogens. However, it remains unclear how differences in the TCR contribute to heterogeneity in T cell state. Single cell RNA-sequencing (scRNA-seq) can allow simultaneous measurement of TCR sequence and global transcriptional profile from single cells. However, current methods for TCR inference from scRNA-seq are limited in their sensitivity and require long sequencing reads, thus increasing the cost and decreasing the number of cells that can be feasibly analyzed. Here we present TRAPeS, a publicly available tool that can efficiently extract TCR sequence information from short-read scRNA-seq libraries. We apply it to investigate heterogeneity in the CD8+ T cell response in humans and mice, and show that it is accurate and more sensitive than existing approaches. Coupling TRAPeS with transcriptome analysis of CD8+ T cells specific for a single epitope from Yellow Fever Virus (YFV), we show that the recently described ‘naive-like’ memory population have significantly longer CDR3 regions and greater divergence from germline sequence than do effector-memory phenotype cells. This suggests that TCR usage is associated with the differentiation state of the CD8+ T cell response to YFV. PMID:28934479

  4. Clonal evolution in relapsed and refractory diffuse large B-cell lymphoma is characterized by high dynamics of subclones.

    PubMed

    Melchardt, Thomas; Hufnagl, Clemens; Weinstock, David M; Kopp, Nadja; Neureiter, Daniel; Tränkenschuh, Wolfgang; Hackl, Hubert; Weiss, Lukas; Rinnerthaler, Gabriel; Hartmann, Tanja N; Greil, Richard; Weigert, Oliver; Egle, Alexander

    2016-08-09

    Little information is available about the role of certain mutations for clonal evolution and the clinical outcome during relapse in diffuse large B-cell lymphoma (DLBCL). Therefore, we analyzed formalin-fixed-paraffin-embedded tumor samples from first diagnosis, relapsed or refractory disease from 28 patients using next-generation sequencing of the exons of 104 coding genes. Non-synonymous mutations were present in 74 of the 104 genes tested. Primary tumor samples showed a median of 8 non-synonymous mutations (range: 0-24) with the used gene set. Lower numbers of non-synonymous mutations in the primary tumor were associated with a better median OS compared with higher numbers (28 versus 15 months, p=0.031). We observed three patterns of clonal evolution during relapse of disease: large global change, subclonal selection and no or minimal change possibly suggesting preprogrammed resistance. We conclude that targeted re-sequencing is a feasible and informative approach to characterize the molecular pattern of relapse and it creates novel insights into the role of dynamics of individual genes.

  5. A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data

    PubMed Central

    Skelly, Daniel A.; Johansson, Marnie; Madeoy, Jennifer; Wakefield, Jon; Akey, Joshua M.

    2011-01-01

    Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes. PMID:21873452

  6. Genome of a Low-Salinity Ammonia-Oxidizing Archaeon Determined by Single-Cell and Metagenomic Analysis

    PubMed Central

    Potanina, Anastasia; Francis, Christopher A.; Quake, Stephen R.

    2011-01-01

    Ammonia-oxidizing archaea (AOA) are thought to be among the most abundant microorganisms on Earth and may significantly impact the global nitrogen and carbon cycles. We sequenced the genome of AOA in an enrichment culture from low-salinity sediments in San Francisco Bay using single-cell and metagenomic genome sequence data. Five single cells were isolated inside an integrated microfluidic device using laser tweezers, the cells' genomic DNA was amplified by multiple displacement amplification (MDA) in 50 nL volumes and then sequenced by high-throughput DNA pyrosequencing. This microscopy-based approach to single-cell genomics minimizes contamination and allows correlation of high-resolution cell images with genomic sequences. Statistical properties of coverage across the five single cells, in combination with the contrasting properties of the metagenomic dataset allowed the assembly of a high-quality draft genome. The genome of this AOA, which we designate Candidatus Nitrosoarchaeum limnia SFB1, is ∼1.77 Mb with >2100 genes and a G+C content of 32%. Across the entire genome, the average nucleotide identity to Nitrosopumilus maritimus, the only AOA in pure culture, is ∼70%, suggesting this AOA represents a new genus of Crenarchaeota. Phylogenetically, the 16S rRNA and ammonia monooxygenase subunit A (amoA) genes of this AOA are most closely related to sequences reported from a wide variety of freshwater ecosystems. Like N. maritimus, the low-salinity AOA genome appears to have an ammonia oxidation pathway distinct from ammonia oxidizing bacteria (AOB). In contrast to other described AOA, these low-salinity AOA appear to be motile, based on the presence of numerous motility- and chemotaxis-associated genes in the genome. This genome data will be used to inform targeted physiological and metabolic studies of this novel group of AOA, which may ultimately advance our understanding of AOA metabolism and their impacts on the global carbon and nitrogen cycles. PMID:21364937

  7. Genome of a low-salinity ammonia-oxidizing archaeon determined by single-cell and metagenomic analysis.

    PubMed

    Blainey, Paul C; Mosier, Annika C; Potanina, Anastasia; Francis, Christopher A; Quake, Stephen R

    2011-02-22

    Ammonia-oxidizing archaea (AOA) are thought to be among the most abundant microorganisms on Earth and may significantly impact the global nitrogen and carbon cycles. We sequenced the genome of AOA in an enrichment culture from low-salinity sediments in San Francisco Bay using single-cell and metagenomic genome sequence data. Five single cells were isolated inside an integrated microfluidic device using laser tweezers, the cells' genomic DNA was amplified by multiple displacement amplification (MDA) in 50 nL volumes and then sequenced by high-throughput DNA pyrosequencing. This microscopy-based approach to single-cell genomics minimizes contamination and allows correlation of high-resolution cell images with genomic sequences. Statistical properties of coverage across the five single cells, in combination with the contrasting properties of the metagenomic dataset allowed the assembly of a high-quality draft genome. The genome of this AOA, which we designate Candidatus Nitrosoarchaeum limnia SFB1, is ∼1.77 Mb with >2100 genes and a G+C content of 32%. Across the entire genome, the average nucleotide identity to Nitrosopumilus maritimus, the only AOA in pure culture, is ∼70%, suggesting this AOA represents a new genus of Crenarchaeota. Phylogenetically, the 16S rRNA and ammonia monooxygenase subunit A (amoA) genes of this AOA are most closely related to sequences reported from a wide variety of freshwater ecosystems. Like N. maritimus, the low-salinity AOA genome appears to have an ammonia oxidation pathway distinct from ammonia oxidizing bacteria (AOB). In contrast to other described AOA, these low-salinity AOA appear to be motile, based on the presence of numerous motility- and chemotaxis-associated genes in the genome. This genome data will be used to inform targeted physiological and metabolic studies of this novel group of AOA, which may ultimately advance our understanding of AOA metabolism and their impacts on the global carbon and nitrogen cycles.

  8. EEG microstates during resting represent personality differences.

    PubMed

    Schlegel, Felix; Lehmann, Dietrich; Faber, Pascal L; Milz, Patricia; Gianotti, Lorena R R

    2012-01-01

    We investigated the spontaneous brain electric activity of 13 skeptics and 16 believers in paranormal phenomena; they were university students assessed with a self-report scale about paranormal beliefs. 33-channel EEG recordings during no-task resting were processed as sequences of momentary potential distribution maps. Based on the maps at peak times of Global Field Power, the sequences were parsed into segments of quasi-stable potential distribution, the 'microstates'. The microstates were clustered into four classes of map topographies (A-D). Analysis of the microstate parameters time coverage, occurrence frequency and duration as well as the temporal sequence (syntax) of the microstate classes revealed significant differences: Believers had a higher coverage and occurrence of class B, tended to decreased coverage and occurrence of class C, and showed a predominant sequence of microstate concatenations from A to C to B to A that was reversed in skeptics (A to B to C to A). Microstates of different topographies, putative "atoms of thought", are hypothesized to represent different types of information processing.The study demonstrates that personality differences can be detected in resting EEG microstate parameters and microstate syntax. Microstate analysis yielded no conclusive evidence for the hypothesized relation between paranormal belief and schizophrenia.

  9. Morphological and molecular characterization of fungal pathogen, Magnaphorthe oryzae

    NASA Astrophysics Data System (ADS)

    Hasan, Nor'Aishah; Rafii, Mohd Y.; Rahim, Harun A.; Ali, Nusaibah Syd; Mazlan, Norida; Abdullah, Shamsiah

    2016-02-01

    Rice is arguably the most crucial food crops supplying quarter of calories intake. Fungal pathogen, Magnaphorthe oryzae promotes blast disease unconditionally to gramineous host including rice species. This disease spurred an outbreaks and constant threat to cereal production. Global rice yield declining almost 10-30% including Malaysia. As Magnaphorthe oryzae and its host is model in disease plant study, the rice blast pathosystem has been the subject of intense interest to overcome the importance of the disease to world agriculture. Therefore, in this study, our prime objective was to isolate samples of Magnaphorthe oryzae from diseased leaf obtained from MARDI Seberang Perai, Penang, Malaysia. Molecular identification was performed by sequences analysis from internal transcribed spacer (ITS) region of nuclear ribosomal RNA genes. Phylogenetic affiliation of the isolated samples were analyzed by comparing the ITS sequences with those deposited in the GenBank database. The sequence of the isolate demonstrated at least 99% nucleotide identity with the corresponding sequence in GenBank for Magnaphorthe oryzae. Morphological observed under microscope demonstrated that the structure of conidia followed similar characteristic as M. oryzae. Finding in this study provide useful information for breeding programs, epidemiology studies and improved disease management.

  10. Morphological and molecular characterization of fungal pathogen, Magnaphorthe oryzae

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hasan, Nor’Aishah, E-mail: aishahnh@ns.uitm.edu.my; Rafii, Mohd Y., E-mail: mrafii@upm.edu.my; Department of Crop Science, Universiti Putra Malaysia

    2016-02-01

    Rice is arguably the most crucial food crops supplying quarter of calories intake. Fungal pathogen, Magnaphorthe oryzae promotes blast disease unconditionally to gramineous host including rice species. This disease spurred an outbreaks and constant threat to cereal production. Global rice yield declining almost 10-30% including Malaysia. As Magnaphorthe oryzae and its host is model in disease plant study, the rice blast pathosystem has been the subject of intense interest to overcome the importance of the disease to world agriculture. Therefore, in this study, our prime objective was to isolate samples of Magnaphorthe oryzae from diseased leaf obtained from MARDI Seberangmore » Perai, Penang, Malaysia. Molecular identification was performed by sequences analysis from internal transcribed spacer (ITS) region of nuclear ribosomal RNA genes. Phylogenetic affiliation of the isolated samples were analyzed by comparing the ITS sequences with those deposited in the GenBank database. The sequence of the isolate demonstrated at least 99% nucleotide identity with the corresponding sequence in GenBank for Magnaphorthe oryzae. Morphological observed under microscope demonstrated that the structure of conidia followed similar characteristic as M. oryzae. Finding in this study provide useful information for breeding programs, epidemiology studies and improved disease management.« less

  11. Haplogroup relationships between domestic and wild sheep resolved using a mitogenome panel.

    PubMed

    Meadows, J R S; Hiendleder, S; Kijas, J W

    2011-04-01

    Five haplogroups have been identified in domestic sheep through global surveys of mitochondrial (mt) sequence variation, however these group classifications are often based on small fragments of the complete mtDNA sequence; partial control region or the cytochrome B gene. This study presents the complete mitogenome from representatives of each haplogroup identified in domestic sheep, plus a sample of their wild relatives. Comparison of the sequence successfully resolved the relationships between each haplogroup and provided insight into the relationship with wild sheep. The five haplogroups were characterised as branching independently, a radiation that shared a common ancestor 920,000 ± 190,000 years ago based on protein coding sequence. The utility of various mtDNA components to inform the true relationship between sheep was also examined with Bayesian, maximum likelihood and partitioned Bremmer support analyses. The control region was found to be the mtDNA component, which contributed the highest amount of support to the tree generated using the complete data set. This study provides the nucleus of a mtDNA mitogenome panel, which can be used to assess additional mitogenomes and serve as a reference set to evaluate small fragments of the mtDNA.

  12. Haplogroup relationships between domestic and wild sheep resolved using a mitogenome panel

    PubMed Central

    Meadows, J R S; Hiendleder, S; Kijas, J W

    2011-01-01

    Five haplogroups have been identified in domestic sheep through global surveys of mitochondrial (mt) sequence variation, however these group classifications are often based on small fragments of the complete mtDNA sequence; partial control region or the cytochrome B gene. This study presents the complete mitogenome from representatives of each haplogroup identified in domestic sheep, plus a sample of their wild relatives. Comparison of the sequence successfully resolved the relationships between each haplogroup and provided insight into the relationship with wild sheep. The five haplogroups were characterised as branching independently, a radiation that shared a common ancestor 920 000±190 000 years ago based on protein coding sequence. The utility of various mtDNA components to inform the true relationship between sheep was also examined with Bayesian, maximum likelihood and partitioned Bremmer support analyses. The control region was found to be the mtDNA component, which contributed the highest amount of support to the tree generated using the complete data set. This study provides the nucleus of a mtDNA mitogenome panel, which can be used to assess additional mitogenomes and serve as a reference set to evaluate small fragments of the mtDNA. PMID:20940734

  13. CEQer: a graphical tool for copy number and allelic imbalance detection from whole-exome sequencing data.

    PubMed

    Piazza, Rocco; Magistroni, Vera; Pirola, Alessandra; Redaelli, Sara; Spinelli, Roberta; Redaelli, Serena; Galbiati, Marta; Valletta, Simona; Giudici, Giovanni; Cazzaniga, Giovanni; Gambacorti-Passerini, Carlo

    2013-01-01

    Copy number alterations (CNA) are common events occurring in leukaemias and solid tumors. Comparative Genome Hybridization (CGH) is actually the gold standard technique to analyze CNAs; however, CGH analysis requires dedicated instruments and is able to perform only low resolution Loss of Heterozygosity (LOH) analyses. Here we present CEQer (Comparative Exome Quantification analyzer), a new graphical, event-driven tool for CNA/allelic-imbalance (AI) coupled analysis of exome sequencing data. By using case-control matched exome data, CEQer performs a comparative digital exonic quantification to generate CNA data and couples this information with exome-wide LOH and allelic imbalance detection. This data is used to build mixed statistical/heuristic models allowing the identification of CNA/AI events. To test our tool, we initially used in silico generated data, then we performed whole-exome sequencing from 20 leukemic specimens and corresponding matched controls and we analyzed the results using CEQer. Taken globally, these analyses showed that the combined use of comparative digital exon quantification and LOH/AI allows generating very accurate CNA data. Therefore, we propose CEQer as an efficient, robust and user-friendly graphical tool for the identification of CNA/AI in the context of whole-exome sequencing data.

  14. DNA Barcode Goes Two-Dimensions: DNA QR Code Web Server

    PubMed Central

    Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, “DNA barcode” actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications. PMID:22574113

  15. Sequencing our way towards understanding global eukaryotic biodiversity

    PubMed Central

    Bik, Holly M.; Porazinska, Dorota L.; Creer, Simon; Caporaso, J. Gregory; Knight, Rob; Thomas, W. Kelley

    2011-01-01

    Microscopic eukaryotes are abundant, diverse, and fill critical ecological roles across every ecosystem on earth, yet there is a well-recognized gap in our understanding of their global biodiversity. Fundamental advances in DNA sequencing and bioinformatics now allow accurate en masse biodiversity assessments of microscopic eukaryotes from environmental samples. Despite a promising outlook, the field of eukaryotic marker gene surveys faces significant challenges: how to generate data that is most useful to the community, especially in the face of evolving sequencing technology and bioinformatics pipelines, and how to incorporate an expanding number of target genes. PMID:22244672

  16. On the recovery of missing low and high frequency information from bandlimited reflectivity data

    NASA Astrophysics Data System (ADS)

    Sacchi, M. D.; Ulrych, T. J.

    2007-12-01

    During the last two decades, an important effort in the seismic exploration community has been made to retrieve broad-band seismic data by means of deconvolution and inversion. In general, the problem can be stated as a spectral reconstruction problem. In other words, given limited spectral information about the earth's reflectivity sequence, one attempts to create a broadband estimate of the Fourier spectra of the unknown reflectivity. Techniques based on the principle of parsimony can be effectively used to retrieve a sparse spike sequence and, consequently, a broad band signal. Alternatively, continuation methods, e.g., autoregressive modeling, can be used to extrapolate the recorded bandwidth of the seismic signal. The goal of this paper is to examine under what conditions the recovery of low and high frequencies from band-limited and noisy signals is possible. At the heart of the methods we discuss, is the celebrated non-Gaussian assumption so important in many modern signal processing methods, such as ICA, for example. Spectral recovery from limited information tends to work when the reflectivity consist of a few well isolated events. Results degrade with the number of reflectors, decreasing SNR and decreasing bandwidth of the source wavelet. Constrains and information-based priors can be used to stabilize the recovery but, as in all inverse problems, the solution is nonunique and effort is required to understand the level of recovery that is achievable, always keeping the physics of the problem in mind. We provide in this paper, a survey of methods to recover broad-band reflectivity sequences and examine the role that these techniques can play in the processing and inversion as applied to exploration and global seismology.

  17. The role of temporal context in norm-based encoding of faces.

    PubMed

    Van Rensbergen, Bram; Op de Beeck, Hans P

    2014-02-01

    Research shows that the human brain encodes faces in terms of how they relate to a prototypical face, a phenomenon referred to as norm-based encoding. The goal of this study was to examine the effect of short-term exposure on the development of the norm, independently of global, long-term exposure. We achieved this by varying the sequence of presentation of the stimuli while keeping global exposure constant. We found that a systematic manipulation of the average face in a set of 10 preceding trials can shift this norm toward that average. However, there was no effect of order or recency among these trials; thus, there was no evidence that the last faces mattered more than the first. This suggests that the position of the face norm is modified by information that is integrated across multiple recent faces.

  18. Diversity of Babesia bovis merozoite surface antigen genes in the Philippines.

    PubMed

    Tattiyapong, Muncharee; Sivakumar, Thillaiampalam; Ybanez, Adrian Patalinghug; Ybanez, Rochelle Haidee Daclan; Perez, Zandro Obligado; Guswanto, Azirwan; Igarashi, Ikuo; Yokoyama, Naoaki

    2014-02-01

    Babesia bovis is the causative agent of fatal babesiosis in cattle. In the present study, we investigated the genetic diversity of B. bovis among Philippine cattle, based on the genes that encode merozoite surface antigens (MSAs). Forty-one B. bovis-positive blood DNA samples from cattle were used to amplify the msa-1, msa-2b, and msa-2c genes. In phylogenetic analyses, the msa-1, msa-2b, and msa-2c gene sequences generated from Philippine B. bovis-positive DNA samples were found in six, three, and four different clades, respectively. All of the msa-1 and most of the msa-2b sequences were found in clades that were formed only by Philippine msa sequences in the respective phylograms. While all the msa-1 sequences from the Philippines showed similarity to those formed by Australian msa-1 sequences, the msa-2b sequences showed similarity to either Australian or Mexican msa-2b sequences. In contrast, msa-2c sequences from the Philippines were distributed across all the clades of the phylogram, although one clade was formed exclusively by Philippine msa-2c sequences. Similarities among the deduced amino acid sequences of MSA-1, MSA-2b, and MSA-2c from the Philippines were 62.2-100, 73.1-100, and 67.3-100%, respectively. The present findings demonstrate that B. bovis populations are genetically diverse in the Philippines. This information will provide a good foundation for the future design and implementation of improved immunological preventive methodologies against bovine babesiosis in the Philippines. The study has also generated a set of data that will be useful for futher understanding of the global genetic diversity of this important parasite. © 2013.

  19. A Phylogeny-Based Global Nomenclature System and Automated Annotation Tool for H1 Hemagglutinin Genes from Swine Influenza A Viruses

    PubMed Central

    Macken, Catherine A.; Lewis, Nicola S.; Van Reeth, Kristien; Brown, Ian H.; Swenson, Sabrina L.; Simon, Gaëlle; Saito, Takehiko; Berhane, Yohannes; Ciacci-Zanella, Janice; Pereda, Ariel; Davis, C. Todd; Donis, Ruben O.; Webby, Richard J.

    2016-01-01

    ABSTRACT The H1 subtype of influenza A viruses (IAVs) has been circulating in swine since the 1918 human influenza pandemic. Over time, and aided by further introductions from nonswine hosts, swine H1 viruses have diversified into three genetic lineages. Due to limited global data, these H1 lineages were named based on colloquial context, leading to a proliferation of inconsistent regional naming conventions. In this study, we propose rigorous phylogenetic criteria to establish a globally consistent nomenclature of swine H1 virus hemagglutinin (HA) evolution. These criteria applied to a data set of 7,070 H1 HA sequences led to 28 distinct clades as the basis for the nomenclature. We developed and implemented a web-accessible annotation tool that can assign these biologically informative categories to new sequence data. The annotation tool assigned the combined data set of 7,070 H1 sequences to the correct clade more than 99% of the time. Our analyses indicated that 87% of the swine H1 viruses from 2010 to the present had HAs that belonged to 7 contemporary cocirculating clades. Our nomenclature and web-accessible classification tool provide an accurate method for researchers, diagnosticians, and health officials to assign clade designations to HA sequences. The tool can be updated readily to track evolving nomenclature as new clades emerge, ensuring continued relevance. A common global nomenclature facilitates comparisons of IAVs infecting humans and pigs, within and between regions, and can provide insight into the diversity of swine H1 influenza virus and its impact on vaccine strain selection, diagnostic reagents, and test performance, thereby simplifying communication of such data. IMPORTANCE A fundamental goal in the biological sciences is the definition of groups of organisms based on evolutionary history and the naming of those groups. For influenza A viruses (IAVs) in swine, understanding the hemagglutinin (HA) genetic lineage of a circulating strain aids in vaccine antigen selection and allows for inferences about vaccine efficacy. Previous reporting of H1 virus HA in swine relied on colloquial names, frequently with incriminating and stigmatizing geographic toponyms, making comparisons between studies challenging. To overcome this, we developed an adaptable nomenclature using measurable criteria for historical and contemporary evolutionary patterns of H1 global swine IAVs. We also developed a web-accessible tool that classifies viruses according to this nomenclature. This classification system will aid agricultural production and pandemic preparedness through the identification of important changes in swine IAVs and provides terminology enabling discussion of swine IAVs in a common context among animal and human health initiatives. PMID:27981236

  20. Genetic relationships and epidemiological links between wild type 1 poliovirus isolates in Pakistan and Afghanistan

    PubMed Central

    2012-01-01

    Background/Aim Efforts have been made to eliminate wild poliovirus transmission since 1988 when the World Health Organization began its global eradication campaign. Since then, the incidence of polio has decreased significantly. However, serotype 1 and serotype 3 still circulate endemically in Pakistan and Afghanistan. Both countries constitute a single epidemiologic block representing one of the three remaining major global reservoirs of poliovirus transmission. In this study we used genetic sequence data to investigate transmission links among viruses from diverse locations during 2005-2007. Methods In order to find the origins and routes of wild type 1 poliovirus circulation, polioviruses were isolated from faecal samples of Acute Flaccid Paralysis (AFP) patients. We used viral cultures, two intratypic differentiation methods PCR, ELISA to characterize as vaccine or wild type 1 and nucleic acid sequencing of entire VP1 region of poliovirus genome to determine the genetic relatedness. Results One hundred eleven wild type 1 poliovirus isolates were subjected to nucleotide sequencing for genetic variation study. Considering the 15% divergence of the sequences from Sabin 1, Phylogenetic analysis by MEGA software revealed that active inter and intra country transmission of many genetically distinct strains of wild poliovirus type 1 belonged to genotype SOAS which is indigenous in this region. By grouping wild type 1 polioviruses according to nucleotide sequence homology, three distinct clusters A, B and C were obtained with multiple chains of transmission together with some silent circulations represented by orphan lineages. Conclusion Our results emphasize that there was a persistent transmission of wild type1 polioviruses in Pakistan and Afghanistan during 2005-2007. The epidemiologic information provided by the sequence data can contribute to the formulation of better strategies for poliomyelitis control to those critical areas, associated with high risk population groups which include migrants, internally displaced people, and refugees. The implication of this study is to maintain high quality mass immunization with oral polio vaccine (OPV) in order to interrupt chains of virus transmission in both countries to endorse substantial progress in Eastern-Mediterranean region. PMID:22353446

  1. Santa Barbara Basin Study Extends Global Climate Record

    NASA Astrophysics Data System (ADS)

    Hopkins, Sarah; Kennett, James; Nicholson, Craig; Pak, Dorothy; Sorlien, Christopher; Behl, Richard; Normark, William; Sliter, Ray; Hill, Tessa; Schimmelmann, Arndt; Cannariato, Kevin

    2006-05-01

    A fundamental goal of Earth science is to understand the remarkable instability of late Quarternary global climate prior to the beginning of the Holocene, about 11,000 years ago. This unusual climate behavior was characterized by millennial-scale climate oscillations on suborbital timescales, and a distinctive `Sawtooth' pattern of very abrupt glacial and stadial terminations (within decades) followed by more gradual global cooling [e.g., Dansgaard et al., 1993; Hendy and Kennett, 1999]. The fact that both major (glacial) and minor (stadial) cooling periods in Earth's climate were terminated by similar abrupt warming episodes suggests a common mechanism driving such rapid changes in global climate. Understanding the causes of this instability is crucial given developing concerns about global warming, yet knowledge about this climate behavior has been essentially confined to the last 150,000 years or so, owing to the absence of available sequences of sufficient age and chronological resolution. The high-resolution paleoclimate record from the Greenland ice cores is limited to about 110 thousand years ago (ka), and although Antarctic ice cores now extend back to more than 740 ka [European Project for Ice Coring in Antarctica, 2004], these latter cores primarily provide information about high-latitude conditions at much lower resolution than is required to address abrupt climate change.

  2. From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction

    PubMed Central

    Cocco, Simona; Monasson, Remi; Weigt, Martin

    2013-01-01

    Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant ‘patterns’ of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold. PMID:23990764

  3. Direct identification of antibiotic resistance genes on single plasmid molecules using CRISPR/Cas9 in combination with optical DNA mapping

    NASA Astrophysics Data System (ADS)

    Müller, Vilhelm; Rajer, Fredrika; Frykholm, Karolin; Nyberg, Lena K.; Quaderi, Saair; Fritzsche, Joachim; Kristiansson, Erik; Ambjörnsson, Tobias; Sandegren, Linus; Westerlund, Fredrik

    2016-12-01

    Bacterial plasmids are extensively involved in the rapid global spread of antibiotic resistance. We here present an assay, based on optical DNA mapping of single plasmids in nanofluidic channels, which provides detailed information about the plasmids present in a bacterial isolate. In a single experiment, we obtain the number of different plasmids in the sample, the size of each plasmid, an optical barcode that can be used to identify and trace the plasmid of interest and information about which plasmid that carries a specific resistance gene. Gene identification is done using CRISPR/Cas9 loaded with a guide-RNA (gRNA) complementary to the gene of interest that linearizes the circular plasmids at a specific location that is identified using the optical DNA maps. We demonstrate the principle on clinically relevant extended spectrum beta-lactamase (ESBL) producing isolates. We discuss how the gRNA sequence can be varied to obtain the desired information. The gRNA can either be very specific to identify a homogeneous group of genes or general to detect several groups of genes at the same time. Finally, we demonstrate an example where we use a combination of two gRNA sequences to identify carbapenemase-encoding genes in two previously not characterized clinical bacterial samples.

  4. Complementary molecular information changes our perception of food web structure

    PubMed Central

    Wirta, Helena K.; Hebert, Paul D. N.; Kaartinen, Riikka; Prosser, Sean W.; Várkonyi, Gergely; Roslin, Tomas

    2014-01-01

    How networks of ecological interactions are structured has a major impact on their functioning. However, accurately resolving both the nodes of the webs and the links between them is fraught with difficulties. We ask whether the new resolution conferred by molecular information changes perceptions of network structure. To probe a network of antagonistic interactions in the High Arctic, we use two complementary sources of molecular data: parasitoid DNA sequenced from the tissues of their hosts and host DNA sequenced from the gut of adult parasitoids. The information added by molecular analysis radically changes the properties of interaction structure. Overall, three times as many interaction types were revealed by combining molecular information from parasitoids and hosts with rearing data, versus rearing data alone. At the species level, our results alter the perceived host specificity of parasitoids, the parasitoid load of host species, and the web-wide role of predators with a cryptic lifestyle. As the northernmost network of host–parasitoid interactions quantified, our data point exerts high leverage on global comparisons of food web structure. However, how we view its structure will depend on what information we use: compared with variation among networks quantified at other sites, the properties of our web vary as much or much more depending on the techniques used to reconstruct it. We thus urge ecologists to combine multiple pieces of evidence in assessing the structure of interaction webs, and suggest that current perceptions of interaction structure may be strongly affected by the methods used to construct them. PMID:24449902

  5. Nuclear 28S rDNA phylogeny supports the basal placement of Noctiluca scintillans (Dinophyceae; Noctilucales) in dinoflagellates.

    PubMed

    Ki, Jang-Seu

    2010-05-01

    Noctiluca scintillans (Macartney) Kofoid et Swezy, 1921 is an unarmoured heterotrophic dinoflagellate with a global distribution, and has been considered as one of the ancestral taxa among dinoflagellates. Recently, 18S rDNA, actin, alpha-, beta-tubulin, and Hsp90-based phylogenies have shown the basal position of the noctilucids. However, the relationships of dinoflagellates in the basal lineages are still controversial. Although the nuclear rDNA (e.g. 18S, ITS-5.8S, and 28S) contains much genetic information, DNA sequences of N. scintillans rDNA molecules were insufficiently characterized as yet. Here the author sequenced a long-range nuclear rDNA, spanning from the 18S to the D5 region of the 28S rDNA, of N. scintillans. The present N. scintillans had a nearly identical genotype (>99.0% similarity) compared to other Noctiluca sequences from different geographic origins. Nucleotide divergence in the partial 28S rDNA was significantly high (p<0.05) as compared to the 18S rDNA, demonstrating that the information from 28S rDNA is more variable. The 28S rDNA phylogeny of 17 selected dinoflagellates, two perkinsids, and two apicomplexans as outgroups showed that N. scintillans and Oxyrrhis marina formed a clade that diverged separately from core dinoflagellates. Copyright (c) 2009 Elsevier GmbH. All rights reserved.

  6. APADB: a database for alternative polyadenylation and microRNA regulation events

    PubMed Central

    Müller, Sören; Rycak, Lukas; Afonso-Grunz, Fabian; Winter, Peter; Zawada, Adam M.; Damrath, Ewa; Scheider, Jessica; Schmäh, Juliane; Koch, Ina; Kahl, Günter; Rotter, Björn

    2014-01-01

    Alternative polyadenylation (APA) is a widespread mechanism that contributes to the sophisticated dynamics of gene regulation. Approximately 50% of all protein-coding human genes harbor multiple polyadenylation (PA) sites; their selective and combinatorial use gives rise to transcript variants with differing length of their 3′ untranslated region (3′UTR). Shortened variants escape UTR-mediated regulation by microRNAs (miRNAs), especially in cancer, where global 3′UTR shortening accelerates disease progression, dedifferentiation and proliferation. Here we present APADB, a database of vertebrate PA sites determined by 3′ end sequencing, using massive analysis of complementary DNA ends. APADB provides (A)PA sites for coding and non-coding transcripts of human, mouse and chicken genes. For human and mouse, several tissue types, including different cancer specimens, are available. APADB records the loss of predicted miRNA binding sites and visualizes next-generation sequencing reads that support each PA site in a genome browser. The database tables can either be browsed according to organism and tissue or alternatively searched for a gene of interest. APADB is the largest database of APA in human, chicken and mouse. The stored information provides experimental evidence for thousands of PA sites and APA events. APADB combines 3′ end sequencing data with prediction algorithms of miRNA binding sites, allowing to further improve prediction algorithms. Current databases lack correct information about 3′UTR lengths, especially for chicken, and APADB provides necessary information to close this gap. Database URL: http://tools.genxpro.net/apadb/ PMID:25052703

  7. Quantification of the effects of eustasy, subsidence, and sediment supply on Miocene sequences, mid-Atlantic margin of the United States

    USGS Publications Warehouse

    Browning, J.V.; Miller, K.G.; McLaughlin, P.P.; Kominz, M.A.; Sugarman, P.J.; Monteverde, D.; Feigenson, M.D.; Hernandez, J.C.

    2006-01-01

    We use backstripping to quantify the roles of variations in global sea level (eustasy), subsidence, and sediment supply on the development of the Miocene stratigraphic record of the mid-Atlantic continental margin of the United States (New Jersey, Delaware, and Maryland). Eustasy is a primary influence on sequence patterns, determining the global template of sequences (i.e., times when sequences can be preserved) and explaining similarities in Miocene sequence architecture on margins throughout the world. Sequences can be correlated throughout the mid-Atlantic region with Sr-isotopic chronology (??0.6 m.y. to ??1.2 m.y.). Eight Miocene sequences correlate regionally and can be correlated to global ??18O increases, indicating glacioeustatic control. This margin is dominated by passive subsidence with little evidence for active tectonic overprints, except possibly in Maryland during the early Miocene. However, early Miocene sequences in New Jersey and Delaware display a patchwork distribution that is attributable to minor (tens of meters) intervals of excess subsidence. Backstripping quantifies that excess subsidence began in Delaware at ca. 21 Ma and continued until 12 Ma, with maximum rates from ca. 21-16 Ma. We attribute this enhanced subsidence to local flexural response to the progradation of thick sequences offshore and adjacent to this area. Removing this excess subsidence in Delaware yields a record that is remarkably similar to New Jersey eustatic estimates. We conclude that sea-level rise and fall is a first-order control on accommodation providing similar timing on all margins to the sequence record. Tectonic changes due to movement of the crust can overprint the record, resulting in large gaps in the stratigraphic record. Smaller differences in sequences can be attributed to local flexural loading effects, particularly in regions experiencing large-scale progradation. ?? 2006 Geological Society of America.

  8. A nationwide database linking information on the hosts with sequence data of their virus strains: A useful tool for the eradication of bovine viral diarrhea (BVD) in Switzerland.

    PubMed

    Stalder, Hanspeter; Hug, Corinne; Zanoni, Reto; Vogt, Hans-Rudolf; Peterhans, Ernst; Schweizer, Matthias; Bachofen, Claudia

    2016-06-15

    Pestiviruses infect a wide variety of animals of the order Artiodactyla, with bovine viral diarrhea virus (BVDV) being an economically important pathogen of livestock globally. BVDV is maintained in the cattle population by infecting fetuses early in gestation and, thus, by generating persistently infected (PI) animals that efficiently transmit the virus throughout their lifetime. In 2008, Switzerland started a national control campaign with the aim to eradicate BVDV from all bovines in the country by searching for and eliminating every PI cattle. Different from previous eradication programs, all animals of the entire population were tested for virus within one year, followed by testing each newborn calf in the subsequent four years. Overall, 3,855,814 animals were tested from 2008 through 2011, 20,553 of which returned an initial BVDV-positive result. We were able to obtain samples from at least 36% of all initially positive tested animals. We sequenced the 5' untranslated region (UTR) of more than 7400 pestiviral strains and compiled the sequence data in a database together with an array of information on the PI animals, among others, the location of the farm in which they were born, their dams, and the locations where the animals had lived. To our knowledge, this is the largest database combining viral sequences with animal data of an endemic viral disease. Using unique identification tags, the different datasets within the database were connected to run diverse molecular epidemiological analyses. The large sets of animal and sequence data made it possible to run analyses in both directions, i.e., starting from a likely epidemiological link, or starting from related sequences. We present the results of three epidemiological investigations in detail and a compilation of 122 individual investigations that show the usefulness of such a database in a country-wide BVD eradication program. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Transcriptome analysis of carnation (Dianthus caryophyllus L.) based on next-generation sequencing technology.

    PubMed

    Tanase, Koji; Nishitani, Chikako; Hirakawa, Hideki; Isobe, Sachiko; Tabata, Satoshi; Ohmiya, Akemi; Onozaki, Takashi

    2012-07-02

    Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. We constructed a normalized cDNA library and a 3'-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.

  10. Transcriptome analysis of carnation (Dianthus caryophyllus L.) based on next-generation sequencing technology

    PubMed Central

    2012-01-01

    Background Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. Results We constructed a normalized cDNA library and a 3’-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. Conclusions We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant. PMID:22747974

  11. Visual management of large scale data mining projects.

    PubMed

    Shah, I; Hunter, L

    2000-01-01

    This paper describes a unified framework for visualizing the preparations for, and results of, hundreds of machine learning experiments. These experiments were designed to improve the accuracy of enzyme functional predictions from sequence, and in many cases were successful. Our system provides graphical user interfaces for defining and exploring training datasets and various representational alternatives, for inspecting the hypotheses induced by various types of learning algorithms, for visualizing the global results, and for inspecting in detail results for specific training sets (functions) and examples (proteins). The visualization tools serve as a navigational aid through a large amount of sequence data and induced knowledge. They provided significant help in understanding both the significance and the underlying biological explanations of our successes and failures. Using these visualizations it was possible to efficiently identify weaknesses of the modular sequence representations and induction algorithms which suggest better learning strategies. The context in which our data mining visualization toolkit was developed was the problem of accurately predicting enzyme function from protein sequence data. Previous work demonstrated that approximately 6% of enzyme protein sequences are likely to be assigned incorrect functions on the basis of sequence similarity alone. In order to test the hypothesis that more detailed sequence analysis using machine learning techniques and modular domain representations could address many of these failures, we designed a series of more than 250 experiments using information-theoretic decision tree induction and naive Bayesian learning on local sequence domain representations of problematic enzyme function classes. In more than half of these cases, our methods were able to perfectly discriminate among various possible functions of similar sequences. We developed and tested our visualization techniques on this application.

  12. Antimicrobial resistance surveillance in the genomic age.

    PubMed

    McArthur, Andrew G; Tsang, Kara K

    2017-01-01

    The loss of effective antimicrobials is reducing our ability to protect the global population from infectious disease. However, the field of antibiotic drug discovery and the public health monitoring of antimicrobial resistance (AMR) is beginning to exploit the power of genome and metagenome sequencing. The creation of novel AMR bioinformatics tools and databases and their continued development will advance our understanding of the molecular mechanisms and threat severity of antibiotic resistance, while simultaneously improving our ability to accurately predict and screen for antibiotic resistance genes within environmental, agricultural, and clinical settings. To do so, efforts must be focused toward exploiting the advancements of genome sequencing and information technology. Currently, AMR bioinformatics software and databases reflect different scopes and functions, each with its own strengths and weaknesses. A review of the available tools reveals common approaches and reference data but also reveals gaps in our curated data, models, algorithms, and data-sharing tools that must be addressed to conquer the limitations and areas of unmet need within the AMR research field before DNA sequencing can be fully exploited for AMR surveillance and improved clinical outcomes. © 2016 New York Academy of Sciences.

  13. Multiple vehicle tracking in aerial video sequence using driver behavior analysis and improved deterministic data association

    NASA Astrophysics Data System (ADS)

    Zhang, Xunxun; Xu, Hongke; Fang, Jianwu

    2018-01-01

    Along with the rapid development of the unmanned aerial vehicle technology, multiple vehicle tracking (MVT) in aerial video sequence has received widespread interest for providing the required traffic information. Due to the camera motion and complex background, MVT in aerial video sequence poses unique challenges. We propose an efficient MVT algorithm via driver behavior-based Kalman filter (DBKF) and an improved deterministic data association (IDDA) method. First, a hierarchical image registration method is put forward to compensate the camera motion. Afterward, to improve the accuracy of the state estimation, we propose the DBKF module by incorporating the driver behavior into the Kalman filter, where artificial potential field is introduced to reflect the driver behavior. Then, to implement the data association, a local optimization method is designed instead of global optimization. By introducing the adaptive operating strategy, the proposed IDDA method can also deal with the situation in which the vehicles suddenly appear or disappear. Finally, comprehensive experiments on the DARPA VIVID data set and KIT AIS data set demonstrate that the proposed algorithm can generate satisfactory and superior results.

  14. Sequence stratigraphic applications to deep-water exploration in the Makassar Strait, offshore East Kalimantan, Indonesia

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Malacek, S.J.; Reaves, C.M.; Atmadja, W.S.

    1994-07-01

    A sequence stratigraphic study was conducted to help evaluate the exploration potential of the Makassar PSC, offshore East Kalimantan, Indonesia. The PSC is on the present-day slope in water depths of 500-3000 ft and borders the large oil and gas fields of the Mahakam delta. The study provided important insights on reservoir distribution, trapping style, and seismic hydrocarbon indicators. Lowstand deposition on a slope modified by growth faulting and shale diapirism controlled reservoir distribution within the prospective late Miocene section. Three major lowstand intervals can be seismically defined and tied to deep-water sands in nearby wells where log character andmore » biostratigraphic data support the seismic system tract interpretation. The three intervals appear to correlate with third-order global lowstand events and are consistent with existing sequence stratigraphic schemes for the shelf and upper slope in the Makassar area. Seismic mapping delineated lowstand features, including incised valleys and intraslope to basin-floor thicks. Regional information on positions of middle-late Miocene delta lobes and shelf edges, helped complete the picture for sand sources, transport routes, and depocenters.« less

  15. Candida auris

    MedlinePlus

    ... auris infection spread globally? CDC conducted whole genome sequencing of C. auris specimens from countries in the ... Asia, southern Africa, and South America. Whole genome sequencing produces detailed DNA fingerprints of organisms. CDC found ...

  16. An integrated global regulatory network of hematopoietic precursor cell self-renewal and differentiation.

    PubMed

    You, Yanan; Cuevas-Diaz Duran, Raquel; Jiang, Lihua; Dong, Xiaomin; Zong, Shan; Snyder, Michael; Wu, Jia Qian

    2018-06-12

    Systematic study of the regulatory mechanisms of Hematopoietic Stem Cell and Progenitor Cell (HSPC) self-renewal is fundamentally important for understanding hematopoiesis and for manipulating HSPCs for therapeutic purposes. Previously, we have characterized gene expression and identified important transcription factors (TFs) regulating the switch between self-renewal and differentiation in a multipotent Hematopoietic Progenitor Cell (HPC) line, EML (Erythroid, Myeloid, and Lymphoid) cells. Herein, we report binding maps for additional TFs (SOX4 and STAT3) by using chromatin immunoprecipitation (ChIP)-Sequencing, to address the underlying mechanisms regulating self-renewal properties of lineage-CD34+ subpopulation (Lin-CD34+ EML cells). Furthermore, we applied the Assay for Transposase Accessible Chromatin (ATAC)-Sequencing to globally identify the open chromatin regions associated with TF binding in the self-renewing Lin-CD34+ EML cells. Mass spectrometry (MS) was also used to quantify protein relative expression levels. Finally, by integrating the protein-protein interaction database, we built an expanded transcriptional regulatory and interaction network. We found that MAPK (Mitogen-activated protein kinase) pathway and TGF-β/SMAD signaling pathway components were highly enriched among the binding targets of these TFs in Lin-CD34+ EML cells. The present study integrates regulatory information at multiple levels to paint a more comprehensive picture of the HSPC self-renewal mechanisms.

  17. Proteomics technique opens new frontiers in mobilome research

    PubMed Central

    Davidson, Andrew D.; Matthews, David A.

    2017-01-01

    ABSTRACT A large proportion of the genome of most eukaryotic organisms consists of highly repetitive mobile genetic elements. The sum of these elements is called the “mobilome,” which in eukaryotes is made up mostly of transposons. Transposable elements contribute to disease, evolution, and normal physiology by mediating genetic rearrangement, and through the “domestication” of transposon proteins for cellular functions. Although ‘omics studies of mobilome genomes and transcriptomes are common, technical challenges have hampered high-throughput global proteomics analyses of transposons. In a recent paper, we overcame these technical hurdles using a technique called “proteomics informed by transcriptomics” (PIT), and thus published the first unbiased global mobilome-derived proteome for any organism (using cell lines derived from the mosquito Aedes aegypti). In this commentary, we describe our methods in more detail, and summarise our major findings. We also use new genome sequencing data to show that, in many cases, the specific genomic element expressing a given protein can be identified using PIT. This proteomic technique therefore represents an important technological advance that will open new avenues of research into the role that proteins derived from transposons and other repetitive and sequence diverse genetic elements, such as endogenous retroviruses, play in health and disease. PMID:28932623

  18. Neisseria meningitidis; clones, carriage, and disease.

    PubMed

    Read, R C

    2014-05-01

    Neisseria meningitidis, the cause of meningococcal disease, has been the subject of sophisticated molecular epidemiological investigation as a consequence of the significant public health threat posed by this organism. The use of multilocus sequence typing and whole genome sequencing classifies the organism into clonal complexes. Extensive phenotypic, genotypic and epidemiological information is available on the PubMLST website. The human nasopharynx is the sole ecological niche of this species, and carrier isolates show extensive genetic diversity as compared with hyperinvasive lineages. Horizontal gene exchange and recombinant events within the meningococcal genome during residence in the human nasopharynx result in antigenic diversity even within clonal complexes, so that individual clones may express, for example, more than one capsular polysaccharide (serogroup). Successful clones are capable of wide global dissemination, and may be associated with explosive epidemics of invasive disease. © 2014 The Author Clinical Microbiology and Infection © 2014 European Society of Clinical Microbiology and Infectious Diseases.

  19. Citrus sinensis annotation project (CAP): a comprehensive database for sweet orange genome.

    PubMed

    Wang, Jia; Chen, Dijun; Lei, Yang; Chang, Ji-Wei; Hao, Bao-Hai; Xing, Feng; Li, Sen; Xu, Qiang; Deng, Xiu-Xin; Chen, Ling-Ling

    2014-01-01

    Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia), and constructed the Citrus sinensis annotation project (CAP) to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET) evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs) and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/.

  20. A Method for WD40 Repeat Detection and Secondary Structure Prediction

    PubMed Central

    Wang, Yang; Jiang, Fan; Zhuo, Zhu; Wu, Xian-Hui; Wu, Yun-Dong

    2013-01-01

    WD40-repeat proteins (WD40s), as one of the largest protein families in eukaryotes, play vital roles in assembling protein-protein/DNA/RNA complexes. WD40s fold into similar β-propeller structures despite diversified sequences. A program WDSP (WD40 repeat protein Structure Predictor) has been developed to accurately identify WD40 repeats and predict their secondary structures. The method is designed specifically for WD40 proteins by incorporating both local residue information and non-local family-specific structural features. It overcomes the problem of highly diversified protein sequences and variable loops. In addition, WDSP achieves a better prediction in identifying multiple WD40-domain proteins by taking the global combination of repeats into consideration. In secondary structure prediction, the average Q3 accuracy of WDSP in jack-knife test reaches 93.7%. A disease related protein LRRK2 was used as a representive example to demonstrate the structure prediction. PMID:23776530

  1. Transcriptome analysis and related databases of Lactococcus lactis.

    PubMed

    Kuipers, Oscar P; de Jong, Anne; Baerends, Richard J S; van Hijum, Sacha A F T; Zomer, Aldert L; Karsens, Harma A; den Hengst, Chris D; Kramer, Naomi E; Buist, Girbe; Kok, Jan

    2002-08-01

    Several complete genome sequences of Lactococcus lactis and their annotations will become available in the near future, next to the already published genome sequence of L. lactis ssp. lactis IL 1403. This will allow intraspecies comparative genomics studies as well as functional genomics studies aimed at a better understanding of physiological processes and regulatory networks operating in lactococci. This paper describes the initial set-up of a DNA-microarray facility in our group, to enable transcriptome analysis of various Gram-positive bacteria, including a ssp. lactis and a ssp. cremoris strain of Lactococcus lactis. Moreover a global description will be given of the hardware and software requirements for such a set-up, highlighting the crucial integration of relevant bioinformatics tools and methods. This includes the development of MolGenIS, an information system for transcriptome data storage and retrieval, and LactococCye, a metabolic pathway/genome database of Lactococcus lactis.

  2. Sensitive Next-Generation Sequencing Method Reveals Deep Genetic Diversity of HIV-1 in the Democratic Republic of the Congo.

    PubMed

    Rodgers, Mary A; Wilkinson, Eduan; Vallari, Ana; McArthur, Carole; Sthreshley, Larry; Brennan, Catherine A; Cloherty, Gavin; de Oliveira, Tulio

    2017-03-15

    As the epidemiological epicenter of the human immunodeficiency virus (HIV) pandemic, the Democratic Republic of the Congo (DRC) is a reservoir of circulating HIV strains exhibiting high levels of diversity and recombination. In this study, we characterized HIV specimens collected in two rural areas of the DRC between 2001 and 2003 to identify rare strains of HIV. The env gp41 region was sequenced and characterized for 172 HIV-positive specimens. The env sequences were predominantly subtype A (43.02%), but 7 other subtypes (33.14%), 20 circulating recombinant forms (CRFs; 11.63%), and 20 unclassified (11.63%) sequences were also found. Of the rare and unclassified subtypes, 18 specimens were selected for next-generation sequencing (NGS) by a modified HIV-switching mechanism at the 5' end of the RNA template (SMART) method to obtain full-genome sequences. NGS produced 14 new complete genomes, which included pure subtype C ( n = 2), D ( n = 1), F1 ( n = 1), H ( n = 3), and J ( n = 1) genomes. The two subtype C genomes and one of the subtype H genomes branched basal to their respective subtype branches but had no evidence of recombination. The remaining 6 genomes were complex recombinants of 2 or more subtypes, including subtypes A1, F, G, H, J, and K and unclassified fragments, including one subtype CRF25 isolate, which branched basal to all CRF25 references. Notably, all recombinant subtype H fragments branched basal to the H clade. Spatial-geographical analysis indicated that the diverse sequences identified here did not expand globally. The full-genome and subgenomic sequences identified in our study population significantly increase the documented diversity of the strains involved in the continually evolving HIV-1 pandemic. IMPORTANCE Very little is known about the ancestral HIV-1 strains that founded the global pandemic, and very few complete genome sequences are available from patients in the Congo Basin, where HIV-1 expanded early in the global pandemic. By sequencing a subgenomic fragment of the HIV-1 envelope from study participants in the DRC, we identified rare variants for complete genome sequencing. The basal branching of some of the complete genome sequences that we recovered suggests that these strains are more closely related to ancestral HIV-1 strains than to previously reported strains and is evidence that the local diversification of HIV in the DRC continues to outpace the diversity of global strains decades after the emergence of the pandemic. Copyright © 2017 Rodgers et al.

  3. 2-D to 3-D global/local finite element analysis of cross-ply composite laminates

    NASA Technical Reports Server (NTRS)

    Thompson, D. Muheim; Griffin, O. Hayden, Jr.

    1990-01-01

    An example of two-dimensional to three-dimensional global/local finite element analysis of a laminated composite plate with a hole is presented. The 'zoom' technique of global/local analysis is used, where displacements of the global/local interface from the two-dimensional global model are applied to the edges of the three-dimensional local model. Three different hole diameters, one, three, and six inches, are considered in order to compare the effect of hole size on the three-dimensional stress state around the hole. In addition, three different stacking sequences are analyzed for the six inch hole case in order to study the effect of stacking sequence. The existence of a 'critical' hole size, where the interlaminar stresses are maximum, is indicated. Dispersion of plies at the same angle, as opposed to clustering, is found to reduce the magnitude of some interlaminar stress components and increase others.

  4. Global Regulatory Pathways in the Alphaproteobacteria

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    none

    A major goal for microbiologists in the twenty-first century is to develop an understanding of the microbial cell in all its complexity. In addition to understanding the function of individual gene products we need to focus on how the cell regulates gene expression at a global level to respond to different environmental parameters. Development of genomic technologies such as complete genome sequencing, proteomics, and global comparisons of mRNA expression patterns allows us to begin to address this issue. This proposal focuses on a number of phylogenetically related bacteria that are involved in environmentally important processes such as carbon sequestration andmore » bioremediation. Genome sequencing projects of a number of these bacteria have revealed the presence of a small family of regulatory genes found thus far only in the alpha-proteobacteria. These genes encode proteins that are related to the global regulatory protein RosR in Rhizobium etli, which is involved in determining nodulation competitiveness in this bacterium. Our goal is to examine the function of the proteins encoded by this gene family in several of the bacteria containing homologs to RosR. We will construct gene disruption mutations in a number of these bacteria and characterize the resulting mutant strains using two-dimensional gel electrophoresis and genetic and biochemical techniques. We will thus determine if the other proteins also function as global regulators of gene expression. Using proteomics methods we will identify the specific proteins whose expression varies depending on the presence or absence of the RosR homolog. Over fifty loci regulated by RosR have been identified in R. etli using transposon mutagenesis; this will serve as out benchmark to which we will compare the other regulons. We expect to identify genes regulated by RosR homologs in several bacterial species, including, but not limited to Rhodopseudomonas palustris and Sphingomonas aromaticivorans. In this way we will provide valuable information on gene regulation in this group of bacteria, expand our understanding of the evolution of global regulatory pathways, and develop methods for comparative regulon analysis among microbes.« less

  5. DNA motif alignment by evolving a population of Markov chains.

    PubMed

    Bi, Chengpeng

    2009-01-30

    Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange. This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart. Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.

  6. Global Analysis of Gene Expression Profiles in Developing Physic Nut (Jatropha curcas L.) Seeds

    PubMed Central

    Jiang, Huawu; Wu, Pingzhi; Zhang, Sheng; Song, Chi; Chen, Yaping; Li, Meiru; Jia, Yongxia; Fang, Xiaohua; Chen, Fan; Wu, Guojiang

    2012-01-01

    Background Physic nut (Jatropha curcas L.) is an oilseed plant species with high potential utility as a biofuel. Furthermore, following recent sequencing of its genome and the availability of expressed sequence tag (EST) libraries, it is a valuable model plant for studying carbon assimilation in endosperms of oilseed plants. There have been several transcriptomic analyses of developing physic nut seeds using ESTs, but they have provided limited information on the accumulation of stored resources in the seeds. Methodology/Principal Findings We applied next-generation Illumina sequencing technology to analyze global gene expression profiles of developing physic nut seeds 14, 19, 25, 29, 35, 41, and 45 days after pollination (DAP). The acquired profiles reveal the key genes, and their expression timeframes, involved in major metabolic processes including: carbon flow, starch metabolism, and synthesis of storage lipids and proteins in the developing seeds. The main period of storage reserves synthesis in the seeds appears to be 29–41 DAP, and the fatty acid composition of the developing seeds is consistent with relative expression levels of different isoforms of acyl-ACP thioesterase and fatty acid desaturase genes. Several transcription factor genes whose expression coincides with storage reserve deposition correspond to those known to regulate the process in Arabidopsis. Conclusions/Significance The results will facilitate searches for genes that influence de novo lipid synthesis, accumulation and their regulatory networks in developing physic nut seeds, and other oil seeds. Thus, they will be helpful in attempts to modify these plants for efficient biofuel production. PMID:22574177

  7. A paleomagnetic record in loess-paleosol sequences since late Pleistocene in the arid Central Asia

    NASA Astrophysics Data System (ADS)

    Li, Guanhua; Xia, Dunsheng; Appel, Erwin; Wang, Youjun; Jia, Jia; Yang, Xiaoqiang

    2018-03-01

    Geomagnetic excursions during Brunhes epoch have been brought to the forefront topic in paleomagnetic study, as they provide key information about Earth's interior dynamics and could serve as another tool for stratigraphic correlation among different lithology. Loess-paleosol sequences provide good archives for decoding geomagnetic excursions. However, the detailed pattern of these excursions was not sufficiently clarified due to pedogenic influence. In this study, paleomagnetic analysis was performed in loess-paleosol sequences on the northern piedmont of the Tianshan Mountains (northwestern China). By radiocarbon and luminance dating, the loess section was chronologically constrained to mainly the last c.130 ka, a period when several distinct geomagnetic excursions were involved. The rock magnetic properties in this loess section are dominated by magnetite and maghemite in a pseudo-single-domain state. The rock magnetic properties and magnetic anisotropy indicate weakly pedogenic influence for magnetic record. The stable component of remanent magnetization derived from thermal demagnetization revealed the presence of two intervals of directional anomalies with corresponding intensity lows in the Brunhes epoch. The age control in the key layers indicates these anomalies are likely associated with the Laschamp and Blake excursions, respectively. In addition, relative paleointensity in the loess section is basically compatible with other regional and global relative paleointensity records and indicates two low-paleointensity zones, possibly corresponding to the Blake and Laschamp excursions, respectively. As a result, this study suggests that the loess section may have the potential to record short-lived excursions, which largely reflect the variation of dipole components in the global archives.

  8. Investigating effects of communications modulation technique on targeting performance

    NASA Astrophysics Data System (ADS)

    Blasch, Erik; Eusebio, Gerald; Huling, Edward

    2006-05-01

    One of the key challenges facing the global war on terrorism (GWOT) and urban operations is the increased need for rapid and diverse information from distributed sources. For users to get adequate information on target types and movements, they would need reliable data. In order to facilitate reliable computational intelligence, we seek to explore the communication modulation tradeoffs affecting information distribution and accumulation. In this analysis, we explore the modulation techniques of Orthogonal Frequency Division Multiplexing (OFDM), Direct Sequence Spread Spectrum (DSSS), and statistical time-division multiple access (TDMA) as a function of the bit error rate and jitter that affect targeting performance. In the analysis, we simulate a Link 16 with a simple bandpass frequency shift keying (PSK) technique using different Signal-to-Noise ratios. The communications transfer delay and accuracy tradeoffs are assessed as to the effects incurred in targeting performance.

  9. Review and International Recommendation of Methods for Typing Neisseria gonorrhoeae Isolates and Their Implications for Improved Knowledge of Gonococcal Epidemiology, Treatment, and Biology

    PubMed Central

    Unemo, Magnus; Dillon, Jo-Anne R.

    2011-01-01

    Summary: Gonorrhea, which may become untreatable due to multiple resistance to available antibiotics, remains a public health problem worldwide. Precise methods for typing Neisseria gonorrhoeae, together with epidemiological information, are crucial for an enhanced understanding regarding issues involving epidemiology, test of cure and contact tracing, identifying core groups and risk behaviors, and recommending effective antimicrobial treatment, control, and preventive measures. This review evaluates methods for typing N. gonorrhoeae isolates and recommends various methods for different situations. Phenotypic typing methods, as well as some now-outdated DNA-based methods, have limited usefulness in differentiating between strains of N. gonorrhoeae. Genotypic methods based on DNA sequencing are preferred, and the selection of the appropriate genotypic method should be guided by its performance characteristics and whether short-term epidemiology (microepidemiology) or long-term and/or global epidemiology (macroepidemiology) matters are being investigated. Currently, for microepidemiological questions, the best methods for fast, objective, portable, highly discriminatory, reproducible, typeable, and high-throughput characterization are N. gonorrhoeae multiantigen sequence typing (NG-MAST) or full- or extended-length porB gene sequencing. However, pulsed-field gel electrophoresis (PFGE) and Opa typing can be valuable in specific situations, i.e., extreme microepidemiology, despite their limitations. For macroepidemiological studies and phylogenetic studies, DNA sequencing of chromosomal housekeeping genes, such as multilocus sequence typing (MLST), provides a more nuanced understanding. PMID:21734242

  10. Gene discovery using next-generation pyrosequencing to develop ESTs for Phalaenopsis orchids

    PubMed Central

    2011-01-01

    Background Orchids are one of the most diversified angiosperms, but few genomic resources are available for these non-model plants. In addition to the ecological significance, Phalaenopsis has been considered as an economically important floriculture industry worldwide. We aimed to use massively parallel 454 pyrosequencing for a global characterization of the Phalaenopsis transcriptome. Results To maximize sequence diversity, we pooled RNA from 10 samples of different tissues, various developmental stages, and biotic- or abiotic-stressed plants. We obtained 206,960 expressed sequence tags (ESTs) with an average read length of 228 bp. These reads were assembled into 8,233 contigs and 34,630 singletons. The unigenes were searched against the NCBI non-redundant (NR) protein database. Based on sequence similarity with known proteins, these analyses identified 22,234 different genes (E-value cutoff, e-7). Assembled sequences were annotated with Gene Ontology, Gene Family and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Among these annotations, over 780 unigenes encoding putative transcription factors were identified. Conclusion Pyrosequencing was effective in identifying a large set of unigenes from Phalaenopsis. The informative EST dataset we developed constitutes a much-needed resource for discovery of genes involved in various biological processes in Phalaenopsis and other orchid species. These transcribed sequences will narrow the gap between study of model organisms with many genomic resources and species that are important for ecological and evolutionary studies. PMID:21749684

  11. Molecular epidemiology of HIV: tracking AIDS pandemic.

    PubMed

    TakebE, Yutaka; Kusagawa, Shigeru; Motomura, Kazushi

    2004-04-01

    Human immunodeficiency virus (HIV) and acquired immunodeficiency syndrome (AIDS) epidemic is a global threat to maternal and child health, especially in developing countries. It is estimated that 800 000 children are infected and 580 000 children die of AIDS-related illnesses every year. Molecular epidemiology has been a useful tool in analyzing the origin of HIV and tracking the course of global HIV spread. This article provides an overview of recent advances in the field of molecular epidemiology of HIV across the world, and discuss the biological implications. Based on the near full-length or partial nucleotide sequence information, the phylogeny and recombinant structure of HIV strains are analyzed. Using genotype classification of HIV as a molecular marker, the origin and the genesis of HIV epidemic are investigated. The HIV-1 group M, a major HIV group responsible for current AIDS pandemic, began its expansion in human population approximately 70 years ago and diversified rapidly over time, now comprising a number of different subtypes and circulating recombinant forms (CRF). Of note, recent studies revealed that new recombinant strains are arising continually, becoming a powerful force in the spread of HIV-1 across the globe. Global dissemination of HIV is a dramatic and deadly example of recent genome emergence and expansion. Molecular epidemiological investigation is expected to provide information critical for prevention and future vaccine strategies.

  12. Music Perception in Dementia.

    PubMed

    Golden, Hannah L; Clark, Camilla N; Nicholas, Jennifer M; Cohen, Miriam H; Slattery, Catherine F; Paterson, Ross W; Foulkes, Alexander J M; Schott, Jonathan M; Mummery, Catherine J; Crutch, Sebastian J; Warren, Jason D

    2017-01-01

    Despite much recent interest in music and dementia, music perception has not been widely studied across dementia syndromes using an information processing approach. Here we addressed this issue in a cohort of 30 patients representing major dementia syndromes of typical Alzheimer's disease (AD, n = 16), logopenic aphasia (LPA, an Alzheimer variant syndrome; n = 5), and progressive nonfluent aphasia (PNFA; n = 9) in relation to 19 healthy age-matched individuals. We designed a novel neuropsychological battery to assess perception of musical patterns in the dimensions of pitch and temporal information (requiring detection of notes that deviated from the established pattern based on local or global sequence features) and musical scene analysis (requiring detection of a familiar tune within polyphonic harmony). Performance on these tests was referenced to generic auditory (timbral) deviance detection and recognition of familiar tunes and adjusted for general auditory working memory performance. Relative to healthy controls, patients with AD and LPA had group-level deficits of global pitch (melody contour) processing while patients with PNFA as a group had deficits of local (interval) as well as global pitch processing. There was substantial individual variation within syndromic groups. Taking working memory performance into account, no specific deficits of musical temporal processing, timbre processing, musical scene analysis, or tune recognition were identified. The findings suggest that particular aspects of music perception such as pitch pattern analysis may open a window on the processing of information streams in major dementia syndromes. The potential selectivity of musical deficits for particular dementia syndromes and particular dimensions of processing warrants further systematic investigation.

  13. Learning Sequences of Actions in Collectives of Autonomous Agents

    NASA Technical Reports Server (NTRS)

    Turner, Kagan; Agogino, Adrian K.; Wolpert, David H.; Clancy, Daniel (Technical Monitor)

    2001-01-01

    In this paper we focus on the problem of designing a collective of autonomous agents that individually learn sequences of actions such that the resultant sequence of joint actions achieves a predetermined global objective. We are particularly interested in instances of this problem where centralized control is either impossible or impractical. For single agent systems in similar domains, machine learning methods (e.g., reinforcement learners) have been successfully used. However, applying such solutions directly to multi-agent systems often proves problematic, as agents may work at cross-purposes, or have difficulty in evaluating their contribution to achievement of the global objective, or both. Accordingly, the crucial design step in multiagent systems centers on determining the private objectives of each agent so that as the agents strive for those objectives, the system reaches a good global solution. In this work we consider a version of this problem involving multiple autonomous agents in a grid world. We use concepts from collective intelligence to design goals for the agents that are 'aligned' with the global goal, and are 'learnable' in that agents can readily see how their behavior affects their utility. We show that reinforcement learning agents using those goals outperform both 'natural' extensions of single agent algorithms and global reinforcement, learning solutions based on 'team games'.

  14. Comparative genomics of citric-acid producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andersen, Mikael R.; Salazar, Margarita; Schaap, Peter

    2011-06-01

    The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compels additional exploration. We therefore undertook whole genome sequencing of the acidogenic A. niger wild type strain (ATCC 1015), and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence and half the telomeric regionsmore » have been elucidated. Moreover, sequence information from ATCC 1015 was utilized to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 megabase of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis revealed up-regulation of the electron transport chain, specifically the alternative oxidative pathway in ATCC 1015, while CBS 513.88 showed significant up regulation of genes associated with biosynthesis of amino acids that are abundant in glucoamylase A, tRNA-synthases and protein transporters.« less

  15. Genetic diversity of porcine circovirus type 2 (PCV2) in Thailand during 2009-2015.

    PubMed

    Thangthamniyom, Nattarat; Sangthong, Pradit; Poolperm, Pariwat; Thanantong, Narut; Boonsoongnern, Alongkot; Hansoongnern, Payuda; Semkum, Ploypailin; Petcharat, Nantawan; Lekcharoensuk, Porntippa

    2017-09-01

    Porcine circovirus type 2 (PCV2), the essential cause of porcine circovirus associated disease (PCVAD), has evolved rapidly and it has been reported worldwide. However, genetic information of PCV2 in Thailand has not been available since 2011. Herein, we studied occurrence and genetic diversity of PCV2 in Thailand and their relationships to the global PCV2 based on ORF2 sequences. The results showed that 306 samples (44.09%) from 56 farms (80%) were PCV2 positive by PCR. Phylogenetic trees constructed by both neighbor-joining and Bayesian Inference yielded similar topology of the ORF2 sequences. Thai PCV2 comprise four clusters: PCV2a (5.5%), PCV2b (29.41%), intermediate clade 1 (IM1) PCV2b (11.03%) and PCV2d (54.41%). Genetic shift of PCV2 in Thailand has occurred similarly to the global situation. The shift from PCV2b to PCV2d was clearly observed during 2013-2014. The viruses with genetically similar to the first reported PCV2 in 2004 have still circulated in Thailand. The first Thai PCV2b and PCV2d were closely related to the neighboring countries. The haplotype network analysis revealed the relationship of PCV2 in Thailand and other countries. These results indicate that genetic diversity of PCV2 in Thailand is caused by genetic drift of the local strains and intermittent introduction of new strains or genotypes from other countries. Genetic evolution of PCV2 in Thailand is similar to that occurs globally. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Spatial effects in real networks: Measures, null models, and applications

    NASA Astrophysics Data System (ADS)

    Ruzzenenti, Franco; Picciolo, Francesco; Basosi, Riccardo; Garlaschelli, Diego

    2012-12-01

    Spatially embedded networks are shaped by a combination of purely topological (space-independent) and space-dependent formation rules. While it is quite easy to artificially generate networks where the relative importance of these two factors can be varied arbitrarily, it is much more difficult to disentangle these two architectural effects in real networks. Here we propose a solution to this problem, by introducing global and local measures of spatial effects that, through a comparison with adequate null models, effectively filter out the spurious contribution of nonspatial constraints. Our filtering allows us to consistently compare different embedded networks or different historical snapshots of the same network. As a challenging application we analyze the World Trade Web, whose topology is known to depend on geographic distances but is also strongly determined by nonspatial constraints (degree sequence or gross domestic product). Remarkably, we are able to detect weak but significant spatial effects both locally and globally in the network, showing that our method succeeds in retrieving spatial information even when nonspatial factors dominate. We finally relate our results to the economic literature on gravity models and trade globalization.

  17. CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data.

    PubMed

    Hallin, Peter F; Ussery, David W

    2004-12-12

    Currently, new bacterial genomes are being published on a monthly basis. With the growing amount of genome sequence data, there is a demand for a flexible and easy-to-maintain structure for storing sequence data and results from bioinformatic analysis. More than 150 sequenced bacterial genomes are now available, and comparisons of properties for taxonomically similar organisms are not readily available to many biologists. In addition to the most basic information, such as AT content, chromosome length, tRNA count and rRNA count, a large number of more complex calculations are needed to perform detailed comparative genomics. DNA structural calculations like curvature and stacking energy, DNA compositions like base skews, oligo skews and repeats at the local and global level are just a few of the analysis that are presented on the CBS Genome Atlas Web page. Complex analysis, changing methods and frequent addition of new models are factors that require a dynamic database layout. Using basic tools like the GNU Make system, csh, Perl and MySQL, we have created a flexible database environment for storing and maintaining such results for a collection of complete microbial genomes. Currently, these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues. A web based user interface which is dynamically linked to the Genome Atlas Database can be accessed via www.cbs.dtu.dk/services/GenomeAtlas/. This paper has a supplemental information page which links to the examples presented: www.cbs.dtu.dk/services/GenomeAtlas/suppl/bioinfdatabase.

  18. Graph pyramids for protein function prediction

    PubMed Central

    2015-01-01

    Background Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Methods Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Results Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. PMID:26044522

  19. Graph pyramids for protein function prediction.

    PubMed

    Sandhan, Tushar; Yoo, Youngjun; Choi, Jin; Kim, Sun

    2015-01-01

    Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data.

  20. The Applied Development of a Tiered Multilocus Sequence Typing (MLST) Scheme for Dichelobacter nodosus.

    PubMed

    Blanchard, Adam M; Jolley, Keith A; Maiden, Martin C J; Coffey, Tracey J; Maboni, Grazieli; Staley, Ceri E; Bollard, Nicola J; Warry, Andrew; Emes, Richard D; Davies, Peers L; Tötemeyer, Sabine

    2018-01-01

    Dichelobacter nodosus ( D. nodosus ) is the causative pathogen of ovine footrot, a disease that has a significant welfare and financial impact on the global sheep industry. Previous studies into the phylogenetics of D. nodosus have focused on Australia and Scandinavia, meaning the current diversity in the United Kingdom (U.K.) population and its relationship globally, is poorly understood. Numerous epidemiological methods are available for bacterial typing; however, few account for whole genome diversity or provide the opportunity for future application of new computational techniques. Multilocus sequence typing (MLST) measures nucleotide variations within several loci with slow accumulation of variation to enable the designation of allele numbers to determine a sequence type. The usage of whole genome sequence data enables the application of MLST, but also core and whole genome MLST for higher levels of strain discrimination with a negligible increase in experimental cost. An MLST database was developed alongside a seven loci scheme using publically available whole genome data from the sequence read archive. Sequence type designation and strain discrimination was compared to previously published data to ensure reproducibility. Multiple D. nodosus isolates from U.K. farms were directly compared to populations from other countries. The U.K. isolates define new clades within the global population of D. nodosus and predominantly consist of serogroups A, B and H, however serogroups C, D, E, and I were also found. The scheme is publically available at https://pubmlst.org/dnodosus/.

  1. Genomic analysis of expressed sequence tags in American black bear Ursus americanus

    PubMed Central

    2010-01-01

    Background Species of the bear family (Ursidae) are important organisms for research in molecular evolution, comparative physiology and conservation biology, but relatively little genetic sequence information is available for this group. Here we report the development and analyses of the first large scale Expressed Sequence Tag (EST) resource for the American black bear (Ursus americanus). Results Comprehensive analyses of molecular functions, alternative splicing, and tissue-specific expression of 38,757 black bear EST sequences were conducted using the dog genome as a reference. We identified 18 genes, involved in functions such as lipid catabolism, cell cycle, and vesicle-mediated transport, that are showing rapid evolution in the bear lineage Three genes, Phospholamban (PLN), cysteine glycine-rich protein 3 (CSRP3) and Troponin I type 3 (TNNI3), are related to heart contraction, and defects in these genes in humans lead to heart disease. Two genes, biphenyl hydrolase-like (BPHL) and CSRP3, contain positively selected sites in bear. Global analysis of evolution rates of hibernation-related genes in bear showed that they are largely conserved and slowly evolving genes, rather than novel and fast-evolving genes. Conclusion We provide a genomic resource for an important mammalian organism and our study sheds new light on the possible functions and evolution of bear genes. PMID:20338065

  2. Complete chloroplast genome sequence of a major allogamous forage species, perennial ryegrass (Lolium perenne L.).

    PubMed

    Diekmann, Kerstin; Hodkinson, Trevor R; Wolfe, Kenneth H; van den Bekerom, Rob; Dix, Philip J; Barth, Susanne

    2009-06-01

    Lolium perenne L. (perennial ryegrass) is globally one of the most important forage and grassland crops. We sequenced the chloroplast (cp) genome of Lolium perenne cultivar Cashel. The L. perenne cp genome is 135 282 bp with a typical quadripartite structure. It contains genes for 76 unique proteins, 30 tRNAs and four rRNAs. As in other grasses, the genes accD, ycf1 and ycf2 are absent. The genome is of average size within its subfamily Pooideae and of medium size within the Poaceae. Genome size differences are mainly due to length variations in non-coding regions. However, considerable length differences of 1-27 codons in comparison of L. perenne to other Poaceae and 1-68 codons among all Poaceae were also detected. Within the cp genome of this outcrossing cultivar, 10 insertion/deletion polymorphisms and 40 single nucleotide polymorphisms were detected. Two of the polymorphisms involve tiny inversions within hairpin structures. By comparing the genome sequence with RT-PCR products of transcripts for 33 genes, 31 mRNA editing sites were identified, five of them unique to Lolium. The cp genome sequence of L. perenne is available under Accession number AM777385 at the European Molecular Biology Laboratory, National Center for Biotechnology Information and DNA DataBank of Japan.

  3. Genomic analysis of expressed sequence tags in American black bear Ursus americanus.

    PubMed

    Zhao, Sen; Shao, Chunxuan; Goropashnaya, Anna V; Stewart, Nathan C; Xu, Yichi; Tøien, Øivind; Barnes, Brian M; Fedorov, Vadim B; Yan, Jun

    2010-03-26

    Species of the bear family (Ursidae) are important organisms for research in molecular evolution, comparative physiology and conservation biology, but relatively little genetic sequence information is available for this group. Here we report the development and analyses of the first large scale Expressed Sequence Tag (EST) resource for the American black bear (Ursus americanus). Comprehensive analyses of molecular functions, alternative splicing, and tissue-specific expression of 38,757 black bear EST sequences were conducted using the dog genome as a reference. We identified 18 genes, involved in functions such as lipid catabolism, cell cycle, and vesicle-mediated transport, that are showing rapid evolution in the bear lineage Three genes, Phospholamban (PLN), cysteine glycine-rich protein 3 (CSRP3) and Troponin I type 3 (TNNI3), are related to heart contraction, and defects in these genes in humans lead to heart disease. Two genes, biphenyl hydrolase-like (BPHL) and CSRP3, contain positively selected sites in bear. Global analysis of evolution rates of hibernation-related genes in bear showed that they are largely conserved and slowly evolving genes, rather than novel and fast-evolving genes. We provide a genomic resource for an important mammalian organism and our study sheds new light on the possible functions and evolution of bear genes.

  4. CEQer: A Graphical Tool for Copy Number and Allelic Imbalance Detection from Whole-Exome Sequencing Data

    PubMed Central

    Piazza, Rocco; Magistroni, Vera; Pirola, Alessandra; Redaelli, Sara; Spinelli, Roberta; Redaelli, Serena; Galbiati, Marta; Valletta, Simona; Giudici, Giovanni; Cazzaniga, Giovanni; Gambacorti-Passerini, Carlo

    2013-01-01

    Copy number alterations (CNA) are common events occurring in leukaemias and solid tumors. Comparative Genome Hybridization (CGH) is actually the gold standard technique to analyze CNAs; however, CGH analysis requires dedicated instruments and is able to perform only low resolution Loss of Heterozygosity (LOH) analyses. Here we present CEQer (Comparative Exome Quantification analyzer), a new graphical, event-driven tool for CNA/allelic-imbalance (AI) coupled analysis of exome sequencing data. By using case-control matched exome data, CEQer performs a comparative digital exonic quantification to generate CNA data and couples this information with exome-wide LOH and allelic imbalance detection. This data is used to build mixed statistical/heuristic models allowing the identification of CNA/AI events. To test our tool, we initially used in silico generated data, then we performed whole-exome sequencing from 20 leukemic specimens and corresponding matched controls and we analyzed the results using CEQer. Taken globally, these analyses showed that the combined use of comparative digital exon quantification and LOH/AI allows generating very accurate CNA data. Therefore, we propose CEQer as an efficient, robust and user-friendly graphical tool for the identification of CNA/AI in the context of whole-exome sequencing data. PMID:24124457

  5. Diverse Array of New Viral Sequences Identified in Worldwide Populations of the Asian Citrus Psyllid (Diaphorina citri) Using Viral Metagenomics

    PubMed Central

    Nouri, Shahideh; Salem, Nidá; Nigg, Jared C.

    2015-01-01

    ABSTRACT The Asian citrus psyllid, Diaphorina citri, is the natural vector of the causal agent of Huanglongbing (HLB), or citrus greening disease. Together; HLB and D. citri represent a major threat to world citrus production. As there is no cure for HLB, insect vector management is considered one strategy to help control the disease, and D. citri viruses might be useful. In this study, we used a metagenomic approach to analyze viral sequences associated with the global population of D. citri. By sequencing small RNAs and the transcriptome coupled with bioinformatics analysis, we showed that the virus-like sequences of D. citri are diverse. We identified novel viral sequences belonging to the picornavirus superfamily, the Reoviridae, Parvoviridae, and Bunyaviridae families, and an unclassified positive-sense single-stranded RNA virus. Moreover, a Wolbachia prophage-related sequence was identified. This is the first comprehensive survey to assess the viral community from worldwide populations of an agricultural insect pest. Our results provide valuable information on new putative viruses, some of which may have the potential to be used as biocontrol agents. IMPORTANCE Insects have the most species of all animals, and are hosts to, and vectors of, a great variety of known and unknown viruses. Some of these most likely have the potential to be important fundamental and/or practical resources. In this study, we used high-throughput next-generation sequencing (NGS) technology and bioinformatics analysis to identify putative viruses associated with Diaphorina citri, the Asian citrus psyllid. D. citri is the vector of the bacterium causing Huanglongbing (HLB), currently the most serious threat to citrus worldwide. Here, we report several novel viral sequences associated with D. citri. PMID:26676774

  6. Diverse Array of New Viral Sequences Identified in Worldwide Populations of the Asian Citrus Psyllid (Diaphorina citri) Using Viral Metagenomics.

    PubMed

    Nouri, Shahideh; Salem, Nidá; Nigg, Jared C; Falk, Bryce W

    2015-12-16

    The Asian citrus psyllid, Diaphorina citri, is the natural vector of the causal agent of Huanglongbing (HLB), or citrus greening disease. Together; HLB and D. citri represent a major threat to world citrus production. As there is no cure for HLB, insect vector management is considered one strategy to help control the disease, and D. citri viruses might be useful. In this study, we used a metagenomic approach to analyze viral sequences associated with the global population of D. citri. By sequencing small RNAs and the transcriptome coupled with bioinformatics analysis, we showed that the virus-like sequences of D. citri are diverse. We identified novel viral sequences belonging to the picornavirus superfamily, the Reoviridae, Parvoviridae, and Bunyaviridae families, and an unclassified positive-sense single-stranded RNA virus. Moreover, a Wolbachia prophage-related sequence was identified. This is the first comprehensive survey to assess the viral community from worldwide populations of an agricultural insect pest. Our results provide valuable information on new putative viruses, some of which may have the potential to be used as biocontrol agents. Insects have the most species of all animals, and are hosts to, and vectors of, a great variety of known and unknown viruses. Some of these most likely have the potential to be important fundamental and/or practical resources. In this study, we used high-throughput next-generation sequencing (NGS) technology and bioinformatics analysis to identify putative viruses associated with Diaphorina citri, the Asian citrus psyllid. D. citri is the vector of the bacterium causing Huanglongbing (HLB), currently the most serious threat to citrus worldwide. Here, we report several novel viral sequences associated with D. citri. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  7. TIA: algorithms for development of identity-linked SNP islands for analysis by massively parallel DNA sequencing.

    PubMed

    Farris, M Heath; Scott, Andrew R; Texter, Pamela A; Bartlett, Marta; Coleman, Patricia; Masters, David

    2018-04-11

    Single nucleotide polymorphisms (SNPs) located within the human genome have been shown to have utility as markers of identity in the differentiation of DNA from individual contributors. Massively parallel DNA sequencing (MPS) technologies and human genome SNP databases allow for the design of suites of identity-linked target regions, amenable to sequencing in a multiplexed and massively parallel manner. Therefore, tools are needed for leveraging the genotypic information found within SNP databases for the discovery of genomic targets that can be evaluated on MPS platforms. The SNP island target identification algorithm (TIA) was developed as a user-tunable system to leverage SNP information within databases. Using data within the 1000 Genomes Project SNP database, human genome regions were identified that contain globally ubiquitous identity-linked SNPs and that were responsive to targeted resequencing on MPS platforms. Algorithmic filters were used to exclude target regions that did not conform to user-tunable SNP island target characteristics. To validate the accuracy of TIA for discovering these identity-linked SNP islands within the human genome, SNP island target regions were amplified from 70 contributor genomic DNA samples using the polymerase chain reaction. Multiplexed amplicons were sequenced using the Illumina MiSeq platform, and the resulting sequences were analyzed for SNP variations. 166 putative identity-linked SNPs were targeted in the identified genomic regions. Of the 309 SNPs that provided discerning power across individual SNP profiles, 74 previously undefined SNPs were identified during evaluation of targets from individual genomes. Overall, DNA samples of 70 individuals were uniquely identified using a subset of the suite of identity-linked SNP islands. TIA offers a tunable genome search tool for the discovery of targeted genomic regions that are scalable in the population frequency and numbers of SNPs contained within the SNP island regions. It also allows the definition of sequence length and sequence variability of the target region as well as the less variable flanking regions for tailoring to MPS platforms. As shown in this study, TIA can be used to discover identity-linked SNP islands within the human genome, useful for differentiating individuals by targeted resequencing on MPS technologies.

  8. Integration of Temporal and Ordinal Information During Serial Interception Sequence Learning

    PubMed Central

    Gobel, Eric W.; Sanchez, Daniel J.; Reber, Paul J.

    2011-01-01

    The expression of expert motor skills typically involves learning to perform a precisely timed sequence of movements (e.g., language production, music performance, athletic skills). Research examining incidental sequence learning has previously relied on a perceptually-cued task that gives participants exposure to repeating motor sequences but does not require timing of responses for accuracy. Using a novel perceptual-motor sequence learning task, learning a precisely timed cued sequence of motor actions is shown to occur without explicit instruction. Participants learned a repeating sequence through practice and showed sequence-specific knowledge via a performance decrement when switched to an unfamiliar sequence. In a second experiment, the integration of representation of action order and timing sequence knowledge was examined. When either action order or timing sequence information was selectively disrupted, performance was reduced to levels similar to completely novel sequences. Unlike prior sequence-learning research that has found timing information to be secondary to learning action sequences, when the task demands require accurate action and timing information, an integrated representation of these types of information is acquired. These results provide the first evidence for incidental learning of fully integrated action and timing sequence information in the absence of an independent representation of action order, and suggest that this integrative mechanism may play a material role in the acquisition of complex motor skills. PMID:21417511

  9. Soil Communities of Central Park, New York City: A Biodiversity Melting Pot

    NASA Astrophysics Data System (ADS)

    Ramirez, K. S.; Leff, J. W.; Wall, D. H.; Fierer, N.

    2013-12-01

    The majority of earth's biodiversity lives in and makes up the soil, but the majority of soil biodiversity has yet to be characterized or even quantified. This may be especially true of urban soil systems. The last decade of advances in molecular, technical and bioinformatic techniques have contributed greatly to our understanding of belowground biodiversity, from global distribution to species counts. Yet, much of this work has been done in ';natural' systems and it is not known if established patterns of distribution, especially in relation to soil factors hold up in urban soils. Urban soils are intensively managed and disturbed, often by effects unique to urban settings. It remains unclear how urban pressures influence soil biodiversity, or if there is a defined or typical ';urban soil community'. Here we describe a study to examine the total soil biodiversity - Bacteria, Archaea and Eukarya- of Central Park, New York City and test for patterns of distribution and relationships to soil characteristics. We then compare the biodiversity of Central Park to 57 global soils, spanning a number of biomes from Alaska to Antarctica. In this way we can identify similarities and differences in soil communities of Central Park to soils from ';natural' systems. To generate a broad-scale survey of total soil biodiversity, 596 soil samples were collected from across Central Park (3.41 km2). Soils varied greatly in vegetation cover and soil characteristics (pH, moisture, soil C and soil N). Using high-throughput Illumina sequencing technology we characterized the complete soil community from 16S rRNA (Bacteria and Archaea) and 18S rRNA gene sequences (Eukarya). Samples were rarified to 40,000 sequences per sample. To compare Central Park to the 57 global soils the complete soil community of the global soils was also characterized using Illumina sequencing technology. All samples were rarified to 40,000 sequences per sample. The total measured biodiversity in Central Park was high: >540,000 bacterial and archaeal species; and >97,000 eukaryotic species (as determined using a 97% sequence similarity cutoff). The most dominant bacterial phyla include Proteobacteria, Acidobacteria, Bacteroidetes, Verrucomicrobia and Actinobacteria, and Archaea represent 1-8% of the sequences. Additionally, the distribution patterns of Acidobacteria and consequently beta-diversity, was strongly related to soil pH. The most dominant eukaryotic taxa include many Protists (Rhizara, Gregarinia), Fungi (Basidiomycota, Ascomycota), and Metazoa (Nematodes, Rotifers, Arthropods and Annelids). No single soil factor could predict eukaryotic distribution. Central Park soil diversity was strikingly similar to the diversity of the 57 global soils. Central Park and the global soils had similarities in alpha diversity, taxon abundances. Interestingly, there was significant overlap in a number of dominant species between Central Park and the global soils. Together these results represent the most comprehensive analysis of soil biodiversity conducted to date. Our data suggest that even well-studied locations like Central Park harbor very high levels of unexplored biodiversity, and that Central Park biodiversity is comparable to soil biodiversity found globally.

  10. Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale

    PubMed Central

    Schmidt, Thomas S. B.; Matias Rodrigues, João F.; von Mering, Christian

    2014-01-01

    Operational Taxonomic Units (OTUs), usually defined as clusters of similar 16S/18S rRNA sequences, are the most widely used basic diversity units in large-scale characterizations of microbial communities. However, it remains unclear how well the various proposed OTU clustering algorithms approximate ‘true’ microbial taxa. Here, we explore the ecological consistency of OTUs – based on the assumption that, like true microbial taxa, they should show measurable habitat preferences (niche conservatism). In a global and comprehensive survey of available microbial sequence data, we systematically parse sequence annotations to obtain broad ecological descriptions of sampling sites. Based on these, we observe that sequence-based microbial OTUs generally show high levels of ecological consistency. However, different OTU clustering methods result in marked differences in the strength of this signal. Assuming that ecological consistency can serve as an objective external benchmark for cluster quality, we conclude that hierarchical complete linkage clustering, which provided the most ecologically consistent partitions, should be the default choice for OTU clustering. To our knowledge, this is the first approach to assess cluster quality using an external, biologically meaningful parameter as a benchmark, on a global scale. PMID:24763141

  11. Development of a single nucleotide polymorphism barcode to genotype Plasmodium vivax infections.

    PubMed

    Baniecki, Mary Lynn; Faust, Aubrey L; Schaffner, Stephen F; Park, Daniel J; Galinsky, Kevin; Daniels, Rachel F; Hamilton, Elizabeth; Ferreira, Marcelo U; Karunaweera, Nadira D; Serre, David; Zimmerman, Peter A; Sá, Juliana M; Wellems, Thomas E; Musset, Lise; Legrand, Eric; Melnikov, Alexandre; Neafsey, Daniel E; Volkman, Sarah K; Wirth, Dyann F; Sabeti, Pardis C

    2015-03-01

    Plasmodium vivax, one of the five species of Plasmodium parasites that cause human malaria, is responsible for 25-40% of malaria cases worldwide. Malaria global elimination efforts will benefit from accurate and effective genotyping tools that will provide insight into the population genetics and diversity of this parasite. The recent sequencing of P. vivax isolates from South America, Africa, and Asia presents a new opportunity by uncovering thousands of novel single nucleotide polymorphisms (SNPs). Genotyping a selection of these SNPs provides a robust, low-cost method of identifying parasite infections through their unique genetic signature or barcode. Based on our experience in generating a SNP barcode for P. falciparum using High Resolution Melting (HRM), we have developed a similar tool for P. vivax. We selected globally polymorphic SNPs from available P. vivax genome sequence data that were located in putatively selectively neutral sites (i.e., intergenic, intronic, or 4-fold degenerate coding). From these candidate SNPs we defined a barcode consisting of 42 SNPs. We analyzed the performance of the 42-SNP barcode on 87 P. vivax clinical samples from parasite populations in South America (Brazil, French Guiana), Africa (Ethiopia) and Asia (Sri Lanka). We found that the P. vivax barcode is robust, as it requires only a small quantity of DNA (limit of detection 0.3 ng/μl) to yield reproducible genotype calls, and detects polymorphic genotypes with high sensitivity. The markers are informative across all clinical samples evaluated (average minor allele frequency > 0.1). Population genetic and statistical analyses show the barcode captures high degrees of population diversity and differentiates geographically distinct populations. Our 42-SNP barcode provides a robust, informative, and standardized genetic marker set that accurately identifies a genomic signature for P. vivax infections.

  12. Development of a Single Nucleotide Polymorphism Barcode to Genotype Plasmodium vivax Infections

    PubMed Central

    Baniecki, Mary Lynn; Faust, Aubrey L.; Schaffner, Stephen F.; Park, Daniel J.; Galinsky, Kevin; Daniels, Rachel F.; Hamilton, Elizabeth; Ferreira, Marcelo U.; Karunaweera, Nadira D.; Serre, David; Zimmerman, Peter A.; Sá, Juliana M.; Wellems, Thomas E.; Musset, Lise; Legrand, Eric; Melnikov, Alexandre; Neafsey, Daniel E.; Volkman, Sarah K.; Wirth, Dyann F.; Sabeti, Pardis C.

    2015-01-01

    Plasmodium vivax, one of the five species of Plasmodium parasites that cause human malaria, is responsible for 25–40% of malaria cases worldwide. Malaria global elimination efforts will benefit from accurate and effective genotyping tools that will provide insight into the population genetics and diversity of this parasite. The recent sequencing of P. vivax isolates from South America, Africa, and Asia presents a new opportunity by uncovering thousands of novel single nucleotide polymorphisms (SNPs). Genotyping a selection of these SNPs provides a robust, low-cost method of identifying parasite infections through their unique genetic signature or barcode. Based on our experience in generating a SNP barcode for P. falciparum using High Resolution Melting (HRM), we have developed a similar tool for P. vivax. We selected globally polymorphic SNPs from available P. vivax genome sequence data that were located in putatively selectively neutral sites (i.e., intergenic, intronic, or 4-fold degenerate coding). From these candidate SNPs we defined a barcode consisting of 42 SNPs. We analyzed the performance of the 42-SNP barcode on 87 P. vivax clinical samples from parasite populations in South America (Brazil, French Guiana), Africa (Ethiopia) and Asia (Sri Lanka). We found that the P. vivax barcode is robust, as it requires only a small quantity of DNA (limit of detection 0.3 ng/μl) to yield reproducible genotype calls, and detects polymorphic genotypes with high sensitivity. The markers are informative across all clinical samples evaluated (average minor allele frequency > 0.1). Population genetic and statistical analyses show the barcode captures high degrees of population diversity and differentiates geographically distinct populations. Our 42-SNP barcode provides a robust, informative, and standardized genetic marker set that accurately identifies a genomic signature for P. vivax infections. PMID:25781890

  13. Quantification of the epitope diversity of HIV-1-specific binding antibodies by peptide microarrays for global HIV-1 vaccine development

    DOE PAGES

    Stephenson, Kathryn E.; Neubauer, George H.; Reimer, Ulf; ...

    2014-11-14

    An effective vaccine against human immunodeficiency virus type 1 (HIV-1) will have to provide protection against a vast array of different HIV-1 strains. Current methods to measure HIV-1-specific binding antibodies following immunization typically focus on determining the magnitude of antibody responses, but the epitope diversity of antibody responses has remained largely unexplored. Here we describe the development of a global HIV-1 peptide microarray that contains 6564 peptides from across the HIV-1 proteome and covers the majority of HIV-1 sequences in the Los Alamos National Laboratory global HIV-1 sequence database. Using this microarray, we quantified the magnitude, breadth, and depth ofmore » IgG binding to linear HIV-1 sequences in HIV-1-infected humans and HIV-1-vaccinated humans, rhesus monkeys and guinea pigs. The microarray measured potentially important differences in antibody epitope diversity, particularly regarding the depth of epitope variants recognized at each binding site. Our data suggest that the global HIV-1 peptide microarray may be a useful tool for both preclinical and clinical HIV-1 research.« less

  14. Quantification of the epitope diversity of HIV-1-specific binding antibodies by peptide microarrays for global HIV-1 vaccine development

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stephenson, Kathryn E.; Neubauer, George H.; Reimer, Ulf

    An effective vaccine against human immunodeficiency virus type 1 (HIV-1) will have to provide protection against a vast array of different HIV-1 strains. Current methods to measure HIV-1-specific binding antibodies following immunization typically focus on determining the magnitude of antibody responses, but the epitope diversity of antibody responses has remained largely unexplored. Here we describe the development of a global HIV-1 peptide microarray that contains 6564 peptides from across the HIV-1 proteome and covers the majority of HIV-1 sequences in the Los Alamos National Laboratory global HIV-1 sequence database. Using this microarray, we quantified the magnitude, breadth, and depth ofmore » IgG binding to linear HIV-1 sequences in HIV-1-infected humans and HIV-1-vaccinated humans, rhesus monkeys and guinea pigs. The microarray measured potentially important differences in antibody epitope diversity, particularly regarding the depth of epitope variants recognized at each binding site. Our data suggest that the global HIV-1 peptide microarray may be a useful tool for both preclinical and clinical HIV-1 research.« less

  15. The critical role of acute flaccid paralysis surveillance in the Global Polio Eradication Initiative.

    PubMed

    Tangermann, Rudolf H; Lamoureux, Christine; Tallis, Graham; Goel, Ajay

    2017-05-01

    Acute flaccid paralysis (AFP) surveillance is a key strategy used by the Global Polio Eradication Initiative (GPEI) to measure progress towards reaching the global eradication goal. Supported by a global polio laboratory network, AFP surveillance is conducted in 179 of 194 WHO member states. Active surveillance visits to priority health facilities are used to assure all children <15 years with AFP are detected, followed by stool specimen collection and testing for poliovirus in WHO-accredited polio laboratories. The quality of AFP surveillance is regularly monitored with standardized surveillance quality indicators. In highest risk countries and areas, the sensitivity of AFP surveillance is enhanced by environmental surveillance (testing of sewage samples). Genetic sequencing of detected poliovirus isolates yields programmatically important information on polio transmission pathways. AFP surveillance is one of the most valuable assets of the GPEI, with the potential to serve as a platform to build integrated disease surveillance systems. Continued support to maintain AFP surveillance systems will be essential, to reliably monitor the completion of global polio eradication, and to assure that a key resource for building surveillance capacity is transitioned post-eradication to support other health priorities. © The Author 2017. Published by Oxford University Press on behalf of Royal Society of Tropical Medicine and Hygiene. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  16. VitisExpDB: a database resource for grape functional genomics.

    PubMed

    Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L

    2008-02-28

    The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores approximately 320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of approximately 20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website http://cropdisease.ars.usda.gov/vitis_at/main-page.htm.

  17. VitisExpDB: A database resource for grape functional genomics

    PubMed Central

    Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L

    2008-01-01

    Background The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. Description VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores ~320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of ~20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. Conclusion The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website . PMID:18307813

  18. Complete Genome Sequences of Isolates of Enterococcus faecium Sequence Type 117, a Globally Disseminated Multidrug-Resistant Clone

    PubMed Central

    Tedim, Ana P.; Lanza, Val F.; Manrique, Marina; Pareja, Eduardo; Ruiz-Garbajosa, Patricia; Cantón, Rafael; Baquero, Fernando; Tobes, Raquel

    2017-01-01

    ABSTRACT The emergence of nosocomial infections by multidrug-resistant sequence type 117 (ST117) Enterococcus faecium has been reported in several European countries. ST117 has been detected in Spanish hospitals as one of the main causes of bloodstream infections. We analyzed genome variations of ST117 strains isolated in Madrid and describe the first ST117 closed genome sequences. PMID:28360174

  19. Massive Collection of Full-Length Complementary DNA Clones and Microarray Analyses:. Keys to Rice Transcriptome Analysis

    NASA Astrophysics Data System (ADS)

    Kikuchi, Shoshi

    2009-02-01

    Completion of the high-precision genome sequence analysis of rice led to the collection of about 35,000 full-length cDNA clones and the determination of their complete sequences. Mapping of these full-length cDNA sequences has given us information on (1) the number of genes expressed in the rice genome; (2) the start and end positions and exon-intron structures of rice genes; (3) alternative transcripts; (4) possible encoded proteins; (5) non-protein-coding (np) RNAs; (6) the density of gene localization on the chromosome; (7) setting the parameters of gene prediction programs; and (8) the construction of a microarray system that monitors global gene expression. Manual curation for rice gene annotation by using mapping information on full-length cDNA and EST assemblies has revealed about 32,000 expressed genes in the rice genome. Analysis of major gene families, such as those encoding membrane transport proteins (pumps, ion channels, and secondary transporters), along with the evolution from bacteria to higher animals and plants, reveals how gene numbers have increased through adaptation to circumstances. Family-based gene annotation also gives us a new way of comparing organisms. Massive amounts of data on gene expression under many kinds of physiological conditions are being accumulated in rice oligoarrays (22K and 44K) based on full-length cDNA sequences. Cluster analyses of genes that have the same promoter cis-elements, that have similar expression profiles, or that encode enzymes in the same metabolic pathways or signal transduction cascades give us clues to understanding the networks of gene expression in rice. As a tool for that purpose, we recently developed "RiCES", a tool for searching for cis-elements in the promoter regions of clustered genes.

  20. Upper Cretaceous sequences and sea-level history, New Jersey Coastal Plain

    USGS Publications Warehouse

    Miller, K.G.; Sugarman, P.J.; Browning, J.V.; Kominz, M.A.; Olsson, R.K.; Feigenson, M.D.; Hernandez, J.C.

    2004-01-01

    We developed a Late Cretaceous sealevel estimate from Upper Cretaceous sequences at Bass River and Ancora, New Jersey (ODP [Ocean Drilling Program] Leg 174AX). We dated 11-14 sequences by integrating Sr isotope and biostratigraphy (age resolution ??0.5 m.y.) and then estimated paleoenvironmental changes within the sequences from lithofacies and biofacies analyses. Sequences generally shallow upsection from middle-neritic to inner-neritic paleodepths, as shown by the transition from thin basal glauconite shelf sands (transgressive systems tracts [TST]), to medial-prodelta silty clays (highstand systems tracts [HST]), and finally to upper-delta-front quartz sands (HST). Sea-level estimates obtained by backstripping (accounting for paleodepth variations, sediment loading, compaction, and basin subsidence) indicate that large (>25 m) and rapid (???1 m.y.) sea-level variations occurred during the Late Cretaceous greenhouse world. The fact that the timing of Upper Cretaceous sequence boundaries in New Jersey is similar to the sea-level lowering records of Exxon Production Research Company (EPR), northwest European sections, and Russian platform outcrops points to a global cause. Because backstripping, seismicity, seismic stratigraphic data, and sediment-distribution patterns all indicate minimal tectonic effects on the New Jersey Coastal Plain, we interpret that we have isolated a eustatic signature. The only known mechanism that can explain such global changes-glacio-eustasy-is consistent with foraminiferal ??18O data. Either continental ice sheets paced sea-level changes during the Late Cretaceous, or our understanding of causal mechanisms for global sea-level change is fundamentally flawed. Comparison of our eustatic history with published ice-sheet models and Milankovitch predictions suggests that small (5-10 ?? 106 km3), ephemeral, and areally restricted Antarctic ice sheets paced the Late Cretaceous global sea-level change. New Jersey and Russian eustatic estimates are typically one-half of the EPR amplitudes, though this difference varies through time, yielding markedly different eustatic curves. We conclude that New Jersey provides the best available estimate for Late Cretaceous sea-level variations. ?? 2004 Geological Society America.

  1. MODIS Snow-Cover Products

    NASA Technical Reports Server (NTRS)

    Hall, Dorothy K.; Riggs, George A.; Salomonson, Vinvent V.; DiGirolamo, Nicolo; Bayr, Klaus J.; Houser, Paul (Technical Monitor)

    2001-01-01

    On December 18, 1999, the Terra satellite was launched with a complement of five instruments including the Moderate Resolution Imaging Spectroradiometer (MODIS). Many geophysical products are derived from MODIS data including global snow-cover products. These products have been available through the National Snow and Ice Data Center (NSIDC) Distributed Active Archive Center (DAAC) since September 13, 2000. MODIS snow-cover products represent potential improvement to the currently available operation products mainly because the MODIS products are global and 500-m resolution, and have the capability to separate most snow and clouds. Also the snow-mapping algorithms are automated which means that a consistent data set is generated for long-term climates studies that require snow-cover information. Extensive quality assurance (QA) information is stored with the product. The snow product suite starts with a 500-m resolution swath snow-cover map which is gridded to the Integerized Sinusoidal Grid to produce daily and eight-day composite tile products. The sequence then proceeds to a climate-modeling grid product at 5-km spatial resolution, with both daily and eight-day composite products. A case study from March 6, 2000, involving MODIS data and field and aircraft measurements, is presented. Near-term enhancements include daily snow albedo and fractional snow cover.

  2. Adaptive Local Realignment of Protein Sequences.

    PubMed

    DeBlasio, Dan; Kececioglu, John

    2018-06-11

    While mutation rates can vary markedly over the residues of a protein, multiple sequence alignment tools typically use the same values for their scoring-function parameters across a protein's entire length. We present a new approach, called adaptive local realignment, that in contrast automatically adapts to the diversity of mutation rates along protein sequences. This builds upon a recent technique known as parameter advising, which finds global parameter settings for an aligner, to now adaptively find local settings. Our approach in essence identifies local regions with low estimated accuracy, constructs a set of candidate realignments using a carefully-chosen collection of parameter settings, and replaces the region if a realignment has higher estimated accuracy. This new method of local parameter advising, when combined with prior methods for global advising, boosts alignment accuracy as much as 26% over the best default setting on hard-to-align protein benchmarks, and by 6.4% over global advising alone. Adaptive local realignment has been implemented within the Opal aligner using the Facet accuracy estimator.

  3. Genomic Epidemiology of Global Carbapenemase-Producing Enterobacter spp., 2008-2014.

    PubMed

    Peirano, Gisele; Matsumura, Yasufumi; Adams, Mark D; Bradford, Patricia; Motyl, Mary; Chen, Liang; Kreiswirth, Barry N; Pitout, Johann D D

    2018-06-01

    We performed whole-genome sequencing on 170 clinical carbapenemase-producing Enterobacter spp. isolates collected globally during 2008-2014. The most common carbapenemase was VIM, followed by New Delhi metallo-β-lactamase (NDM), Klebsiella pneumoniae carbapenemase, oxacillin 48, and IMP. The isolates were of predominantly 2 species (E. xiangfangensis and E. hormaechei subsp. steigerwaltii) and 4 global clones (sequence type [ST] 114, ST93, ST90, and ST78) with different clades within ST114 and ST90. Particular genetic structures surrounding carbapenemase genes were circulating locally in various institutions within the same or between different STs in Greece, Guatemala, Italy, Spain, Serbia, and Vietnam. We found a common NDM genetic structure (NDM-GE-U.S.), previously described on pNDM-U.S. from Klebsiella pneumoniae ATCC BAA-214, in 14 different clones obtained from 6 countries spanning 4 continents. Our study highlights the importance of surveillance programs using whole-genome sequencing in providing insight into the molecular epidemiology of carbapenemase-producing Enterobacter spp.

  4. Evolution of a global regulator: Lrp in four orders of γ-Proteobacteria.

    PubMed

    Unoarumhi, Yvette; Blumenthal, Robert M; Matson, Jyl S

    2016-05-20

    Bacterial global regulators each regulate the expression of several hundred genes. In Escherichia coli, the top seven global regulators together control over half of all genes. Leucine-responsive regulatory protein (Lrp) is one of these top seven global regulators. Lrp orthologs are very widely distributed, among both Bacteria and Archaea. Surprisingly, even within the phylum γ-Proteobacteria (which includes E. coli), Lrp is a global regulator in some orders and a local regulator in others. This raises questions about the evolution of Lrp and, more broadly, of global regulators. We examined Lrp sequences from four bacterial orders of the γ-Proteobacteria using phylogenetic and Logo analyses. The orders studied were Enterobacteriales and Vibrionales, in which Lrp plays a global role in tested species; Pasteurellales, in which Lrp is a local regulator in the tested species; and Alteromonadales, an order closely related to the other three but in which Lrp has not yet been studied. For comparison, we analyzed the Lrp paralog AsnC, which in all tested cases is a local regulator. The Lrp and AsnC phylogenetic clusters each divided, as expected, into subclusters representing the Enterobacteriales, Vibrionales, and Pasteuralles. However the Alteromonadales did not yield coherent clusters for either Lrp or AsnC. Logo analysis revealed signatures associated with globally- vs. locally- acting Lrp orthologs, providing testable hypotheses for which portions of Lrp are responsible for a global vs. local role. These candidate regions include both ends of the Lrp polypeptide but not, interestingly, the highly-conserved helix-turn-helix motif responsible for DNA sequence specificity. Lrp and AsnC have conserved sequence signatures that allow their unambiguous annotation, at least in γ-Proteobacteria. Among Lrp orthologs, specific residues correlated with global vs. local regulatory roles, and can now be tested to determine which are functionally relevant and which simply reflect divergence. In the Alteromonadales, it appears that there are different subgroups of Lrp orthologs, one of which may act globally while the other may act locally. These results suggest experiments to improve our understanding of the evolution of bacterial global regulators.

  5. Phylogenetic Distribution of CRISPR-Cas Systems in Antibiotic-Resistant Pseudomonas aeruginosa.

    PubMed

    van Belkum, Alex; Soriaga, Leah B; LaFave, Matthew C; Akella, Srividya; Veyrieras, Jean-Baptiste; Barbu, E Magda; Shortridge, Dee; Blanc, Bernadette; Hannum, Gregory; Zambardi, Gilles; Miller, Kristofer; Enright, Mark C; Mugnier, Nathalie; Brami, Daniel; Schicklin, Stéphane; Felderman, Martina; Schwartz, Ariel S; Richardson, Toby H; Peterson, Todd C; Hubby, Bolyn; Cady, Kyle C

    2015-11-24

    Pseudomonas aeruginosa is an antibiotic-refractory pathogen with a large genome and extensive genotypic diversity. Historically, P. aeruginosa has been a major model system for understanding the molecular mechanisms underlying type I clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR-associated protein (CRISPR-Cas)-based bacterial immune system function. However, little information on the phylogenetic distribution and potential role of these CRISPR-Cas systems in molding the P. aeruginosa accessory genome and antibiotic resistance elements is known. Computational approaches were used to identify and characterize CRISPR-Cas systems within 672 genomes, and in the process, we identified a previously unreported and putatively mobile type I-C P. aeruginosa CRISPR-Cas system. Furthermore, genomes harboring noninhibited type I-F and I-E CRISPR-Cas systems were on average ~300 kb smaller than those without a CRISPR-Cas system. In silico analysis demonstrated that the accessory genome (n = 22,036 genes) harbored the majority of identified CRISPR-Cas targets. We also assembled a global spacer library that aided the identification of difficult-to-characterize mobile genetic elements within next-generation sequencing (NGS) data and allowed CRISPR typing of a majority of P. aeruginosa strains. In summary, our analysis demonstrated that CRISPR-Cas systems play an important role in shaping the accessory genomes of globally distributed P. aeruginosa isolates. P. aeruginosa is both an antibiotic-refractory pathogen and an important model system for type I CRISPR-Cas bacterial immune systems. By combining the genome sequences of 672 newly and previously sequenced genomes, we were able to provide a global view of the phylogenetic distribution, conservation, and potential targets of these systems. This analysis identified a new and putatively mobile P. aeruginosa CRISPR-Cas subtype, characterized the diverse distribution of known CRISPR-inhibiting genes, and provided a potential new use for CRISPR spacer libraries in accessory genome analysis. Our data demonstrated the importance of CRISPR-Cas systems in modulating the accessory genomes of globally distributed strains while also providing substantial data for subsequent genomic and experimental studies in multiple fields. Understanding why certain genotypes of P. aeruginosa are clinically prevalent and adept at horizontally acquiring virulence and antibiotic resistance elements is of major clinical and economic importance. Copyright © 2015 van Belkum et al.

  6. Multiplex PCR-Based Next-Generation Sequencing and Global Diversity of Seoul Virus in Humans and Rats.

    PubMed

    Kim, Won-Keun; No, Jin Sun; Lee, Seung-Ho; Song, Dong Hyun; Lee, Daesang; Kim, Jeong-Ah; Gu, Se Hun; Park, Sunhye; Jeong, Seong Tae; Kim, Heung-Chul; Klein, Terry A; Wiley, Michael R; Palacios, Gustavo; Song, Jin-Won

    2018-02-01

    Seoul virus (SEOV) poses a worldwide public health threat. This virus, which is harbored by Rattus norvegicus and R. rattus rats, is the causative agent of hemorrhagic fever with renal syndrome (HFRS) in humans, which has been reported in Asia, Europe, the Americas, and Africa. Defining SEOV genome sequences plays a critical role in development of preventive and therapeutic strategies against the unique worldwide hantavirus. We applied multiplex PCR-based next-generation sequencing to obtain SEOV genome sequences from clinical and reservoir host specimens. Epidemiologic surveillance of R. norvegicus rats in South Korea during 2000-2016 demonstrated that the serologic prevalence of enzootic SEOV infections was not significant on the basis of sex, weight (age), and season. Viral loads of SEOV in rats showed wide dissemination in tissues and dynamic circulation among populations. Phylogenetic analyses showed the global diversity of SEOV and possible genomic configuration of genetic exchanges.

  7. A communal catalogue reveals Earth's multiscale microbial diversity.

    PubMed

    Thompson, Luke R; Sanders, Jon G; McDonald, Daniel; Amir, Amnon; Ladau, Joshua; Locey, Kenneth J; Prill, Robert J; Tripathi, Anupriya; Gibbons, Sean M; Ackermann, Gail; Navas-Molina, Jose A; Janssen, Stefan; Kopylova, Evguenia; Vázquez-Baeza, Yoshiki; González, Antonio; Morton, James T; Mirarab, Siavash; Zech Xu, Zhenjiang; Jiang, Lingjing; Haroon, Mohamed F; Kanbar, Jad; Zhu, Qiyun; Jin Song, Se; Kosciolek, Tomasz; Bokulich, Nicholas A; Lefler, Joshua; Brislawn, Colin J; Humphrey, Gregory; Owens, Sarah M; Hampton-Marcell, Jarrad; Berg-Lyons, Donna; McKenzie, Valerie; Fierer, Noah; Fuhrman, Jed A; Clauset, Aaron; Stevens, Rick L; Shade, Ashley; Pollard, Katherine S; Goodwin, Kelly D; Jansson, Janet K; Gilbert, Jack A; Knight, Rob

    2017-11-23

    Our growing awareness of the microbial world's importance and diversity contrasts starkly with our limited understanding of its fundamental structure. Despite recent advances in DNA sequencing, a lack of standardized protocols and common analytical frameworks impedes comparisons among studies, hindering the development of global inferences about microbial life on Earth. Here we present a meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project. Coordinated protocols and new analytical methods, particularly the use of exact sequences instead of clustered operational taxonomic units, enable bacterial and archaeal ribosomal RNA gene sequences to be followed across multiple studies and allow us to explore patterns of diversity at an unprecedented scale. The result is both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth's microbial diversity.

  8. Transgenerational epigenetics: Inheritance of global cytosine methylation and methylation-related epigenetic markers in the shrub Lavandula latifolia.

    PubMed

    Herrera, Carlos M; Alonso, Conchita; Medrano, Mónica; Pérez, Ricardo; Bazaga, Pilar

    2018-04-01

    The ecological and evolutionary significance of natural epigenetic variation (i.e., not based on DNA sequence variants) variation will depend critically on whether epigenetic states are transmitted from parents to offspring, but little is known on epigenetic inheritance in nonmodel plants. We present a quantitative analysis of transgenerational transmission of global DNA cytosine methylation (= proportion of all genomic cytosines that are methylated) and individual epigenetic markers (= methylation status of anonymous MSAP markers) in the shrub Lavandula latifolia. Methods based on parent-offspring correlations and parental variance component estimation were applied to epigenetic features of field-growing plants ('maternal parents') and greenhouse-grown progenies. Transmission of genetic markers (AFLP) was also assessed for reference. Maternal parents differed significantly in global DNA cytosine methylation (range = 21.7-36.7%). Greenhouse-grown maternal families differed significantly in global methylation, and their differences were significantly related to maternal origin. Methylation-sensitive amplified polymorphism (MSAP) markers exhibited significant transgenerational transmission, as denoted by significant maternal variance component of marker scores in greenhouse families and significant mother-offspring correlations of marker scores. Although transmission-related measurements for global methylation and MSAP markers were quantitatively lower than those for AFLP markers taken as reference, this study has revealed extensive transgenerational transmission of genome-wide global cytosine methylation and anonymous epigenetic markers in L. latifolia. Similarity of results for global cytosine methylation and epigenetic markers lends robustness to this conclusion, and stresses the value of considering both types of information in epigenetic studies of nonmodel plants. © 2018 Botanical Society of America.

  9. MGIS: managing banana (Musa spp.) genetic resources information and high-throughput genotyping data

    PubMed Central

    Guignon, V.; Sempere, G.; Sardos, J.; Hueber, Y.; Duvergey, H.; Andrieu, A.; Chase, R.; Jenny, C.; Hazekamp, T.; Irish, B.; Jelali, K.; Adeka, J.; Ayala-Silva, T.; Chao, C.P.; Daniells, J.; Dowiya, B.; Effa effa, B.; Gueco, L.; Herradura, L.; Ibobondji, L.; Kempenaers, E.; Kilangi, J.; Muhangi, S.; Ngo Xuan, P.; Paofa, J.; Pavis, C.; Thiemele, D.; Tossou, C.; Sandoval, J.; Sutanto, A.; Vangu Paka, G.; Yi, G.; Van den houwe, I.; Roux, N.

    2017-01-01

    Abstract Unraveling the genetic diversity held in genebanks on a large scale is underway, due to advances in Next-generation sequence (NGS) based technologies that produce high-density genetic markers for a large number of samples at low cost. Genebank users should be in a position to identify and select germplasm from the global genepool based on a combination of passport, genotypic and phenotypic data. To facilitate this, a new generation of information systems is being designed to efficiently handle data and link it with other external resources such as genome or breeding databases. The Musa Germplasm Information System (MGIS), the database for global ex situ-held banana genetic resources, has been developed to address those needs in a user-friendly way. In developing MGIS, we selected a generic database schema (Chado), the robust content management system Drupal for the user interface, and Tripal, a set of Drupal modules which links the Chado schema to Drupal. MGIS allows germplasm collection examination, accession browsing, advanced search functions, and germplasm orders. Additionally, we developed unique graphical interfaces to compare accessions and to explore them based on their taxonomic information. Accession-based data has been enriched with publications, genotyping studies and associated genotyping datasets reporting on germplasm use. Finally, an interoperability layer has been implemented to facilitate the link with complementary databases like the Banana Genome Hub and the MusaBase breeding database. Database URL: https://www.crop-diversity.org/mgis/ PMID:29220435

  10. ANME-2D Archaea Catalyze Methane Oxidation in Deep Subsurface Sediments Independent of Nitrate Reduction

    NASA Astrophysics Data System (ADS)

    Hernsdorf, A. W.; Amano, Y.; Suzuki, Y.; Ise, K.; Thomas, B. C.; Banfield, J. F.

    2015-12-01

    Terrestrial sediments are an important global reservoir for methane. Microorganisms in the deep subsurface play a critical role in the methane cycle, yet much remains to be learned about their diversity and metabolisms. To provide more comprehensive insight into the microbiology of the methane cycle in the deep subsurface, we conducted a genome-resolved study of samples collected from the Horonobe Underground Research Laboratory (HURL), Japan. Groundwater samples were obtained from three boreholes from a depth range of between 140 m and 250 m in two consecutive years. Groundwater was filtered and metagenomic DNA extracted and sequenced, and the sequence data assembled. Based on the sequences of phylogenetically informative genes on the assembled fragments, we detected a high degree of overlap in community composition across a vertical transect within one borehole at the two sampling times. However, there was comparatively little similarity observed among communities across boreholes. Spatial and temporal abundance patterns were used in combination with tetranucleotide signatures of assembled genome fragments to bin the data and reconstruct over 200 unique draft genomes, of which 137 are considered to be of high quality (>90% complete). The deepest samples from one borehole were highly dominated by an archaeon identified as ANME-2D; this organism was also present at lower abundance in all other samples from that borehole. Also abundant in these microbial communities were novel members of the Gammaproteobacteria, Saccharibacteria (TM7) and Tenericute phyla. Notably, a ~2 Mbp draft genome for the ANME-2D archaeon was reconstructed. As expected, the genome encodes all of the genes predicted to be involved in the reverse methanogenesis pathway. In contrast with the previously reported ANME2-D genome, the HURL ANME-2D genome lacks the capacity to reduce nitrate. However, we identified many multiheme cytochromes with closest similarity to those of the known Fe-reducing/oxidizing archaeon Ferroglobus placidus. Thus, we suggest that ANME2-D may couple methane oxidation to reduction of ferric iron minerals in the sediment and may be generally important as a link between the iron and methane cycles in deep subsurface environments. Such information has important implications for modeling the global carbon cycle.

  11. Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts

    PubMed Central

    Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming

    2015-01-01

    Background Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. Results We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington’s, Alzheimer’s and Parkinson’s diseases. This is the first description of degenerative disease-associated genes in jellyfish. Conclusion We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular level and information on the underlying molecular mechanisms of jellyfish stinging. The findings of this study may also be used in comparative studies of gene expression profiling among different jellyfish species. PMID:26551022

  12. Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts.

    PubMed

    Liu, Guoyan; Zhou, Yonghong; Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming

    2015-01-01

    Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington's, Alzheimer's and Parkinson's diseases. This is the first description of degenerative disease-associated genes in jellyfish. We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular level and information on the underlying molecular mechanisms of jellyfish stinging. The findings of this study may also be used in comparative studies of gene expression profiling among different jellyfish species.

  13. Analysis of Pvama1 genes from China-Myanmar border reveals little regional genetic differentiation of Plasmodium vivax populations.

    PubMed

    Zhu, Xiaotong; Zhao, Pan; Wang, Si; Liu, Fei; Liu, Jun; Wang, Jian; Yang, Zhaoqing; Yan, Guiyun; Fan, Qi; Cao, Yaming; Cui, Liwang

    2016-11-29

    With the premise of diminishing parasite genetic diversity following the reduction of malaria incidence, the analysis of polymorphic antigenic markers may provide important information about the impact of malaria control on local parasite populations. Here we evaluated the genetic diversity of Plasmodium vivax apical membrane antigen 1 (Pvama1) gene in a parasite population from the China-Myanmar border and compared it with global P. vivax populations. We performed evolutionary analysis to examine the genetic diversity, natural selection, and population differentiation of 73 Pvama1 sequences acquired from the China-Myanmar border as well as 615 publically available Pvama1 sequences from seven global P. vivax populations. A total of 308 Pvama1 haplotypes were identified among the global P. vivax isolates. The overall nucleotide diversity of Pvama1 gene among the 73 China-Myanmar border parasite isolates was 0.008 with 41 haplotypes being identified (Hd = 0.958). Domain I (DI) harbored the majority (26/33) of the polymorphic sites. The McDonald Kreitman test showed a significant positive selection across the ectodomain and the DI of Pvama1. The fixation index (F ST ) estimation between the China-Myanmar border, Thailand (0.01) and Myanmar (0.10) showed only slight geographical genetic differentiation. Notably, the Sal-I haplotype was not detected in any of the analyzed global isolates, whereas the Belem strain was restricted to the Thai population. The detected mutations are mapped outside the overlapped region of the predicted B-cell epitopes and intrinsically unstructured/disordered regions. This study revealed high levels of genetic diversity of Pvama1 in the P. vivax parasite population from the China-Myanmar border with DI displaying stronger diversifying selection than other domains. There were low levels of population subdivision among parasite populations from the Greater Mekong Subregion.

  14. A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region.

    PubMed

    Kress, W John; Erickson, David L

    2007-06-06

    A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.

  15. A Comprehensive Approach to Sequence-oriented IsomiR annotation (CASMIR): demonstration with IsomiR profiling in colorectal neoplasia.

    PubMed

    Wu, Chung Wah; Evans, Jared M; Huang, Shengbing; Mahoney, Douglas W; Dukek, Brian A; Taylor, William R; Yab, Tracy C; Smyrk, Thomas C; Jen, Jin; Kisiel, John B; Ahlquist, David A

    2018-05-25

    MicroRNA (miRNA) profiling is an important step in studying biological associations and identifying marker candidates. miRNA exists in isoforms, called isomiRs, which may exhibit distinct properties. With conventional profiling methods, limitations in assay and analysis platforms may compromise isomiR interrogation. We introduce a comprehensive approach to sequence-oriented isomiR annotation (CASMIR) to allow unbiased identification of global isomiRs from small RNA sequencing data. In this approach, small RNA reads are maintained as independent sequences instead of being summarized under miRNA names. IsomiR features are identified through step-wise local alignment against canonical forms and precursor sequences. Through customizing the reference database, CASMIR is applicable to isomiR annotation across species. To demonstrate its application, we investigated isomiR profiles in normal and neoplastic human colorectal epithelia. We also ran miRDeep2, a popular miRNA analysis algorithm to validate isomiRs annotated by CASMIR. With CASMIR, specific and biologically relevant isomiR patterns could be identified. We note that specific isomiRs are often more abundant than their canonical forms. We identify isomiRs that are commonly up-regulated in both colorectal cancer and advanced adenoma, and illustrate advantages in targeting isomiRs as potential biomarkers over canonical forms. Studying miRNAs at the isomiR level could reveal new insight into miRNA biology and inform assay design for specific isomiRs. CASMIR facilitates comprehensive annotation of isomiR features in small RNA sequencing data for isomiR profiling and differential expression analysis.

  16. A functional genomics tool for the Pacific bluefin tuna: Development of a 44K oligonucleotide microarray from whole-genome sequencing data for global transcriptome analysis.

    PubMed

    Yasuike, Motoshige; Fujiwara, Atushi; Nakamura, Yoji; Iwasaki, Yuki; Nishiki, Issei; Sugaya, Takuma; Shimizu, Akio; Sano, Motohiko; Kobayashi, Takanori; Ototake, Mitsuru

    2016-02-01

    Bluefin tunas are one of the most important fishery resources worldwide. Because of high market values, bluefin tuna farming has been rapidly growing during recent years. At present, the most common form of the tuna farming is based on the stocking of wild-caught fish. Therefore, concerns have been raised about the negative impact of the tuna farming on wild stocks. Recently, the Pacific bluefin tuna (PBT), Thunnus orientalis, has succeeded in completing the reproduction cycle under aquaculture conditions, but production bottlenecks remain to be solved because of very little biological information on bluefin tunas. Functional genomics approaches promise to rapidly increase our knowledge on biological processes in the bluefin tuna. Here, we describe the development of the first 44K PBT oligonucleotide microarray (oligo-array), based on whole-genome shotgun (WGS) sequencing and large-scale expressed sequence tags (ESTs) data. In addition, we also introduce an initial 44K PBT oligo-array experiment using in vitro grown peripheral blood leukocytes (PBLs) stimulated with immunostimulants such as lipopolysaccharide (LPS: a cell wall component of Gram-negative bacteria) or polyinosinic:polycytidylic acid (poly I:C: a synthetic mimic of viral infection). This pilot 44K PBT oligo-array analysis successfully addressed distinct immune processes between LPS- and poly I:C- stimulated PBLs. Thus, we expect that this oligo-array will provide an excellent opportunity to analyze global gene expression profiles for a better understanding of diseases and stress, as well as for reproduction, development and influence of nutrition on tuna aquaculture production. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

  17. groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data.

    PubMed

    Chae, Minho; Danko, Charles G; Kraus, W Lee

    2015-07-16

    Global run-on coupled with deep sequencing (GRO-seq) provides extensive information on the location and function of coding and non-coding transcripts, including primary microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and enhancer RNAs (eRNAs), as well as yet undiscovered classes of transcripts. However, few computational tools tailored toward this new type of sequencing data are available, limiting the applicability of GRO-seq data for identifying novel transcription units. Here, we present groHMM, a computational tool in R, which defines the boundaries of transcription units de novo using a two state hidden-Markov model (HMM). A systematic comparison of the performance between groHMM and two existing peak-calling methods tuned to identify broad regions (SICER and HOMER) favorably supports our approach on existing GRO-seq data from MCF-7 breast cancer cells. To demonstrate the broader utility of our approach, we have used groHMM to annotate a diverse array of transcription units (i.e., primary transcripts) from four GRO-seq data sets derived from cells representing a variety of different human tissue types, including non-transformed cells (cardiomyocytes and lung fibroblasts) and transformed cells (LNCaP and MCF-7 cancer cells), as well as non-mammalian cells (from flies and worms). As an example of the utility of groHMM and its application to questions about the transcriptome, we show how groHMM can be used to analyze cell type-specific enhancers as defined by newly annotated enhancer transcripts. Our results show that groHMM can reveal new insights into cell type-specific transcription by identifying novel transcription units, and serve as a complete and useful tool for evaluating functional genomic elements in cells.

  18. Comparative genomics of citric-acid-producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88

    PubMed Central

    Andersen, Mikael R.; Salazar, Margarita P.; Schaap, Peter J.; van de Vondervoort, Peter J.I.; Culley, David; Thykaer, Jette; Frisvad, Jens C.; Nielsen, Kristian F.; Albang, Richard; Albermann, Kaj; Berka, Randy M.; Braus, Gerhard H.; Braus-Stromeyer, Susanna A.; Corrochano, Luis M.; Dai, Ziyu; van Dijck, Piet W.M.; Hofmann, Gerald; Lasure, Linda L.; Magnuson, Jon K.; Menke, Hildegard; Meijer, Martin; Meijer, Susan L.; Nielsen, Jakob B.; Nielsen, Michael L.; van Ooyen, Albert J.J.; Pel, Herman J.; Poulsen, Lars; Samson, Rob A.; Stam, Hein; Tsang, Adrian; van den Brink, Johannes M.; Atkins, Alex; Aerts, Andrea; Shapiro, Harris; Pangilinan, Jasmyn; Salamov, Asaf; Lou, Yigong; Lindquist, Erika; Lucas, Susan; Grimwood, Jane; Grigoriev, Igor V.; Kubicek, Christian P.; Martinez, Diego; van Peij, Noël N.M.E.; Roubos, Johannes A.; Nielsen, Jens; Baker, Scott E.

    2011-01-01

    The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compel additional exploration. We therefore undertook whole-genome sequencing of the acidogenic A. niger wild-type strain (ATCC 1015) and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence, and half the telomeric regions have been elucidated. Moreover, sequence information from ATCC 1015 was used to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 Mb of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis supported up-regulation of genes associated with biosynthesis of amino acids that are abundant in glucoamylase A, tRNA-synthases, and protein transporters in the protein producing CBS 513.88 strain. Our results and data sets from this integrative systems biology analysis resulted in a snapshot of fungal evolution and will support further optimization of cell factories based on filamentous fungi. PMID:21543515

  19. Taxonomic annotation of public fungal ITS sequences from the built environment – a report from an April 10–11, 2017 workshop (Aberdeen, UK)

    PubMed Central

    Nilsson, R. Henrik; Taylor, Andy F. S.; Adams, Rachel I.; Baschien, Christiane; Johan Bengtsson-Palme; Cangren, Patrik; Coleine, Claudia; Heide-Marie Daniel; Glassman, Sydney I.; Hirooka, Yuuri; Irinyi, Laszlo; Reda Iršėnaitė; Pedro M. Martin-Sanchez; Meyer, Wieland; Seung-Yoon Oh; Jose Paulo Sampaio; Seifert, Keith A.; Sklenář, Frantisek; Dirk Stubbe; Suh, Sung-Oui; Summerbell, Richard; Svantesson, Sten; Martin Unterseher; Cobus M. Visagie; Weiss, Michael; Woudenberg, Joyce HC; Christian Wurzbacher; den Wyngaert, Silke Van; Yilmaz, Neriman; Andrey Yurkov; Kõljalg, Urmas; Abarenkov, Kessy

    2018-01-01

    Abstract Recent DNA-based studies have shown that the built environment is surprisingly rich in fungi. These indoor fungi – whether transient visitors or more persistent residents – may hold clues to the rising levels of human allergies and other medical and building-related health problems observed globally. The taxonomic identity of these fungi is crucial in such pursuits. Molecular identification of the built mycobiome is no trivial undertaking, however, given the large number of unidentified, misidentified, and technically compromised fungal sequences in public sequence databases. In addition, the sequence metadata required to make informed taxonomic decisions – such as country and host/substrate of collection – are often lacking even from reference and ex-type sequences. Here we report on a taxonomic annotation workshop (April 10–11, 2017) organized at the James Hutton Institute/University of Aberdeen (UK) to facilitate reproducible studies of the built mycobiome. The 32 participants went through public fungal ITS barcode sequences related to the built mycobiome for taxonomic and nomenclatural correctness, technical quality, and metadata availability. A total of 19,508 changes – including 4,783 name changes, 14,121 metadata annotations, and the removal of 99 technically compromised sequences – were implemented in the UNITE database for molecular identification of fungi (https://unite.ut.ee/) and shared with a range of other databases and downstream resources. Among the genera that saw the largest number of changes were Penicillium, Talaromyces, Cladosporium, Acremonium, and Alternaria, all of them of significant importance in both culture-based and culture-independent surveys of the built environment. PMID:29559822

  20. Plasmid Flux in Escherichia coli ST131 Sublineages, Analyzed by Plasmid Constellation Network (PLACNET), a New Method for Plasmid Reconstruction from Whole Genome Sequences

    PubMed Central

    Garcillán-Barcia, M. Pilar; Mora, Azucena; Blanco, Jorge; Coque, Teresa M.; de la Cruz, Fernando

    2014-01-01

    Bacterial whole genome sequence (WGS) methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET) that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage), comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC), comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ–proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages. PMID:25522143

  1. Plasmid flux in Escherichia coli ST131 sublineages, analyzed by plasmid constellation network (PLACNET), a new method for plasmid reconstruction from whole genome sequences.

    PubMed

    Lanza, Val F; de Toro, María; Garcillán-Barcia, M Pilar; Mora, Azucena; Blanco, Jorge; Coque, Teresa M; de la Cruz, Fernando

    2014-12-01

    Bacterial whole genome sequence (WGS) methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET) that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage), comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC), comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ-proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages.

  2. First Report on Circulation of Echinococcus ortleppi in the one Humped Camel (Camelus dromedaries), Sudan

    PubMed Central

    2013-01-01

    Background Echinococcus granulosus (EG) complex, the cause of cystic echinococcosis (CE), infects humans and several other animal species worldwide and hence the disease is of public health importance. Ten genetic variants, or genotypes designated as (G1-G10), are distributed worldwide based on genetic diversity. The objective of this study was to provide some sequence data and phylogeny of EG isolates recovered from the Sudanese one-humped camel (Camelus dromedaries). Fifty samples of hydatid cysts were collected from the one- humped camels (Camelus dromedaries) at Taboul slaughter house, central Sudan. DNAs were extracted from protoscolices and/or associated germinal layers of hydatid cysts using a commercial kit. The mitochondrial NADH dehydrogenase subunit 1 (NADH1) gene and the cytochrome C oxidase subunit 1 (cox1) gene were used as targets for polymerase chain reaction (PCR) amplification. The PCR products were purified and partial sequences were generated. Sequences were further examined by sequence analysis and subsequent phylogeny to compare these sequences to those from known strains of EG circulating globally. Results The identity of the PCR products were confirmed as NADH1 and cox1 nucleotide sequences using the Basic Local Alignment Search Tool (BLAST) of NCBI (National Center for Biotechnology Information, Bethesda, MD). The phylogenetic analysis showed that 98% (n = 49) of the isolates clustered with Echinococcus canadensis genotype 6 (G6), whereas only one isolate (2%) clustered with Echinococcus ortleppi (G5). Conclusions This investigation expands on the existing sequence data generated from EG isolates recovered from camel in the Sudan. The circulation of the cattle genotype (G5) in the one-humped camel is reported here for the first time. PMID:23800362

  3. First report on circulation of Echinococcus ortleppi in the one humped camel (Camelus dromedaries), Sudan.

    PubMed

    Ahmed, Mohamed E; Eltom, Kamal H; Musa, Nasreen O; Ali, Ibtisam A; Elamin, Fatima M; Grobusch, Martin P; Aradaib, Imadeldin E

    2013-06-25

    Echinococcus granulosus (EG) complex, the cause of cystic echinococcosis (CE), infects humans and several other animal species worldwide and hence the disease is of public health importance. Ten genetic variants, or genotypes designated as (G1-G10), are distributed worldwide based on genetic diversity. The objective of this study was to provide some sequence data and phylogeny of EG isolates recovered from the Sudanese one-humped camel (Camelus dromedaries). Fifty samples of hydatid cysts were collected from the one- humped camels (Camelus dromedaries) at Taboul slaughter house, central Sudan. DNAs were extracted from protoscolices and/or associated germinal layers of hydatid cysts using a commercial kit. The mitochondrial NADH dehydrogenase subunit 1 (NADH1) gene and the cytochrome C oxidase subunit 1 (cox1) gene were used as targets for polymerase chain reaction (PCR) amplification. The PCR products were purified and partial sequences were generated. Sequences were further examined by sequence analysis and subsequent phylogeny to compare these sequences to those from known strains of EG circulating globally. The identity of the PCR products were confirmed as NADH1 and cox1 nucleotide sequences using the Basic Local Alignment Search Tool (BLAST) of NCBI (National Center for Biotechnology Information, Bethesda, MD). The phylogenetic analysis showed that 98% (n = 49) of the isolates clustered with Echinococcus canadensis genotype 6 (G6), whereas only one isolate (2%) clustered with Echinococcus ortleppi (G5). This investigation expands on the existing sequence data generated from EG isolates recovered from camel in the Sudan. The circulation of the cattle genotype (G5) in the one-humped camel is reported here for the first time.

  4. Genetics of coronary artery disease: discovery, biology and clinical translation

    PubMed Central

    Khera, Amit V.; Kathiresan, Sekar

    2018-01-01

    Coronary artery disease is the leading global cause of mortality. Long recognized to be heritable, recent advances have started to unravel the genetic architecture of the disease. Common variant association studies have linked about 60 genetic loci to coronary risk. Large-scale gene sequencing efforts and functional studies have facilitated a better understanding of causal risk factors, elucidated underlying biology and informed the development of new therapeutics. Moving forward, genetic testing could enable precision medicine approaches, by identifying subgroups of patients at increased risk of CAD or those with a specific driving pathophysiology in whom a therapeutic or preventive approach is most useful. PMID:28286336

  5. Synchronization in neural nets

    NASA Technical Reports Server (NTRS)

    Vidal, Jacques J.; Haggerty, John

    1988-01-01

    The paper presents an artificial neural network concept (the Synchronizable Oscillator Networks) where the instants of individual firings in the form of point processes constitute the only form of information transmitted between joining neurons. In the model, neurons fire spontaneously and regularly in the absence of perturbation. When interaction is present, the scheduled firings are advanced or delayed by the firing of neighboring neurons. Networks of such neurons become global oscillators which exhibit multiple synchronizing attractors. From arbitrary initial states, energy minimization learning procedures can make the network converge to oscillatory modes that satisfy multi-dimensional constraints. Such networks can directly represent routing and scheduling problems that consist of ordering sequences of events.

  6. India's Computational Biology Growth and Challenges.

    PubMed

    Chakraborty, Chiranjib; Bandyopadhyay, Sanghamitra; Agoramoorthy, Govindasamy

    2016-09-01

    India's computational science is growing swiftly due to the outburst of internet and information technology services. The bioinformatics sector of India has been transforming rapidly by creating a competitive position in global bioinformatics market. Bioinformatics is widely used across India to address a wide range of biological issues. Recently, computational researchers and biologists are collaborating in projects such as database development, sequence analysis, genomic prospects and algorithm generations. In this paper, we have presented the Indian computational biology scenario highlighting bioinformatics-related educational activities, manpower development, internet boom, service industry, research activities, conferences and trainings undertaken by the corporate and government sectors. Nonetheless, this new field of science faces lots of challenges.

  7. Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq

    PubMed Central

    Shepard, Peter J.; Choi, Eun-A; Lu, Jente; Flanagan, Lisa A.; Hertel, Klemens J.; Shi, Yongsheng

    2011-01-01

    Alternative polyadenylation (APA) of mRNAs has emerged as an important mechanism for post-transcriptional gene regulation in higher eukaryotes. Although microarrays have recently been used to characterize APA globally, they have a number of serious limitations that prevents comprehensive and highly quantitative analysis. To better characterize APA and its regulation, we have developed a deep sequencing-based method called Poly(A) Site Sequencing (PAS-Seq) for quantitatively profiling RNA polyadenylation at the transcriptome level. PAS-Seq not only accurately and comprehensively identifies poly(A) junctions in mRNAs and noncoding RNAs, but also provides quantitative information on the relative abundance of polyadenylated RNAs. PAS-Seq analyses of human and mouse transcriptomes showed that 40%–50% of all expressed genes produce alternatively polyadenylated mRNAs. Furthermore, our study detected evolutionarily conserved polyadenylation of histone mRNAs and revealed novel features of mitochondrial RNA polyadenylation. Finally, PAS-Seq analyses of mouse embryonic stem (ES) cells, neural stem/progenitor (NSP) cells, and neurons not only identified more poly(A) sites than what was found in the entire mouse EST database, but also detected significant changes in the global APA profile that lead to lengthening of 3′ untranslated regions (UTR) in many mRNAs during stem cell differentiation. Together, our PAS-Seq analyses revealed a complex landscape of RNA polyadenylation in mammalian cells and the dynamic regulation of APA during stem cell differentiation. PMID:21343387

  8. Identification of radiation responsive genes and transcriptome profiling via complete RNA sequencing in a stable radioresistant U87 glioblastoma model.

    PubMed

    Doan, Ninh B; Nguyen, Ha S; Alhajala, Hisham S; Jaber, Basem; Al-Gizawiy, Mona M; Ahn, Eun-Young Erin; Mueller, Wade M; Chitambar, Christopher R; Mirza, Shama P; Schmainda, Kathleen M

    2018-05-04

    The absence of major progress in the treatment of glioblastoma (GBM) is partly attributable to our poor understanding of both GBM tumor biology and the acquirement of treatment resistance in recurrent GBMs. Recurrent GBMs are characterized by their resistance to radiation. In this study, we used an established stable U87 radioresistant GBM model and total RNA sequencing to shed light on global mRNA expression changes following irradiation. We identified many genes, the expressions of which were altered in our radioresistant GBM model, that have never before been reported to be associated with the development of radioresistant GBM and should be concertedly further investigated to understand their roles in radioresistance. These genes were enriched in various biological processes such as inflammatory response, cell migration, positive regulation of epithelial to mesenchymal transition, angiogenesis, apoptosis, positive regulation of T-cell migration, positive regulation of macrophage chemotaxis, T-cell antigen processing and presentation, and microglial cell activation involved in immune response genes. These findings furnish crucial information for elucidating the molecular mechanisms associated with radioresistance in GBM. Therapeutically, with the global alterations of multiple biological pathways observed in irradiated GBM cells, an effective GBM therapy may require a cocktail carrying multiple agents targeting multiple implicated pathways in order to have a chance at making a substantial impact on improving the overall GBM survival.

  9. Incidental and clinically actionable genetic variants in 1005 whole exomes and genomes from Qatar.

    PubMed

    Jain, Abhinav; Gandhi, Shrey; Koshy, Remya; Scaria, Vinod

    2018-03-20

    Incidental findings in genomic data have been studied in great detail in the recent years, especially from population-scale data sets. However, little is known about the frequency of such findings in ethnic groups, specifically the Middle East, which were not previously covered in global sequencing studies. The availability of whole exome and genome data sets for a highly consanguineous Arab population from Qatar motivated us to explore the incidental findings in this population-scale data. The sequence data of 1005 Qatari individuals were systematically analyzed for incidental genetic variants in the 59 genes suggested by the American College of Medical Genetics and Genomics. We identified four genetic variants which were pathogenic or likely pathogenic. These variants occurred in six individuals, suggesting a frequency of 0.59% in the population, much lesser than that previously reported from European and African populations. Our analysis identified a variant in RYR1 gene associated with Malignant Hyperthermia that has significantly higher frequency in the population compared to global frequencies. Evaluation of the allele frequencies of these variants suggested enrichment in sub-populations, especially in individuals of Sub-Saharan African ancestry. The present study thereby provides the information on pathogenicity and frequency, which could aid in genomic medicine. To the best of our knowledge, this is the first comprehensive analysis of incidental genetic findings in any Arab population and suggests ethnic differences in incidental findings.

  10. Strongly-motivated positive affects induce faster responses to local than global information of visual stimuli: an approach using large-size Navon letters.

    PubMed

    Noguchi, Yasuki; Tomoike, Kouta

    2016-01-12

    Recent studies argue that strongly-motivated positive emotions (e.g. desire) narrow a scope of attention. This argument is mainly based on an observation that, while humans normally respond faster to global than local information of a visual stimulus (global advantage), positive affects eliminated the global advantage by selectively speeding responses to local (but not global) information. In other words, narrowing of attentional scope was indirectly evidenced by the elimination of global advantage (the same speed of processing between global and local information). No study has directly shown that strongly-motivated positive affects induce faster responses to local than global information while excluding a bias for global information (global advantage) in a baseline (emotionally-neutral) condition. In the present study, we addressed this issue by eliminating the global advantage in a baseline (neutral) state. Induction of positive affects under this state resulted in faster responses to local than global information. Our results provided direct evidence that positive affects in high motivational intensity narrow a scope of attention.

  11. Sequence conservation, HLA-E-Restricted peptide, and best-defined CTL/CD8+ epitopes in gag P24 (capsid) of HIV-1 subtype B

    NASA Astrophysics Data System (ADS)

    Prasetyo, Afiono Agung; Dharmawan, Ruben; Sari, Yulia; Sariyatun, Ratna

    2017-02-01

    Human immunodeficiency virus type 1 (HIV-1) remains a cause of global health problem. Continuous studies of HIV-1 genetic and immunological profiles are important to find strategies against the virus. This study aimed to conduct analysis of sequence conservation, HLA-E-restricted peptide, and best-defined CTL/CD8+ epitopes in p24 (capsid) of HIV-1 subtype B worldwide. The p24-coding sequences from 3,557 HIV subtype B isolates were aligned using MUSCLE and analysed. Some highly conserved regions (sequence conservation ≥95%) were observed. Two considerably long series of sequences with conservation of 100% was observed at base 349-356 and 550-557 of p24 (HXB2 numbering). The consensus from all aligned isolates was precisely the same as consensus B in the Los Alamos HIV Database. The HLA-E-restricted peptide in amino acid (aa) 14-22 of HIV-1 p24 (AISPRTLNA) was found in 55.9% (1,987/3,557) of HIV-1 subtype B worldwide. Forty-four best-defined CTL/CD8+ epitopes were observed, in which VKNWMTETL epitope (aa 181-189 of p24) restricted by B*4801 was the most frequent, as found in 94.9% of isolates. The results of this study would contribute information about HIV-1 subtype B and benefits for further works willing to develop diagnostic and therapeutic strategies against the virus.

  12. Reliable transformation system for Microbotryum lychnidis-dioicae informed by genome and transcriptome project.

    PubMed

    Toh, Su San; Treves, David S; Barati, Michelle T; Perlin, Michael H

    2016-10-01

    Microbotryum lychnidis-dioicae is a member of a species complex infecting host plants in the Caryophyllaceae. It is used as a model system in many areas of research, but attempts to make this organism tractable for reverse genetic approaches have not been fruitful. Here, we exploited the recently obtained genome sequence and transcriptome analysis to inform our design of constructs for use in Agrobacterium-mediated transformation techniques currently available for other fungi. Reproducible transformation was demonstrated at the genomic, transcriptional and functional levels. Moreover, these initial proof-of-principle experiments provide evidence that supports the findings from initial global transcriptome analysis regarding expression from the respective promoters under different growth conditions of the fungus. The technique thus provides for the first time the ability to stably introduce transgenes and over-express target M. lychnidis-dioicae genes.

  13. From barcoding single individuals to metabarcoding biological communities: towards an integrative approach to the study of global biodiversity.

    PubMed

    Cristescu, Melania E

    2014-10-01

    DNA-based species identification, known as barcoding, transformed the traditional approach to the study of biodiversity science. The field is transitioning from barcoding individuals to metabarcoding communities. This revolution involves new sequencing technologies, bioinformatics pipelines, computational infrastructure, and experimental designs. In this dynamic genomics landscape, metabarcoding studies remain insular and biodiversity estimates depend on the particular methods used. In this opinion article, I discuss the need for a coordinated advancement of DNA-based species identification that integrates taxonomic and barcoding information. Such an approach would facilitate access to almost 3 centuries of taxonomic knowledge and 1 decade of building repository barcodes. Conservation projects are time sensitive, research funding is becoming restricted, and informed decisions depend on our ability to embrace integrative approaches to biodiversity science. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. Segmentation and tracking in echocardiographic sequences: active contours guided by optical flow estimates

    NASA Technical Reports Server (NTRS)

    Mikic, I.; Krucinski, S.; Thomas, J. D.

    1998-01-01

    This paper presents a method for segmentation and tracking of cardiac structures in ultrasound image sequences. The developed algorithm is based on the active contour framework. This approach requires initial placement of the contour close to the desired position in the image, usually an object outline. Best contour shape and position are then calculated, assuming that at this configuration a global energy function, associated with a contour, attains its minimum. Active contours can be used for tracking by selecting a solution from a previous frame as an initial position in a present frame. Such an approach, however, fails for large displacements of the object of interest. This paper presents a technique that incorporates the information on pixel velocities (optical flow) into the estimate of initial contour to enable tracking of fast-moving objects. The algorithm was tested on several ultrasound image sequences, each covering one complete cardiac cycle. The contour successfully tracked boundaries of mitral valve leaflets, aortic root and endocardial borders of the left ventricle. The algorithm-generated outlines were compared against manual tracings by expert physicians. The automated method resulted in contours that were within the boundaries of intraobserver variability.

  15. The role of tRNA and ribosome competition in coupling the expression of different mRNAs in Saccharomyces cerevisiae

    PubMed Central

    Chu, Dominique; Barnes, David J.; von der Haar, Tobias

    2011-01-01

    Protein synthesis translates information from messenger RNAs into functional proteomes. Because of the finite nature of the resources required by the translational machinery, both the overall protein synthesis activity of a cell and activity on individual mRNAs are controlled by the allocation of limiting resources. Upon introduction of heterologous sequences into an organism—for example for the purposes of bioprocessing or synthetic biology—limiting resources may also become overstretched, thus negatively affecting both endogenous and heterologous gene expression. In this study, we present a mean-field model of translation in Saccharomyces cerevisiae for the investigation of two particular translational resources, namely ribosomes and aminoacylated tRNAs. We firstly use comparisons of experiments with heterologous sequences and simulations of the same conditions to calibrate our model, and then analyse the behaviour of the translational system in yeast upon introduction of different types of heterologous sequences. Our main findings are that: competition for ribosomes, rather than tRNAs, limits global translation in this organism; that tRNA aminoacylation levels exert, at most, weak control over translational activity; and that decoding speeds and codon adaptation exert strong control over local (mRNA specific) translation rates. PMID:21558172

  16. Differential impact of continuous theta-burst stimulation over left and right DLPFC on planning.

    PubMed

    Kaller, Christoph P; Heinze, Katharina; Frenkel, Annekathrein; Läppchen, Claus H; Unterrainer, Josef M; Weiller, Cornelius; Lange, Rüdiger; Rahm, Benjamin

    2013-01-01

    Most neuroimaging studies on planning report bilateral activations of the dorsolateral prefrontal cortex (dlPFC). Recently, these concurrent activations of left and right dlPFC have been shown to double dissociate with different cognitive demands imposed by the planning task: Higher demands on the extraction of task-relevant information led to stronger activation in left dlPFC, whereas higher demands on the integration of interdependent information into a coherent action sequence entailed stronger activation of right dlPFC. Here, we used continuous theta-burst stimulation (cTBS) to investigate the supposed causal structure-function mapping underlying this double dissociation. Two groups of healthy subjects (left-lateralized stimulation, n = 26; right-lateralized stimulation, n = 26) were tested within-subject on a variant of the Tower of London task following either real cTBS over dlPFC or sham stimulation over posterior parietal cortex. Results revealed that, irrespective of specific task demands, cTBS over left and right dlPFC was associated with a global decrease and increase, respectively, in initial planning times compared to sham stimulation. Moreover, no interaction between task demands and stimulation type (real vs. sham) and/or stimulation side (left vs. right hemisphere) were found. Together, against expectations from previous neuroimaging data, lateralized cTBS did not lead to planning-parameter specific changes in performance, but instead revealed a global asymmetric pattern of faster versus slower task processing after left versus right cTBS. This global asymmetry in the absence of any task-parameter specific impact of cTBS suggests that different levels of information processing may span colocalized, but independent axes of functional lateralization in the dlPFC. Copyright © 2011 Wiley Periodicals, Inc.

  17. Global rotational motion and displacement estimation of digital image stabilization based on the oblique vectors matching algorithm

    NASA Astrophysics Data System (ADS)

    Yu, Fei; Hui, Mei; Zhao, Yue-jin

    2009-08-01

    The image block matching algorithm based on motion vectors of correlative pixels in oblique direction is presented for digital image stabilization. The digital image stabilization is a new generation of image stabilization technique which can obtains the information of relative motion among frames of dynamic image sequences by the method of digital image processing. In this method the matching parameters are calculated from the vectors projected in the oblique direction. The matching parameters based on the vectors contain the information of vectors in transverse and vertical direction in the image blocks at the same time. So the better matching information can be obtained after making correlative operation in the oblique direction. And an iterative weighted least square method is used to eliminate the error of block matching. The weights are related with the pixels' rotational angle. The center of rotation and the global emotion estimation of the shaking image can be obtained by the weighted least square from the estimation of each block chosen evenly from the image. Then, the shaking image can be stabilized with the center of rotation and the global emotion estimation. Also, the algorithm can run at real time by the method of simulated annealing in searching method of block matching. An image processing system based on DSP was used to exam this algorithm. The core processor in the DSP system is TMS320C6416 of TI, and the CCD camera with definition of 720×576 pixels was chosen as the input video signal. Experimental results show that the algorithm can be performed at the real time processing system and have an accurate matching precision.

  18. Comparative genomics of citric-acid producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grigoriev, Igor V.; Baker, Scott E.; Andersen, Mikael R.

    2011-04-28

    The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compels additional exploration. We therefore undertook whole genome sequencing of the acidogenic A. niger wild type strain (ATCC 1015), and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence and half the telomeric regionsmore » have been elucidated. Moreover, sequence information from ATCC 1015 was utilized to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 megabase of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis revealed up-regulation of the electron transport chain, specifically the alternative oxidative pathway in ATCC 1015, while CBS 513.88 showed significant up-regulation of genes relevant to glucoamylase A production, such as tRNA-synthases and protein transporters. Our results and datasets from this integrative systems biology analysis resulted in a snapshot of fungal evolution and will support further optimization of cell factories based on filamentous fungi.[Supplemental materials (10 figures, three text documents and 16 tables) have been made available. The whole genome sequence for A. niger ATCC 1015 is available from NBCI under acc. no ACJE00000000. The up-dated sequence for A. niger CBS 513.88 is available from EMBL under acc. no AM269948-AM270415. The sequence data from the phylogeny study has been submitted to NCBI (GU296686-296739). Microarray data from this study is submitted to GEO as series GSE10983. Accession for reviewers is possible through: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi token GSE10983] The dsmM_ANIGERa_coll511030F library and platform information is deposited at GEO under number GPL6758« less

  19. Effects of informed consent for individual genome sequencing on relevant knowledge.

    PubMed

    Kaphingst, K A; Facio, F M; Cheng, M-R; Brooks, S; Eidem, H; Linn, A; Biesecker, B B; Biesecker, L G

    2012-11-01

    Increasing availability of individual genomic information suggests that patients will need knowledge about genome sequencing to make informed decisions, but prior research is limited. In this study, we examined genome sequencing knowledge before and after informed consent among 311 participants enrolled in the ClinSeq™ sequencing study. An exploratory factor analysis of knowledge items yielded two factors (sequencing limitations knowledge; sequencing benefits knowledge). In multivariable analysis, high pre-consent sequencing limitations knowledge scores were significantly related to education [odds ratio (OR): 8.7, 95% confidence interval (CI): 2.45-31.10 for post-graduate education, and OR: 3.9; 95% CI: 1.05, 14.61 for college degree compared with less than college degree] and race/ethnicity (OR: 2.4, 95% CI: 1.09, 5.38 for non-Hispanic Whites compared with other racial/ethnic groups). Mean values increased significantly between pre- and post-consent for the sequencing limitations knowledge subscale (6.9-7.7, p < 0.0001) and sequencing benefits knowledge subscale (7.0-7.5, p < 0.0001); increase in knowledge did not differ by sociodemographic characteristics. This study highlights gaps in genome sequencing knowledge and underscores the need to target educational efforts toward participants with less education or from minority racial/ethnic groups. The informed consent process improved genome sequencing knowledge. Future studies could examine how genome sequencing knowledge influences informed decision making. © 2012 John Wiley & Sons A/S.

  20. Phylogenomics of Brazilian epidemic isolates of Mycobacterium abscessus subsp. bolletii reveals relationships of global outbreak strains

    PubMed Central

    Davidson, Rebecca M.; Hasan, Nabeeh A.; de Moura, Vinicius Calado Nogueira; Duarte, Rafael Silva; Jackson, Mary; Strong, Michael

    2013-01-01

    Rapidly growing, non-tuberculous mycobacteria (NTM) in the Mycobacterium abscessus (MAB) species are emerging pathogens that cause various diseases including skin and respiratory infections. The species has undergone recent taxonomic nomenclature refinement, and is currently recognized as two subspecies, M. abscessus subsp. abscessus (MAB-A) and M. abscessus subsp. bolletii (MAB-B). The recently reported outbreaks of MAB-B in surgical patients in Brazil from 2004 to 2009 and in cystic fibrosis patients in the United Kingdom (UK) in 2006 to 2012 underscore the need to investigate the genetic diversity of clinical MAB strains. To this end, we sequenced the genomes of two Brazilian MAB-B epidemic isolates (CRM-0019 and CRM-0020) derived from an outbreak of skin infections in Rio de Janeiro, two unrelated MAB strains from patients with pulmonary infections in the United States (US) (NJH8 and NJH11) and one type MAB-B strain (CCUG 48898) and compared them to 25 publically available genomes of globally diverse MAB strains. Genome-wide analyses of 27,598 core genome single nucleotide polymorphisms (SNPs) revealed that the two Brazilian derived CRM strains are nearly indistinguishable from one another and are more closely related to UK outbreak isolates infecting CF patients than to strains from the US, Malaysia or France. Comparative genomic analyses of six closely related outbreak strains revealed geographic-specific large-scale insertion/deletion variation that corresponds to bacteriophage insertions and recombination hotspots. Our study integrates new genome sequence data with existing genomic information to explore the global diversity of infectious M. abscessus isolates and to compare clinically relevant outbreak strains from different continents. PMID:24055961

  1. Whole-genome sequencing and analyses identify high genetic heterogeneity, diversity and endemicity of rotavirus genotype P[6] strains circulating in Africa.

    PubMed

    Nyaga, Martin M; Tan, Yi; Seheri, Mapaseka L; Halpin, Rebecca A; Akopov, Asmik; Stucker, Karla M; Fedorova, Nadia B; Shrivastava, Susmita; Duncan Steele, A; Mwenda, Jason M; Pickett, Brett E; Das, Suman R; Jeffrey Mphahlele, M

    2018-05-18

    Rotavirus A (RVA) exhibits a wide genotype diversity globally. Little is known about the genetic composition of genotype P[6] from Africa. This study investigated possible evolutionary mechanisms leading to genetic diversity of genotype P[6] VP4 sequences. Phylogenetic analyses on 167 P[6] VP4 full-length sequences were conducted, which included six porcine-origin sequences. Of the 167 sequences, 57 were newly acquired through whole genome sequencing as part of this study. The other 110 sequences were all publicly-available global P[6] VP4 full-length sequences downloaded from GenBank. The strength of association between the phenotypic features and the phylogeny was also determined. A number of reassortment and mixed infections of RVA genotype P[6] strains were observed in this study. Phylogenetic analyses demostrated the extensive genetic diversity that exists among human P[6] strains, porcine-like strains, their concomitant clades/subclades and estimated that P[6] VP4 gene has a higher substitution rate with the mean of 1.05E-3 substitutions/site/year. Further, the phylogenetic analyses indicated that genotype P[6] strains were endemic in Africa, characterised by an extensive genetic diversity and long-time local evolution of the viruses. This was also supported by phylogeographic clustering and G-genotype clustering of the P[6] strains when Bayesian Tip-association Significance testing (BaTS) was applied, clearly supporting that the viruses evolved locally in Africa instead of spatial mixing among different regions. Overall, the results demonstrated that multiple mechanisms such as reassortment events, various mutations and possibly interspecies transmission account for the enormous diversity of genotype P[6] strains in Africa. These findings highlight the need for continued global surveillance of rotavirus diversity. Copyright © 2018 Elsevier B.V. All rights reserved.

  2. Inter-rater reliability and aspects of validity of the parent-infant relationship global assessment scale (PIR-GAS)

    PubMed Central

    2013-01-01

    Background The Parent-Infant Relationship Global Assessment Scale (PIR-GAS) signifies a conceptually relevant development in the multi-axial, developmentally sensitive classification system DC:0-3R for preschool children. However, information about the reliability and validity of the PIR-GAS is rare. A review of the available empirical studies suggests that in research, PIR-GAS ratings can be based on a ten-minute videotaped interaction sequence. The qualification of raters may be very heterogeneous across studies. Methods To test whether the use of the PIR-GAS still allows for a reliable assessment of the parent-infant relationship, our study compared a PIR-GAS ratings based on a full-information procedure across multiple settings with ratings based on a ten-minute video by two doctoral candidates of medicine. For each mother-child dyad at a family day hospital (N = 48), we obtained two video ratings and one full-information rating at admission to therapy and at discharge. This pre-post design allowed for a replication of our findings across the two measurement points. We focused on the inter-rater reliability between the video coders, as well as between the video and full-information procedure, including mean differences and correlations between the raters. Additionally, we examined aspects of the validity of video and full-information ratings based on their correlation with measures of child and maternal psychopathology. Results Our results showed that a ten-minute video and full-information PIR-GAS ratings were not interchangeable. Most results at admission could be replicated by the data obtained at discharge. We concluded that a higher degree of standardization of the assessment procedure should increase the reliability of the PIR-GAS, and a more thorough theoretical foundation of the manual should increase its validity. PMID:23705962

  3. Children inhibit global information when the forest is dense and local information when the forest is sparse.

    PubMed

    Krakowski, Claire-Sara; Borst, Grégoire; Vidal, Julie; Houdé, Olivier; Poirel, Nicolas

    2018-09-01

    Visual environments are composed of global shapes and local details that compete for attentional resources. In adults, the global level is processed more rapidly than the local level, and global information must be inhibited in order to process local information when the local information and global information are in conflict. Compared with adults, children present less of a bias toward global visual information and appear to be more sensitive to the density of local elements that constitute the global level. The current study aimed, for the first time, to investigate the key role of inhibition during global/local processing in children. By including two different conditions of global saliency during a negative priming procedure, the results showed that when the global level was salient (dense hierarchical figures), 7-year-old children and adults needed to inhibit the global level to process the local information. However, when the global level was less salient (sparse hierarchical figures), only children needed to inhibit the local level to process the global information. These results confirm a weaker global bias and the greater impact of saliency in children than in adults. Moreover, the results indicate that, regardless of age, inhibition of the most salient hierarchical level is systematically required to select the less salient but more relevant level. These findings have important implications for future research in this area. Copyright © 2018 Elsevier Inc. All rights reserved.

  4. Inhibition in motor imagery: a novel action mode switching paradigm.

    PubMed

    Rieger, Martina; Dahm, Stephan F; Koch, Iring

    2017-04-01

    Motor imagery requires that actual movements are prevented (i.e., inhibited) from execution. To investigate at what level inhibition takes place in motor imagery, we developed a novel action mode switching paradigm. Participants imagined (indicating only start and end) and executed movements from start buttons to target buttons, and we analyzed trial sequence effects. Trial sequences depended on current action mode (imagination or execution), previous action mode (pure blocks/same mode, mixed blocks/same mode, or mixed blocks/other mode), and movement sequence (action repetition, hand repetition, or hand alternation). Results provided evidence for global inhibition (indicated by switch benefits in execution-imagination (E-I)-sequences in comparison to I-I-sequences), effector-specific inhibition (indicated by hand repetition costs after an imagination trial), and target inhibition (indicated by target repetition benefits in I-I-sequences). No evidence for subthreshold motor activation or action-specific inhibition (inhibition of the movement of an effector to a specific target) was obtained. Two (global inhibition and effector-specific inhibition) of the three observed mechanisms are active inhibition mechanisms. In conclusion, motor imagery is not simply a weaker form of execution, which often is implied in views focusing on similarities between imagination and execution.

  5. Thai Youths and Global Warming: Media Information, Awareness, and Lifestyle Activities

    ERIC Educational Resources Information Center

    Chokriensukchai, Kanchana; Tamang, Ritendra

    2010-01-01

    This study examines the exposure of Thai youths to media information on global warming, the relationship between exposure to global warming information and awareness of global warming, and the relationship between that awareness and lifestyle activities that contribute to global warming. A focus group of eight Thai youths provided information that…

  6. Seismic sequence stratigraphy of Miocene deposits related to eustatic, tectonic and climatic events, Cap Bon Peninsula, northeastern Tunisia

    NASA Astrophysics Data System (ADS)

    Gharsalli, Ramzi; Zouaghi, Taher; Soussi, Mohamed; Chebbi, Riadh; Khomsi, Sami; Bédir, Mourad

    2013-09-01

    The Cap Bon Peninsula, belonging to northeastern Tunisia, is located in the Maghrebian Alpine foreland and in the North of the Pelagian block. By its paleoposition, during the Cenozoic, in the edge of the southern Tethyan margin, this peninsula constitutes a geological entity that fossilized the eustatic, tectonic and climatic interactions. Surface and subsurface study carried out in the Cap Bon onshore area and surrounding offshore of Hammamet interests the Miocene deposits from the Langhian-to-Messinian interval time. Related to the basin and the platform positions, sequence and seismic stratigraphy studies have been conducted to identify seven third-order seismic sequences in subsurface (SM1-SM7), six depositional sequences on the Zinnia-1 petroleum well (SDM1-SDM6), and five depositional sequences on the El Oudiane section of the Jebel Abderrahmane (SDM1-SDM5). Each sequence shows a succession of high-frequency systems tract and parasequences. These sequences are separated by remarkable sequence boundaries and maximum flooding surfaces (SB and MFS) that have been correlated to the eustatic cycles and supercycles of the Global Sea Level Chart of Haq et al. (1987). The sequences have been also correlated with Sequence Chronostratigraphic Chart of Hardenbol et al. (1998), related to European basins, allows us to arise some major differences in number and in size. The major discontinuities, which limit the sequences resulted from the interplay between tectonic and climatic phenomena. It thus appears very judicious to bring back these chronological surfaces to eustatic and/or local tectonic activity and global eustatic and climatic controls.

  7. 77 FR 18266 - Meeting of the Department of Justice Global Justice Information Sharing Initiative Federal...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-03-27

    ... Department of Justice Global Justice Information Sharing Initiative Federal Advisory Committee AGENCY: Office... meeting of the Department of Justice (DOJ) Global Justice Information Sharing Initiative (Global) Federal Advisory Committee (GAC) to discuss the Global Initiative, as described at www.it.ojp.gov/global . DATES...

  8. Discovery of genes related to insecticide resistance in Bactrocera dorsalis by functional genomic analysis of a de novo assembled transcriptome.

    PubMed

    Hsu, Ju-Chun; Chien, Ting-Ying; Hu, Chia-Cheng; Chen, Mei-Ju May; Wu, Wen-Jer; Feng, Hai-Tung; Haymer, David S; Chen, Chien-Yu

    2012-01-01

    Insecticide resistance has recently become a critical concern for control of many insect pest species. Genome sequencing and global quantization of gene expression through analysis of the transcriptome can provide useful information relevant to this challenging problem. The oriental fruit fly, Bactrocera dorsalis, is one of the world's most destructive agricultural pests, and recently it has been used as a target for studies of genetic mechanisms related to insecticide resistance. However, prior to this study, the molecular data available for this species was largely limited to genes identified through homology. To provide a broader pool of gene sequences of potential interest with regard to insecticide resistance, this study uses whole transcriptome analysis developed through de novo assembly of short reads generated by next-generation sequencing (NGS). The transcriptome of B. dorsalis was initially constructed using Illumina's Solexa sequencing technology. Qualified reads were assembled into contigs and potential splicing variants (isotigs). A total of 29,067 isotigs have putative homologues in the non-redundant (nr) protein database from NCBI, and 11,073 of these correspond to distinct D. melanogaster proteins in the RefSeq database. Approximately 5,546 isotigs contain coding sequences that are at least 80% complete and appear to represent B. dorsalis genes. We observed a strong correlation between the completeness of the assembled sequences and the expression intensity of the transcripts. The assembled sequences were also used to identify large numbers of genes potentially belonging to families related to insecticide resistance. A total of 90 P450-, 42 GST-and 37 COE-related genes, representing three major enzyme families involved in insecticide metabolism and resistance, were identified. In addition, 36 isotigs were discovered to contain target site sequences related to four classes of resistance genes. Identified sequence motifs were also analyzed to characterize putative polypeptide translational products and associate them with specific genes and protein functions.

  9. Massively parallel sequencing of 32 forensic markers using the Precision ID GlobalFiler™ NGS STR Panel and the Ion PGM™ System.

    PubMed

    Wang, Zheng; Zhou, Di; Wang, Hui; Jia, Zhenjun; Liu, Jing; Qian, Xiaoqin; Li, Chengtao; Hou, Yiping

    2017-11-01

    Massively parallel sequencing (MPS) technologies have proved capable of sequencing the majority of the key forensic STR markers. By MPS, not only the repeat-length size but also sequence variations could be detected. Recently, Thermo Fisher Scientific has designed an advanced MPS 32-plex panel, named the Precision ID GlobalFiler™ NGS STR Panel, where the primer set has been designed specifically for the purpose of MPS technologies and the data analysis are supported by a new version HID STR Genotyper Plugin (V4.0). In this study, a series of experiments that evaluated concordance, reliability, sensitivity of detection, mixture analysis, and the ability to analyze case-type and challenged samples were conducted. In addition, 106 unrelated Han individuals were sequenced to perform genetic analyses of allelic diversity. As expected, MPS detected broader allele variations and gained higher power of discrimination and exclusion rate. MPS results were found to be concordant with current capillary electrophoresis methods, and single source complete profiles could be obtained stably using as little as 100pg of input DNA. Moreover, this MPS panel could be adapted to case-type samples and partial STR genotypes of the minor contributor could be detected up to 19:1 mixture. Aforementioned results indicate that the Precision ID GlobalFiler™ NGS STR Panel is reliable, robust and reproducible and have the potential to be used as a tool for human forensics. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. SDSS-IV MaNGA: Spatially Resolved Star Formation Main Sequence and LI(N)ER Sequence

    NASA Astrophysics Data System (ADS)

    Hsieh, B. C.; Lin, Lihwai; Lin, J. H.; Pan, H. A.; Hsu, C. H.; Sánchez, S. F.; Cano-Díaz, M.; Zhang, K.; Yan, R.; Barrera-Ballesteros, J. K.; Boquien, M.; Riffel, R.; Brownstein, J.; Cruz-González, I.; Hagen, A.; Ibarra, H.; Pan, K.; Bizyaev, D.; Oravetz, D.; Simmons, A.

    2017-12-01

    We present our study on the spatially resolved Hα and M * relation for 536 star-forming and 424 quiescent galaxies taken from the MaNGA survey. We show that the star formation rate surface density ({{{Σ }}}{SFR}), derived based on the Hα emissions, is strongly correlated with the M * surface density ({{{Σ }}}* ) on kiloparsec scales for star-forming galaxies and can be directly connected to the global star-forming sequence. This suggests that the global main sequence may be a consequence of a more fundamental relation on small scales. On the other hand, our result suggests that ∼20% of quiescent galaxies in our sample still have star formation activities in the outer region with lower specific star formation rate (SSFR) than typical star-forming galaxies. Meanwhile, we also find a tight correlation between {{{Σ }}}{{H}α } and {{{Σ }}}* for LI(N)ER regions, named the resolved “LI(N)ER” sequence, in quiescent galaxies, which is consistent with the scenario that LI(N)ER emissions are primarily powered by the hot, evolved stars as suggested in the literature.

  11. Super Normal Vector for Human Activity Recognition with Depth Cameras.

    PubMed

    Yang, Xiaodong; Tian, YingLi

    2017-05-01

    The advent of cost-effectiveness and easy-operation depth cameras has facilitated a variety of visual recognition tasks including human activity recognition. This paper presents a novel framework for recognizing human activities from video sequences captured by depth cameras. We extend the surface normal to polynormal by assembling local neighboring hypersurface normals from a depth sequence to jointly characterize local motion and shape information. We then propose a general scheme of super normal vector (SNV) to aggregate the low-level polynormals into a discriminative representation, which can be viewed as a simplified version of the Fisher kernel representation. In order to globally capture the spatial layout and temporal order, an adaptive spatio-temporal pyramid is introduced to subdivide a depth video into a set of space-time cells. In the extensive experiments, the proposed approach achieves superior performance to the state-of-the-art methods on the four public benchmark datasets, i.e., MSRAction3D, MSRDailyActivity3D, MSRGesture3D, and MSRActionPairs3D.

  12. A compendium of multi-omic sequence information from the Saanich Inlet water column

    DOE PAGES

    Hawley, Alyse K.; Torres-Beltran, Monica; Zaikova, Elena; ...

    2017-10-31

    Microbial communities play vital roles in earth’s geochemical cycles. Within marine oxygen minimum zones (OMZs) gradients of oxygen, nitrate and sulfide create redox gradients that drive biogeochemical cycling of carbon, nitrogen and sulphur. Climate-change induced expansion and intensification of OMZs and associated biogeochemical activities has significant implications for green house gas production i.e. nitrous oxide and methane. Next generation sequencing technologies have enabled observations of changes in microbial community structure and expression of RNA and protein along these redox gradients within OMZs. Here, we present a multi-omic time series dataset from Saanich Inlet spanning six years, including high spatial resolutionmore » small subunit ribosomal RNA tags, metagenomes, metatranscriptomes, and metaproteomes. As a result, this compendium provides paired multi-omic datasets over multiple time points providing a basis for exploring shifts in microbial community interactions and regulation of metabolic activities both along redox gradients and over time with implications for global climate models.« less

  13. A compendium of multi-omic sequence information from the Saanich Inlet water column

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hawley, Alyse K.; Torres-Beltran, Monica; Zaikova, Elena

    Microbial communities play vital roles in earth’s geochemical cycles. Within marine oxygen minimum zones (OMZs) gradients of oxygen, nitrate and sulfide create redox gradients that drive biogeochemical cycling of carbon, nitrogen and sulphur. Climate-change induced expansion and intensification of OMZs and associated biogeochemical activities has significant implications for green house gas production i.e. nitrous oxide and methane. Next generation sequencing technologies have enabled observations of changes in microbial community structure and expression of RNA and protein along these redox gradients within OMZs. Here, we present a multi-omic time series dataset from Saanich Inlet spanning six years, including high spatial resolutionmore » small subunit ribosomal RNA tags, metagenomes, metatranscriptomes, and metaproteomes. As a result, this compendium provides paired multi-omic datasets over multiple time points providing a basis for exploring shifts in microbial community interactions and regulation of metabolic activities both along redox gradients and over time with implications for global climate models.« less

  14. WheatGenome.info: an integrated database and portal for wheat genome information.

    PubMed

    Lai, Kaitao; Berkman, Paul J; Lorenc, Michal Tadeusz; Duran, Chris; Smits, Lars; Manoli, Sahana; Stiller, Jiri; Edwards, David

    2012-02-01

    Bread wheat (Triticum aestivum) is one of the most important crop plants, globally providing staple food for a large proportion of the human population. However, improvement of this crop has been limited due to its large and complex genome. Advances in genomics are supporting wheat crop improvement. We provide a variety of web-based systems hosting wheat genome and genomic data to support wheat research and crop improvement. WheatGenome.info is an integrated database resource which includes multiple web-based applications. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second-generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This system includes links to a variety of wheat genome resources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/.

  15. Recognizing Chinese characters in digital ink from non-native language writers using hierarchical models

    NASA Astrophysics Data System (ADS)

    Bai, Hao; Zhang, Xi-wen

    2017-06-01

    While Chinese is learned as a second language, its characters are taught step by step from their strokes to components, radicals to components, and their complex relations. Chinese Characters in digital ink from non-native language writers are deformed seriously, thus the global recognition approaches are poorer. So a progressive approach from bottom to top is presented based on hierarchical models. Hierarchical information includes strokes and hierarchical components. Each Chinese character is modeled as a hierarchical tree. Strokes in one Chinese characters in digital ink are classified with Hidden Markov Models and concatenated to the stroke symbol sequence. And then the structure of components in one ink character is extracted. According to the extraction result and the stroke symbol sequence, candidate characters are traversed and scored. Finally, the recognition candidate results are listed by descending. The method of this paper is validated by testing 19815 copies of the handwriting Chinese characters written by foreign students.

  16. Examination of Triacylglycerol Biosynthetic Pathways via De Novo Transcriptomic and Proteomic Analyses in an Unsequenced Microalga

    PubMed Central

    Guarnieri, Michael T.; Nag, Ambarish; Smolinski, Sharon L.; Darzins, Al; Seibert, Michael; Pienkos, Philip T.

    2011-01-01

    Biofuels derived from algal lipids represent an opportunity to dramatically impact the global energy demand for transportation fuels. Systems biology analyses of oleaginous algae could greatly accelerate the commercialization of algal-derived biofuels by elucidating the key components involved in lipid productivity and leading to the initiation of hypothesis-driven strain-improvement strategies. However, higher-level systems biology analyses, such as transcriptomics and proteomics, are highly dependent upon available genomic sequence data, and the lack of these data has hindered the pursuit of such analyses for many oleaginous microalgae. In order to examine the triacylglycerol biosynthetic pathway in the unsequenced oleaginous microalga, Chlorella vulgaris, we have established a strategy with which to bypass the necessity for genomic sequence information by using the transcriptome as a guide. Our results indicate an upregulation of both fatty acid and triacylglycerol biosynthetic machinery under oil-accumulating conditions, and demonstrate the utility of a de novo assembled transcriptome as a search model for proteomic analysis of an unsequenced microalga. PMID:22043295

  17. Phylogenetic evidence for multiple intertypic recombinations in enterovirus B81 strains isolated in Tibet, China

    PubMed Central

    Hu, Lan; Zhang, Yong; Hong, Mei; Zhu, Shuangli; Yan, Dongmei; Wang, Dongyan; Li, Xiaolei; Zhu, Zhen; Tsewang; Xu, Wenbo

    2014-01-01

    Enterovirus B81 (EV-B81) is a newly identified serotype within the species enterovirus B (EV-B). To date, only eight nucleotide sequences of EV-B81 have been published and only one full-length genome sequence (the prototype strain) has been made available in the GenBank database. Here, we report the full-length genome sequences of two EV-B81 strains isolated in the Tibet Autonomous Region of China during acute flaccid paralysis surveillance activities, and we also conducted an antibody seroprevalence study in two prefectures of Tibet. The sequence comparison and phylogenetic dendrogram analysis revealed high variability among the global EV-B81 strains and frequent intertypic recombination in the non-structural protein region of EV-B serotypes, suggesting high genetic diversity of EV-B81. However, low positive rates and low titers of neutralizing antibodies against EV-B81 were detected. Nearly 68% of children under the age of five had no neutralizing antibodies against EV-B81. Hence, the extent of transmission and the exposure of the population to this EV type are very limited. Although little is known about the biological and pathogenic properties of EV-B81 because of few research in this field owing to the limited number of isolates, our study provides basic information for further studies of EV-B81. PMID:25112835

  18. Comparison of Campylobacter jejuni isolates from human, food, veterinary and environmental sources in Iceland using PFGE, MLST and fla-SVR sequencing.

    PubMed

    Magnússon, S H; Guðmundsdóttir, S; Reynisson, E; Rúnarsson, A R; Harðardóttir, H; Gunnarson, E; Georgsson, F; Reiersen, J; Marteinsson, V Th

    2011-10-01

    Campylobacter jejuni isolates from various sources in Iceland were genotyped with the aim of assessing the genetic diversity, population structure, source distribution and campylobacter transmission routes to humans. A collection of 584 Campylobacter isolates were collected from clinical cases, food, animals and environment in Iceland in 1999-2002, during a period of national Campylobacter epidemic in Iceland. All isolates were characterized by pulse field gel electrophoresis (PFGE), and selected subset of 52 isolates representing the diversity of the identified PFGE types was further genotyped using multilocus sequence typing (MLST) and fla-SVR sequencing to gain better insight into the population structure. The results show a substantial diversity within the Icelandic Campylobacter population. Majority of the human Campylobacter infections originated from domestic chicken and cattle isolates. MLST showed the isolates to be distributed among previously reported and common sequence type complexes in the MLST database. The genotyping of Campylobacter from various sources has not previously been reported from Iceland, and the results of the study gave a valuable insight into the population structure of Camp. jejuni in Iceland, source distribution and transmission routes to humans. The geographical isolation of Iceland in the north Atlantic provides new information on Campylobacter population dynamics on a global scale. Journal of Applied Microbiology © 2011 The Society for Applied Microbiology No claim to Icelandic Government works.

  19. Genome Sequence and Transcriptome Analyses of Chrysochromulina tobin: Metabolic Tools for Enhanced Algal Fitness in the Prominent Order Prymnesiales (Haptophyceae)

    PubMed Central

    Hovde, Blake T.; Deodato, Chloe R.; Hunsperger, Heather M.; Ryken, Scott A.; Yost, Will; Jha, Ramesh K.; Patterson, Johnathan; Monnat, Raymond J.; Barlow, Steven B.; Starkenburg, Shawn R.; Cattolico, Rose Ann

    2015-01-01

    Haptophytes are recognized as seminal players in aquatic ecosystem function. These algae are important in global carbon sequestration, form destructive harmful blooms, and given their rich fatty acid content, serve as a highly nutritive food source to a broad range of eco-cohorts. Haptophyte dominance in both fresh and marine waters is supported by the mixotrophic nature of many taxa. Despite their importance the nuclear genome sequence of only one haptophyte, Emiliania huxleyi (Isochrysidales), is available. Here we report the draft genome sequence of Chrysochromulina tobin (Prymnesiales), and transcriptome data collected at seven time points over a 24-hour light/dark cycle. The nuclear genome of C. tobin is small (59 Mb), compact (∼40% of the genome is protein coding) and encodes approximately 16,777 genes. Genes important to fatty acid synthesis, modification, and catabolism show distinct patterns of expression when monitored over the circadian photoperiod. The C. tobin genome harbors the first hybrid polyketide synthase/non-ribosomal peptide synthase gene complex reported for an algal species, and encodes potential anti-microbial peptides and proteins involved in multidrug and toxic compound extrusion. A new haptophyte xanthorhodopsin was also identified, together with two “red” RuBisCO activases that are shared across many algal lineages. The Chrysochromulina tobin genome sequence provides new information on the evolutionary history, ecology and economic importance of haptophytes. PMID:26397803

  20. From psychological need satisfaction to intentional behavior: testing a motivational sequence in two behavioral contexts.

    PubMed

    Hagger, Martin S; Chatzisarantis, Nikos L D; Harris, Jemma

    2006-02-01

    The present study tested a motivational sequence in which global-level psychological need satisfaction from self-determination theory influenced intentions and behavior directly and indirectly through contextual-level motivation and situational-level decision-making constructs from the theory of planned behavior. Two samples of university students (N = 511) completed measures of global-level psychological need satisfaction, contextual-level autonomous motivation, and situational-level attitudes, subjective norms, perceived behavioral control, intentions, and behavior in two behavioral contexts: exercise and dieting. A structural equation model supported the proposed sequence in both samples. The indirect effect was present for exercise behavior, whereas both direct and indirect effects were found for dieting behavior. Findings independently supported the component theories and provided a comprehensive integrated explanation of volitional behavior.

  1. Low Diversity Cryptococcus neoformans Variety grubii Multilocus Sequence Types from Thailand Are Consistent with an Ancestral African Origin

    PubMed Central

    Simwami, Sitali P.; Khayhan, Kantarawee; Henk, Daniel A.; Aanensen, David M.; Boekhout, Teun; Hagen, Ferry; Brouwer, Annemarie E.; Harrison, Thomas S.; Donnelly, Christl A.; Fisher, Matthew C.

    2011-01-01

    The global burden of HIV-associated cryptococcal meningitis is estimated at nearly one million cases per year, causing up to a third of all AIDS-related deaths. Molecular epidemiology constitutes the main methodology for understanding the factors underpinning the emergence of this understudied, yet increasingly important, group of pathogenic fungi. Cryptococcus species are notable in the degree that virulence differs amongst lineages, and highly-virulent emerging lineages are changing patterns of human disease both temporally and spatially. Cryptococcus neoformans variety grubii (Cng, serotype A) constitutes the most ubiquitous cause of cryptococcal meningitis worldwide, however patterns of molecular diversity are understudied across some regions experiencing significant burdens of disease. We compared 183 clinical and environmental isolates of Cng from one such region, Thailand, Southeast Asia, against a global MLST database of 77 Cng isolates. Population genetic analyses showed that Thailand isolates from 11 provinces were highly homogenous, consisting of the same genetic background (globally known as VNI) and exhibiting only ten nearly identical sequence types (STs), with three (STs 44, 45 and 46) dominating our sample. This population contains significantly less diversity when compared against the global population of Cng, specifically Africa. Genetic diversity in Cng was significantly subdivided at the continental level with nearly half (47%) of the global STs unique to a genetically diverse and recombining population in Botswana. These patterns of diversity, when combined with evidence from haplotypic networks and coalescent analyses of global populations, are highly suggestive of an expansion of the Cng VNI clade out of Africa, leading to a limited number of genotypes founding the Asian populations. Divergence time testing estimates the time to the most common ancestor between the African and Asian populations to be 6,920 years ago (95% HPD 122.96 - 27,177.76). Further high-density sampling of global Cng STs is now necessary to resolve the temporal sequence underlying the global emergence of this human pathogen. PMID:21573144

  2. Low diversity Cryptococcus neoformans variety grubii multilocus sequence types from Thailand are consistent with an ancestral African origin.

    PubMed

    Simwami, Sitali P; Khayhan, Kantarawee; Henk, Daniel A; Aanensen, David M; Boekhout, Teun; Hagen, Ferry; Brouwer, Annemarie E; Harrison, Thomas S; Donnelly, Christl A; Fisher, Matthew C

    2011-04-01

    The global burden of HIV-associated cryptococcal meningitis is estimated at nearly one million cases per year, causing up to a third of all AIDS-related deaths. Molecular epidemiology constitutes the main methodology for understanding the factors underpinning the emergence of this understudied, yet increasingly important, group of pathogenic fungi. Cryptococcus species are notable in the degree that virulence differs amongst lineages, and highly-virulent emerging lineages are changing patterns of human disease both temporally and spatially. Cryptococcus neoformans variety grubii (Cng, serotype A) constitutes the most ubiquitous cause of cryptococcal meningitis worldwide, however patterns of molecular diversity are understudied across some regions experiencing significant burdens of disease. We compared 183 clinical and environmental isolates of Cng from one such region, Thailand, Southeast Asia, against a global MLST database of 77 Cng isolates. Population genetic analyses showed that Thailand isolates from 11 provinces were highly homogenous, consisting of the same genetic background (globally known as VNI) and exhibiting only ten nearly identical sequence types (STs), with three (STs 44, 45 and 46) dominating our sample. This population contains significantly less diversity when compared against the global population of Cng, specifically Africa. Genetic diversity in Cng was significantly subdivided at the continental level with nearly half (47%) of the global STs unique to a genetically diverse and recombining population in Botswana. These patterns of diversity, when combined with evidence from haplotypic networks and coalescent analyses of global populations, are highly suggestive of an expansion of the Cng VNI clade out of Africa, leading to a limited number of genotypes founding the Asian populations. Divergence time testing estimates the time to the most common ancestor between the African and Asian populations to be 6,920 years ago (95% HPD 122.96 - 27,177.76). Further high-density sampling of global Cng STs is now necessary to resolve the temporal sequence underlying the global emergence of this human pathogen.

  3. Sequence stratigraphy as a scientific enterprise: the evolution and persistence of conflicting paradigms

    NASA Astrophysics Data System (ADS)

    Miall, Andrew D.; Miall, Charlene E.

    2001-08-01

    In the 1970s, seismic stratigraphy represented a new paradigm in geological thought. The development of new techniques for analyzing seismic-reflection data constituted a "crisis," as conceptualized by T.S. Kuhn, and stimulated a revolution in stratigraphy. We analyze here a specific subset of the new ideas, that pertaining to the concept of global-eustasy and the global cycle chart published by Vail et al. [Vail, P.R., Mitchum, R.M., Jr., Todd, R.G., Widmier, J.M., Thompson, S., III, Sangree, J.B., Bubb, J.N., Hatlelid, W.G., 1977. Seismic stratigraphy and global changes of sea-level. In: Payton, C.E. (Ed.), Seismic Stratigraphy—Applications to Hydrocarbon Exploration, Am. Assoc. Pet. Geol. Mem. 26, pp. 49-212.] The global-eustasy model posed two challenges to the "normal science" of stratigraphy then underway: (1) that sequence stratigraphy, as exemplified by the global cycle chart, constitutes a superior standard of geologic time to that assembled from conventional chronostratigraphic evidence, and (2) that stratigraphic processes are dominated by the effects of eustasy, to the exclusion of other allogenic mechanisms, including tectonism. While many stratigraphers now doubt the universal validity of the model of global-eustasy, what we term the global-eustasy paradigm, a group of sequence researchers led by Vail still adheres to it, and the two conceptual approaches have evolved into two conflicting paradigms. Those who assert that there are multiple processes generating stratigraphic sequences (possibly including eustatic processes) are adherents of what we term the complexity paradigm. Followers of this paradigm argue that tests of the global cycle chart amount to little more than circular reasoning. A new body of work documenting the European sequence record was published in 1998 by de Graciansky et al. These workers largely follow the global-eustasy paradigm. Citation and textual analysis of this work indicates that they have not responded to any of the scientific problems identified by the opposing group. These researchers have developed their own descriptive and interpretive language that is largely self-referential. Through the use of philosophical and sociological assumptions about the nature of human activity, and in particular the work of Thomas Kuhn, we have attempted to illustrate (1) how the preconceptions of geologists shape their observations in nature; (2) how the working environment can contribute to the consensus that develops around a theoretical approach with a concomitant disregard for anomalous data that may arise; (3) how a theoretical argument can be accepted by the geological community in the absence of "proofs" such as documentation and primary data; (4) how the definition of a situation and the use or non-use of geological language "texts" can direct geological interpretive processes in one direction or another; and (5) how citation patterns and clusters of interrelated "invisible colleges" of geologists can extend or thwart the advancement of geological knowledge.

  4. The Global Reciprocal Reprogramming between Mycobacteriophage SWU1 and Mycobacterium Reveals the Molecular Strategy of Subversion and Promotion of Phage Infection

    PubMed Central

    Fan, Xiangyu; Duan, Xiangke; Tong, Yan; Huang, Qinqin; Zhou, Mingliang; Wang, Huan; Zeng, Lanying; Young, Ry F.; Xie, Jianping

    2016-01-01

    Bacteriophages are the viruses of bacteria, which have contributed extensively to our understanding of life and modern biology. The phage-mediated bacterial growth inhibition represents immense untapped source for novel antimicrobials. Insights into the interaction between mycobacteriophage and Mycobacterium host will inform better utilizing of mycobacteriophage. In this study, RNA sequencing technology (RNA-seq) was used to explore the global response of Mycobacterium smegmatis mc2155 at an early phase of infection with mycobacteriophage SWU1, key host metabolic processes of M. smegmatis mc2155 shut off by SWU1, and the responsible phage proteins. The results of RNA-seq were confirmed by Real-time PCR and functional assay. 1174 genes of M. smegmatis mc2155 (16.9% of the entire encoding capacity) were differentially regulated by phage infection. These genes belong to six functional categories: (i) signal transduction, (ii) cell energetics, (iii) cell wall biosynthesis, (iv) DNA, RNA, and protein biosynthesis, (v) iron uptake, (vi) central metabolism. The transcription patterns of phage SWU1 were also characterized. This study provided the first global glimpse of the reciprocal reprogramming between the mycobacteriophage and Mycobacterium host. PMID:26858712

  5. Contrasting introduction scenarios among continents in the worldwide invasion of the banana fungal pathogen Mycosphaerella fijiensis.

    PubMed

    Robert, S; Ravigne, V; Zapater, M-F; Abadie, C; Carlier, J

    2012-03-01

    Reconstructing and characterizing introduction routes is a key step towards understanding the ecological and evolutionary factors underlying successful invasions and disease emergence. Here, we aimed to decipher scenarios of introduction and stochastic demographic events associated with the global spread of an emerging disease of bananas caused by the destructive fungal pathogen Mycosphaerella fijiensis. We analysed the worldwide population structure of this fungus using 21 microsatellites and 8 sequence-based markers on 735 individuals from 37 countries. Our analyses designated South-East Asia as the source of the global invasion and supported the location of the centre of origin of M. fijiensis within this area. We confirmed the occurrence of bottlenecks upon introduction into other continents followed by widespread founder events within continents. Furthermore, this study suggested contrasting introduction scenarios of the pathogen between the African and American continents. While potential signatures of admixture resulting from multiple introductions were detected in America, all the African samples examined seem to descend from a single successful founder event. In combination with historical information, our study reveals an original and unprecedented global scenario of invasion for this recently emerging disease caused by a wind-dispersed pathogen. © 2012 Blackwell Publishing Ltd.

  6. Clustering analysis of proteins from microbial genomes at multiple levels of resolution.

    PubMed

    Zaslavsky, Leonid; Ciufo, Stacy; Fedorov, Boris; Tatusova, Tatiana

    2016-08-31

    Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy. Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering. The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters. The developed filtering strategies allow to identify and exclude such peripheral proteins limiting the protein dataset in global clustering. Overall, the proposed methodology allows the relevant data at different levels of details to be obtained and data redundancy eliminated while keeping biologically interesting variations.

  7. A Two-Locus Global DNA Barcode for Land Plants: The Coding rbcL Gene Complements the Non-Coding trnH-psbA Spacer Region

    PubMed Central

    Kress, W. John; Erickson, David L.

    2007-01-01

    Background A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Methodology/Principal Findings Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. Conclusions/Significance A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination. PMID:17551588

  8. Rapid characterization of the 2015 Mw 7.8 Gorkha, Nepal, earthquake sequence and its seismotectonic context

    USGS Publications Warehouse

    Hayes, Gavin; Briggs, Richard; Barnhart, William D.; Yeck, William; McNamara, Daniel E.; Wald, David J.; Nealy, Jennifer; Benz, Harley M.; Gold, Ryan D.; Jaiswal, Kishor S.; Marano, Kristin; Earle, Paul S.; Hearne, Mike; Smoczyk, Gregory M.; Wald, Lisa A.; Samsonov, Sergey

    2015-01-01

    Earthquake response and related information products are important for placing recent seismic events into context and particularly for understanding the impact earthquakes can have on the regional community and its infrastructure. These tools are even more useful if they are available quickly, ahead of detailed information from the areas affected by such earthquakes. Here we provide an overview of the response activities and related information products generated and provided by the U.S. Geological Survey National Earthquake Information Center in association with the 2015 M 7.8 Gorkha, Nepal, earthquake. This group monitors global earthquakes 24  hrs/day and 7  days/week to provide rapid information on the location and size of recent events and to characterize the source properties, tectonic setting, and potential fatalities and economic losses associated with significant earthquakes. We present the timeline over which these products became available, discuss what they tell us about the seismotectonics of the Gorkha earthquake and its aftershocks, and examine how their information is used today, and might be used in the future, to help mitigate the impact of such natural disasters.

  9. A communal catalogue reveals Earth’s multiscale microbial diversity

    DOE PAGES

    Thompson, Luke R.; Sanders, Jon G.; McDonald, Daniel; ...

    2017-11-01

    Our growing awareness of the importance and diversity of the microbial world contrasts starkly with our limited understanding of its fundamental structure. Despite remarkable advances in DNA sequence generation, a lack of standardized protocols and common analytical framework impede useful comparison between studies, hindering development of global inferences about microbial life on Earth. Here, we show that with coordinated protocols, exact microbial 16S rRNA gene sequences can be followed across scores of individual studies, revealing patterns of diversity, community structure, and life history strategy at a planetary scale. Using 27,751 crowdsourced environmental samples comprising more than 2.2 billion reads, wemore » find sharp divides between host-associated and free-living communities. We show that the distribution of taxonomic and sequence diversity follows consistent trends across samples types and along gradients of environmental parameters, highlighting some of the global evolutionary patterns and ecological principles that underpin Earth’s microbiome. Here, this dataset provides the most complete environmental survey of our microbial world to date, and serves as a growing reference to provide immediate global context to future microbial surveys.« less

  10. A communal catalogue reveals Earth’s multiscale microbial diversity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thompson, Luke R.; Sanders, Jon G.; McDonald, Daniel

    Our growing awareness of the importance and diversity of the microbial world contrasts starkly with our limited understanding of its fundamental structure. Despite remarkable advances in DNA sequence generation, a lack of standardized protocols and common analytical framework impede useful comparison between studies, hindering development of global inferences about microbial life on Earth. Here, we show that with coordinated protocols, exact microbial 16S rRNA gene sequences can be followed across scores of individual studies, revealing patterns of diversity, community structure, and life history strategy at a planetary scale. Using 27,751 crowdsourced environmental samples comprising more than 2.2 billion reads, wemore » find sharp divides between host-associated and free-living communities. We show that the distribution of taxonomic and sequence diversity follows consistent trends across samples types and along gradients of environmental parameters, highlighting some of the global evolutionary patterns and ecological principles that underpin Earth’s microbiome. Here, this dataset provides the most complete environmental survey of our microbial world to date, and serves as a growing reference to provide immediate global context to future microbial surveys.« less

  11. Chronology of Eocene-Miocene sequences on the New Jersey shallow shelf: implications for regional, interregional, and global correlations

    USGS Publications Warehouse

    Browning, James V.; Miller, Kenneth G.; Sugarman, Peter J.; Barron, John; McCarthy, Francine M.G.; Kulhanek, Denise K.; Katz, Miriam E.; Feigenson, Mark D.

    2013-01-01

    Integrated Ocean Drilling Program Expedition 313 continuously cored and logged latest Eocene to early-middle Miocene sequences at three sites (M27, M28, and M29) on the inner-middle continental shelf offshore New Jersey, providing an opportunity to evaluate the ages, global correlations, and significance of sequence boundaries. We provide a chronology for these sequences using integrated strontium isotopic stratigraphy and biostratigraphy (primarily calcareous nannoplankton, diatoms, and dinocysts [dinoflagellate cysts]). Despite challenges posed by shallow-water sediments, age resolution is typically ±0.5 m.y. and in many sequences is as good as ±0.25 m.y. Three Oligocene sequences were sampled at Site M27 on sequence bottomsets. Fifteen early to early-middle Miocene sequences were dated at Sites M27, M28, and M29 across clinothems in topsets, foresets (where the sequences are thickest), and bottomsets. A few sequences have coarse (∼1 m.y.) or little age constraint due to barren zones; we constrain the age estimates of these less well dated sequences by applying the principle of superposition, i.e., sediments above sequence boundaries in any site are younger than the sediments below the sequence boundaries at other sites. Our age control provides constraints on the timing of deposition in the clinothem; sequences on the topsets are generally the youngest in the clinothem, whereas the bottomsets generally are the oldest. The greatest amount of time is represented on foresets, although we have no evidence for a correlative conformity. Our chronology provides a baseline for regional and interregional correlations and sea-level reconstructions: (1) we correlate a major increase in sedimentation rate precisely with the timing of the middle Miocene climate changes associated with the development of a permanent East Antarctic Ice Sheet; and (2) the timing of sequence boundaries matches the deep-sea oxygen isotopic record, implicating glacioeustasy as a major driver for forming sequence boundaries.

  12. Random variability explains apparent global clustering of large earthquakes

    USGS Publications Warehouse

    Michael, A.J.

    2011-01-01

    The occurrence of 5 Mw ≥ 8.5 earthquakes since 2004 has created a debate over whether or not we are in a global cluster of large earthquakes, temporarily raising risks above long-term levels. I use three classes of statistical tests to determine if the record of M ≥ 7 earthquakes since 1900 can reject a null hypothesis of independent random events with a constant rate plus localized aftershock sequences. The data cannot reject this null hypothesis. Thus, the temporal distribution of large global earthquakes is well-described by a random process, plus localized aftershocks, and apparent clustering is due to random variability. Therefore the risk of future events has not increased, except within ongoing aftershock sequences, and should be estimated from the longest possible record of events.

  13. Smooth quantile normalization.

    PubMed

    Hicks, Stephanie C; Okrah, Kwame; Paulson, Joseph N; Quackenbush, John; Irizarry, Rafael A; Bravo, Héctor Corrada

    2018-04-01

    Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.

  14. Global Transmission Dynamics of Measles in the Measles Elimination Era.

    PubMed

    Furuse, Yuki; Oshitani, Hitoshi

    2017-04-16

    Although there have been many epidemiological reports of the inter-country transmission of measles, systematic analysis of the global transmission dynamics of the measles virus (MV) is limited. In this study, we applied phylogeographic analysis to characterize the global transmission dynamics of the MV using large-scale genetic sequence data (obtained for 7456 sequences) from 115 countries between 1954 and 2015. These analyses reveal the spatial and temporal characteristics of global transmission of the virus, especially in Australia, China, India, Japan, the UK, and the USA in the period since 1990. The transmission is frequently observed, not only within the same region but also among distant and frequently visited areas. Frequencies of export from measles-endemic countries, such as China, India, and Japan are high but decreasing, while the frequencies from countries where measles is no longer endemic, such as Australia, the UK, and the USA, are low but slightly increasing. The world is heading toward measles eradication, but the disease is still transmitted regionally and globally. Our analysis reveals that countries wherein measles is endemic and those having eliminated the disease (apart from occasional outbreaks) both remain a source of global transmission in this measles elimination era. It is therefore crucial to maintain vigilance in efforts to monitor and eradicate measles globally.

  15. Computer-aided system for detecting runway incursions

    NASA Astrophysics Data System (ADS)

    Sridhar, Banavar; Chatterji, Gano B.

    1994-07-01

    A synthetic vision system for enhancing the pilot's ability to navigate and control the aircraft on the ground is described. The system uses the onboard airport database and images acquired by external sensors. Additional navigation information needed by the system is provided by the Inertial Navigation System and the Global Positioning System. The various functions of the system, such as image enhancement, map generation, obstacle detection, collision avoidance, guidance, etc., are identified. The available technologies, some of which were developed at NASA, that are applicable to the aircraft ground navigation problem are noted. Example images of a truck crossing the runway while the aircraft flies close to the runway centerline are described. These images are from a sequence of images acquired during one of the several flight experiments conducted by NASA to acquire data to be used for the development and verification of the synthetic vision concepts. These experiments provide a realistic database including video and infrared images, motion states from the Inertial Navigation System and the Global Positioning System, and camera parameters.

  16. A Microbiological Revolution Meets an Ancient Disease: Improving the Management of Tuberculosis with Genomics

    PubMed Central

    Wlodarska, Marta; Johnston, James C.; Gardy, Jennifer L.

    2015-01-01

    SUMMARY Tuberculosis (TB) is an ancient disease with an enormous global impact. Despite declining global incidence, the diagnosis, phenotyping, and epidemiological investigation of TB require significant clinical microbiology laboratory resources. Current methods for the detection and characterization of Mycobacterium tuberculosis consist of a series of laboratory tests varying in speed and performance, each of which yields incremental information about the disease. Since the sequencing of the first M. tuberculosis genome in 1998, genomic tools have aided in the diagnosis, treatment, and control of TB. Here we summarize genomics-based methods that are positioned to be introduced in the modern clinical TB laboratory, and we highlight how recent advances in genomics will improve the detection of antibiotic resistance-conferring mutations and the understanding of M. tuberculosis transmission dynamics and epidemiology. We imagine the future TB clinic as one that relies heavily on genomic interrogation of the M. tuberculosis isolate, allowing for more rapid diagnosis of TB and real-time monitoring of outbreak emergence. PMID:25810419

  17. Tectonic collision and uplift of Wallacea triggered the global songbird radiation

    NASA Astrophysics Data System (ADS)

    Moyle, Robert G.; Oliveros, Carl H.; Andersen, Michael J.; Hosner, Peter A.; Benz, Brett W.; Manthey, Joseph D.; Travers, Scott L.; Brown, Rafe M.; Faircloth, Brant C.

    2016-08-01

    Songbirds (oscine passerines) are the most species-rich and cosmopolitan bird group, comprising almost half of global avian diversity. Songbirds originated in Australia, but the evolutionary trajectory from a single species in an isolated continent to worldwide proliferation is poorly understood. Here, we combine the first comprehensive genome-scale DNA sequence data set for songbirds, fossil-based time calibrations, and geologically informed biogeographic reconstructions to provide a well-supported evolutionary hypothesis for the group. We show that songbird diversification began in the Oligocene, but accelerated in the early Miocene, at approximately half the age of most previous estimates. This burst of diversification occurred coincident with extensive island formation in Wallacea, which provided the first dispersal corridor out of Australia, and resulted in independent waves of songbird expansion through Asia to the rest of the globe. Our results reconcile songbird evolution with Earth history and link a major radiation of terrestrial biodiversity to early diversification within an isolated Australian continent.

  18. Tectonic collision and uplift of Wallacea triggered the global songbird radiation.

    PubMed

    Moyle, Robert G; Oliveros, Carl H; Andersen, Michael J; Hosner, Peter A; Benz, Brett W; Manthey, Joseph D; Travers, Scott L; Brown, Rafe M; Faircloth, Brant C

    2016-08-30

    Songbirds (oscine passerines) are the most species-rich and cosmopolitan bird group, comprising almost half of global avian diversity. Songbirds originated in Australia, but the evolutionary trajectory from a single species in an isolated continent to worldwide proliferation is poorly understood. Here, we combine the first comprehensive genome-scale DNA sequence data set for songbirds, fossil-based time calibrations, and geologically informed biogeographic reconstructions to provide a well-supported evolutionary hypothesis for the group. We show that songbird diversification began in the Oligocene, but accelerated in the early Miocene, at approximately half the age of most previous estimates. This burst of diversification occurred coincident with extensive island formation in Wallacea, which provided the first dispersal corridor out of Australia, and resulted in independent waves of songbird expansion through Asia to the rest of the globe. Our results reconcile songbird evolution with Earth history and link a major radiation of terrestrial biodiversity to early diversification within an isolated Australian continent.

  19. Genetic epidemiology of type 2 diabetes and cardiovascular diseases in Africa.

    PubMed

    Tekola-Ayele, Fasil; Adeyemo, Adebowale A; Rotimi, Charles N

    2013-01-01

    The burdens of type 2 diabetes (T2D) and cardiovascular diseases (CVD) are increasing in Africa. T2D and CVD are the result of the complex interaction between inherited characteristics, lifestyle, and environmental factors. The epidemic of obesity is largely behind the exploding global incidence of T2D. However, not all obese individuals develop diabetes and positive family history is a powerful risk factor for diabetes and CVD. Recent implementations of high throughput genotyping and sequencing approaches have advanced our understanding of the genetic basis of diabetes and CVD by identifying several genomic loci that were not previously linked to the pathobiology of these diseases. However, African populations have not been adequately represented in these global genomic efforts. Here, we summarize the state of knowledge of the genetic epidemiology of T2D and CVD in Africa and highlight new genomic initiatives that promise to inform disease etiology, public health and clinical medicine in Africa. © 2013.

  20. Embedding strategies for effective use of information from multiple sequence alignments.

    PubMed Central

    Henikoff, S.; Henikoff, J. G.

    1997-01-01

    We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain. PMID:9070452

  1. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

    DOE PAGES

    Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas; ...

    2017-08-08

    Here, we present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a MetagenomeAssembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Genemore » Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.« less

  2. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas

    Here, we present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a MetagenomeAssembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Genemore » Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.« less

  3. Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin.

    PubMed

    Guzzi, Pietro Hiram; Milenkovic, Tijana

    2018-05-01

    Analogous to genomic sequence alignment that allows for across-species transfer of biological knowledge between conserved sequence regions, biological network alignment can be used to guide the knowledge transfer between conserved regions of molecular networks of different species. Hence, biological network alignment can be used to redefine the traditional notion of a sequence-based homology to a new notion of network-based homology. Analogous to genomic sequence alignment, there exist local and global biological network alignments. Here, we survey prominent and recent computational approaches of each network alignment type and discuss their (dis)advantages. Then, as it was recently shown that the two approach types are complementary, in the sense that they capture different slices of cellular functioning, we discuss the need to reconcile the two network alignment types and present a recent first step in this direction. We conclude with some open research problems on this topic and comment on the usefulness of network alignment in other domains besides computational biology.

  4. Phylogeography of Influenza A(H3N2) Virus in Peru, 2010-2012.

    PubMed

    Pollett, Simon; Nelson, Martha I; Kasper, Matthew; Tinoco, Yeny; Simons, Mark; Romero, Candice; Silva, Marita; Lin, Xudong; Halpin, Rebecca A; Fedorova, Nadia; Stockwell, Timothy B; Wentworth, David; Holmes, Edward C; Bausch, Daniel G

    2015-08-01

    It remains unclear whether lineages of influenza A(H3N2) virus can persist in the tropics and seed temperate areas. We used viral gene sequence data sampled from Peru to test this source-sink model for a Latin American country. Viruses were obtained during 2010-2012 from influenza surveillance cohorts in Cusco, Tumbes, Puerto Maldonado, and Lima. Specimens positive for influenza A(H3N2) virus were randomly selected and underwent hemagglutinin sequencing and phylogeographic analyses. Analysis of 389 hemagglutinin sequences from Peru and 2,192 global sequences demonstrated interseasonal extinction of Peruvian lineages. Extensive mixing occurred with global clades, but some spatial structure was observed at all sites; this structure was weakest in Lima and Puerto Maldonado, indicating that these locations may experience greater viral traffic. The broad diversity and co-circulation of many simultaneous lineages of H3N2 virus in Peru suggests that this country should not be overlooked as a potential source for novel pandemic strains.

  5. Phylogeography of Influenza A(H3N2) Virus in Peru, 2010–2012

    PubMed Central

    Nelson, Martha I.; Kasper, Matthew; Tinoco, Yeny; Simons, Mark; Romero, Candice; Silva, Marita; Lin, Xudong; Halpin, Rebecca A.; Fedorova, Nadia; Stockwell, Timothy B.; Wentworth, David; Holmes, Edward C.; Bausch, Daniel G.

    2015-01-01

    It remains unclear whether lineages of influenza A(H3N2) virus can persist in the tropics and seed temperate areas. We used viral gene sequence data sampled from Peru to test this source–sink model for a Latin American country. Viruses were obtained during 2010–2012 from influenza surveillance cohorts in Cusco, Tumbes, Puerto Maldonado, and Lima. Specimens positive for influenza A(H3N2) virus were randomly selected and underwent hemagglutinin sequencing and phylogeographic analyses. Analysis of 389 hemagglutinin sequences from Peru and 2,192 global sequences demonstrated interseasonal extinction of Peruvian lineages. Extensive mixing occurred with global clades, but some spatial structure was observed at all sites; this structure was weakest in Lima and Puerto Maldonado, indicating that these locations may experience greater viral traffic. The broad diversity and co-circulation of many simultaneous lineages of H3N2 virus in Peru suggests that this country should not be overlooked as a potential source for novel pandemic strains. PMID:26196599

  6. A method for partitioning the information contained in a protein sequence between its structure and function.

    PubMed

    Possenti, Andrea; Vendruscolo, Michele; Camilloni, Carlo; Tiana, Guido

    2018-05-23

    Proteins employ the information stored in the genetic code and translated into their sequences to carry out well-defined functions in the cellular environment. The possibility to encode for such functions is controlled by the balance between the amount of information supplied by the sequence and that left after that the protein has folded into its structure. We study the amount of information necessary to specify the protein structure, providing an estimate that keeps into account the thermodynamic properties of protein folding. We thus show that the information remaining in the protein sequence after encoding for its structure (the 'information gap') is very close to what needed to encode for its function and interactions. Then, by predicting the information gap directly from the protein sequence, we show that it may be possible to use these insights from information theory to discriminate between ordered and disordered proteins, to identify unknown functions, and to optimize artificially-designed protein sequences. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.

  7. Neisseria gonorrhoeae molecular typing for understanding sexual networks and antimicrobial resistance transmission: A systematic review.

    PubMed

    Town, Katy; Bolt, Hikaru; Croxford, Sara; Cole, Michelle; Harris, Simon; Field, Nigel; Hughes, Gwenda

    2018-06-01

    Neisseria gonorrhoeae (NG) is a significant global public health concern due to rising diagnoses rates and antimicrobial resistance. Molecular combined with epidemiological data have been used to understand the distribution and spread of NG, as well as relationships between cases in sexual networks, but the public health value gained from these studies is unclear. We conducted a systematic review to examine how molecular epidemiological studies have informed understanding of sexual networks and NG transmission, and subsequent public health interventions. Five research databases were systematically searched up to 31st March 2017 for studies that used sequence-based DNA typing methods, including whole genome sequencing, and linked molecular data to patient-level epidemiological data. Data were extracted and summarised to identify common themes. Of the 49 studies included, 82% used NG Multi-antigen Sequence Typing. Gender and sexual orientation were commonly used to characterise sexual networks that were inferred using molecular clusters; clusters predominantly of one patient group often contained a small number of isolates from other patient groups. Suggested public health applications included using these data to target interventions at specific populations, confirm outbreaks, and inform partner management, but these were mainly untested. Combining molecular and epidemiological data has provided insight into sexual mixing patterns, and dissemination of NG, but few studies have applied these findings to design or evaluate public health interventions. Future studies should focus on the application of molecular epidemiology in public health practice to provide evidence for how to prevent and control NG. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  8. Integrating biogeochemistry with multiomic sequence information in a model oxygen minimum zone

    PubMed Central

    Hawley, Alyse K.; Katsev, Sergei; Torres-Beltran, Monica; Bhatia, Maya P.; Kheirandish, Sam; Michiels, Céline C.; Capelle, David; Lavik, Gaute; Doebeli, Michael; Crowe, Sean A.; Hallam, Steven J.

    2016-01-01

    Microorganisms are the most abundant lifeform on Earth, mediating global fluxes of matter and energy. Over the past decade, high-throughput molecular techniques generating multiomic sequence information (DNA, mRNA, and protein) have transformed our perception of this microcosmos, conceptually linking microorganisms at the individual, population, and community levels to a wide range of ecosystem functions and services. Here, we develop a biogeochemical model that describes metabolic coupling along the redox gradient in Saanich Inlet—a seasonally anoxic fjord with biogeochemistry analogous to oxygen minimum zones (OMZs). The model reproduces measured biogeochemical process rates as well as DNA, mRNA, and protein concentration profiles across the redox gradient. Simulations make predictions about the role of ubiquitous OMZ microorganisms in mediating carbon, nitrogen, and sulfur cycling. For example, nitrite “leakage” during incomplete sulfide-driven denitrification by SUP05 Gammaproteobacteria is predicted to support inorganic carbon fixation and intense nitrogen loss via anaerobic ammonium oxidation. This coupling creates a metabolic niche for nitrous oxide reduction that completes denitrification by currently unidentified community members. These results quantitatively improve previous conceptual models describing microbial metabolic networks in OMZs. Beyond OMZ-specific predictions, model results indicate that geochemical fluxes are robust indicators of microbial community structure and reciprocally, that gene abundances and geochemical conditions largely determine gene expression patterns. The integration of real observational data, including geochemical profiles and process rate measurements as well as metagenomic, metatranscriptomic and metaproteomic sequence data, into a biogeochemical model, as shown here, enables holistic insight into the microbial metabolic network driving nutrient and energy flow at ecosystem scales. PMID:27655888

  9. Integrating biogeochemistry with multiomic sequence information in a model oxygen minimum zone.

    PubMed

    Louca, Stilianos; Hawley, Alyse K; Katsev, Sergei; Torres-Beltran, Monica; Bhatia, Maya P; Kheirandish, Sam; Michiels, Céline C; Capelle, David; Lavik, Gaute; Doebeli, Michael; Crowe, Sean A; Hallam, Steven J

    2016-10-04

    Microorganisms are the most abundant lifeform on Earth, mediating global fluxes of matter and energy. Over the past decade, high-throughput molecular techniques generating multiomic sequence information (DNA, mRNA, and protein) have transformed our perception of this microcosmos, conceptually linking microorganisms at the individual, population, and community levels to a wide range of ecosystem functions and services. Here, we develop a biogeochemical model that describes metabolic coupling along the redox gradient in Saanich Inlet-a seasonally anoxic fjord with biogeochemistry analogous to oxygen minimum zones (OMZs). The model reproduces measured biogeochemical process rates as well as DNA, mRNA, and protein concentration profiles across the redox gradient. Simulations make predictions about the role of ubiquitous OMZ microorganisms in mediating carbon, nitrogen, and sulfur cycling. For example, nitrite "leakage" during incomplete sulfide-driven denitrification by SUP05 Gammaproteobacteria is predicted to support inorganic carbon fixation and intense nitrogen loss via anaerobic ammonium oxidation. This coupling creates a metabolic niche for nitrous oxide reduction that completes denitrification by currently unidentified community members. These results quantitatively improve previous conceptual models describing microbial metabolic networks in OMZs. Beyond OMZ-specific predictions, model results indicate that geochemical fluxes are robust indicators of microbial community structure and reciprocally, that gene abundances and geochemical conditions largely determine gene expression patterns. The integration of real observational data, including geochemical profiles and process rate measurements as well as metagenomic, metatranscriptomic and metaproteomic sequence data, into a biogeochemical model, as shown here, enables holistic insight into the microbial metabolic network driving nutrient and energy flow at ecosystem scales.

  10. Ultra-deep sequencing reveals high prevalence and broad structural diversity of hepatitis B surface antigen mutations in a global population

    PubMed Central

    Gencay, Mikael; Hübner, Kirsten; Gohl, Peter; Seffner, Anja; Weizenegger, Michael; Neofytos, Dionysios; Batrla, Richard; Woeste, Andreas; Kim, Hyon-suk; Westergaard, Gaston; Reinsch, Christine; Brill, Eva; Thu Thuy, Pham Thi; Hoang, Bui Huu; Sonderup, Mark; Spearman, C. Wendy; Pabinger, Stephan; Gautier, Jérémie; Brancaccio, Giuseppina; Fasano, Massimo; Santantonio, Teresa; Gaeta, Giovanni B.; Nauck, Markus; Kaminski, Wolfgang E.

    2017-01-01

    The diversity of the hepatitis B surface antigen (HBsAg) has a significant impact on the performance of diagnostic screening tests and the clinical outcome of hepatitis B infection. Neutralizing or diagnostic antibodies against the HBsAg are directed towards its highly conserved major hydrophilic region (MHR), in particular towards its “a” determinant subdomain. Here, we explored, on a global scale, the genetic diversity of the HBsAg MHR in a large, multi-ethnic cohort of randomly selected subjects with HBV infection from four continents. A total of 1553 HBsAg positive blood samples of subjects originating from 20 different countries across Africa, America, Asia and central Europe were characterized for amino acid variation in the MHR. Using highly sensitive ultra-deep sequencing, we found 72.8% of the successfully sequenced subjects (n = 1391) demonstrated amino acid sequence variation in the HBsAg MHR. This indicates that the global variation frequency in the HBsAg MHR is threefold higher than previously reported. The majority of the amino acid mutations were found in the HBV genotypes B (28.9%) and C (25.4%). Collectively, we identified 345 distinct amino acid mutations in the MHR. Among these, we report 62 previously unknown mutations, which extends the worldwide pool of currently known HBsAg MHR mutations by 22%. Importantly, topological analysis identified the “a” determinant upstream flanking region as the structurally most diverse subdomain of the HBsAg MHR. The highest prevalence of “a” determinant region mutations was observed in subjects from Asia, followed by the African, American and European cohorts, respectively. Finally, we found that more than half (59.3%) of all HBV subjects investigated carried multiple MHR mutations. Together, this worldwide ultra-deep sequencing based genotyping study reveals that the global prevalence and structural complexity of variation in the hepatitis B surface antigen have, to date, been significantly underappreciated. PMID:28472040

  11. Ultra-deep sequencing reveals high prevalence and broad structural diversity of hepatitis B surface antigen mutations in a global population.

    PubMed

    Gencay, Mikael; Hübner, Kirsten; Gohl, Peter; Seffner, Anja; Weizenegger, Michael; Neofytos, Dionysios; Batrla, Richard; Woeste, Andreas; Kim, Hyon-Suk; Westergaard, Gaston; Reinsch, Christine; Brill, Eva; Thu Thuy, Pham Thi; Hoang, Bui Huu; Sonderup, Mark; Spearman, C Wendy; Pabinger, Stephan; Gautier, Jérémie; Brancaccio, Giuseppina; Fasano, Massimo; Santantonio, Teresa; Gaeta, Giovanni B; Nauck, Markus; Kaminski, Wolfgang E

    2017-01-01

    The diversity of the hepatitis B surface antigen (HBsAg) has a significant impact on the performance of diagnostic screening tests and the clinical outcome of hepatitis B infection. Neutralizing or diagnostic antibodies against the HBsAg are directed towards its highly conserved major hydrophilic region (MHR), in particular towards its "a" determinant subdomain. Here, we explored, on a global scale, the genetic diversity of the HBsAg MHR in a large, multi-ethnic cohort of randomly selected subjects with HBV infection from four continents. A total of 1553 HBsAg positive blood samples of subjects originating from 20 different countries across Africa, America, Asia and central Europe were characterized for amino acid variation in the MHR. Using highly sensitive ultra-deep sequencing, we found 72.8% of the successfully sequenced subjects (n = 1391) demonstrated amino acid sequence variation in the HBsAg MHR. This indicates that the global variation frequency in the HBsAg MHR is threefold higher than previously reported. The majority of the amino acid mutations were found in the HBV genotypes B (28.9%) and C (25.4%). Collectively, we identified 345 distinct amino acid mutations in the MHR. Among these, we report 62 previously unknown mutations, which extends the worldwide pool of currently known HBsAg MHR mutations by 22%. Importantly, topological analysis identified the "a" determinant upstream flanking region as the structurally most diverse subdomain of the HBsAg MHR. The highest prevalence of "a" determinant region mutations was observed in subjects from Asia, followed by the African, American and European cohorts, respectively. Finally, we found that more than half (59.3%) of all HBV subjects investigated carried multiple MHR mutations. Together, this worldwide ultra-deep sequencing based genotyping study reveals that the global prevalence and structural complexity of variation in the hepatitis B surface antigen have, to date, been significantly underappreciated.

  12. Global versus Local Regulatory Roles for Lrp-Related Proteins: Haemophilus influenzae as a Case Study

    PubMed Central

    Friedberg, Devorah; Midkiff, Michael; Calvo, Joseph M.

    2001-01-01

    Lrp (leucine-responsive regulatory protein) plays a global regulatory role in Escherichia coli, affecting expression of dozens of operons. Numerous lrp-related genes have been identified in different bacteria and archaea, including asnC, an E. coli gene that was the first reported member of this family. Pairwise comparisons of amino acid sequences of the corresponding proteins shows an average sequence identity of only 29% for the vast majority of comparisons. By contrast, Lrp-related proteins from enteric bacteria show more than 97% amino acid identity. Is the global regulatory role associated with E. coli Lrp limited to enteric bacteria? To probe this question we investigated LrfB, an Lrp-related protein from Haemophilus influenzae that shares 75% sequence identity with E. coli Lrp (highest sequence identity among 42 sequences compared). A strain of H. influenzae having an lrfB null allele grew at the wild-type growth rate but with a filamentous morphology. A comparison of two-dimensional (2D) electrophoretic patterns of proteins from parent and mutant strains showed only two differences (comparable studies with lrp+ and lrp E. coli strains by others showed 20 differences). The abundance of LrfB in H. influenzae, estimated by Western blotting experiments, was about 130 dimers per cell (compared to 3,000 dimers per E. coli cell). LrfB expressed in E. coli replaced Lrp as a repressor of the lrp gene but acted only to a limited extent as an activator of the ilvIH operon. Thus, although LrfB resembles Lrp sufficiently to perform some of its functions, its low abundance is consonant with a more local role in regulating but a few genes, a view consistent with the results of the 2D electrophoretic analysis. We speculate that an Lrp having a global regulatory role evolved to help enteric bacteria adapt to their ecological niches and that it is unlikely that Lrp-related proteins in other organisms have a broad regulatory function. PMID:11395465

  13. [Study on ITS sequences of Aconitum vilmorinianum and its medicinal adulterant].

    PubMed

    Zhang, Xiao-nan; Du, Chun-hua; Fu, De-huan; Gao, Li; Zhou, Pei-jun; Wang, Li

    2012-09-01

    To analyze and compare the ITS sequences of Aconitum vilmorinianum and its medicinal adulterant Aconitum austroyunnanense. Total genomic DNA were extracted from sample materials by improved CTAB method, ITS sequences of samples were amplified using PCR systems, directly sequenced and analyzed using software DNAStar, ClustalX1.81 and MEGA 4.0. 299 consistent sites, 19 variable sites and 13 informative sites were found in ITS1 sequences, 162 consistent sites, 2 variable sites and 1 informative sites were found in 5.8S sequences, 217 consistent sites, 3 variable sites and 1 informative site were found in ITS2 sequences. Base transition and transversion was not found only in 5.8S sequences, 2 sites transition and 1 site transversion were found in ITS1 sequences, only 1 site transversion was found in ITS2 sequences comparting the ITS sequences data matrix. By analyzing the ITS sequences data matrix from 2 population of Aconitum vilmorinianum and 3 population of Aconitum austroyunnanense, we found a stable informative site at the 596th base in ITS2 sequences, in all the samples of Aconitum vilmorinianum the base was C, and in all the samples of Aconitum austroyunnanense the base was A. Aconitum vilmorinianum and Aconitum austroyunnanense can be identified by their characters of ITS sequences, and the variable sites in ITS1 sequences are more than in ITS2 sequences.

  14. GSDC: A Unique Data Center in Korea for HEP research

    NASA Astrophysics Data System (ADS)

    Ahn, Sang-Un

    2017-04-01

    Global Science experimental Data hub Center (GSDC) at Korea Institute of Science and Technology Information (KISTI) is a unique data center in South Korea established for promoting the fundamental research fields by supporting them with the expertise on Information and Communication Technology (ICT) and the infrastructure for High Performance Computing (HPC), High Throughput Computing (HTC) and Networking. GSDC has supported various research fields in South Korea dealing with the large scale of data, e.g. RENO experiment for neutrino research, LIGO experiment for gravitational wave detection, Genome sequencing project for bio-medical, and HEP experiments such as CDF at FNAL, Belle at KEK, and STAR at BNL. In particular, GSDC has run a Tier-1 center for ALICE experiment using the LHC at CERN since 2013. In this talk, we present the overview on computing infrastructure that GSDC runs for the research fields and we discuss on the data center infrastructure management system deployed at GSDC.

  15. Score distributions of gapped multiple sequence alignments down to the low-probability tail

    NASA Astrophysics Data System (ADS)

    Fieth, Pascal; Hartmann, Alexander K.

    2016-08-01

    Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution is known analytically to follow a Gumbel distribution. Distributions for gapped local alignments and global alignments of finite lengths can only be obtained numerically. To obtain result for the small-probability region, specific statistical mechanics-based rare-event algorithms can be applied. In previous studies, this was achieved for pairwise alignments. They showed that, contrary to results from previous simple sampling studies, strong deviations from the Gumbel distribution occur in case of finite sequence lengths. Here we extend the studies to multiple sequence alignments with gaps, which are much more relevant for practical applications in molecular biology. We study the distributions of scores over a large range of the support, reaching probabilities as small as 10-160, for global and local (sum-of-pair scores) multiple alignments. We find that even after suitable rescaling, eliminating the sequence-length dependence, the distributions for multiple alignment differ from the pairwise alignment case. Furthermore, we also show that the previously discussed Gaussian correction to the Gumbel distribution needs to be refined, also for the case of pairwise alignments.

  16. Who Learns More? Cultural Differences in Implicit Sequence Learning

    PubMed Central

    Fu, Qiufang; Dienes, Zoltan; Shang, Junchen; Fu, Xiaolan

    2013-01-01

    Background It is well documented that East Asians differ from Westerners in conscious perception and attention. However, few studies have explored cultural differences in unconscious processes such as implicit learning. Methodology/Principal Findings The global-local Navon letters were adopted in the serial reaction time (SRT) task, during which Chinese and British participants were instructed to respond to global or local letters, to investigate whether culture influences what people acquire in implicit sequence learning. Our results showed that from the beginning British expressed a greater local bias in perception than Chinese, confirming a cultural difference in perception. Further, over extended exposure, the Chinese learned the target regularity better than the British when the targets were global, indicating a global advantage for Chinese in implicit learning. Moreover, Chinese participants acquired greater unconscious knowledge of an irrelevant regularity than British participants, indicating that the Chinese were more sensitive to contextual regularities than the British. Conclusions/Significance The results suggest that cultural biases can profoundly influence both what people consciously perceive and unconsciously learn. PMID:23940773

  17. Draft Genome Sequence of Pseudomonas putida CA-3, a Bacterium Capable of Styrene Degradation and Medium-Chain-Length Polyhydroxyalkanoate Synthesis

    PubMed Central

    Almeida, Eduardo L.; Margassery, Lekha M.; O’Leary, Niall

    2018-01-01

    ABSTRACT Pseudomonas putida strain CA-3 is an industrial bioreactor isolate capable of synthesizing biodegradable polyhydroxyalkanoate polymers via the metabolism of styrene and other unrelated carbon sources. The pathways involved are subject to regulation by global cellular processes. The draft genome sequence is 6,177,154 bp long and contains 5,608 predicted coding sequences. PMID:29371359

  18. Complete genome sequence of a ciprofloxacin resistant Salmonella enterica subsp. enterica serovar Kentucky sequence of a ciprofloxacin strain, PU131, isolated from a human patient in Washington State.

    USDA-ARS?s Scientific Manuscript database

    A ciprofloxacin resistant (CipR) Salmonella enterica subsp. enterica serovar Kentucky ST198 has rapidly and extensively disseminated globally to become a major food-safety and public health concern. Here, we report a complete genome sequence of a CipR S. Kentucky ST198 strain PU131 isolated from a ...

  19. A multilevel ant colony optimization algorithm for classical and isothermic DNA sequencing by hybridization with multiplicity information available.

    PubMed

    Kwarciak, Kamil; Radom, Marcin; Formanowicz, Piotr

    2016-04-01

    The classical sequencing by hybridization takes into account a binary information about sequence composition. A given element from an oligonucleotide library is or is not a part of the target sequence. However, the DNA chip technology has been developed and it enables to receive a partial information about multiplicity of each oligonucleotide the analyzed sequence consist of. Currently, it is not possible to assess the exact data of such type but even partial information should be very useful. Two realistic multiplicity information models are taken into consideration in this paper. The first one, called "one and many" assumes that it is possible to obtain information if a given oligonucleotide occurs in a reconstructed sequence once or more than once. According to the second model, called "one, two and many", one is able to receive from biochemical experiment information if a given oligonucleotide is present in an analyzed sequence once, twice or at least three times. An ant colony optimization algorithm has been implemented to verify the above models and to compare with existing algorithms for sequencing by hybridization which utilize the additional information. The proposed algorithm solves the problem with any kind of hybridization errors. Computational experiment results confirm that using even the partial information about multiplicity leads to increased quality of reconstructed sequences. Moreover, they also show that the more precise model enables to obtain better solutions and the ant colony optimization algorithm outperforms the existing ones. Test data sets and the proposed ant colony optimization algorithm are available on: http://bioserver.cs.put.poznan.pl/download/ACO4mSBH.zip. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. Development and expansion of high-quality control region databases to improve forensic mtDNA evidence interpretation.

    PubMed

    Irwin, Jodi A; Saunier, Jessica L; Strouss, Katharine M; Sturk, Kimberly A; Diegoli, Toni M; Just, Rebecca S; Coble, Michael D; Parson, Walther; Parsons, Thomas J

    2007-06-01

    In an effort to increase the quantity, breadth and availability of mtDNA databases suitable for forensic comparisons, we have developed a high-throughput process to generate approximately 5000 control region sequences per year from regional US populations, global populations from which the current US population is derived and global populations currently under-represented in available forensic databases. The system utilizes robotic instrumentation for all laboratory steps from pre-extraction through sequence detection, and a rigorous eight-step, multi-laboratory data review process with entirely electronic data transfer. Over the past 3 years, nearly 10,000 control region sequences have been generated using this approach. These data are being made publicly available and should further address the need for consistent, high-quality mtDNA databases for forensic testing.

  1. A global reference for human genetic variation

    PubMed Central

    2016-01-01

    The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. PMID:26432245

  2. Sequencing and Characterization of the Invasive Sycamore Lace Bug Corythucha ciliata (Hemiptera: Tingidae) Transcriptome

    PubMed Central

    Qu, Cheng; Fu, Ningning; Xu, Yihua

    2016-01-01

    The sycamore lace bug, Corythucha ciliata (Hemiptera: Tingidae), is an invasive forestry pest rapidly expanding in many countries. This pest poses a considerable threat to the urban forestry ecosystem, especially to Platanus spp. However, its molecular biology and biochemistry are poorly understood. This study reports the first C. ciliata transcriptome, encompassing three different life stages (Nymphs, adults female (AF) and adults male (AM)). In total, 26.53 GB of clean data and 60,879 unigenes were obtained from three RNA-seq libraries. These unigenes were annotated and classified by Nr (NCBI non-redundant protein sequences), Nt (NCBI non-redundant nucleotide sequences), Pfam (Protein family), KOG/COG (Clusters of Orthologous Groups of proteins), Swiss-Prot (A manually annotated and reviewed protein sequence database), and KO (KEGG Ortholog database). After all pairwise comparisons between these three different samples, a large number of differentially expressed genes were revealed. The dramatic differences in global gene expression profiles were found between distinct life stages (nymphs and AF, nymphs and AM) and sex difference (AF and AM), with some of the significantly differentially expressed genes (DEGs) being related to metamorphosis, digestion, immune and sex difference. The different express of unigenes were validated through quantitative Real-Time PCR (qRT-PCR) for 16 randomly selected unigenes. In addition, 17,462 potential simple sequence repeat molecular markers were identified in these transcriptome resources. These comprehensive C. ciliata transcriptomic information can be utilized to promote the development of environmentally friendly methodologies to disrupt the processes of metamorphosis, digestion, immune and sex differences. PMID:27494615

  3. Conscious Vision Proceeds from Global to Local Content in Goal-Directed Tasks and Spontaneous Vision.

    PubMed

    Campana, Florence; Rebollo, Ignacio; Urai, Anne; Wyart, Valentin; Tallon-Baudry, Catherine

    2016-05-11

    The reverse hierarchy theory (Hochstein and Ahissar, 2002) makes strong, but so far untested, predictions on conscious vision. In this theory, local details encoded in lower-order visual areas are unconsciously processed before being automatically and rapidly combined into global information in higher-order visual areas, where conscious percepts emerge. Contingent on current goals, local details can afterward be consciously retrieved. This model therefore predicts that (1) global information is perceived faster than local details, (2) global information is computed regardless of task demands during early visual processing, and (3) spontaneous vision is dominated by global percepts. We designed novel textured stimuli that are, as opposed to the classic Navon's letters, truly hierarchical (i.e., where global information is solely defined by local information but where local and global orientations can still be manipulated separately). In line with the predictions, observers were systematically faster reporting global than local properties of those stimuli. Second, global information could be decoded from magneto-encephalographic data during early visual processing regardless of task demands. Last, spontaneous subjective reports were dominated by global information and the frequency and speed of spontaneous global perception correlated with the accuracy and speed in the global task. No such correlation was observed for local information. We therefore show that information at different levels of the visual hierarchy is not equally likely to become conscious; rather, conscious percepts emerge preferentially at a global level. We further show that spontaneous reports can be reliable and are tightly linked to objective performance at the global level. Is information encoded at different levels of the visual system (local details in low-level areas vs global shapes in high-level areas) equally likely to become conscious? We designed new hierarchical stimuli and provide the first empirical evidence based on behavioral and MEG data that global information encoded at high levels of the visual hierarchy dominates perception. This result held both in the presence and in the absence of task demands. The preferential emergence of percepts at high levels can account for two properties of conscious vision, namely, the dominance of global percepts and the feeling of visual richness reported independently of the perception of local details. Copyright © 2016 the authors 0270-6474/16/365200-14$15.00/0.

  4. First Genome Sequence of Leptospira interrogans Serovar Pomona, Isolated from a Bovine Abortion

    PubMed Central

    Varni, Vanina; Koval, Ariel; Nagel, Ariel; Ruybal, Paula

    2016-01-01

    Leptospirosis is a widespread zoonosis and a re-emergent disease of global distribution with major relevance in veterinary production. Here, we report the whole-genome sequence of Leptospira interrogans serovar Pomona strain AKRFB, isolated from a bovine abortion during a leptospirosis outbreak in Argentina. PMID:27198013

  5. Global genotype flow in Cercospora beticola populations confirmed through genotyping-by-sequencing

    USDA-ARS?s Scientific Manuscript database

    Genotyping-by-sequencing (GBS) was conducted on 333 Cercospora isolates collected from Beta vulgaris (sugar beet, table beet and Swiss chard) in the USA and Europe. Cercospora beticola was confirmed as the species predominantly isolated from leaves with Cercospora leaf spot (CLS) symptoms. However, ...

  6. DNA Barcode Analysis of Thrips (Thysanoptera) Diversity in Pakistan Reveals Cryptic Species Complexes.

    PubMed

    Iftikhar, Romana; Ashfaq, Muhammad; Rasool, Akhtar; Hebert, Paul D N

    2016-01-01

    Although thrips are globally important crop pests and vectors of viral disease, species identifications are difficult because of their small size and inconspicuous morphological differences. Sequence variation in the mitochondrial COI-5' (DNA barcode) region has proven effective for the identification of species in many groups of insect pests. We analyzed barcode sequence variation among 471 thrips from various plant hosts in north-central Pakistan. The Barcode Index Number (BIN) system assigned these sequences to 55 BINs, while the Automatic Barcode Gap Discovery detected 56 partitions, a count that coincided with the number of monophyletic lineages recognized by Neighbor-Joining analysis and Bayesian inference. Congeneric species showed an average of 19% sequence divergence (range = 5.6% - 27%) at COI, while intraspecific distances averaged 0.6% (range = 0.0% - 7.6%). BIN analysis suggested that all intraspecific divergence >3.0% actually involved a species complex. In fact, sequences for three major pest species (Haplothrips reuteri, Thrips palmi, Thrips tabaci), and one predatory thrips (Aeolothrips intermedius) showed deep intraspecific divergences, providing evidence that each is a cryptic species complex. The study compiles the first barcode reference library for the thrips of Pakistan, and examines global haplotype diversity in four important pest thrips.

  7. High resolution depth reconstruction from monocular images and sparse point clouds using deep convolutional neural network

    NASA Astrophysics Data System (ADS)

    Dimitrievski, Martin; Goossens, Bart; Veelaert, Peter; Philips, Wilfried

    2017-09-01

    Understanding the 3D structure of the environment is advantageous for many tasks in the field of robotics and autonomous vehicles. From the robot's point of view, 3D perception is often formulated as a depth image reconstruction problem. In the literature, dense depth images are often recovered deterministically from stereo image disparities. Other systems use an expensive LiDAR sensor to produce accurate, but semi-sparse depth images. With the advent of deep learning there have also been attempts to estimate depth by only using monocular images. In this paper we combine the best of the two worlds, focusing on a combination of monocular images and low cost LiDAR point clouds. We explore the idea that very sparse depth information accurately captures the global scene structure while variations in image patches can be used to reconstruct local depth to a high resolution. The main contribution of this paper is a supervised learning depth reconstruction system based on a deep convolutional neural network. The network is trained on RGB image patches reinforced with sparse depth information and the output is a depth estimate for each pixel. Using image and point cloud data from the KITTI vision dataset we are able to learn a correspondence between local RGB information and local depth, while at the same time preserving the global scene structure. Our results are evaluated on sequences from the KITTI dataset and our own recordings using a low cost camera and LiDAR setup.

  8. Seasonal variations in shallow Alaska seismicity and stress modulation from GRACE derived hydrological loading

    NASA Astrophysics Data System (ADS)

    Johnson, C. W.; Fu, Y.; Burgmann, R.

    2017-12-01

    Shallow (≤50 km), low magnitude (M≥2.0) seismicity in southern Alaska is examined for seasonal variations during the annual hydrological cycle. The seismicity is declustered with a spatio-temporal epidemic type aftershock sequence (ETAS) model. The removal of aftershock sequences allows detailed investigation of seismicity rate changes, as water and ice loads modulate crustal stresses throughout the year. The GRACE surface loads are obtained from the JPL mass concentration blocks (mascons) global land and ocean solutions. The data product is smoothed with a 9˚ Gaussian filter and interpolated on a 25 km grid. To inform the surface loading model, the global solutions are limited to the region from -160˚ to -120˚ and 50˚ to 70˚. The stress changes are calculated using a 1D spherical layered earth model at depth intervals of 10 km from 10 - 50 km in the study region. To evaluate the induced seasonal stresses, we use >30 years of earthquake focal mechanisms to constrain the background stress field orientation and assess the stress change with respect to the principal stress orientation. The background stress field is assumed to control the preferred orientation of faulting, and stress field perturbations are expected to increase or decrease seismicity. The number of excess earthquakes is calculated with respect to the background seismicity rates. Here, we present preliminary results for the shallow seismicity variations and quantify the seasonal stresses associated with changes in hydrological loading.

  9. Virioplankton Assemblage Structure in the Lower River and Ocean Continuum of the Amazon

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Silva, Bruno S. de O.; Coutinho, Felipe H.; Gregoracci, Gustavo B.

    ABSTRACT The Amazon River watershed and its associated plume comprise a vast continental and oceanic area. The microbial activities along this continuum contribute substantially to global carbon and nutrient cycling, and yet there is a dearth of information on the diversity, abundance, and possible roles of viruses in this globally important river. The aim of this study was to elucidate the diversity and structure of virus assemblages of the Amazon River-ocean continuum. Environmental viral DNA sequences were obtained for 12 locations along the river’s lower reach (n= 5) and plume (n= 7). Sequence assembly yielded 29,358 scaffolds, encoding 82,546 viralmore » proteins, with 15 new complete viral genomes. Despite the spatial connectivity mediated by the river, virome analyses and physical-chemical water parameters clearly distinguished river and plume ecosystems. Bacteriophages were ubiquitous in the continuum and were more abundant in the transition region. Eukaryotic viruses occurred mostly in the river, while the plume had more viruses of autotrophic organisms (Prochlorococcus,Synechococcus) and heterotrophic bacteria (Pelagibacter). The viral familiesMicroviridaeandMyoviridaewere the most abundant and occurred throughout the continuum. The major functions of the genes in the continuum involved viral structures and life cycles, and viruses from plume locations and Tapajós River showed the highest levels of functional diversity. The distribution patterns of the viral assemblages were defined not only by the occurrence of possible hosts but also by water physical and chemical parameters, especially salinity. The findings presented here help to improve understanding of the possible roles of viruses in the organic matter cycle along the river-ocean continuum. IMPORTANCEThe Amazon River forms a vast plume in the Atlantic Ocean that can extend for more than 1,000 km. Microbial communities promote a globally relevant carbon sink system in the plume. Despite the importance of viruses for the global carbon cycle, the diversity and the possible roles of viruses in the Amazon are poorly understood. The present work assesses, for the first time, the abundance and diversity of viruses simultaneously in the river and ocean in order to elucidate their possible roles. DNA sequence assembly yielded 29,358 scaffolds, encoding 82,546 viral proteins, with 15 new complete viral genomes from the 12 river and ocean locations. Viral diversity was clearly distinguished by river and ocean. Bacteriophages were the most abundant and occurred throughout the continuum. Viruses that infect eukaryotes were more abundant in the river, whereas phages appeared to have strong control over the host prokaryotic populations in the plume.« less

  10. Virioplankton Assemblage Structure in the Lower River and Ocean Continuum of the Amazon.

    PubMed

    Silva, Bruno S de O; Coutinho, Felipe H; Gregoracci, Gustavo B; Leomil, Luciana; de Oliveira, Louisi S; Fróes, Adriana; Tschoeke, Diogo; Soares, Ana Carolina; Cabral, Anderson S; Ward, Nicholas D; Richey, Jeffrey E; Krusche, Alex V; Yager, Patricia L; de Rezende, Carlos Eduardo; Thompson, Cristiane C; Thompson, Fabiano L

    2017-01-01

    The Amazon River watershed and its associated plume comprise a vast continental and oceanic area. The microbial activities along this continuum contribute substantially to global carbon and nutrient cycling, and yet there is a dearth of information on the diversity, abundance, and possible roles of viruses in this globally important river. The aim of this study was to elucidate the diversity and structure of virus assemblages of the Amazon River-ocean continuum. Environmental viral DNA sequences were obtained for 12 locations along the river's lower reach ( n = 5) and plume ( n = 7). Sequence assembly yielded 29,358 scaffolds, encoding 82,546 viral proteins, with 15 new complete viral genomes. Despite the spatial connectivity mediated by the river, virome analyses and physical-chemical water parameters clearly distinguished river and plume ecosystems. Bacteriophages were ubiquitous in the continuum and were more abundant in the transition region. Eukaryotic viruses occurred mostly in the river, while the plume had more viruses of autotrophic organisms ( Prochlorococcus , Synechococcus ) and heterotrophic bacteria ( Pelagibacter ). The viral families Microviridae and Myoviridae were the most abundant and occurred throughout the continuum. The major functions of the genes in the continuum involved viral structures and life cycles, and viruses from plume locations and Tapajós River showed the highest levels of functional diversity. The distribution patterns of the viral assemblages were defined not only by the occurrence of possible hosts but also by water physical and chemical parameters, especially salinity. The findings presented here help to improve understanding of the possible roles of viruses in the organic matter cycle along the river-ocean continuum. IMPORTANCE The Amazon River forms a vast plume in the Atlantic Ocean that can extend for more than 1,000 km. Microbial communities promote a globally relevant carbon sink system in the plume. Despite the importance of viruses for the global carbon cycle, the diversity and the possible roles of viruses in the Amazon are poorly understood. The present work assesses, for the first time, the abundance and diversity of viruses simultaneously in the river and ocean in order to elucidate their possible roles. DNA sequence assembly yielded 29,358 scaffolds, encoding 82,546 viral proteins, with 15 new complete viral genomes from the 12 river and ocean locations. Viral diversity was clearly distinguished by river and ocean. Bacteriophages were the most abundant and occurred throughout the continuum. Viruses that infect eukaryotes were more abundant in the river, whereas phages appeared to have strong control over the host prokaryotic populations in the plume.

  11. Virioplankton Assemblage Structure in the Lower River and Ocean Continuum of the Amazon

    PubMed Central

    Silva, Bruno S. de O.; Coutinho, Felipe H.; Gregoracci, Gustavo B.; Leomil, Luciana; de Oliveira, Louisi S.; Fróes, Adriana; Tschoeke, Diogo; Soares, Ana Carolina; Cabral, Anderson S.; Ward, Nicholas D.; Richey, Jeffrey E.; Krusche, Alex V.; Yager, Patricia L.; de Rezende, Carlos Eduardo; Thompson, Cristiane C.

    2017-01-01

    ABSTRACT The Amazon River watershed and its associated plume comprise a vast continental and oceanic area. The microbial activities along this continuum contribute substantially to global carbon and nutrient cycling, and yet there is a dearth of information on the diversity, abundance, and possible roles of viruses in this globally important river. The aim of this study was to elucidate the diversity and structure of virus assemblages of the Amazon River-ocean continuum. Environmental viral DNA sequences were obtained for 12 locations along the river’s lower reach (n = 5) and plume (n = 7). Sequence assembly yielded 29,358 scaffolds, encoding 82,546 viral proteins, with 15 new complete viral genomes. Despite the spatial connectivity mediated by the river, virome analyses and physical-chemical water parameters clearly distinguished river and plume ecosystems. Bacteriophages were ubiquitous in the continuum and were more abundant in the transition region. Eukaryotic viruses occurred mostly in the river, while the plume had more viruses of autotrophic organisms (Prochlorococcus, Synechococcus) and heterotrophic bacteria (Pelagibacter). The viral families Microviridae and Myoviridae were the most abundant and occurred throughout the continuum. The major functions of the genes in the continuum involved viral structures and life cycles, and viruses from plume locations and Tapajós River showed the highest levels of functional diversity. The distribution patterns of the viral assemblages were defined not only by the occurrence of possible hosts but also by water physical and chemical parameters, especially salinity. The findings presented here help to improve understanding of the possible roles of viruses in the organic matter cycle along the river-ocean continuum. IMPORTANCE The Amazon River forms a vast plume in the Atlantic Ocean that can extend for more than 1,000 km. Microbial communities promote a globally relevant carbon sink system in the plume. Despite the importance of viruses for the global carbon cycle, the diversity and the possible roles of viruses in the Amazon are poorly understood. The present work assesses, for the first time, the abundance and diversity of viruses simultaneously in the river and ocean in order to elucidate their possible roles. DNA sequence assembly yielded 29,358 scaffolds, encoding 82,546 viral proteins, with 15 new complete viral genomes from the 12 river and ocean locations. Viral diversity was clearly distinguished by river and ocean. Bacteriophages were the most abundant and occurred throughout the continuum. Viruses that infect eukaryotes were more abundant in the river, whereas phages appeared to have strong control over the host prokaryotic populations in the plume. PMID:28989970

  12. EOS Data and Information System (EOSDIS). [landsat satellites

    NASA Technical Reports Server (NTRS)

    1992-01-01

    In the past decade, science and technology have reached levels that permit assessments of global environmental change. Scientific success in understanding global environmental change depends on integration and management of numerous data sources. The Global Change Data and Information System (GCDIS) must provide for the management of data, information dissemination, and technology transfer. The Earth Observing System Data and Information System (EOSDIS) is NASA's portion of this global change information system.

  13. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications

    PubMed Central

    Yilmaz, Pelin; Kottmann, Renzo; Field, Dawn; Knight, Rob; Cole, James R; Amaral-Zettler, Linda; Gilbert, Jack A; Karsch-Mizrachi, Ilene; Johnston, Anjanette; Cochrane, Guy; Vaughan, Robert; Hunter, Christopher; Park, Joonhong; Morrison, Norman; Rocca-Serra, Philippe; Sterk, Peter; Arumugam, Manimozhiyan; Bailey, Mark; Baumgartner, Laura; Birren, Bruce W; Blaser, Martin J; Bonazzi, Vivien; Booth, Tim; Bork, Peer; Bushman, Frederic D; Buttigieg, Pier Luigi; Chain, Patrick S G; Charlson, Emily; Costello, Elizabeth K; Huot-Creasy, Heather; Dawyndt, Peter; DeSantis, Todd; Fierer, Noah; Fuhrman, Jed A; Gallery, Rachel E; Gevers, Dirk; Gibbs, Richard A; Gil, Inigo San; Gonzalez, Antonio; Gordon, Jeffrey I; Guralnick, Robert; Hankeln, Wolfgang; Highlander, Sarah; Hugenholtz, Philip; Jansson, Janet; Kau, Andrew L; Kelley, Scott T; Kennedy, Jerry; Knights, Dan; Koren, Omry; Kuczynski, Justin; Kyrpides, Nikos; Larsen, Robert; Lauber, Christian L; Legg, Teresa; Ley, Ruth E; Lozupone, Catherine A; Ludwig, Wolfgang; Lyons, Donna; Maguire, Eamonn; Methé, Barbara A; Meyer, Folker; Muegge, Brian; Nakielny, Sara; Nelson, Karen E; Nemergut, Diana; Neufeld, Josh D; Newbold, Lindsay K; Oliver, Anna E; Pace, Norman R; Palanisamy, Giriprakash; Peplies, Jörg; Petrosino, Joseph; Proctor, Lita; Pruesse, Elmar; Quast, Christian; Raes, Jeroen; Ratnasingham, Sujeevan; Ravel, Jacques; Relman, David A; Assunta-Sansone, Susanna; Schloss, Patrick D; Schriml, Lynn; Sinha, Rohini; Smith, Michelle I; Sodergren, Erica; Spor, Aymé; Stombaugh, Jesse; Tiedje, James M; Ward, Doyle V; Weinstock, George M; Wendel, Doug; White, Owen; Whiteley, Andrew; Wilke, Andreas; Wortman, Jennifer R; Yatsunenko, Tanya; Glöckner, Frank Oliver

    2012-01-01

    Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences—the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The ‘environmental packages’ apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere. PMID:21552244

  14. High-Throughput Next-Generation Sequencing of Polioviruses

    PubMed Central

    Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.

    2016-01-01

    ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929

  15. Prediction of Human Activity by Discovering Temporal Sequence Patterns.

    PubMed

    Li, Kang; Fu, Yun

    2014-08-01

    Early prediction of ongoing human activity has become more valuable in a large variety of time-critical applications. To build an effective representation for prediction, human activities can be characterized by a complex temporal composition of constituent simple actions and interacting objects. Different from early detection on short-duration simple actions, we propose a novel framework for long -duration complex activity prediction by discovering three key aspects of activity: Causality, Context-cue, and Predictability. The major contributions of our work include: (1) a general framework is proposed to systematically address the problem of complex activity prediction by mining temporal sequence patterns; (2) probabilistic suffix tree (PST) is introduced to model causal relationships between constituent actions, where both large and small order Markov dependencies between action units are captured; (3) the context-cue, especially interactive objects information, is modeled through sequential pattern mining (SPM), where a series of action and object co-occurrence are encoded as a complex symbolic sequence; (4) we also present a predictive accumulative function (PAF) to depict the predictability of each kind of activity. The effectiveness of our approach is evaluated on two experimental scenarios with two data sets for each: action-only prediction and context-aware prediction. Our method achieves superior performance for predicting global activity classes and local action units.

  16. Extensive Geographic Mosaicism in Avian Influenza Viruses from Gulls in the Northern Hemisphere

    PubMed Central

    Wille, Michelle; Robertson, Gregory J.; Whitney, Hugh; Bishop, Mary Anne; Runstadler, Jonathan A.; Lang, Andrew S.

    2011-01-01

    Due to limited interaction of migratory birds between Eurasia and America, two independent avian influenza virus (AIV) gene pools have evolved. There is evidence of low frequency reassortment between these regions, which has major implications in global AIV dynamics. Indeed, all currently circulating lineages of the PB1 and PA segments in North America are of Eurasian origin. Large-scale analyses of intercontinental reassortment have shown that viruses isolated from Charadriiformes (gulls, terns, and shorebirds) are the major contributor of these outsider events. To clarify the role of gulls in AIV dynamics, specifically in movement of genes between geographic regions, we have sequenced six gull AIV isolated in Alaska and analyzed these along with 142 other available gull virus sequences. Basic investigations of host species and the locations and times of isolation reveal biases in the available sequence information. Despite these biases, our analyses reveal a high frequency of geographic reassortment in gull viruses isolated in America. This intercontinental gene mixing is not found in the viruses isolated from gulls in Eurasia. This study demonstrates that gulls are important as vectors for geographically reassorted viruses, particularly in America, and that more surveillance effort should be placed on this group of birds. PMID:21697989

  17. Genetic characterization of Anaplasma marginale strains from Tunisia using single and multiple gene typing reveals novel variants with an extensive genetic diversity.

    PubMed

    Ben Said, Mourad; Ben Asker, Alaa; Belkahia, Hanène; Ghribi, Raoua; Selmi, Rachid; Messadi, Lilia

    2018-05-12

    Anaplasma marginale, which is responsible for bovine anaplasmosis in tropical and subtropical regions, is a tick-borne obligatory intraerythrocytic bacterium of cattle and wild ruminants. In Tunisia, information about the genetic diversity and the phylogeny of A. marginale strains are limited to the msp4 gene analysis. The purpose of this study is to investigate A. marginale isolates infecting 16 cattle located in different bioclimatic areas of northern Tunisia with single gene analysis and multilocus sequence typing methods on the basis of seven partial genes (dnaA, ftsZ, groEL, lipA, secY, recA and sucB). The single gene analysis confirmed the presence of different and novel heterogenic A. marginale strains infecting cattle from the north of Tunisia. The concatenated sequence analysis showed a phylogeographical resolution at the global level and that most of the Tunisian sequence types (STs) formed a separate cluster from a South African isolate and from all New World isolates and strains. By combining the characteristics of each single locus with those of the multi-loci scheme, these results provide a more detailed understanding on the diversity and the evolution of Tunisian A. marginale strains. Copyright © 2018 Elsevier GmbH. All rights reserved.

  18. Genomic Tools in Groundnut Breeding Program: Status and Perspectives

    PubMed Central

    Janila, P.; Variath, Murali T.; Pandey, Manish K.; Desmae, Haile; Motagi, Babu N.; Okori, Patrick; Manohar, Surendra S.; Rathnakumar, A. L.; Radhakrishnan, T.; Liao, Boshou; Varshney, Rajeev K.

    2016-01-01

    Groundnut, a nutrient-rich food legume, is cultivated world over. It is valued for its good quality cooking oil, energy and protein rich food, and nutrient-rich fodder. Globally, groundnut improvement programs have developed varieties to meet the preferences of farmers, traders, processors, and consumers. Enhanced yield, tolerance to biotic and abiotic stresses and quality parameters have been the target traits. Spurt in genetic information of groundnut was facilitated by development of molecular markers, genetic, and physical maps, generation of expressed sequence tags (EST), discovery of genes, and identification of quantitative trait loci (QTL) for some important biotic and abiotic stresses and quality traits. The first groundnut variety developed using marker assisted breeding (MAB) was registered in 2003. Since then, USA, China, Japan, and India have begun to use genomic tools in routine groundnut improvement programs. Introgression lines that combine foliar fungal disease resistance and early maturity were developed using MAB. Establishment of marker-trait associations (MTA) paved way to integrate genomic tools in groundnut breeding for accelerated genetic gain. Genomic Selection (GS) tools are employed to improve drought tolerance and pod yield, governed by several minor effect QTLs. Draft genome sequence and low cost genotyping tools such as genotyping by sequencing (GBS) are expected to accelerate use of genomic tools to enhance genetic gains for target traits in groundnut. PMID:27014312

  19. Information capacity of nucleotide sequences and its applications.

    PubMed

    Sadovsky, M G

    2006-05-01

    The information capacity of nucleotide sequences is defined through the specific entropy of frequency dictionary of a sequence determined with respect to another one containing the most probable continuations of shorter strings. This measure distinguishes a sequence both from a random one, and from ordered entity. A comparison of sequences based on their information capacity is studied. An order within the genetic entities is found at the length scale ranged from 3 to 8. Some other applications of the developed methodology to genetics, bioinformatics, and molecular biology are discussed.

  20. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.

    PubMed

    Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

    2016-06-15

    Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx : xin.gao@kaust.edu.sa Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  1. MDR/Omni-band Reconfigurable Terminal: Design Concept

    DTIC Science & Technology

    1998-09-01

    tasks, data bases and major communications flows. Global issues relevant to most of the blocks are then covered. Finally the planned sequence of...and event logger that are detailed in later paragraphs. The BITE(built-in test equipment)/Debugger detail can be found separately in the Global Issues paragraphs...conditions. Every part of the simulator has a BITE/Debugger component, the general description of which is given in Global Issues . Simulator control

  2. Location of core diagnostic information across various sequences in brain MRI and implications for efficiency of MRI scanner utilization.

    PubMed

    Sharma, Aseem; Chatterjee, Arindam; Goyal, Manu; Parsons, Matthew S; Bartel, Seth

    2015-04-01

    Targeting redundancy within MRI can improve its cost-effective utilization. We sought to quantify potential redundancy in our brain MRI protocols. In this retrospective review, we aggregated 207 consecutive adults who underwent brain MRI and reviewed their medical records to document clinical indication, core diagnostic information provided by MRI, and its clinical impact. Contributory imaging abnormalities constituted positive core diagnostic information whereas absence of imaging abnormalities constituted negative core diagnostic information. The senior author selected core sequences deemed sufficient for extraction of core diagnostic information. For validating core sequences selection, four readers assessed the relative ease of extracting core diagnostic information from the core sequences. Potential redundancy was calculated by comparing the average number of core sequences to the average number of sequences obtained. Scanning had been performed using 9.4±2.8 sequences over 37.3±12.3 minutes. Core diagnostic information was deemed extractable from 2.1±1.1 core sequences, with an assumed scanning time of 8.6±4.8 minutes, reflecting a potential redundancy of 74.5%±19.1%. Potential redundancy was least in scans obtained for treatment planning (14.9%±25.7%) and highest in scans obtained for follow-up of benign diseases (81.4%±12.6%). In 97.4% of cases, all four readers considered core diagnostic information to be either easily extractable from core sequences or the ease to be equivalent to that from the entire study. With only one MRI lacking clinical impact (0.48%), overutilization did not seem to contribute to potential redundancy. High potential redundancy that can be targeted for more efficient scanner utilization exists in brain MRI protocols.

  3. Draft Genome Sequence of Pseudomonas putida CA-3, a Bacterium Capable of Styrene Degradation and Medium-Chain-Length Polyhydroxyalkanoate Synthesis.

    PubMed

    Almeida, Eduardo L; Margassery, Lekha M; O'Leary, Niall; Dobson, Alan D W

    2018-01-25

    Pseudomonas putida strain CA-3 is an industrial bioreactor isolate capable of synthesizing biodegradable polyhydroxyalkanoate polymers via the metabolism of styrene and other unrelated carbon sources. The pathways involved are subject to regulation by global cellular processes. The draft genome sequence is 6,177,154 bp long and contains 5,608 predicted coding sequences. Copyright © 2018 Almeida et al.

  4. Placental fetal stem segmentation in a sequence of histology images

    NASA Astrophysics Data System (ADS)

    Athavale, Prashant; Vese, Luminita A.

    2012-02-01

    Recent research in perinatal pathology argues that analyzing properties of the placenta may reveal important information on how certain diseases progress. One important property is the structure of the placental fetal stems. Analysis of the fetal stems in a placenta could be useful in the study and diagnosis of some diseases like autism. To study the fetal stem structure effectively, we need to automatically and accurately track fetal stems through a sequence of digitized hematoxylin and eosin (H&E) stained histology slides. There are many problems in successfully achieving this goal. A few of the problems are: large size of images, misalignment of the consecutive H&E slides, unpredictable inaccuracies of manual tracing, very complicated texture patterns of various tissue types without clear characteristics, just to name a few. In this paper we propose a novel algorithm to achieve automatic tracing of the fetal stem in a sequence of H&E images, based on an inaccurate manual segmentation of a fetal stem in one of the images. This algorithm combines global affine registration, local non-affine registration and a novel 'dynamic' version of the active contours model without edges. We first use global affine image registration of all the images based on displacement, scaling and rotation. This gives us approximate location of the corresponding fetal stem in the image that needs to be traced. We then use the affine registration algorithm "locally" near this location. At this point, we use a fast non-affine registration based on L2-similarity measure and diffusion regularization to get a better location of the fetal stem. Finally, we have to take into account inaccuracies in the initial tracing. This is achieved through a novel dynamic version of the active contours model without edges where the coefficients of the fitting terms are computed iteratively to ensure that we obtain a unique stem in the segmentation. The segmentation thus obtained can then be used as an initial guess to obtain segmentation in the rest of the images in the sequence. This constitutes an important step in the extraction and understanding of the fetal stem vasculature.

  5. The Global Awareness Curriculum in International Business Programs: A Critical Perspective

    ERIC Educational Resources Information Center

    Witte, Anne E.

    2010-01-01

    Designing educational sequences that enhance the cognitive, behavioral, and critical skills of a diverse learning community seeking global competencies, requires mindfulness of different international educational models, a tailored curriculum designed to build different types of awareness learning, and clarity in targeted outputs keeping in mind a…

  6. Feasibility of physical map construction from fingerprinted bacterial artificial chromosome libraries of polyploid plant species

    PubMed Central

    2010-01-01

    Background The presence of closely related genomes in polyploid species makes the assembly of total genomic sequence from shotgun sequence reads produced by the current sequencing platforms exceedingly difficult, if not impossible. Genomes of polyploid species could be sequenced following the ordered-clone sequencing approach employing contigs of bacterial artificial chromosome (BAC) clones and BAC-based physical maps. Although BAC contigs can currently be constructed for virtually any diploid organism with the SNaPshot high-information-content-fingerprinting (HICF) technology, it is currently unknown if this is also true for polyploid species. It is possible that BAC clones from orthologous regions of homoeologous chromosomes would share numerous restriction fragments and be therefore included into common contigs. Because of this and other concerns, physical mapping utilizing the SNaPshot HICF of BAC libraries of polyploid species has not been pursued and the possibility of doing so has not been assessed. The sole exception has been in common wheat, an allohexaploid in which it is possible to construct single-chromosome or single-chromosome-arm BAC libraries from DNA of flow-sorted chromosomes and bypass the obstacles created by polyploidy. Results The potential of the SNaPshot HICF technology for physical mapping of polyploid plants utilizing global BAC libraries was evaluated by assembling contigs of fingerprinted clones in an in silico merged BAC library composed of single-chromosome libraries of two wheat homoeologous chromosome arms, 3AS and 3DS, and complete chromosome 3B. Because the chromosome arm origin of each clone was known, it was possible to estimate the fidelity of contig assembly. On average 97.78% or more clones, depending on the library, were from a single chromosome arm. A large portion of the remaining clones was shown to be library contamination from other chromosomes, a feature that is unavoidable during the construction of single-chromosome BAC libraries. Conclusions The negligibly low level of incorporation of clones from homoeologous chromosome arms into a contig during contig assembly suggested that it is feasible to construct contigs and physical maps using global BAC libraries of wheat and almost certainly also of other plant polyploid species with genome sizes comparable to that of wheat. Because of the high purity of the resulting assembled contigs, they can be directly used for genome sequencing. It is currently unknown but possible that equally good BAC contigs can be also constructed for polyploid species containing smaller, more gene-rich genomes. PMID:20170511

  7. Genome Sequence and Transcriptome Analyses of Chrysochromulina tobin: Metabolic Tools for Enhanced Algal Fitness in the Prominent Order Prymnesiales (Haptophyceae)

    DOE PAGES

    Hovde, Blake T.; Deodato, Chloe R.; Hunsperger, Heather M.; ...

    2015-09-23

    Haptophytes are recognized as seminal players in aquatic ecosystem function. These algae are important in global carbon sequestration, form destructive harmful blooms, and given their rich fatty acid content, serve as a highly nutritive food source to a broad range of eco-cohorts. Haptophyte dominance in both fresh and marine waters is supported by the mixotrophic nature of many taxa. Despite their importance the nuclear genome sequence of only one haptophyte, Emiliania huxleyi (Isochrysidales), is available. Here we report the draft genome sequence of Chrysochromulina tobin (Prymnesiales), and transcriptome data collected at seven time points over a 24-hour light/dark cycle. Themore » nuclear genome of C. tobin is small (59 Mb), compact (~40% of the genome is protein coding) and encodes approximately 16,777 genes. Genes important to fatty acid synthesis, modification, and catabolism show distinct patterns of expression when monitored over the circadian photoperiod. The C. tobin genome harbors the first hybrid polyketide synthase/non-ribosomal peptide synthase gene complex reported for an algal species, and encodes potential anti-microbial peptides and proteins involved in multidrug and toxic compound extrusion. A new haptophyte xanthorhodopsin was also identified, together with two “red” RuBisCO activases that are shared across many algal lineages. In conclusion, the Chrysochromulina tobin genome sequence provides new information on the evolutionary history, ecology and economic importance of haptophytes.« less

  8. Homology-based Modeling of Rhodopsin-like Family Members in the Inactive State: Structural Analysis and Deduction of Tips for Modeling and Optimization.

    PubMed

    Pappalardo, Matteo; Rayan, Mahmoud; Abu-Lafi, Saleh; Leonardi, Martha E; Milardi, Danilo; Guccione, Salvatore; Rayan, Anwar

    2017-08-01

    Modeling G-Protein Coupled Receptors (GPCRs) is an emergent field of research, since utility of high-quality models in receptor structure-based strategies might facilitate the discovery of interesting drug candidates. The findings from a quantitative analysis of eighteen resolved structures of rhodopsin family "A" receptors crystallized with antagonists and 153 pairs of structures are described. A strategy termed endeca-amino acids fragmentation was used to analyze the structures models aiming to detect the relationship between sequence identity and Root Mean Square Deviation (RMSD) at each trans-membrane-domain. Moreover, we have applied the leave-one-out strategy to study the shiftiness likelihood of the helices. The type of correlation between sequence identity and RMSD was studied using the aforementioned set receptors as representatives of membrane proteins and 98 serine proteases with 4753 pairs of structures as representatives of globular proteins. Data analysis using fragmentation strategy revealed that there is some extent of correlation between sequence identity and global RMSD of 11AA width windows. However, spatial conservation is not always close to the endoplasmic side as was reported before. A comparative study with globular proteins shows that GPCRs have higher standard deviation and higher slope in the graph with correlation between sequence identity and RMSD. The extracted information disclosed in this paper could be incorporated in the modeling protocols while using technique for model optimization and refinement. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Complete genome sequence of mumps viruses isolated from patients with parotitis, pancreatitis and encephalitis in India.

    PubMed

    Vaidya, Sunil R; Chowdhury, Deepika T; Jadhav, Santoshkumar M; Hamde, Venkat S

    2016-04-01

    Limited information is available regarding epidemiology of mumps in India. Mumps vaccine is not included in the Universal Immunization Program of India. The complete genome sequences of Indian mumps virus (MuV) isolates are not available, hence this study was performed. Five isolates from bilateral parotitis and pancreatitis patients from Maharashtra, a MuV isolate from unilateral parotitis patient from Tamil Nadu, and a MuV isolate from encephalitis patient from Uttar Pradesh were genotyped by the standard protocol of the World Health Organization and subsequently complete genomes were sequenced. Indian MuV genomes were compared with published MuV genomes, including reference genotypes and eight vaccine strains for the genetic differences. The SH gene analysis revealed that five MuV isolates belonged to genotype C and two belonged to genotype G strains. The percent nucleotide divergence (PND) was 1.1% amongst five MuV genotype C strains and 2.2% amongst two MuV genotype G strains. A comparison with widely used mumps Jeryl Lynn vaccine strain revealed that Indian mumps isolates had 54, 54, 53, 49, 49, 38, and 49 amino acid substitutions in Chennai-2012, Kushinagar-2013, Pune-2008, Osmanabad-2012a, Osmanabad-2012b, Pune-1986 and Pune-2012, respectively. This study reports the complete genome sequences of Indian MuV strains obtained in years 1986, 2008, 2012 and 2013 that may be useful for further studies in India and globally. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. Genome Sequence and Transcriptome Analyses of Chrysochromulina tobin: Metabolic Tools for Enhanced Algal Fitness in the Prominent Order Prymnesiales (Haptophyceae)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hovde, Blake T.; Deodato, Chloe R.; Hunsperger, Heather M.

    Haptophytes are recognized as seminal players in aquatic ecosystem function. These algae are important in global carbon sequestration, form destructive harmful blooms, and given their rich fatty acid content, serve as a highly nutritive food source to a broad range of eco-cohorts. Haptophyte dominance in both fresh and marine waters is supported by the mixotrophic nature of many taxa. Despite their importance the nuclear genome sequence of only one haptophyte, Emiliania huxleyi (Isochrysidales), is available. Here we report the draft genome sequence of Chrysochromulina tobin (Prymnesiales), and transcriptome data collected at seven time points over a 24-hour light/dark cycle. Themore » nuclear genome of C. tobin is small (59 Mb), compact (~40% of the genome is protein coding) and encodes approximately 16,777 genes. Genes important to fatty acid synthesis, modification, and catabolism show distinct patterns of expression when monitored over the circadian photoperiod. The C. tobin genome harbors the first hybrid polyketide synthase/non-ribosomal peptide synthase gene complex reported for an algal species, and encodes potential anti-microbial peptides and proteins involved in multidrug and toxic compound extrusion. A new haptophyte xanthorhodopsin was also identified, together with two “red” RuBisCO activases that are shared across many algal lineages. In conclusion, the Chrysochromulina tobin genome sequence provides new information on the evolutionary history, ecology and economic importance of haptophytes.« less

  11. Transcriptome- Assisted Label-Free Quantitative Proteomics Analysis Reveals Novel Insights into Piper nigrum—Phytophthora capsici Phytopathosystem

    PubMed Central

    Mahadevan, Chidambareswaren; Krishnan, Anu; Saraswathy, Gayathri G.; Surendran, Arun; Jaleel, Abdul; Sakuntala, Manjula

    2016-01-01

    Black pepper (Piper nigrum L.), a tropical spice crop of global acclaim, is susceptible to Phytophthora capsici, an oomycete pathogen which causes the highly destructive foot rot disease. A systematic understanding of this phytopathosystem has not been possible owing to lack of genome or proteome information. In this study, we explain an integrated transcriptome-assisted label-free quantitative proteomics pipeline to study the basal immune components of black pepper when challenged with P. capsici. We report a global identification of 532 novel leaf proteins from black pepper, of which 518 proteins were functionally annotated using BLAST2GO tool. A label-free quantitation of the protein datasets revealed 194 proteins common to diseased and control protein datasets of which 22 proteins showed significant up-regulation and 134 showed significant down-regulation. Ninety-three proteins were identified exclusively on P. capsici infected leaf tissues and 245 were expressed only in mock (control) infected samples. In-depth analysis of our data gives novel insights into the regulatory pathways of black pepper which are compromised during the infection. Differential down-regulation was observed in a number of critical pathways like carbon fixation in photosynthetic organism, cyano-amino acid metabolism, fructose, and mannose metabolism, glutathione metabolism, and phenylpropanoid biosynthesis. The proteomics results were validated with real-time qRT-PCR analysis. We were also able to identify the complete coding sequences for all the proteins of which few selected genes were cloned and sequence characterized for further confirmation. Our study is the first report of a quantitative proteomics dataset in black pepper which provides convincing evidence on the effectiveness of a transcriptome-based label-free proteomics approach for elucidating the host response to biotic stress in a non-model spice crop like P. nigrum, for which genome information is unavailable. Our dataset will serve as a useful resource for future studies in this plant. Data are available via ProteomeXchange with identifier PXD003887. PMID:27379110

  12. Transcriptome- Assisted Label-Free Quantitative Proteomics Analysis Reveals Novel Insights into Piper nigrum-Phytophthora capsici Phytopathosystem.

    PubMed

    Mahadevan, Chidambareswaren; Krishnan, Anu; Saraswathy, Gayathri G; Surendran, Arun; Jaleel, Abdul; Sakuntala, Manjula

    2016-01-01

    Black pepper (Piper nigrum L.), a tropical spice crop of global acclaim, is susceptible to Phytophthora capsici, an oomycete pathogen which causes the highly destructive foot rot disease. A systematic understanding of this phytopathosystem has not been possible owing to lack of genome or proteome information. In this study, we explain an integrated transcriptome-assisted label-free quantitative proteomics pipeline to study the basal immune components of black pepper when challenged with P. capsici. We report a global identification of 532 novel leaf proteins from black pepper, of which 518 proteins were functionally annotated using BLAST2GO tool. A label-free quantitation of the protein datasets revealed 194 proteins common to diseased and control protein datasets of which 22 proteins showed significant up-regulation and 134 showed significant down-regulation. Ninety-three proteins were identified exclusively on P. capsici infected leaf tissues and 245 were expressed only in mock (control) infected samples. In-depth analysis of our data gives novel insights into the regulatory pathways of black pepper which are compromised during the infection. Differential down-regulation was observed in a number of critical pathways like carbon fixation in photosynthetic organism, cyano-amino acid metabolism, fructose, and mannose metabolism, glutathione metabolism, and phenylpropanoid biosynthesis. The proteomics results were validated with real-time qRT-PCR analysis. We were also able to identify the complete coding sequences for all the proteins of which few selected genes were cloned and sequence characterized for further confirmation. Our study is the first report of a quantitative proteomics dataset in black pepper which provides convincing evidence on the effectiveness of a transcriptome-based label-free proteomics approach for elucidating the host response to biotic stress in a non-model spice crop like P. nigrum, for which genome information is unavailable. Our dataset will serve as a useful resource for future studies in this plant. Data are available via ProteomeXchange with identifier PXD003887.

  13. National Earthquake Information Center Seismic Event Detections on Multiple Scales

    NASA Astrophysics Data System (ADS)

    Patton, J.; Yeck, W. L.; Benz, H.; Earle, P. S.; Soto-Cordero, L.; Johnson, C. E.

    2017-12-01

    The U.S. Geological Survey National Earthquake Information Center (NEIC) monitors seismicity on local, regional, and global scales using automatic picks from more than 2,000 near-real time seismic stations. This presents unique challenges in automated event detection due to the high variability in data quality, network geometries and density, and distance-dependent variability in observed seismic signals. To lower the overall detection threshold while minimizing false detection rates, NEIC has begun to test the incorporation of new detection and picking algorithms, including multiband (Lomax et al., 2012) and kurtosis (Baillard et al., 2014) pickers, and a new bayesian associator (Glass 3.0). The Glass 3.0 associator allows for simultaneous processing of variably scaled detection grids, each with a unique set of nucleation criteria (e.g., nucleation threshold, minimum associated picks, nucleation phases) to meet specific monitoring goals. We test the efficacy of these new tools on event detection in networks of various scales and geometries, compare our results with previous catalogs, and discuss lessons learned. For example, we find that on local and regional scales, rapid nucleation of small events may require event nucleation with both P and higher-amplitude secondary phases (e.g., S or Lg). We provide examples of the implementation of a scale-independent associator for an induced seismicity sequence (local-scale), a large aftershock sequence (regional-scale), and for monitoring global seismicity. Baillard, C., Crawford, W. C., Ballu, V., Hibert, C., & Mangeney, A. (2014). An automatic kurtosis-based P-and S-phase picker designed for local seismic networks. Bulletin of the Seismological Society of America, 104(1), 394-409. Lomax, A., Satriano, C., & Vassallo, M. (2012). Automatic picker developments and optimization: FilterPicker - a robust, broadband picker for real-time seismic monitoring and earthquake early-warning, Seism. Res. Lett. , 83, 531-540, doi: 10.1785/gssrl.83.3.531.

  14. Mars Global Surveyor Mission: Environmental Assessment

    NASA Technical Reports Server (NTRS)

    1995-01-01

    This environmental assessment addresses the proposed action to complete the integration and launch the Mars Global Surveyor (MGS) spacecraft from Cape Canaveral Air Station (CCAS), Florida, during the launch window in November 1996. Mars Global Surveyor is part of the Solar System Exploration Program to the inner planets designed to maintain a sufficient level of scientific investigation and accomplishment so that the United States retains a leading position in solar system exploration through the end of the century. The Program consists of a specific sequence of missions, based on technological readiness, launch opportunities, rapidity of data return, and a balance of scientific disciplines. The purpose of the MGS mission would be to deliver a spacecraft platform to a low-altitude polar orbit around Mars where it would collect global observations of basic geological, geophysical, and climatological processes of the planet. To satisfy this purpose, the MGS mission would support a scientific set of objectives. Detailed global maps of surface topography, the distribution of minerals, the planet's mass, size, and shape, the characterization of Mars gravitational and magnetic fields, and the monitoring of global weather, collected over the period of one Martian year (about two Earth years), would help answer some of the questions about the evolution of Mars. Such an investigation would help scientists better understand the current state of water on Mars, the evolution of the planet's atmosphere, and the factors that led to major changes in the Martian climate. It would also provide much needed information on the magnetic field of Mars. Data collected from this mission would provide insight into the evolution of both Earth and the solar system, as well as demonstrate technological approaches that could be applicable to future Mars missions.

  15. Genomic Sequencing of Bordetella pertussis for Epidemiology and Global Surveillance of Whooping Cough.

    PubMed

    Bouchez, Valérie; Guglielmini, Julien; Dazas, Mélody; Landier, Annie; Toubiana, Julie; Guillot, Sophie; Criscuolo, Alexis; Brisse, Sylvain

    2018-06-01

    Bordetella pertussis causes whooping cough, a highly contagious respiratory disease that is reemerging in many world regions. The spread of antigen-deficient strains may threaten acellular vaccine efficacy. Dynamics of strain transmission are poorly defined because of shortcomings in current strain genotyping methods. Our objective was to develop a whole-genome genotyping strategy with sufficient resolution for local epidemiologic questions and sufficient reproducibility to enable international comparisons of clinical isolates. We defined a core genome multilocus sequence typing scheme comprising 2,038 loci and demonstrated its congruence with whole-genome single-nucleotide polymorphism variation. Most cases of intrafamilial groups of isolates or of multiple isolates recovered from the same patient were distinguished from temporally and geographically cocirculating isolates. However, epidemiologically unrelated isolates were sometimes nearly undistinguishable. We set up a publicly accessible core genome multilocus sequence typing database to enable global comparisons of B. pertussis isolates, opening the way for internationally coordinated surveillance.

  16. Unique features of a global human ectoparasite identified through sequencing of the bed bug genome.

    PubMed

    Benoit, Joshua B; Adelman, Zach N; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C; Szuter, Elise M; Hagan, Richard W; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M; Nelson, David R; Rosendale, Andrew J; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R; Ioannidis, Panagiotis; Waterhouse, Robert M; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J Spencer; Gondhalekar, Ameya D; Scharf, Michael E; Peterson, Brittany F; Raje, Kapil R; Hottel, Benjamin A; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S T; Duncan, Elizabeth J; Murali, Shwetha C; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C; Muzny, Donna M; Wheeler, David; Panfilio, Kristen A; Vargas Jentzsch, Iris M; Vargo, Edward L; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T; Anderson, Michelle A E; Jones, Jeffery W; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D; Attardo, Geoffrey M; Robertson, Hugh M; Zdobnov, Evgeny M; Ribeiro, Jose M C; Gibbs, Richard A; Werren, John H; Palli, Subba R; Schal, Coby; Richards, Stephen

    2016-02-02

    The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host-symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human-bed bug and symbiont-bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite.

  17. Unique features of a global human ectoparasite identified through sequencing of the bed bug genome

    PubMed Central

    Benoit, Joshua B.; Adelman, Zach N.; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C.; Szuter, Elise M.; Hagan, Richard W.; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M.; Nelson, David R.; Rosendale, Andrew J.; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M.; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R.; Ioannidis, Panagiotis; Waterhouse, Robert M.; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J. Spencer; Gondhalekar, Ameya D.; Scharf, Michael E.; Peterson, Brittany F.; Raje, Kapil R.; Hottel, Benjamin A.; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S. T.; Duncan, Elizabeth J.; Murali, Shwetha C.; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L.; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C.; Muzny, Donna M.; Wheeler, David; Panfilio, Kristen A.; Vargas Jentzsch, Iris M.; Vargo, Edward L.; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T.; Anderson, Michelle A. E.; Jones, Jeffery W.; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D.; Attardo, Geoffrey M.; Robertson, Hugh M.; Zdobnov, Evgeny M.; Ribeiro, Jose M. C.; Gibbs, Richard A.; Werren, John H.; Palli, Subba R.; Schal, Coby; Richards, Stephen

    2016-01-01

    The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host–symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human–bed bug and symbiont–bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite. PMID:26836814

  18. Improvement of training set structure in fusion data cleaning using Time-Domain Global Similarity method

    NASA Astrophysics Data System (ADS)

    Liu, J.; Lan, T.; Qin, H.

    2017-10-01

    Traditional data cleaning identifies dirty data by classifying original data sequences, which is a class-imbalanced problem since the proportion of incorrect data is much less than the proportion of correct ones for most diagnostic systems in Magnetic Confinement Fusion (MCF) devices. When using machine learning algorithms to classify diagnostic data based on class-imbalanced training set, most classifiers are biased towards the major class and show very poor classification rates on the minor class. By transforming the direct classification problem about original data sequences into a classification problem about the physical similarity between data sequences, the class-balanced effect of Time-Domain Global Similarity (TDGS) method on training set structure is investigated in this paper. Meanwhile, the impact of improved training set structure on data cleaning performance of TDGS method is demonstrated with an application example in EAST POlarimetry-INTerferometry (POINT) system.

  19. 78 FR 17232 - Meeting of the Global Justice Information Sharing Initiative Federal Advisory Committee

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-03-20

    ... DEPARTMENT OF JUSTICE Office of Justice Programs [OJP (BJA) Docket No. 1616] Meeting of the Global Justice Information Sharing Initiative Federal Advisory Committee AGENCY: Office of Justice Programs (OJP... Information Sharing Initiative (Global) Federal Advisory Committee (GAC) to discuss the Global Initiative, as...

  20. Using small RNA (sRNA) deep sequencing to understand global virus distribution in plants

    USDA-ARS?s Scientific Manuscript database

    Small RNAs (sRNAs), a class of regulatory RNAs, have been used to serve as the specificity determinants of suppressing gene expression in plants and animals. Next generation sequencing (NGS) uncovered the sRNA landscape in most organisms including their associated microbes. In the current study, w...

  1. Genome Sequences of Multidrug-Resistant, Colistin-Susceptible and -Resistant Klebsiella pneumoniae Clinical Isolates from Pakistan

    PubMed Central

    Crawford, Matthew A.; Timme, Ruth; Lomonaco, Sara; Lascols, Christine; Fisher, Debra J.; Sharma, Shashi K.; Strain, Errol; Allard, Marc W.; Brown, Eric W.; McFarland, Melinda A.; Croley, Tim; Hammack, Thomas S.; Weigel, Linda M.; Anderson, Kevin; Hodge, David R.; Pillai, Segaran P.; Morse, Stephen A.; Khan, Erum

    2016-01-01

    The emergence and spread of colistin resistance among multidrug-resistant (MDR) Klebsiella pneumoniae represent a critical threat to global health. Here, we report the complete genome sequences of 10 MDR, colistin-susceptible and -resistant K. pneumoniae clinical isolates obtained in Pakistan between 2010 and 2013. PMID:27979956

  2. Detection of distorted frames in retinal video-sequences via machine learning

    NASA Astrophysics Data System (ADS)

    Kolar, Radim; Liberdova, Ivana; Odstrcilik, Jan; Hracho, Michal; Tornow, Ralf P.

    2017-07-01

    This paper describes detection of distorted frames in retinal sequences based on set of global features extracted from each frame. The feature vector is consequently used in classification step, in which three types of classifiers are tested. The best classification accuracy 96% has been achieved with support vector machine approach.

  3. A global meta-analysis of Tuber ITS rDNA sequences: species diversity, host associations and long-distance dispersal

    Treesearch

    Gregory M. Bonito; Andrii P. Gryganskyi; James M. Trappe; Rytas Vilgalys

    2010-01-01

    Truffles (Tuber) are ectomycorrhizal fungi characterized by hypogeous fruitbodies. Their biodiversity, host associations and geographical distributions are not well documented. ITS rDNA sequences of Tuber are commonly recovered from molecular surveys of fungal communities, but most remain insufficiently identified making it...

  4. Low diversity in the mitogenome of sperm whales revealed by next-generation sequencing

    Treesearch

    Alana Alexander; Debbie Steel; Beth Slikas; Kendra Hoekzema; Colm Carraher; Matthew Parks; Richard Cronn; C. Scott Baker

    2012-01-01

    Large population sizes and global distributions generally associate with high mitochondrial DNA control region (CR) diversity. The sperm whale (Physeter macrocephalus) is an exception, showing low CR diversity relative to other cetaceans; however, diversity levels throughout the remainder of the sperm whale mitogenome are unknown. We sequenced 20...

  5. Genetic characterization of the non-structural protein-3 gene of bluetongue virus serotype-2 isolate from India.

    PubMed

    Pudupakam, Raghavendra Sumanth; Raghunath, Shobana; Pudupakam, Meghanath; Daggupati, Sreenivasulu

    2017-03-01

    Sequence analysis and phylogenetic studies based on non-structural protein-3 (NS3) gene are important in understanding the evolution and epidemiology of bluetongue virus (BTV). This study was aimed at characterizing the NS3 gene sequence of Indian BTV serotype-2 (BTV2) to elucidate its genetic relationship to global BTV isolates. The NS3 gene of BTV2 was amplified from infected BHK-21 cell cultures, cloned and subjected to sequence analysis. The generated NS3 gene sequence was compared with the corresponding sequences of different BTV serotypes across the world, and a phylogenetic relationship was established. The NS3 gene of BTV2 showed moderate levels of variability in comparison to different BTV serotypes, with nucleotide sequence identities ranging from 81% to 98%. The region showed high sequence homology of 93-99% at amino acid level with various BTV serotypes. The PPXY/PTAP late domain motifs, glycosylation sites, hydrophobic domains, and the amino acid residues critical for virus-host interactions were conserved in NS3 protein. Phylogenetic analysis revealed that BTV isolates segregate into four topotypes and that the Indian BTV2 in subclade IA is closely related to Asian and Australian origin strains. Analysis of the NS3 gene indicated that Indian BTV2 isolate is closely related to strains from Asia and Australia, suggesting a common origin of infection. Although the pattern of evolution of BTV2 isolate is different from other global isolates, the deduced amino acid sequence of NS3 protein demonstrated high molecular stability.

  6. Genetic characterization of the non-structural protein-3 gene of bluetongue virus serotype-2 isolate from India

    PubMed Central

    Pudupakam, Raghavendra Sumanth; Raghunath, Shobana; Pudupakam, Meghanath; Daggupati, Sreenivasulu

    2017-01-01

    Aim: Sequence analysis and phylogenetic studies based on non-structural protein-3 (NS3) gene are important in understanding the evolution and epidemiology of bluetongue virus (BTV). This study was aimed at characterizing the NS3 gene sequence of Indian BTV serotype-2 (BTV2) to elucidate its genetic relationship to global BTV isolates. Materials and Methods: The NS3 gene of BTV2 was amplified from infected BHK-21 cell cultures, cloned and subjected to sequence analysis. The generated NS3 gene sequence was compared with the corresponding sequences of different BTV serotypes across the world, and a phylogenetic relationship was established. Results: The NS3 gene of BTV2 showed moderate levels of variability in comparison to different BTV serotypes, with nucleotide sequence identities ranging from 81% to 98%. The region showed high sequence homology of 93-99% at amino acid level with various BTV serotypes. The PPXY/PTAP late domain motifs, glycosylation sites, hydrophobic domains, and the amino acid residues critical for virus-host interactions were conserved in NS3 protein. Phylogenetic analysis revealed that BTV isolates segregate into four topotypes and that the Indian BTV2 in subclade IA is closely related to Asian and Australian origin strains. Conclusion: Analysis of the NS3 gene indicated that Indian BTV2 isolate is closely related to strains from Asia and Australia, suggesting a common origin of infection. Although the pattern of evolution of BTV2 isolate is different from other global isolates, the deduced amino acid sequence of NS3 protein demonstrated high molecular stability. PMID:28435199

  7. Sequencing actions: an information-search study of tradeoffs of priorities against spatiotemporal constraints.

    PubMed

    Gärling, T

    1996-09-01

    How people choose between sequences of actions was investigated in an everyday errand-planning task. In this task subjects chose the preferred sequence of performing a number of errands in a fictitious environment. Two experiments were conducted with undergraduate students serving as subjects. One group searched information about each alternative. The same information was directly available to another group. In Experiment 1 the results showed that for two errands subjects took into account all attributes describing the errands, thus suggesting a tradeoff between priority, wait time, and travel distance with priority being the most important. Consistent with this finding predominantly intraalternative information search was observed. These results were replicated in Experiment 2 for three errands. In addition choice outcomes, information search, and sequence of responding suggested that for more than two actions sequence choices are made in stages.

  8. Global Maritime Awareness

    DTIC Science & Technology

    2009-06-01

    to maritime information Mission: Act as a Maritime Awareness Coordinator and data critical to building situational awareness . We are...Maritime Awareness Technical Sub-committee (NMATS) July 2008 Desired Outcome Maritime Information Exchange Vision: Global maritime information ...Global Maritime Situational Awareness I i i i 1 Information Hubs n t at ves: . 2. MSSIS (Maritime Safety & Security Information Systems

  9. Genomic Epidemiology of Vibrio cholerae O1 Associated with Floods, Pakistan, 2010

    PubMed Central

    Shah, Muhammad Ali; Mutreja, Ankur; Thomson, Nicholas; Baker, Stephen; Parkhill, Julian; Dougan, Gordon; Bokhari, Habib

    2014-01-01

    In August 2010, Pakistan experienced major floods and a subsequent cholera epidemic. To clarify the population dynamics and transmission of Vibrio cholerae in Pakistan, we sequenced the genomes of all V. cholerae O1 El Tor isolates and compared the sequences to a global collection of 146 V. cholerae strains. Within the global phylogeny, all isolates from Pakistan formed 2 new subclades (PSC-1 and PSC-2), lying in the third transmission wave of the seventh-pandemic lineage that could be distinguished by signature deletions and their antimicrobial susceptibilities. Geographically, PSC-1 isolates originated from the coast, whereas PSC-2 isolates originated from inland areas flooded by the Indus River. Single-nucleotide polymorphism accumulation analysis correlated river flow direction with the spread of PSC-2. We found at least 2 sources of cholera in Pakistan during the 2010 epidemic and illustrate the value of a global genomic data bank in contextualizing cholera outbreaks. PMID:24378019

  10. Genomic epidemiology of Vibrio cholerae O1 associated with floods, Pakistan, 2010.

    PubMed

    Shah, Muhammad Ali; Mutreja, Ankur; Thomson, Nicholas; Baker, Stephen; Parkhill, Julian; Dougan, Gordon; Bokhari, Habib; Wren, Brendan W

    2014-01-01

    In August 2010, Pakistan experienced major floods and a subsequent cholera epidemic. To clarify the population dynamics and transmission of Vibrio cholerae in Pakistan, we sequenced the genomes of all V. cholerae O1 El Tor isolates and compared the sequences to a global collection of 146 V. cholerae strains. Within the global phylogeny, all isolates from Pakistan formed 2 new subclades (PSC-1 and PSC-2), lying in the third transmission wave of the seventh-pandemic lineage that could be distinguished by signature deletions and their antimicrobial susceptibilities. Geographically, PSC-1 isolates originated from the coast, whereas PSC-2 isolates originated from inland areas flooded by the Indus River. Single-nucleotide polymorphism accumulation analysis correlated river flow direction with the spread of PSC-2. We found at least 2 sources of cholera in Pakistan during the 2010 epidemic and illustrate the value of a global genomic data bank in contextualizing cholera outbreaks.

  11. The limit space of a Cauchy sequence of globally hyperbolic spacetimes

    NASA Astrophysics Data System (ADS)

    Noldus, Johan

    2004-02-01

    In this second paper, I construct a limit space of a Cauchy sequence of globally hyperbolic spacetimes. In section 2, I work gradually towards a construction of the limit space. I prove that the limit space is unique up to isometry. I also show that, in general, the limit space has quite complicated causal behaviour. This work prepares the final paper in which I shall study in more detail properties of the limit space and the moduli space of (compact) globally hyperbolic spacetimes (cobordisms). As a fait divers, I give in this paper a suitable definition of dimension of a Lorentz space in agreement with the one given by Gromov in the Riemannian case. The difference in philosophy between Lorentzian and Riemannian geometry is one of relativism versus absolutism. In the latter every point distinguishes itself while in the former in general two elements get distinguished by a third, different, one.

  12. Misconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants.

    PubMed

    Eaton, Deren A R; Spriggs, Elizabeth L; Park, Brian; Donoghue, Michael J

    2017-05-01

    Restriction-site associated DNA (RAD) sequencing and related methods rely on the conservation of enzyme recognition sites to isolate homologous DNA fragments for sequencing, with the consequence that mutations disrupting these sites lead to missing information. There is thus a clear expectation for how missing data should be distributed, with fewer loci recovered between more distantly related samples. This observation has led to a related expectation: that RAD-seq data are insufficiently informative for resolving deeper scale phylogenetic relationships. Here we investigate the relationship between missing information among samples at the tips of a tree and information at edges within it. We re-analyze and review the distribution of missing data across ten RAD-seq data sets and carry out simulations to determine expected patterns of missing information. We also present new empirical results for the angiosperm clade Viburnum (Adoxaceae, with a crown age >50 Ma) for which we examine phylogenetic information at different depths in the tree and with varied sequencing effort. The total number of loci, the proportion that are shared, and phylogenetic informativeness varied dramatically across the examined RAD-seq data sets. Insufficient or uneven sequencing coverage accounted for similar proportions of missing data as dropout from mutation-disruption. Simulations reveal that mutation-disruption, which results in phylogenetically distributed missing data, can be distinguished from the more stochastic patterns of missing data caused by low sequencing coverage. In Viburnum, doubling sequencing coverage nearly doubled the number of parsimony informative sites, and increased by >10X the number of loci with data shared across >40 taxa. Our analysis leads to a set of practical recommendations for maximizing phylogenetic information in RAD-seq studies. [hierarchical redundancy; phylogenetic informativeness; quartet informativeness; Restriction-site associated DNA (RAD) sequencing; sequencing coverage; Viburnum.]. © The authors 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please e-mail: journals.permission@oup.com.

  13. Cenozoic global sea level, sequences, and the New Jersey transect: Results from coastal plain and continental slope drilling

    USGS Publications Warehouse

    Miller, K.G.; Mountain, Gregory S.; Browning, J.V.; Kominz, M.; Sugarman, P.J.; Christie-Blick, N.; Katz, M.E.; Wright, J.D.

    1998-01-01

    The New Jersey Sea Level Transect was designed to evaluate the relationships among global sea level (eustatic) change, unconformity-bounded sequences, and variations in subsidence, sediment supply, and climate on a passive continental margin. By sampling and dating Cenozoic strata from coastal plain and continental slope locations, we show that sequence boundaries correlate (within ??0.5 myr) regionally (onshore-offshore) and interregionally (New Jersey-Alabama-Bahamas), implicating a global cause. Sequence boundaries correlate with ??18O increases for at least the past 42 myr, consistent with an ice volume (glacioeustatic) control, although a causal relationship is not required because of uncertainties in ages and correlations. Evidence for a causal connection is provided by preliminary Miocene data from slope Site 904 that directly link ??18O increases with sequence boundaries. We conclude that variation in the size of ice sheets has been a primary control on the formation of sequence boundaries since ~42 Ma. We speculate that prior to this, the growth and decay of small ice sheets caused small-amplitude sea level changes (<20 m) in this supposedly ice-free world because Eocene sequence boundaries also appear to correlate with minor ??18O increases. Subsidence estimates (backstripping) indicate amplitudes of short-term (million-year scale) lowerings that are consistent with estimates derived from ??18O studies (25-50 m in the Oligocene-middle Miocene and 10-20 m in the Eocene) and a long-term lowering of 150-200 m over the past 65 myr, consistent with estimates derived from volume changes on mid-ocean ridges. Although our results are consistent with the general number and timing of Paleocene to middle Miocene sequences published by workers at Exxon Production Research Company, our estimates of sea level amplitudes are substantially lower than theirs. Lithofacies patterns within sequences follow repetitive, predictable patterns: (1) coastal plain sequences consist of basal transgressive sands overlain by regressive highstand silts and quartz sands; and (2) although slope lithofacies variations are subdued, reworked sediments constitute lowstand deposits, causing the strongest, most extensive seismic reflections. Despite a primary eustatic control on sequence boundaries, New Jersey sequences were also influenced by changes in tectonics, sediment supply, and climate. During the early to middle Eocene, low siliciclastic and high pelagic input associated with warm climates resulted in widespread carbonate deposition and thin sequences. Late middle Eocene and earliest Oligocene cooling events curtailed carbonate deposition in the coastal plain and slope, respectively, resulting in a switch to siliciclastic sedimentation. In onshore areas, Oligocene sequences are thin owing to low siliciclastic and pelagic input, and their distribution is patchy, reflecting migration or progradation of depocenters; in contrast, Miocene onshore sequences are thicker, reflecting increased sediment supply, and they are more complete downdip owing to simple tectonics. We conclude that the New Jersey margin provides a natural laboratory for unraveling complex interactions of eustasy, tectonics, changes in sediment supply, and climate change.

  14. Early Miocene sequence development across the New Jersey margin

    USGS Publications Warehouse

    Monteverde, D.H.; Mountain, Gregory S.; Miller, K.G.

    2008-01-01

    Sequence stratigraphy provides an understanding of the interplay between eustasy, sediment supply and accommodation in the sedimentary construction of passive margins. We used this approach to follow the early to middle Miocene growth of the New Jersey margin and analyse the connection between relative changes of sea level and variable sediment supply. Eleven candidate sequence boundaries were traced in high-resolution multi-channel seismic profiles across the inner margin and matched to geophysical log signatures and lithologic changes in ODP Leg 150X onshore coreholes. Chronologies at these drill sites were then used to assign ages to the intervening seismic sequences. We conclude that the regional and global correlation of early Miocene sequences suggests a dominant role of global sea-level change but margin progradation was controlled by localized sediment contribution and that local conditions played a large role in sequence formation and preservation. Lowstand deposits were regionally restricted and their locations point to both single and multiple sediment sources. The distribution of highstand deposits, by contrast, documents redistribution by along shelf currents. We find no evidence that sea level fell below the elevation of the clinoform rollover, and the existence of extensive lowstand deposits seaward of this inflection point indicates efficient cross-shelf sediment transport mechanisms despite the apparent lack of well-developed fluvial drainage. ?? 2008 The Authors. Journal compilation ?? 2008 Blackwell Publishing.

  15. Barcoded NS31/AML2 primers for sequencing of arbuscular mycorrhizal communities in environmental samples1

    PubMed Central

    Morgan, Benjamin S. T.; Egerton-Warburton, Louise M.

    2017-01-01

    Premise of the study: Arbuscular mycorrhizal fungi (AMF) are globally important root symbioses that enhance plant growth and nutrition and influence ecosystem structure and function. To better characterize levels of AMF diversity relevant to ecosystem function, deeper sequencing depth in environmental samples is needed. In this study, Illumina barcoded primers and a bioinformatics pipeline were developed and applied to study AMF diversity and community structure in environmental samples. Methods: Libraries of small subunit ribosomal RNA fragment amplicons were amplified from environmental DNA using a single-step PCR reaction with barcoded NS31/AML2 primers. Amplicons were sequenced on an Illumina MiSeq sequencer using version 2, 2 × 250-bp paired-end chemistry, and analyzed using QIIME and RDP Classifier. Results: Sequencing captured 196 to 6416 operational taxonomic units (OTUs; depending on clustering parameters) representing nine AMF genera. Regardless of clustering parameters, ∼20 OTUs dominated AMF communities (78–87% reads) with the remaining reads distributed among other OTUs. Analyses also showed significant biogeographic differences in AMF communities and that community composition could be linked to specific edaphic factors. Discussion: Barcoded NS31/AML2 primers and Illumina MiSeq sequencing provide a powerful approach to address AMF diversity and variations in fungal assemblages across host plants, ecosystems, and responses to environmental drivers including global change. PMID:28924511

  16. The clinical trial landscape in oncology and connectivity of somatic mutational profiles to targeted therapies.

    PubMed

    Patterson, Sara E; Liu, Rangjiao; Statz, Cara M; Durkin, Daniel; Lakshminarayana, Anuradha; Mockus, Susan M

    2016-01-16

    Precision medicine in oncology relies on rapid associations between patient-specific variations and targeted therapeutic efficacy. Due to the advancement of genomic analysis, a vast literature characterizing cancer-associated molecular aberrations and relative therapeutic relevance has been published. However, data are not uniformly reported or readily available, and accessing relevant information in a clinically acceptable time-frame is a daunting proposition, hampering connections between patients and appropriate therapeutic options. One important therapeutic avenue for oncology patients is through clinical trials. Accordingly, a global view into the availability of targeted clinical trials would provide insight into strengths and weaknesses and potentially enable research focus. However, data regarding the landscape of clinical trials in oncology is not readily available, and as a result, a comprehensive understanding of clinical trial availability is difficult. To support clinical decision-making, we have developed a data loader and mapper that connects sequence information from oncology patients to data stored in an in-house database, the JAX Clinical Knowledgebase (JAX-CKB), which can be queried readily to access comprehensive data for clinical reporting via customized reporting queries. JAX-CKB functions as a repository to house expertly curated clinically relevant data surrounding our 358-gene panel, the JAX Cancer Treatment Profile (JAX CTP), and supports annotation of functional significance of molecular variants. Through queries of data housed in JAX-CKB, we have analyzed the landscape of clinical trials relevant to our 358-gene targeted sequencing panel to evaluate strengths and weaknesses in current molecular targeting in oncology. Through this analysis, we have identified patient indications, molecular aberrations, and targeted therapy classes that have strong or weak representation in clinical trials. Here, we describe the development and disseminate system methods for associating patient genomic sequence data with clinically relevant information, facilitating interpretation and providing a mechanism for informing therapeutic decision-making. Additionally, through customized queries, we have the capability to rapidly analyze the landscape of targeted therapies in clinical trials, enabling a unique view into current therapeutic availability in oncology.

  17. Elman RNN based classification of proteins sequences on account of their mutual information.

    PubMed

    Mishra, Pooja; Nath Pandey, Paras

    2012-10-21

    In the present work we have employed the method of estimating residue correlation within the protein sequences, by using the mutual information (MI) of adjacent residues, based on structural and solvent accessibility properties of amino acids. The long range correlation between nonadjacent residues is improved by constructing a mutual information vector (MIV) for a single protein sequence, like this each protein sequence is associated with its corresponding MIVs. These MIVs are given to Elman RNN to obtain the classification of protein sequences. The modeling power of MIV was shown to be significantly better, giving a new approach towards alignment free classification of protein sequences. We also conclude that sequence structural and solvent accessible property based MIVs are better predictor. Copyright © 2012 Elsevier Ltd. All rights reserved.

  18. Image encryption using random sequence generated from generalized information domain

    NASA Astrophysics Data System (ADS)

    Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu

    2016-05-01

    A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.

  19. World data centre for microorganisms: an information infrastructure to explore and utilize preserved microbial strains worldwide

    PubMed Central

    Wu, Linhuan; Sun, Qinglan; Desmeth, Philippe; Sugawara, Hideaki; Xu, Zhenghong; McCluskey, Kevin; Smith, David; Alexander, Vasilenko; Lima, Nelson; Ohkuma, Moriya; Robert, Vincent; Zhou, Yuguang; Li, Jianhui; Fan, Guomei; Ingsriswang, Supawadee; Ozerskaya, Svetlana; Ma, Juncai

    2017-01-01

    The World Data Centre for Microorganisms (WDCM) was established 50 years ago as the data center of the World Federation for Culture Collections (WFCC)—Microbial Resource Center (MIRCEN). WDCM aims to provide integrated information services using big data technology for microbial resource centers and microbiologists all over the world. Here, we provide an overview of WDCM including all of its integrated services. Culture Collections Information Worldwide (CCINFO) provides metadata information on 708 culture collections from 72 countries and regions. Global Catalogue of Microorganism (GCM) gathers strain catalogue information and provides a data retrieval, analysis, and visualization system of microbial resources. Currently, GCM includes >368 000 strains from 103 culture collections in 43 countries and regions. Analyzer of Bioresource Citation (ABC) is a data mining tool extracting strain related publications, patents, nucleotide sequences and genome information from public data sources to form a knowledge base. Reference Strain Catalogue (RSC) maintains a database of strains listed in International Standards Organization (ISO) and other international or regional standards. RSC allocates a unique identifier to strains recommended for use in diagnosis and quality control, and hence serves as a valuable cross-platform reference. WDCM provides free access to all these services at www.wdcm.org. PMID:28053166

  20. Task planning with uncertainty for robotic systems. Thesis

    NASA Technical Reports Server (NTRS)

    Cao, Tiehua

    1993-01-01

    In a practical robotic system, it is important to represent and plan sequences of operations and to be able to choose an efficient sequence from them for a specific task. During the generation and execution of task plans, different kinds of uncertainty may occur and erroneous states need to be handled to ensure the efficiency and reliability of the system. An approach to task representation, planning, and error recovery for robotic systems is demonstrated. Our approach to task planning is based on an AND/OR net representation, which is then mapped to a Petri net representation of all feasible geometric states and associated feasibility criteria for net transitions. Task decomposition of robotic assembly plans based on this representation is performed on the Petri net for robotic assembly tasks, and the inheritance of properties of liveness, safeness, and reversibility at all levels of decomposition are explored. This approach provides a framework for robust execution of tasks through the properties of traceability and viability. Uncertainty in robotic systems are modeled by local fuzzy variables, fuzzy marking variables, and global fuzzy variables which are incorporated in fuzzy Petri nets. Analysis of properties and reasoning about uncertainty are investigated using fuzzy reasoning structures built into the net. Two applications of fuzzy Petri nets, robot task sequence planning and sensor-based error recovery, are explored. In the first application, the search space for feasible and complete task sequences with correct precedence relationships is reduced via the use of global fuzzy variables in reasoning about subgoals. In the second application, sensory verification operations are modeled by mutually exclusive transitions to reason about local and global fuzzy variables on-line and automatically select a retry or an alternative error recovery sequence when errors occur. Task sequencing and task execution with error recovery capability for one and multiple soft components in robotic systems are investigated.

  1. Geographic Patterns of Genetic Variation in a Broadly Distributed Marine Vertebrate: New Insights into Loggerhead Turtle Stock Structure from Expanded Mitochondrial DNA Sequences

    PubMed Central

    Shamblin, Brian M.; Bolten, Alan B.; Abreu-Grobois, F. Alberto; Bjorndal, Karen A.; Cardona, Luis; Carreras, Carlos; Clusa, Marcel; Monzón-Argüello, Catalina; Nairn, Campbell J.; Nielsen, Janne T.; Nel, Ronel; Soares, Luciano S.; Stewart, Kelly R.; Vilaça, Sibelle T.; Türkozan, Oguz; Yilmaz, Can; Dutton, Peter H.

    2014-01-01

    Previous genetic studies have demonstrated that natal homing shapes the stock structure of marine turtle nesting populations. However, widespread sharing of common haplotypes based on short segments of the mitochondrial control region often limits resolution of the demographic connectivity of populations. Recent studies employing longer control region sequences to resolve haplotype sharing have focused on regional assessments of genetic structure and phylogeography. Here we synthesize available control region sequences for loggerhead turtles from the Mediterranean Sea, Atlantic, and western Indian Ocean basins. These data represent six of the nine globally significant regional management units (RMUs) for the species and include novel sequence data from Brazil, Cape Verde, South Africa and Oman. Genetic tests of differentiation among 42 rookeries represented by short sequences (380 bp haplotypes from 3,486 samples) and 40 rookeries represented by long sequences (∼800 bp haplotypes from 3,434 samples) supported the distinction of the six RMUs analyzed as well as recognition of at least 18 demographically independent management units (MUs) with respect to female natal homing. A total of 59 haplotypes were resolved. These haplotypes belonged to two highly divergent global lineages, with haplogroup I represented primarily by CC-A1, CC-A4, and CC-A11 variants and haplogroup II represented by CC-A2 and derived variants. Geographic distribution patterns of haplogroup II haplotypes and the nested position of CC-A11.6 from Oman among the Atlantic haplotypes invoke recent colonization of the Indian Ocean from the Atlantic for both global lineages. The haplotypes we confirmed for western Indian Ocean RMUs allow reinterpretation of previous mixed stock analysis and further suggest that contemporary migratory connectivity between the Indian and Atlantic Oceans occurs on a broader scale than previously hypothesized. This study represents a valuable model for conducting comprehensive international cooperative data management and research in marine ecology. PMID:24465810

  2. Joint deep shape and appearance learning: application to optic pathway glioma segmentation

    NASA Astrophysics Data System (ADS)

    Mansoor, Awais; Li, Ien; Packer, Roger J.; Avery, Robert A.; Linguraru, Marius George

    2017-03-01

    Automated tissue characterization is one of the major applications of computer-aided diagnosis systems. Deep learning techniques have recently demonstrated impressive performance for the image patch-based tissue characterization. However, existing patch-based tissue classification techniques struggle to exploit the useful shape information. Local and global shape knowledge such as the regional boundary changes, diameter, and volumetrics can be useful in classifying the tissues especially in scenarios where the appearance signature does not provide significant classification information. In this work, we present a deep neural network-based method for the automated segmentation of the tumors referred to as optic pathway gliomas (OPG) located within the anterior visual pathway (AVP; optic nerve, chiasm or tracts) using joint shape and appearance learning. Voxel intensity values of commonly used MRI sequences are generally not indicative of OPG. To be considered an OPG, current clinical practice dictates that some portion of AVP must demonstrate shape enlargement. The method proposed in this work integrates multiple sequence magnetic resonance image (T1, T2, and FLAIR) along with local boundary changes to train a deep neural network. For training and evaluation purposes, we used a dataset of multiple sequence MRI obtained from 20 subjects (10 controls, 10 NF1+OPG). To our best knowledge, this is the first deep representation learning-based approach designed to merge shape and multi-channel appearance data for the glioma detection. In our experiments, mean misclassification errors of 2:39% and 0:48% were observed respectively for glioma and control patches extracted from the AVP. Moreover, an overall dice similarity coefficient of 0:87+/-0:13 (0:93+/-0:06 for healthy tissue, 0:78+/-0:18 for glioma tissue) demonstrates the potential of the proposed method in the accurate localization and early detection of OPG.

  3. Contextual cueing in naturalistic scenes: Global and local contexts.

    PubMed

    Brockmole, James R; Castelhano, Monica S; Henderson, John M

    2006-07-01

    In contextual cueing, the position of a target within a group of distractors is learned over repeated exposure to a display with reference to a few nearby items rather than to the global pattern created by the elements. The authors contrasted the role of global and local contexts for contextual cueing in naturalistic scenes. Experiment 1 showed that learned target positions transfer when local information is altered but not when global information is changed. Experiment 2 showed that scene-target covariation is learned more slowly when local, but not global, information is repeated across trials than when global but not local information is repeated. Thus, in naturalistic scenes, observers are biased to associate target locations with global contexts. Copyright 2006 APA, all rights reserved.

  4. Providing Global Change Information for Decision-Making: Capturing and Presenting Provenance

    NASA Technical Reports Server (NTRS)

    Ma, Xiaogang; Fox, Peter; Tilmes, Curt; Jacobs, Katherine; Waple, Anne

    2014-01-01

    Global change information demands access to data sources and well-documented provenance to provide evidence needed to build confidence in scientific conclusions and, in specific applications, to ensure the information's suitability for use in decision-making. A new generation of Web technology, the Semantic Web, provides tools for that purpose. The topic of global change covers changes in the global environment (including alterations in climate, land productivity, oceans or other water resources, atmospheric composition and or chemistry, and ecological systems) that may alter the capacity of the Earth to sustain life and support human systems. Data and findings associated with global change research are of great public, government, and academic concern and are used in policy and decision-making, which makes the provenance of global change information especially important. In addition, since different types of decisions benefit from different types of information, understanding how to capture and present the provenance of global change information is becoming more of an imperative in adaptive planning.

  5. Information Avoidance Tendencies, Threat Management Resources, and Interest in Genetic Sequencing Feedback.

    PubMed

    Taber, Jennifer M; Klein, William M P; Ferrer, Rebecca A; Lewis, Katie L; Harris, Peter R; Shepperd, James A; Biesecker, Leslie G

    2015-08-01

    Information avoidance is a defensive strategy that undermines receipt of potentially beneficial but threatening health information and may especially occur when threat management resources are unavailable. We examined whether individual differences in information avoidance predicted intentions to receive genetic sequencing results for preventable and unpreventable (i.e., more threatening) disease and, secondarily, whether threat management resources of self-affirmation or optimism mitigated any effects. Participants (N = 493) in an NIH study (ClinSeq®) piloting the use of genome sequencing reported intentions to receive (optional) sequencing results and completed individual difference measures of information avoidance, self-affirmation, and optimism. Information avoidance tendencies corresponded with lower intentions to learn results, particularly for unpreventable diseases. The association was weaker among individuals higher in self-affirmation or optimism, but only for results regarding preventable diseases. Information avoidance tendencies may influence decisions to receive threatening health information; threat management resources hold promise for mitigating this association.

  6. Analysis of consequences of non-synonymous SNP in feed conversion ratio associated TGF-β receptor type 3 gene in chicken.

    PubMed

    Rasal, Kiran D; Shah, Tejas M; Vaidya, Megha; Jakhesara, Subhash J; Joshi, Chaitanya G

    2015-06-01

    The recent advances in high throughput sequencing technology accelerate possible ways for the study of genome wide variation in several organisms and associated consequences. In the present study, mutations in TGFBR3 showing significant association with FCR trait in chicken during exome sequencing were further analyzed. Out of four SNPs, one nsSNP p.Val451Leu was found in the coding region of TGFBR3. In silico tools such as SnpSift and PANTHER predicted it as deleterious (0.04) and to be tolerated, respectively, while I-Mutant revealed that protein stability decreased. The TGFBR3 I-TASSER model has a C-score of 0.85, which was validated using PROCHECK. Based on MD simulation, mutant protein structure deviated from native with RMSD 0.08 Å due to change in the H-bonding distances of mutant residue. The docking of TGFBR3 with interacting TGFBR2 inferred that mutant required more global energy. Therefore, the present study will provide useful information about functional SNPs that have an impact on FCR traits.

  7. Infrared interferometric observations of nearby exozodiacal disks: current status and perspectives

    NASA Astrophysics Data System (ADS)

    Defrère, D.; Absil, O.; di Folco, E.; Coudé du Foresto, V.; Mérand, A.; Augereau, J.-C.

    2010-10-01

    Directly detecting exozodiacal dust in the inner part of extrasolar planetary systems is nowadays feasible thanks to the advance of high-precision near-infrared interferometry. Investigating this region around nearby stars provides unique information to understand the global architecture of planetary systems and to define the population of stars suitable for future exo-Earth characterization missions. Over the last few years, a survey of nearby main-sequence stars has been ongoing at the CHARA array using the FLUOR beam combiner. The goal of this survey is to directly probe the inner part of circumstellar disks in order to detect the signature of hot dust accounting for about 1% of the near-infrared stellar flux. In this paper, we present the status of this survey and provide the first statistical results about the occurrence of bright exozodiacal disks around nearby main-sequence stars. We also report on the first H-band interferometric observations of the exozodiacal disk around Vega which have been obtained with IOTA/IONIC, and discuss the implications on the disk properties.

  8. Worldwide Distribution of Cytochrome P450 Alleles: A Meta-analysis of Population-scale Sequencing Projects.

    PubMed

    Zhou, Y; Ingelman-Sundberg, M; Lauschke, V M

    2017-10-01

    Genetic polymorphisms in cytochrome P450 (CYP) genes can result in altered metabolic activity toward a plethora of clinically important medications. Thus, single nucleotide variants and copy number variations in CYP genes are major determinants of drug pharmacokinetics and toxicity and constitute pharmacogenetic biomarkers for drug dosing, efficacy, and safety. Strikingly, the distribution of CYP alleles differs considerably between populations with important implications for personalized drug therapy and healthcare programs. To provide a global distribution map of CYP alleles with clinical importance, we integrated whole-genome and exome sequencing data from 56,945 unrelated individuals of five major human populations. By combining this dataset with population-specific linkage information, we derive the frequencies of 176 CYP haplotypes, providing an extensive resource for major genetic determinants of drug metabolism. Furthermore, we aggregated this dataset into spectra of predicted functional variability in the respective populations and discuss the implications for population-adjusted pharmacological treatment strategies. © 2017 The Authors Clinical Pharmacology & Therapeutics published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.

  9. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas

    We present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a Metagenome-Assembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Gene Sequencemore » (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.« less

  10. Unique features of a global human ectoparasite identified through sequencing of the bed bug genome

    USDA-ARS?s Scientific Manuscript database

    The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the last two decades. This global resurgence is likely linked to increased international travel and commerce and widespread insecticide resistance. Analyses of the C. le...

  11. Students' Communication, Argumentation and Knowledge in a Citizens' Conference on Global Warming

    ERIC Educational Resources Information Center

    Albe, Virginie; Gombert, Marie-Jose

    2012-01-01

    An empirical study on 12th-grade students' engagement on a global warming debate as a citizens' conference is reported. Within the design-based research methodology, an interdisciplinary teaching sequence integrating an initiation to non-violent communication was developed. Students' debates were analyzed according to three dimensions:…

  12. Genome of the Asian longhorned beetle, Anoplophora glabripennis), a globally significant invasive species, reveals key functional and evolutionary innovations at the beetle-plant interface

    USDA-ARS?s Scientific Manuscript database

    The Asian longhorned beetle (Anoplophora glabripennis; AGLAB) is a globally significant invasive species capable of inflicting severe feeding damage on many important orchard, ornamental and forest trees. Genome sequencing, annotation, gene expression assays, and functional and comparative genomic s...

  13. Assessing Global Awareness over Short-Term Study Abroad Sequence: A Factor Analysis

    ERIC Educational Resources Information Center

    Kurt, Mark R.; Olitsky, Neal H.; Geis, Paul

    2013-01-01

    Academic study abroad programs are uniquely equipped to give students the opportunities to achieve outcomes for global citizenship (Langran, Langran, and Ozment 2009). These programs take students outside the confines of their home institutions and expose students to new cultures and languages while integrating academic content to enhance the…

  14. Contextualizing the Intermediate Financial Accounting Courses in the Global Financial Crisis

    ERIC Educational Resources Information Center

    Bloom, Robert; Webinger, Mariah

    2011-01-01

    This paper represents an attempt to incorporate concepts and issues stemming from the global financial crisis (GFC) into the typical Intermediate Accounting, two-course sequence as taught in North American colleges and universities. The teaching approach which the authors advocate embeds the GFC throughout these courses. The main expected outcome…

  15. Diversity and Genome Analysis of Australian and Global Oilseed Brassica napus L. Germplasm Using Transcriptomics and Whole Genome Re-sequencing.

    PubMed

    Malmberg, M Michelle; Shi, Fan; Spangenberg, German C; Daetwyler, Hans D; Cogan, Noel O I

    2018-01-01

    Intensive breeding of Brassica napus has resulted in relatively low diversity, such that B. napus would benefit from germplasm improvement schemes that sustain diversity. As such, samples representative of global germplasm pools need to be assessed for existing population structure, diversity and linkage disequilibrium (LD). Complexity reduction genotyping-by-sequencing (GBS) methods, including GBS-transcriptomics (GBS-t), enable cost-effective screening of a large number of samples, while whole genome re-sequencing (WGR) delivers the ability to generate large numbers of unbiased genomic single nucleotide polymorphisms (SNPs), and identify structural variants (SVs). Furthermore, the development of genomic tools based on whole genomes representative of global oilseed diversity and orientated by the reference genome has substantial industry relevance and will be highly beneficial for canola breeding. As recent studies have focused on European and Chinese varieties, a global diversity panel as well as a substantial number of Australian spring types were included in this study. Focusing on industry relevance, 633 varieties were initially genotyped using GBS-t to examine population structure using 61,037 SNPs. Subsequently, 149 samples representative of global diversity were selected for WGR and both data sets used for a side-by-side evaluation of diversity and LD. The WGR data was further used to develop genomic resources consisting of a list of 4,029,750 high-confidence SNPs annotated using SnpEff, and SVs in the form of 10,976 deletions and 2,556 insertions. These resources form the basis of a reliable and repeatable system allowing greater integration between canola genomics studies, with a strong focus on breeding germplasm and industry applicability.

  16. International Perspectives in LIS Education: Global Education, Research, and Collaboration at the SJSU School of Information

    ERIC Educational Resources Information Center

    Hirsh, Sandra; Simmons, Michelle Holschuh; Christensen, Paul; Sellar, Melanie; Stenström, Cheryl; Hagar, Christine; Bernier, Anthony; Faires, Debbie; Fisher, Jane; Alman, Susan

    2015-01-01

    The IFLA Trend Report identified five trends that will impact the information environment (IFLA, 2015), such as access to information with new technologies, online education for global learning, hyper-connected communities, and the global information environment. The faculty at San José State University (SJSU) School of Information (iSchool) is…

  17. Effects of Sequences of Cognitions on Group Performance Over Time

    PubMed Central

    Molenaar, Inge; Chiu, Ming Ming

    2017-01-01

    Extending past research showing that sequences of low cognitions (low-level processing of information) and high cognitions (high-level processing of information through questions and elaborations) influence the likelihoods of subsequent high and low cognitions, this study examines whether sequences of cognitions are related to group performance over time; 54 primary school students (18 triads) discussed and wrote an essay about living in another country (32,375 turns of talk). Content analysis and statistical discourse analysis showed that within each lesson, groups with more low cognitions or more sequences of low cognition followed by high cognition added more essay words. Groups with more high cognitions, sequences of low cognition followed by low cognition, or sequences of high cognition followed by an action followed by low cognition, showed different words and sequences, suggestive of new ideas. The links between cognition sequences and group performance over time can inform facilitation and assessment of student discussions. PMID:28490854

  18. Effects of Sequences of Cognitions on Group Performance Over Time.

    PubMed

    Molenaar, Inge; Chiu, Ming Ming

    2017-04-01

    Extending past research showing that sequences of low cognitions (low-level processing of information) and high cognitions (high-level processing of information through questions and elaborations) influence the likelihoods of subsequent high and low cognitions, this study examines whether sequences of cognitions are related to group performance over time; 54 primary school students (18 triads) discussed and wrote an essay about living in another country (32,375 turns of talk). Content analysis and statistical discourse analysis showed that within each lesson, groups with more low cognitions or more sequences of low cognition followed by high cognition added more essay words. Groups with more high cognitions, sequences of low cognition followed by low cognition, or sequences of high cognition followed by an action followed by low cognition, showed different words and sequences, suggestive of new ideas. The links between cognition sequences and group performance over time can inform facilitation and assessment of student discussions.

  19. Discovery and information-theoretic characterization of transcription factor binding sites that act cooperatively.

    PubMed

    Clifford, Jacob; Adami, Christoph

    2015-09-02

    Transcription factor binding to the surface of DNA regulatory regions is one of the primary causes of regulating gene expression levels. A probabilistic approach to model protein-DNA interactions at the sequence level is through position weight matrices (PWMs) that estimate the joint probability of a DNA binding site sequence by assuming positional independence within the DNA sequence. Here we construct conditional PWMs that depend on the motif signatures in the flanking DNA sequence, by conditioning known binding site loci on the presence or absence of additional binding sites in the flanking sequence of each site's locus. Pooling known sites with similar flanking sequence patterns allows for the estimation of the conditional distribution function over the binding site sequences. We apply our model to the Dorsal transcription factor binding sites active in patterning the Dorsal-Ventral axis of Drosophila development. We find that those binding sites that cooperate with nearby Twist sites on average contain about 0.5 bits of information about the presence of Twist transcription factor binding sites in the flanking sequence. We also find that Dorsal binding site detectors conditioned on flanking sequence information make better predictions about what is a Dorsal site relative to background DNA than detection without information about flanking sequence features.

  20. Optimal network alignment with graphlet degree vectors.

    PubMed

    Milenković, Tijana; Ng, Weng Leong; Hayes, Wayne; Przulj, Natasa

    2010-06-30

    Important biological information is encoded in the topology of biological networks. Comparative analyses of biological networks are proving to be valuable, as they can lead to transfer of knowledge between species and give deeper insights into biological function, disease, and evolution. We introduce a new method that uses the Hungarian algorithm to produce optimal global alignment between two networks using any cost function. We design a cost function based solely on network topology and use it in our network alignment. Our method can be applied to any two networks, not just biological ones, since it is based only on network topology. We use our new method to align protein-protein interaction networks of two eukaryotic species and demonstrate that our alignment exposes large and topologically complex regions of network similarity. At the same time, our alignment is biologically valid, since many of the aligned protein pairs perform the same biological function. From the alignment, we predict function of yet unannotated proteins, many of which we validate in the literature. Also, we apply our method to find topological similarities between metabolic networks of different species and build phylogenetic trees based on our network alignment score. The phylogenetic trees obtained in this way bear a striking resemblance to the ones obtained by sequence alignments. Our method detects topologically similar regions in large networks that are statistically significant. It does this independent of protein sequence or any other information external to network topology.

  1. Application of wavelet analysis in determining the periodicity of global warming

    NASA Astrophysics Data System (ADS)

    Feng, Xiao

    2018-04-01

    In the last two decades of the last century, the global average temperature has risen by 0.48 ° C over 100 years ago. Since then, global warming has become a hot topic. Global warming will have complex and potential impacts on humans and the Earth. However, the negative impacts far outweigh the positive impacts. The most obvious external manifestation of global warming is temperature. Therefore, this study uses wavelet analysis study the characteristics of temperature time series, solve the periodicity of the sequence, find out the trend of temperature change and predict the extent of global warming in the future, so as to take the necessary precautionary measures.

  2. L-GRAAL: Lagrangian graphlet-based network aligner.

    PubMed

    Malod-Dognin, Noël; Pržulj, Nataša

    2015-07-01

    Discovering and understanding patterns in networks of protein-protein interactions (PPIs) is a central problem in systems biology. Alignments between these networks aid functional understanding as they uncover important information, such as evolutionary conserved pathways, protein complexes and functional orthologs. A few methods have been proposed for global PPI network alignments, but because of NP-completeness of underlying sub-graph isomorphism problem, producing topologically and biologically accurate alignments remains a challenge. We introduce a novel global network alignment tool, Lagrangian GRAphlet-based ALigner (L-GRAAL), which directly optimizes both the protein and the interaction functional conservations, using a novel alignment search heuristic based on integer programming and Lagrangian relaxation. We compare L-GRAAL with the state-of-the-art network aligners on the largest available PPI networks from BioGRID and observe that L-GRAAL uncovers the largest common sub-graphs between the networks, as measured by edge-correctness and symmetric sub-structures scores, which allow transferring more functional information across networks. We assess the biological quality of the protein mappings using the semantic similarity of their Gene Ontology annotations and observe that L-GRAAL best uncovers functionally conserved proteins. Furthermore, we introduce for the first time a measure of the semantic similarity of the mapped interactions and show that L-GRAAL also uncovers best functionally conserved interactions. In addition, we illustrate on the PPI networks of baker's yeast and human the ability of L-GRAAL to predict new PPIs. Finally, L-GRAAL's results are the first to show that topological information is more important than sequence information for uncovering functionally conserved interactions. L-GRAAL is coded in C++. Software is available at: http://bio-nets.doc.ic.ac.uk/L-GRAAL/. n.malod-dognin@imperial.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  3. TRAP: automated classification, quantification and annotation of tandemly repeated sequences.

    PubMed

    Sobreira, Tiago José P; Durham, Alan M; Gruber, Arthur

    2006-02-01

    TRAP, the Tandem Repeats Analysis Program, is a Perl program that provides a unified set of analyses for the selection, classification, quantification and automated annotation of tandemly repeated sequences. TRAP uses the results of the Tandem Repeats Finder program to perform a global analysis of the satellite content of DNA sequences, permitting researchers to easily assess the tandem repeat content for both individual sequences and whole genomes. The results can be generated in convenient formats such as HTML and comma-separated values. TRAP can also be used to automatically generate annotation data in the format of feature table and GFF files.

  4. TANGLE: Two-Level Support Vector Regression Approach for Protein Backbone Torsion Angle Prediction from Primary Sequences

    PubMed Central

    Song, Jiangning; Tan, Hao; Wang, Mingjun; Webb, Geoffrey I.; Akutsu, Tatsuya

    2012-01-01

    Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/. PMID:22319565

  5. Defining the Estimated Core Genome of Bacterial Populations Using a Bayesian Decision Model

    PubMed Central

    van Tonder, Andries J.; Mistry, Shilan; Bray, James E.; Hill, Dorothea M. C.; Cody, Alison J.; Farmer, Chris L.; Klugman, Keith P.; von Gottberg, Anne; Bentley, Stephen D.; Parkhill, Julian; Jolley, Keith A.; Maiden, Martin C. J.; Brueggemann, Angela B.

    2014-01-01

    The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance. PMID:25144616

  6. Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping

    PubMed Central

    2011-01-01

    Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome sequence of a single Holstein Friesian bull with data from single nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) array technologies to determine a comprehensive spectrum of genomic variation. The performance of resequencing SNP detection was assessed by combining SNPs that were identified to be either in identity by descent (IBD) or in copy number variation (CNV) with results from SNP array genotyping. Coding insertions and deletions (indels) were found to be enriched for size in multiples of 3 and were located near the N- and C-termini of proteins. For larger indels, a combination of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation in an individual bovine genome and demonstrate that structural variation surpasses sequence variation as the main component of genomic variability. Better accuracy of SNP detection was achieved with little loss of sensitivity when algorithms that implemented mapping quality were used. IBD regions were found to be instrumental for calculating resequencing SNP accuracy, while SNP detection within CNVs tended to be less reliable. CNV discovery was affected dramatically by platform resolution and coverage biases. The combined data for this study showed that at a moderate level of sequencing coverage, an ensemble of platforms and tools can be applied together to maximize the accurate detection of sequence and structural variants. PMID:22082336

  7. New Insights into the Function and Global Distribution of Polyethylene Terephthalate (PET)-Degrading Bacteria and Enzymes in Marine and Terrestrial Metagenomes.

    PubMed

    Danso, Dominik; Schmeisser, Christel; Chow, Jennifer; Zimmermann, Wolfgang; Wei, Ren; Leggewie, Christian; Li, Xiangzhen; Hazen, Terry; Streit, Wolfgang R

    2018-04-15

    Polyethylene terephthalate (PET) is one of the most important synthetic polymers used today. Unfortunately, the polymers accumulate in nature and to date no highly active enzymes are known that can degrade it at high velocity. Enzymes involved in PET degradation are mainly α- and β-hydrolases, like cutinases and related enzymes (EC 3.1.1). Currently, only a small number of such enzymes are well characterized. In this work, a search algorithm was developed that identified 504 possible PET hydrolase candidate genes from various databases. A further global search that comprised more than 16 Gb of sequence information within 108 marine and 25 terrestrial metagenomes obtained from the Integrated Microbial Genome (IMG) database detected 349 putative PET hydrolases. Heterologous expression of four such candidate enzymes verified the function of these enzymes and confirmed the usefulness of the developed search algorithm. In this way, two novel and thermostable enzymes with high potential for downstream application were partially characterized. Clustering of 504 novel enzyme candidates based on amino acid similarities indicated that PET hydrolases mainly occur in the phyla of Actinobacteria , Proteobacteria , and Bacteroidetes Within the Proteobacteria , the Betaproteobacteria , Deltaproteobacteria , and Gammaproteobacteria were the main hosts. Remarkably enough, in the marine environment, bacteria affiliated with the phylum Bacteroidetes appear to be the main hosts of PET hydrolase genes, rather than Actinobacteria or Proteobacteria , as observed for the terrestrial metagenomes. Our data further imply that PET hydrolases are truly rare enzymes. The highest occurrence of 1.5 hits/Mb was observed in sequences from a sample site containing crude oil. IMPORTANCE Polyethylene terephthalate (PET) accumulates in our environment without significant microbial conversion. Although a few PET hydrolases are already known, it is still unknown how frequently they appear and with which main bacterial phyla they are affiliated. In this study, deep sequence mining of protein databases and metagenomes demonstrated that PET hydrolases indeed occur at very low frequencies in the environment. Furthermore, it was possible to link them to phyla that were previously not known to harbor such enzymes. This work contributes novel knowledge on the phylogenetic relationships, the recent evolution, and the global distribution of PET hydrolases. Finally, we describe the biochemical traits of four novel PET hydrolases. Copyright © 2018 Danso et al.

  8. New Insights into the Function and Global Distribution of Polyethylene Terephthalate (PET)-Degrading Bacteria and Enzymes in Marine and Terrestrial Metagenomes

    PubMed Central

    Danso, Dominik; Schmeisser, Christel; Chow, Jennifer; Wei, Ren; Leggewie, Christian; Li, Xiangzhen

    2018-01-01

    ABSTRACT Polyethylene terephthalate (PET) is one of the most important synthetic polymers used today. Unfortunately, the polymers accumulate in nature and to date no highly active enzymes are known that can degrade it at high velocity. Enzymes involved in PET degradation are mainly α- and β-hydrolases, like cutinases and related enzymes (EC 3.1.1). Currently, only a small number of such enzymes are well characterized. In this work, a search algorithm was developed that identified 504 possible PET hydrolase candidate genes from various databases. A further global search that comprised more than 16 Gb of sequence information within 108 marine and 25 terrestrial metagenomes obtained from the Integrated Microbial Genome (IMG) database detected 349 putative PET hydrolases. Heterologous expression of four such candidate enzymes verified the function of these enzymes and confirmed the usefulness of the developed search algorithm. In this way, two novel and thermostable enzymes with high potential for downstream application were partially characterized. Clustering of 504 novel enzyme candidates based on amino acid similarities indicated that PET hydrolases mainly occur in the phyla of Actinobacteria, Proteobacteria, and Bacteroidetes. Within the Proteobacteria, the Betaproteobacteria, Deltaproteobacteria, and Gammaproteobacteria were the main hosts. Remarkably enough, in the marine environment, bacteria affiliated with the phylum Bacteroidetes appear to be the main hosts of PET hydrolase genes, rather than Actinobacteria or Proteobacteria, as observed for the terrestrial metagenomes. Our data further imply that PET hydrolases are truly rare enzymes. The highest occurrence of 1.5 hits/Mb was observed in sequences from a sample site containing crude oil. IMPORTANCE Polyethylene terephthalate (PET) accumulates in our environment without significant microbial conversion. Although a few PET hydrolases are already known, it is still unknown how frequently they appear and with which main bacterial phyla they are affiliated. In this study, deep sequence mining of protein databases and metagenomes demonstrated that PET hydrolases indeed occur at very low frequencies in the environment. Furthermore, it was possible to link them to phyla that were previously not known to harbor such enzymes. This work contributes novel knowledge on the phylogenetic relationships, the recent evolution, and the global distribution of PET hydrolases. Finally, we describe the biochemical traits of four novel PET hydrolases. PMID:29427431

  9. [Hepatitis B virus genotype E infection in Turkey: the detection of the first case].

    PubMed

    Sayan, Murat; Sanlıdağ, Tamer; Akçalı, Sinem; Arıkan, Ayşe

    2014-10-01

    Hepatitis B virus (HBV) infection is a global major health problem. Currently, 10 genotypes (A-J) of hepatitis B virus (HBV) are identified based on the nucleic acid sequence heterogeneity, and these genotypes have been shown to have distinct geographic distribution. Reports of the previous studies indicated that the genotype D is the predominant type among hepatitis B patients in different regions of Turkey. However, recent studies indicated that other HBV genotypes are also seen with an increasing rate. Although epidemiological and clinical information on genotype E infection is currently limited, it is known that genotype E infection is common in West and Central Africa. In this report, the first case of HBV genotype E infection in Turkey was presented. A 22-year-old Nigerian male employee who resided in Manisa for five years was admitted to Celal Bayar University Hospital Manisa, Turkey, for his routine check-up. Since HBsAg was found positive, other HBV markers were tested with a repeated serum sample. Laboratory findings were as follows; HBsAg (+), anti-HBs (-), HBeAg (-), anti-HBe (+), anti-HBc (+), anti-HCV (-), anti-HIV (-), ALT: 44 U/L and AST: 45 U/L. HBV-DNA level was detected as 700 IU/ml by real-time PCR (Artus HBV QS RGQ Qiagen, Germany). HBV-DNA isolated from the serum sample of the patient was amplified by PCR and polymerase gene segment of HBV was directly sequenced. UPGMA method was used for phylogenetic analysis and Inno-LIPA HBV genotyping method (Innogenetics, Belgium) was performed to determine multiple HBV genotype infection. On the basis of those methods the genotype of the virus was identified as genotype E. The partial sequences of the HBV polymerase gene were loaded to the international DNA data bank (GenBank) for contribution to the global HBV surveillance. This report emphasized that besides genotype D the other HBV genotypes could be found in Turkey. Since the patient was an inactive HBsAg carrier before his residence in Turkey, this case was regarded as an imported HBV genotype E case. In conclusion, detection of different HBV genotypes, their epidemiology and molecular characteristics are important for both national and global HBV surveillance and better clinical approach.

  10. Large-scale parallel 454 sequencing reveals host ecological group specificity of arbuscular mycorrhizal fungi in a boreonemoral forest.

    PubMed

    Opik, M; Metsis, M; Daniell, T J; Zobel, M; Moora, M

    2009-10-01

    * Knowledge of the diversity of arbuscular mycorrhizal fungi (AMF) in natural ecosystems is a major bottleneck in mycorrhizal ecology. Here, we aimed to apply 454 sequencing--providing a new level of descriptive power--to assess the AMF diversity in a boreonemoral forest. * 454 sequencing reads of the small subunit ribosomal RNA (SSU rRNA) gene of Glomeromycota were assigned to sequence groups by blast searches against a custom-made annotated sequence database. * We detected 47 AMF taxa in the roots of 10 plant species in a 10 x 10 m plot, which is almost the same as the number of plant species in the whole studied forest. There was a significant difference between AMF communities in the roots of forest specialist plant species and in the roots of habitat generalist plant species. Forest plant species hosted 22 specialist AMF taxa, and the generalist plants shared all but one AMF taxon with forest plants, including globally distributed generalist fungi. These AMF taxa that have been globally recorded only in forest ecosystems were significantly over-represented in the roots of forest plant species. * Our findings suggest that partner specificity in AM symbiosis may occur at the level of ecological groups, rather than at the species level, of both plant and fungal partners.

  11. The Global Invertebrate Genomics Alliance (GIGA): Developing Community Resources to Study Diverse Invertebrate Genomes

    PubMed Central

    2014-01-01

    Over 95% of all metazoan (animal) species comprise the “invertebrates,” but very few genomes from these organisms have been sequenced. We have, therefore, formed a “Global Invertebrate Genomics Alliance” (GIGA). Our intent is to build a collaborative network of diverse scientists to tackle major challenges (e.g., species selection, sample collection and storage, sequence assembly, annotation, analytical tools) associated with genome/transcriptome sequencing across a large taxonomic spectrum. We aim to promote standards that will facilitate comparative approaches to invertebrate genomics and collaborations across the international scientific community. Candidate study taxa include species from Porifera, Ctenophora, Cnidaria, Placozoa, Mollusca, Arthropoda, Echinodermata, Annelida, Bryozoa, and Platyhelminthes, among others. GIGA will target 7000 noninsect/nonnematode species, with an emphasis on marine taxa because of the unrivaled phyletic diversity in the oceans. Priorities for selecting invertebrates for sequencing will include, but are not restricted to, their phylogenetic placement; relevance to organismal, ecological, and conservation research; and their importance to fisheries and human health. We highlight benefits of sequencing both whole genomes (DNA) and transcriptomes and also suggest policies for genomic-level data access and sharing based on transparency and inclusiveness. The GIGA Web site (http://giga.nova.edu) has been launched to facilitate this collaborative venture. PMID:24336862

  12. The Global Invertebrate Genomics Alliance (GIGA): developing community resources to study diverse invertebrate genomes.

    PubMed

    Bracken-Grissom, Heather; Collins, Allen G; Collins, Timothy; Crandall, Keith; Distel, Daniel; Dunn, Casey; Giribet, Gonzalo; Haddock, Steven; Knowlton, Nancy; Martindale, Mark; Medina, Mónica; Messing, Charles; O'Brien, Stephen J; Paulay, Gustav; Putnam, Nicolas; Ravasi, Timothy; Rouse, Greg W; Ryan, Joseph F; Schulze, Anja; Wörheide, Gert; Adamska, Maja; Bailly, Xavier; Breinholt, Jesse; Browne, William E; Diaz, M Christina; Evans, Nathaniel; Flot, Jean-François; Fogarty, Nicole; Johnston, Matthew; Kamel, Bishoy; Kawahara, Akito Y; Laberge, Tammy; Lavrov, Dennis; Michonneau, François; Moroz, Leonid L; Oakley, Todd; Osborne, Karen; Pomponi, Shirley A; Rhodes, Adelaide; Santos, Scott R; Satoh, Nori; Thacker, Robert W; Van de Peer, Yves; Voolstra, Christian R; Welch, David Mark; Winston, Judith; Zhou, Xin

    2014-01-01

    Over 95% of all metazoan (animal) species comprise the "invertebrates," but very few genomes from these organisms have been sequenced. We have, therefore, formed a "Global Invertebrate Genomics Alliance" (GIGA). Our intent is to build a collaborative network of diverse scientists to tackle major challenges (e.g., species selection, sample collection and storage, sequence assembly, annotation, analytical tools) associated with genome/transcriptome sequencing across a large taxonomic spectrum. We aim to promote standards that will facilitate comparative approaches to invertebrate genomics and collaborations across the international scientific community. Candidate study taxa include species from Porifera, Ctenophora, Cnidaria, Placozoa, Mollusca, Arthropoda, Echinodermata, Annelida, Bryozoa, and Platyhelminthes, among others. GIGA will target 7000 noninsect/nonnematode species, with an emphasis on marine taxa because of the unrivaled phyletic diversity in the oceans. Priorities for selecting invertebrates for sequencing will include, but are not restricted to, their phylogenetic placement; relevance to organismal, ecological, and conservation research; and their importance to fisheries and human health. We highlight benefits of sequencing both whole genomes (DNA) and transcriptomes and also suggest policies for genomic-level data access and sharing based on transparency and inclusiveness. The GIGA Web site (http://giga.nova.edu) has been launched to facilitate this collaborative venture.

  13. Model-free aftershock forecasts constructed from similar sequences in the past

    NASA Astrophysics Data System (ADS)

    van der Elst, N.; Page, M. T.

    2017-12-01

    The basic premise behind aftershock forecasting is that sequences in the future will be similar to those in the past. Forecast models typically use empirically tuned parametric distributions to approximate past sequences, and project those distributions into the future to make a forecast. While parametric models do a good job of describing average outcomes, they are not explicitly designed to capture the full range of variability between sequences, and can suffer from over-tuning of the parameters. In particular, parametric forecasts may produce a high rate of "surprises" - sequences that land outside the forecast range. Here we present a non-parametric forecast method that cuts out the parametric "middleman" between training data and forecast. The method is based on finding past sequences that are similar to the target sequence, and evaluating their outcomes. We quantify similarity as the Poisson probability that the observed event count in a past sequence reflects the same underlying intensity as the observed event count in the target sequence. Event counts are defined in terms of differential magnitude relative to the mainshock. The forecast is then constructed from the distribution of past sequences outcomes, weighted by their similarity. We compare the similarity forecast with the Reasenberg and Jones (RJ95) method, for a set of 2807 global aftershock sequences of M≥6 mainshocks. We implement a sequence-specific RJ95 forecast using a global average prior and Bayesian updating, but do not propagate epistemic uncertainty. The RJ95 forecast is somewhat more precise than the similarity forecast: 90% of observed sequences fall within a factor of two of the median RJ95 forecast value, whereas the fraction is 85% for the similarity forecast. However, the surprise rate is much higher for the RJ95 forecast; 10% of observed sequences fall in the upper 2.5% of the (Poissonian) forecast range. The surprise rate is less than 3% for the similarity forecast. The similarity forecast may be useful to emergency managers and non-specialists when confidence or expertise in parametric forecasting may be lacking. The method makes over-tuning impossible, and minimizes the rate of surprises. At the least, this forecast constitutes a useful benchmark for more precisely tuned parametric forecasts.

  14. Effort in Multitasking: Local and Global Assessment of Effort.

    PubMed

    Kiesel, Andrea; Dignath, David

    2017-01-01

    When performing multiple tasks in succession, self-organization of task order might be superior compared to external-controlled task schedules, because self-organization allows optimizing processing modes and thus reduces switch costs, and it increases commitment to task goals. However, self-organization is an additional executive control process that is not required if task order is externally specified and as such it is considered as time-consuming and effortful. To compare self-organized and externally controlled task scheduling, we suggest assessing global subjective and objectives measures of effort in addition to local performance measures. In our new experimental approach, we combined characteristics of dual tasking settings and task switching settings and compared local and global measures of effort in a condition with free choice of task sequence and a condition with cued task sequence. In a multi-tasking environment, participants chose the task order while the task requirement of the not-yet-performed task remained the same. This task preview allowed participants to work on the previously non-chosen items in parallel and resulted in faster responses and fewer errors in task switch trials than in task repetition trials. The free-choice group profited more from this task preview than the cued group when considering local performance measures. Nevertheless, the free-choice group invested more effort than the cued group when considering global measures. Thus, self-organization in task scheduling seems to be effortful even in conditions in which it is beneficiary for task processing. In a second experiment, we reduced the possibility of task preview for the not-yet-performed tasks in order to hinder efficient self-organization. Here neither local nor global measures revealed substantial differences between the free-choice and a cued task sequence condition. Based on the results of both experiments, we suggest that global assessment of effort in addition to local performance measures might be a useful tool for multitasking research.

  15. Cerebral responses to local and global auditory novelty under general anesthesia

    PubMed Central

    Uhrig, Lynn; Janssen, David; Dehaene, Stanislas; Jarraya, Béchir

    2017-01-01

    Primate brains can detect a variety of unexpected deviations in auditory sequences. The local-global paradigm dissociates two hierarchical levels of auditory predictive coding by examining the brain responses to first-order (local) and second-order (global) sequence violations. Using the macaque model, we previously demonstrated that, in the awake state, local violations cause focal auditory responses while global violations activate a brain circuit comprising prefrontal, parietal and cingulate cortices. Here we used the same local-global auditory paradigm to clarify the encoding of the hierarchical auditory regularities in anesthetized monkeys and compared their brain responses to those obtained in the awake state as measured with fMRI. Both, propofol, a GABAA-agonist, and ketamine, an NMDA-antagonist, left intact or even enhanced the cortical response to auditory inputs. The local effect vanished during propofol anesthesia and shifted spatially during ketamine anesthesia compared with wakefulness. Under increasing levels of propofol, we observed a progressive disorganization of the global effect in prefrontal, parietal and cingulate cortices and its complete suppression under ketamine anesthesia. Anesthesia also suppressed thalamic activations to the global effect. These results suggest that anesthesia preserves initial auditory processing, but disturbs both short-term and long-term auditory predictive coding mechanisms. The disorganization of auditory novelty processing under anesthesia relates to a loss of thalamic responses to novelty and to a disruption of higher-order functional cortical networks in parietal, prefrontal and cingular cortices. PMID:27502046

  16. LookSeq: a browser-based viewer for deep sequencing data.

    PubMed

    Manske, Heinrich Magnus; Kwiatkowski, Dominic P

    2009-11-01

    Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.

  17. Exploring origins, invasion history and genetic diversity of Imperata cylindrica (L.) P. Beauv. (Cogongrass) in the United States using genotyping by sequencing

    USDA-ARS?s Scientific Manuscript database

    Imperata cylindrica (Cogongrass, Speargrass) is a diploid C4 grass that is a noxious weed in 73 countries and constitutes a significant threat to global biodiversity and sustainable agriculture. We used a cost-effective genotyping-by-sequencing (GBS)approach to identify the reproductive system, gene...

  18. Complete Genome Sequences of Four Isolates of Plutella xylostella Granulovirus.

    PubMed

    Spence, Robert J; Noune, Christopher; Hauxwell, Caroline

    2016-06-30

    Granuloviruses are widespread pathogens of Plutella xylostella L. (diamondback moth) and potential biopesticides for control of this global insect pest. We report the complete genomes of four Plutella xylostella granulovirus isolates from China, Malaysia, and Taiwan exhibiting pairs of noncoding, homologous repeat regions with significant sequence variation but equivalent length. Copyright © 2016 Spence et al.

  19. Global Distribution of Polaromonas Phylotypes - Evidence for a Highly Successful Dispersal Capacity

    PubMed Central

    Darcy, John L.; Lynch, Ryan C.; King, Andrew J.; Robeson, Michael S.; Schmidt, Steven K.

    2011-01-01

    Bacteria from the genus Polaromonas are dominant phylotypes in clone libraries and culture collections from polar and high-elevation environments. Although Polaromonas has been found on six continents, we do not know if the same phylotypes exist in all locations or if they exhibit genetic isolation by distance patterns. To examine their biogeographic distribution, we analyzed all available, long-read 16S rRNA gene sequences of Polaromonas phylotypes from glacial and periglacial environments across the globe. Using genetic isolation by geographic distance analyses, including Mantel tests and Mantel correlograms, we found that Polaromonas phylotypes are globally distributed showing weak isolation by distance patterns at global scales. More focused analyses using discrete, equally sampled distances classes, revealed that only two distance classes (out of 12 total) showed significant spatial structuring. Overall, our analyses show that most Polaromonas phylotypes are truly globally distributed, but that some, as yet unknown, environmental variable may be selecting for unique phylotypes at a minority of our global sites. Analyses of aerobiological and genomic data suggest that Polaromonas phylotypes are globally distributed as dormant cells through high-elevation air currents; Polaromonas phylotypes are common in air and snow samples from high altitudes, and a glacial-ice metagenome and the two sequenced Polaromonas genomes contain the gene hipA, suggesting that Polaromonas can form dormant cells. PMID:21897856

  20. Phylogeny and Haplotype Analysis of Fungi Within the Fusarium incarnatum-equiseti Species Complex.

    PubMed

    Ramdial, H; Latchoo, R K; Hosein, F N; Rampersad, S N

    2017-01-01

    Fusarium spp. are ranked among the top 10 most economically and scientifically important plant-pathogenic fungi in the world and are associated with plant diseases that include fruit decay of a number of crops. Fusarium isolates infecting bell pepper in Trinidad were identified based on sequence comparisons of the translation elongation factor gene (EF-1a) with sequences of Fusarium incarnatum-equiseti species complex (FIESC) verified in the FUSARIUM-ID database. Eighty-two isolates were identified as belonging to one of four phylogenetic species within the subclades FIESC-1, FIESC-15, FIESC-16, and FIESC-26, with the majority of isolates belonging to FIESC-15. A comparison of the level of DNA polymorphism and phylogenetic inference for sequences of the internal transcribed spacer region (ITS1-5.8S-ITS2) and EF-1a sequences for Trinidad and FUSARIUM-ID type species was carried out. The ITS sequences were less informative, had lower haplotype diversity and restricted haplotype distribution, and resulted in poor resolution and taxa placement in the consensus maximum-likelihood tree. EF-1a sequences enabled strongly supported phylogenetic inference with highly resolved branching patterns of the 30 phylogenetic species within the FIESC and placement of representative Trinidad isolates. Therefore, global phylogeny was inferred from EF-1a sequences representing 11 countries, and separation into distinct Incarnatum and Equiseti clades was again evident. In total, 42 haplotypes were identified: 12 were shared and the remaining were unique haplotypes. The most diverse haplotype was represented by sequences from China, Indonesia, Malaysia, and Trinidad and consisted exclusively of F. incarnatum isolates. Spain had the highest haplotype diversity, perhaps because both F. equiseti and F. incarnatum sequences were represented; followed by the United States, which contributed both F. equiseti and F. incarnatum sequences to the data set; then by countries representing Southeast Asia (China, Indonesia, Malaysia, Thailand, and Philippines) and Trinidad; both of these regions were represented by only F. incarnatum sequences. Trinidad shared two haplotypes with China and one haplotype with the United States for only F. incarnatum isolates. The findings of this study are important for devising disease management strategies and for understanding the phylogenetic relationships among members of the FIESC.

  1. Using Local States To Drive the Sampling of Global Conformations in Proteins

    PubMed Central

    2016-01-01

    Conformational changes associated with protein function often occur beyond the time scale currently accessible to unbiased molecular dynamics (MD) simulations, so that different approaches have been developed to accelerate their sampling. Here we investigate how the knowledge of backbone conformations preferentially adopted by protein fragments, as contained in precalculated libraries known as structural alphabets (SA), can be used to explore the landscape of protein conformations in MD simulations. We find that (a) enhancing the sampling of native local states in both metadynamics and steered MD simulations allows the recovery of global folded states in small proteins; (b) folded states can still be recovered when the amount of information on the native local states is reduced by using a low-resolution version of the SA, where states are clustered into macrostates; and (c) sequences of SA states derived from collections of structural motifs can be used to sample alternative conformations of preselected protein regions. The present findings have potential impact on several applications, ranging from protein model refinement to protein folding and design. PMID:26808351

  2. A global experimental dataset for assessing grain legume production

    PubMed Central

    Cernay, Charles; Pelzer, Elise; Makowski, David

    2016-01-01

    Grain legume crops are a significant component of the human diet and animal feed and have an important role in the environment, but the global diversity of agricultural legume species is currently underexploited. Experimental assessments of grain legume performances are required, to identify potential species with high yields. Here, we introduce a dataset including results of field experiments published in 173 articles. The selected experiments were carried out over five continents on 39 grain legume species. The dataset includes measurements of grain yield, aerial biomass, crop nitrogen content, residual soil nitrogen content and water use. When available, yields for cereals and oilseeds grown after grain legumes in the crop sequence are also included. The dataset is arranged into a relational database with nine structured tables and 198 standardized attributes. Tillage, fertilization, pest and irrigation management are systematically recorded for each of the 8,581 crop*field site*growing season*treatment combinations. The dataset is freely reusable and easy to update. We anticipate that it will provide valuable information for assessing grain legume production worldwide. PMID:27676125

  3. Gene Editing in Humans: Towards a Global and Inclusive Debate for Responsible Research


    PubMed Central

    de Lecuona, Itziar; Casado, María; Marfany, Gemma; Lopez Baroni, Manuel; Escarrabill, Mar

    2017-01-01

    In December 2016, the Opinion Group of the Bioethics and Law Observatory (OBD) of the University of Barcelona launched a Declaration on Bioethics and Gene Editing in Humans analyzing the use of genome editing techniques and their social, ethical, and legal implications through a multidisciplinary approach. It focuses on CRISPR/Cas9, a genome modification technique that enables researchers to edit specific sections of the DNA sequence of humans and other living beings. This technique has generated expectations and worries that deserve an interdisciplinary analysis and an informed social debate. The research work developed by the OBD presents a set of recommendations addressed to different stakeholders and aims at being a tool to learn more about CRISPR/Cas9 while finding an appropriate ethical and legal framework for this new technology. This article gathers and compares reports that have been published in Europe and the USA since the OBD Declaration. It aims at being a tool to foster a global and interdisciplinary discussion of this new genome editing technology. PMID:29259532

  4. Integrating common and rare genetic variation in diverse human populations.

    PubMed

    Altshuler, David M; Gibbs, Richard A; Peltonen, Leena; Altshuler, David M; Gibbs, Richard A; Peltonen, Leena; Dermitzakis, Emmanouil; Schaffner, Stephen F; Yu, Fuli; Peltonen, Leena; Dermitzakis, Emmanouil; Bonnen, Penelope E; Altshuler, David M; Gibbs, Richard A; de Bakker, Paul I W; Deloukas, Panos; Gabriel, Stacey B; Gwilliam, Rhian; Hunt, Sarah; Inouye, Michael; Jia, Xiaoming; Palotie, Aarno; Parkin, Melissa; Whittaker, Pamela; Yu, Fuli; Chang, Kyle; Hawes, Alicia; Lewis, Lora R; Ren, Yanru; Wheeler, David; Gibbs, Richard A; Muzny, Donna Marie; Barnes, Chris; Darvishi, Katayoon; Hurles, Matthew; Korn, Joshua M; Kristiansson, Kati; Lee, Charles; McCarrol, Steven A; Nemesh, James; Dermitzakis, Emmanouil; Keinan, Alon; Montgomery, Stephen B; Pollack, Samuela; Price, Alkes L; Soranzo, Nicole; Bonnen, Penelope E; Gibbs, Richard A; Gonzaga-Jauregui, Claudia; Keinan, Alon; Price, Alkes L; Yu, Fuli; Anttila, Verneri; Brodeur, Wendy; Daly, Mark J; Leslie, Stephen; McVean, Gil; Moutsianas, Loukas; Nguyen, Huy; Schaffner, Stephen F; Zhang, Qingrun; Ghori, Mohammed J R; McGinnis, Ralph; McLaren, William; Pollack, Samuela; Price, Alkes L; Schaffner, Stephen F; Takeuchi, Fumihiko; Grossman, Sharon R; Shlyakhter, Ilya; Hostetter, Elizabeth B; Sabeti, Pardis C; Adebamowo, Clement A; Foster, Morris W; Gordon, Deborah R; Licinio, Julio; Manca, Maria Cristina; Marshall, Patricia A; Matsuda, Ichiro; Ngare, Duncan; Wang, Vivian Ota; Reddy, Deepa; Rotimi, Charles N; Royal, Charmaine D; Sharp, Richard R; Zeng, Changqing; Brooks, Lisa D; McEwen, Jean E

    2010-09-02

    Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of

  5. Tracking Virus Particles in Fluorescence Microscopy Images Using Multi-Scale Detection and Multi-Frame Association.

    PubMed

    Jaiswal, Astha; Godinez, William J; Eils, Roland; Lehmann, Maik Jorg; Rohr, Karl

    2015-11-01

    Automatic fluorescent particle tracking is an essential task to study the dynamics of a large number of biological structures at a sub-cellular level. We have developed a probabilistic particle tracking approach based on multi-scale detection and two-step multi-frame association. The multi-scale detection scheme allows coping with particles in close proximity. For finding associations, we have developed a two-step multi-frame algorithm, which is based on a temporally semiglobal formulation as well as spatially local and global optimization. In the first step, reliable associations are determined for each particle individually in local neighborhoods. In the second step, the global spatial information over multiple frames is exploited jointly to determine optimal associations. The multi-scale detection scheme and the multi-frame association finding algorithm have been combined with a probabilistic tracking approach based on the Kalman filter. We have successfully applied our probabilistic tracking approach to synthetic as well as real microscopy image sequences of virus particles and quantified the performance. We found that the proposed approach outperforms previous approaches.

  6. Using Local States To Drive the Sampling of Global Conformations in Proteins.

    PubMed

    Pandini, Alessandro; Fornili, Arianna

    2016-03-08

    Conformational changes associated with protein function often occur beyond the time scale currently accessible to unbiased molecular dynamics (MD) simulations, so that different approaches have been developed to accelerate their sampling. Here we investigate how the knowledge of backbone conformations preferentially adopted by protein fragments, as contained in precalculated libraries known as structural alphabets (SA), can be used to explore the landscape of protein conformations in MD simulations. We find that (a) enhancing the sampling of native local states in both metadynamics and steered MD simulations allows the recovery of global folded states in small proteins; (b) folded states can still be recovered when the amount of information on the native local states is reduced by using a low-resolution version of the SA, where states are clustered into macrostates; and (c) sequences of SA states derived from collections of structural motifs can be used to sample alternative conformations of preselected protein regions. The present findings have potential impact on several applications, ranging from protein model refinement to protein folding and design.

  7. The effect of OPC Factor on energy levels in healthy adults ages 45-65: a phase IIb randomized controlled trial.

    PubMed

    LaRiccia, Patrick J; Farrar, John T; Sammel, Mary D; Gallo, Joseph J

    2008-07-01

    To determine the efficacy of the food supplement OPC Factor to increase energy levels in healthy adults aged 45 to 65. Randomized, placebo-controlled, triple-blind crossover study. Twenty-five (25) healthy adults recruited from the University of Pennsylvania Health System. OPC Factor,trade mark (AlivenLabs, Lebanon, TN) a food supplement that contains oligomeric proanthocyanidins from grape seeds and pine bark along with other nutrient supplements including vitamins and minerals, was in the form of an effervescent powder. The placebo was similar in appearance and taste. Five outcome measurements were performed: (1) Energy subscale scores of the Activation-Deactivation Adjective Check List (AD ACL); (2) One (1) global question of percent energy change (Global Energy Percent Change); (3) One (1) global question of energy change measured on a Likert scale (Global Energy Scale Change); 4. One (1) global question of percent overall status change (Global Overall Status Percent Change); and (5) One (1) global question of overall status change measured on a Likert scale (Global Overall Status Scale Change). There were no carryover/period effects in the groups randomized to Placebo/Active Product sequence versus Active Product/Placebo sequence. Examination of the AD ACL Energy subscale scores for the Active Product versus Placebo comparison revealed no significant difference in the intention-to-treat (IT) analysis and the treatment received (TR) analysis. However, Global Energy Percent Change (p = 0.06) and Global Energy Scale Change (p = 0.09) both closely approached conventional levels of statistical significance for the active product in the IT analysis. Global Energy Percent Change (p = 0.05) and Global Energy Scale Change (p = 0.04) reached statistical significance in the TR analysis. A cumulative percent responders analysis graph indicated greater response rates for the active product. OPC Factor may increase energy levels in healthy adults aged 45-65 years. A larger study is recommended. Clinical Trials.gov identifier: NCT03318019.

  8. Advances in Setaria genomics for genetic improvement of cereals and bioenergy grasses.

    PubMed

    Muthamilarasan, Mehanathan; Prasad, Manoj

    2015-01-01

    Recent advances in Setaria genomics appear promising for genetic improvement of cereals and biofuel crops towards providing multiple securities to the steadily increasing global population. The prominent attributes of foxtail millet (Setaria italica, cultivated) and green foxtail (S. viridis, wild) including small genome size, short life-cycle, in-breeding nature, genetic close-relatedness to several cereals, millets and bioenergy grasses, and potential abiotic stress tolerance have accentuated these two Setaria species as novel model system for studying C4 photosynthesis, stress biology and biofuel traits. Considering this, studies have been performed on structural and functional genomics of these plants to develop genetic and genomic resources, and to delineate the physiology and molecular biology of stress tolerance, for the improvement of millets, cereals and bioenergy grasses. The release of foxtail millet genome sequence has provided a new dimension to Setaria genomics, resulting in large-scale development of genetic and genomic tools, construction of informative databases, and genome-wide association and functional genomic studies. In this context, this review discusses the advancements made in Setaria genomics, which have generated a considerable knowledge that could be used for the improvement of millets, cereals and biofuel crops. Further, this review also shows the nutritional potential of foxtail millet in providing health benefits to global population and provides a preliminary information on introgressing the nutritional properties in graminaceous species through molecular breeding and transgene-based approaches.

  9. Haplotype estimation using sequencing reads.

    PubMed

    Delaneau, Olivier; Howie, Bryan; Cox, Anthony J; Zagury, Jean-François; Marchini, Jonathan

    2013-10-03

    High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  10. A public HTLV-1 molecular epidemiology database for sequence management and data mining.

    PubMed

    Araujo, Thessika Hialla Almeida; Souza-Brito, Leandro Inacio; Libin, Pieter; Deforche, Koen; Edwards, Dustin; de Albuquerque-Junior, Antonio Eduardo; Vandamme, Anne-Mieke; Galvao-Castro, Bernardo; Alcantara, Luiz Carlos Junior

    2012-01-01

    It is estimated that 15 to 20 million people are infected with the human T-cell lymphotropic virus type 1 (HTLV-1). At present, there are more than 2,000 unique HTLV-1 isolate sequences published. A central database to aggregate sequence information from a range of epidemiological aspects including HTLV-1 infections, pathogenesis, origins, and evolutionary dynamics would be useful to scientists and physicians worldwide. Described here, we have developed a database that collects and annotates sequence data and can be accessed through a user-friendly search interface. The HTLV-1 Molecular Epidemiology Database website is available at http://htlv1db.bahia.fiocruz.br/. All data was obtained from publications available at GenBank or through contact with the authors. The database was developed using Apache Webserver 2.1.6 and SGBD MySQL. The webpage interfaces were developed in HTML and sever-side scripting written in PHP. The HTLV-1 Molecular Epidemiology Database is hosted on the Gonçalo Moniz/FIOCRUZ Research Center server. There are currently 2,457 registered sequences with 2,024 (82.37%) of those sequences representing unique isolates. Of these sequences, 803 (39.67%) contain information about clinical status (TSP/HAM, 17.19%; ATL, 7.41%; asymptomatic, 12.89%; other diseases, 2.17%; and no information, 60.32%). Further, 7.26% of sequences contain information on patient gender while 5.23% of sequences provide the age of the patient. The HTLV-1 Molecular Epidemiology Database retrieves and stores annotated HTLV-1 proviral sequences from clinical, epidemiological, and geographical studies. The collected sequences and related information are now accessible on a publically available and user-friendly website. This open-access database will support clinical research and vaccine development related to viral genotype.

  11. A method for automatically extracting infectious disease-related primers and probes from the literature

    PubMed Central

    2010-01-01

    Background Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. Results We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. Conclusions We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch. PMID:20682041

  12. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.

    PubMed

    Yooseph, Shibu; Sutton, Granger; Rusch, Douglas B; Halpern, Aaron L; Williamson, Shannon J; Remington, Karin; Eisen, Jonathan A; Heidelberg, Karla B; Manning, Gerard; Li, Weizhong; Jaroszewski, Lukasz; Cieplak, Piotr; Miller, Christopher S; Li, Huiying; Mashiyama, Susan T; Joachimiak, Marcin P; van Belle, Christopher; Chandonia, John-Marc; Soergel, David A; Zhai, Yufeng; Natarajan, Kannan; Lee, Shaun; Raphael, Benjamin J; Bafna, Vineet; Friedman, Robert; Brenner, Steven E; Godzik, Adam; Eisenberg, David; Dixon, Jack E; Taylor, Susan S; Strausberg, Robert L; Frazier, Marvin; Venter, J Craig

    2007-03-01

    Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.

  13. Defining a Core Genome Multilocus Sequence Typing Scheme for the Global Epidemiology of Vibrio parahaemolyticus

    PubMed Central

    Jolley, Keith A.; Reed, Elizabeth; Martinez-Urtaza, Jaime

    2017-01-01

    ABSTRACT Vibrio parahaemolyticus is an important human foodborne pathogen whose transmission is associated with the consumption of contaminated seafood, with a growing number of infections reported over recent years worldwide. A multilocus sequence typing (MLST) database for V. parahaemolyticus was created in 2008, and a large number of clones have been identified, causing severe outbreaks worldwide (sequence type 3 [ST3]), recurrent outbreaks in certain regions (e.g., ST36), or spreading to other regions where they are nonendemic (e.g., ST88 or ST189). The current MLST scheme uses sequences of 7 genes to generate an ST, which results in a powerful tool for inferring the population structure of this pathogen, although with limited resolution, especially compared to pulsed-field gel electrophoresis (PFGE). The application of whole-genome sequencing (WGS) has become routine for trace back investigations, with core genome MLST (cgMLST) analysis as one of the most straightforward ways to explore complex genomic data in an epidemiological context. Therefore, there is a need to generate a new, portable, standardized, and more advanced system that provides higher resolution and discriminatory power among V. parahaemolyticus strains using WGS data. We sequenced 92 V. parahaemolyticus genomes and used the genome of strain RIMD 2210633 as a reference (with a total of 4,832 genes) to determine which genes were suitable for establishing a V. parahaemolyticus cgMLST scheme. This analysis resulted in the identification of 2,254 suitable core genes for use in the cgMLST scheme. To evaluate the performance of this scheme, we performed a cgMLST analysis of 92 newly sequenced genomes, plus an additional 142 strains with genomes available at NCBI. cgMLST analysis was able to distinguish related and unrelated strains, including those with the same ST, clearly showing its enhanced resolution over conventional MLST analysis. It also distinguished outbreak-related from non-outbreak-related strains within the same ST. The sequences obtained from this work were deposited and are available in the public database (http://pubmlst.org/vparahaemolyticus). The application of this cgMLST scheme to the characterization of V. parahaemolyticus strains provided by different laboratories from around the world will reveal the global picture of the epidemiology, spread, and evolution of this pathogen and will become a powerful tool for outbreak investigations, allowing for the unambiguous comparison of strains with global coverage. PMID:28330888

  14. [Learning and Repetive Reproduction of Memorized Sequences by the Right and the Left Hand].

    PubMed

    Bobrova, E V; Lyakhovetskii, V A; Bogacheva, I N

    2015-01-01

    An important stage of learning a new skill is repetitive reproduction of one and the same sequence of movements, which plays a significant role in forming of the movement stereotypes. Two groups of right-handers repeatedly memorized (6-10 repetitions) the sequences of their hand transitions by experimenter in 6 positions, firstly by the right hand (RH), and then--by the left hand (LH) or vice versa. Random sequences previously unknown to the volunteers were reproduced in the 11 series. Modified sequences were tested in the 2nd and 3rd series, where the same elements' positions were presented in different order. The processes of repetitive sequence reproduction were similar for RH and LH. However, the learning of the modified sequences differed: Information about elements' position disregarding the reproduction order was used only when LH initiated task performing. This information was not used when LH followed RH and when RH performed the task. Consequently, the type of information coding activated by LH helped learn the positions of sequence elements, while the type of information coding activated by RH prevented learning. It is supposedly connected with the predominant role of right hemisphere in the processes of positional coding and motor learning.

  15. The influence of mutation, recombination, population history, and selection on patterns of genetic diversity in Neisseria meningitidis.

    PubMed

    Jolley, K A; Wilson, D J; Kriz, P; McVean, G; Maiden, M C J

    2005-03-01

    Patterns of genetic diversity within populations of human pathogens, shaped by the ecology of host-microbe interactions, contain important information about the epidemiological history of infectious disease. Exploiting this information, however, requires a systematic approach that distinguishes the genetic signal generated by epidemiological processes from the effects of other forces, such as recombination, mutation, and population history. Here, a variety of quantitative techniques were employed to investigate multilocus sequence information from isolate collections of Neisseria meningitidis, a major cause of meningitis and septicemia world wide. This allowed quantitative evaluation of alternative explanations for the observed population structure. A coalescent-based approach was employed to estimate the rate of mutation, the rate of recombination, and the size distribution of recombination fragments from samples from disease-associated and carried meningococci obtained in the Czech Republic in 1993 and a global collection of disease-associated isolates collected globally from 1937 to 1996. The parameter estimates were used to reject a model in which genetic structure arose by chance in small populations, and analysis of molecular variation showed that geographically restricted gene flow was unlikely to be the cause of the genetic structure. The genetic differentiation between disease and carriage isolate collections indicated that, whereas certain genotypes were overrepresented among the disease-isolate collections (the "hyperinvasive" lineages), disease-associated and carried meningococci exhibited remarkably little differentiation at the level of individual nucleotide polymorphisms. In combination, these results indicated the repeated action of natural selection on meningococcal populations, possibly arising from the coevolutionary dynamic of host-pathogen interactions.

  16. The Global Drought Information System - A Decision Support Tool with Global Applications

    NASA Astrophysics Data System (ADS)

    Arndt, D. S.; Brewer, M.; Heim, R. R., Jr.

    2014-12-01

    Drought is a natural hazard which can cause famine in developing countries and severe economic hardship in developed countries. Given current concerns with the increasing frequency and magnitude of droughts in many regions of the world, especially in the light of expected climate change, drought monitoring and dissemination of early warning information in a timely fashion on a global scale is a critical concern as an important adaptation and mitigation strategy. While a number of nations, and a few continental-scale activities have developed drought information system activities, a global drought early warning system (GDEWS) remains elusive, despite the benefits highlighted by ministers to the Global Earth Observation System of System in 2008. In an effort to begin a process of drought monitoring with international collaboration, the National Integrated Drought Information System's (NIDIS) U.S. Drought Portal, a web-based information system created to address drought services and early warning in the United States, including drought monitoring, forecasting, impacts, mitigation, research, and education, volunteered to develop a prototype Global Drought Monitoring Portal (GDMP). Through integration of data and information at the global level, and with four continental-level partners, the GDMP has proven successful as a tool to monitor drought around the globe. At a past meeting between NIDIS, the World Meteorological Organization, and the Global Earth Observation System of Systems, it was recommended that the GDMP form the basis for a Global Drought Information System (GDIS). Currently, GDIS activities are focused around providing operational global drought monitoring products and assessments, incorporating additional drought monitoring information, especially from those areas without regional or continental-scale input, and incorporating drought-specific climate forecast information from the World Climate Research Programme. Additional GDIS pilot activities are underway with an emphasis on information and decision making, and how to effectively provide drought early warning. This talk will provide an update on the status of GDIS and its role in international drought monitoring.

  17. Time-Extended Policies in Mult-Agent Reinforcement Learning

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Agogino, Adrian K.

    2004-01-01

    Reinforcement learning methods perform well in many domains where a single agent needs to take a sequence of actions to perform a task. These methods use sequences of single-time-step rewards to create a policy that tries to maximize a time-extended utility, which is a (possibly discounted) sum of these rewards. In this paper we build on our previous work showing how these methods can be extended to a multi-agent environment where each agent creates its own policy that works towards maximizing a time-extended global utility over all agents actions. We show improved methods for creating time-extended utilities for the agents that are both "aligned" with the global utility and "learnable." We then show how to crate single-time-step rewards while avoiding the pi fall of having rewards aligned with the global reward leading to utilities not aligned with the global utility. Finally, we apply these reward functions to the multi-agent Gridworld problem. We explicitly quantify a utility's learnability and alignment, and show that reinforcement learning agents using the prescribed reward functions successfully tradeoff learnability and alignment. As a result they outperform both global (e.g., team games ) and local (e.g., "perfectly learnable" ) reinforcement learning solutions by as much as an order of magnitude.

  18. PulseNet International: Vision for the implementation of whole genome sequencing (WGS) for global food-borne disease surveillance

    PubMed Central

    Nadon, Celine; Van Walle, Ivo; Gerner-Smidt, Peter; Campos, Josefina; Chinen, Isabel; Concepcion-Acevedo, Jeniffer; Gilpin, Brent; Smith, Anthony M.; Kam, Kai Man; Perez, Enrique; Trees, Eija; Kubota, Kristy; Takkinen, Johanna; Nielsen, Eva Møller; Carleton, Heather

    2017-01-01

    PulseNet International is a global network dedicated to laboratory-based surveillance for food-borne diseases. The network comprises the national and regional laboratory networks of Africa, Asia Pacific, Canada, Europe, Latin America and the Caribbean, the Middle East, and the United States. The PulseNet International vision is the standardised use of whole genome sequencing (WGS) to identify and subtype food-borne bacterial pathogens worldwide, replacing traditional methods to strengthen preparedness and response, reduce global social and economic disease burden, and save lives. To meet the needs of real-time surveillance, the PulseNet International network will standardise subtyping via WGS using whole genome multilocus sequence typing (wgMLST), which delivers sufficiently high resolution and epidemiological concordance, plus unambiguous nomenclature for the purposes of surveillance. Standardised protocols, validation studies, quality control programmes, database and nomenclature development, and training should support the implementation and decentralisation of WGS. Ideally, WGS data collected for surveillance purposes should be publicly available, in real time where possible, respecting data protection policies. WGS data are suitable for surveillance and outbreak purposes and for answering scientific questions pertaining to source attribution, antimicrobial resistance, transmission patterns, and virulence, which will further enable the protection and improvement of public health with respect to food-borne disease. PMID:28662764

  19. Detection of Rotational Sequences for Global Oscillation Modes inside the Sun

    NASA Technical Reports Server (NTRS)

    Wolff, Charles L.; Niemann, Hasso B. (Technical Monitor)

    2002-01-01

    A very simple mathematical sequence is detected in a half century of thermal radio flux from the Sun. Since the only known physical cause of the sequence is global oscillations trapped in the nonconvecting solar interior, g-modes and probably r-modes are active. If so, their rotation frequencies are detected and some previously reported difference frequencies are confirmed with high confidence. All angular harmonics for 2 less than or = l less than or = 7 are detected as well as some others up to the limit l less than or = 14 resolvable by the observations (a Fourier spectrum of the 10.7 cm flux time series). The mean sidereal rotation of the nonconvecting interior is 428.2 nHz as averaged by g-modes and 429.8 nHz by the r-modes, indicating that g-mode energy is a bit more centrally concentrated. Helioseismology measures such rotation rates near 0.36R (R = solar radius), so the global modes would have about half their kinetic energy above and below that level. This, and the known log(r) energy dependence of most modes implies that these oscillations are significantly reflected near 0.18R, the same level at which sound speed measurements display a maximum departure from theoretical models.

  20. The peanut genome consortium and peanut genome sequence: Creating a better future through global food security

    USDA-ARS?s Scientific Manuscript database

    The competitiveness of peanuts in domestic and global markets has been threatened by losses in productivity and quality that are attributed to diseases, pests, environmental stresses and allergy or food safety issues. The U.S. Peanut Genome Initiative (PGI) was launched in 2004, and expanded to a gl...

  1. The Evolution of Strain Typing in the Mycobacterium tuberculosis Complex.

    PubMed

    Merker, Matthias; Kohl, Thomas A; Niemann, Stefan; Supply, Philip

    2017-01-01

    Tuberculosis (TB) is a contagious disease with a complex epidemiology. Therefore, molecular typing (genotyping) of Mycobacterium tuberculosis complex (MTBC) strains is of primary importance to effectively guide outbreak investigations, define transmission dynamics and assist global epidemiological surveillance of the disease. Large-scale genotyping is also needed to get better insights into the biological diversity and the evolution of the pathogen. Thanks to its shorter turnaround and simple numerical nomenclature system, mycobacterial interspersed repetitive unit-variable-number tandem repeat (MIRU-VNTR) typing, based on 24 standardized plus 4 hypervariable loci, optionally combined with spoligotyping, has replaced IS6110 DNA fingerprinting over the last decade as a gold standard among classical strain typing methods for many applications. With the continuous progress and decreasing costs of next-generation sequencing (NGS) technologies, typing based on whole genome sequencing (WGS) is now increasingly performed for near complete exploitation of the available genetic information. However, some important challenges remain such as the lack of standardization of WGS analysis pipelines, the need of databases for sharing WGS data at a global level, and a better understanding of the relevant genomic distances for defining clusters of recent TB transmission in different epidemiological contexts. This chapter provides an overview of the evolution of genotyping methods over the last three decades, which culminated with the development of WGS-based methods. It addresses the relative advantages and limitations of these techniques, indicates current challenges and potential directions for facilitating standardization of WGS-based typing, and provides suggestions on what method to use depending on the specific research question.

  2. pDHS-SVM: A prediction method for plant DNase I hypersensitive sites based on support vector machine.

    PubMed

    Zhang, Shanxin; Zhou, Zhiping; Chen, Xinmeng; Hu, Yong; Yang, Lindong

    2017-08-07

    DNase I hypersensitive sites (DHSs) are accessible chromatin regions hypersensitive to cleavages by DNase I endonucleases. DHSs are indicative of cis-regulatory DNA elements (CREs), all of which play important roles in global gene expression regulation. It is helpful for discovering CREs by recognition of DHSs in genome. To accelerate the investigation, it is an important complement to develop cost-effective computational methods to identify DHSs. However, there is a lack of tools used for identifying DHSs in plant genome. Here we presented pDHS-SVM, a computational predictor to identify plant DHSs. To integrate the global sequence-order information and local DNA properties, reverse complement kmer and dinucleotide-based auto covariance of DNA sequences were applied to construct the feature space. In this work, fifteen physical-chemical properties of dinucleotides were used and Support Vector Machine (SVM) was employed. To further improve the performance of the predictor and extract an optimized subset of nucleotide physical-chemical properties positive for the DHSs, a heuristic nucleotide physical-chemical property selection algorithm was introduced. With the optimized subset of properties, experimental results of Arabidopsis thaliana and rice (Oryza sativa) showed that pDHS-SVM could achieve accuracies up to 87.00%, and 85.79%, respectively. The results indicated the effectiveness of proposed method for predicting DHSs. Furthermore, pDHS-SVM could provide a helpful complement for predicting CREs in plant genome. Our implementation of the novel proposed method pDHS-SVM is freely available as source code, at https://github.com/shanxinzhang/pDHS-SVM. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. U.S. Global Change Research Program National Climate Assessment Global Change Information System

    NASA Technical Reports Server (NTRS)

    Tilmes, Curt

    2012-01-01

    The program: a) Coordinates Federal research to better understand and prepare the nation for global change. b) Priori4zes and supports cutting edge scientific work in global change. c) Assesses the state of scientific knowledge and the Nation s readiness to respond to global change. d) Communicates research findings to inform, educate, and engage the global community.

  4. Sequencing consolidates molecular markers with plant breeding practice.

    PubMed

    Yang, Huaan; Li, Chengdao; Lam, Hon-Ming; Clements, Jonathan; Yan, Guijun; Zhao, Shancen

    2015-05-01

    Plenty of molecular markers have been developed by contemporary sequencing technologies, whereas few of them are successfully applied in breeding, thus we present a review on how sequencing can facilitate marker-assisted selection in plant breeding. The growing global population and shrinking arable land area require efficient plant breeding. Novel strategies assisted by certain markers have proven effective for genetic gains. Fortunately, cutting-edge sequencing technologies bring us a deluge of genomes and genetic variations, enlightening the potential of marker development. However, a large gap still exists between the potential of molecular markers and actual plant breeding practices. In this review, we discuss marker-assisted breeding from a historical perspective, describe the road from crop sequencing to breeding, and highlight how sequencing facilitates the application of markers in breeding practice.

  5. Earth science information: Planning for the integration and use of global change information

    NASA Technical Reports Server (NTRS)

    Lousma, Jack R.

    1992-01-01

    Activities and accomplishments of the first six months of the Consortium for International Earth Science Information Network (CIESIN's) 1992 technical program have focused on four main missions: (1) the development and implementation of plans for initiation of the Socioeconomic Data and Applications Center (SEDAC) as part of the EOSDIS Program; (2) the pursuit and development of a broad-based global change information cooperative by providing systems analysis and integration between natural science and social science data bases held by numerous federal agencies and other sources; (3) the fostering of scientific research into the human dimensions of global change and providing integration between natural science and social science data and information; and (4) the serving of CIESIN as a gateway for global change data and information distribution through development of the Global Change Research Information Office and other comprehensive knowledge sharing systems.

  6. Modeling genome coverage in single-cell sequencing

    PubMed Central

    Daley, Timothy; Smith, Andrew D.

    2014-01-01

    Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873

  7. A short review of paleoenvironments for Lower Beaufort (Upper Permian) Karoo sequences from southern to central Africa: A major Gondwana Lacustrine episode

    NASA Astrophysics Data System (ADS)

    Yemane, K.; Kelts, K.

    This paper compares Karoo deposits within the Lower Beaufort (Late Permian) time interval from southern to central Africa. Facies aspects are summarized for selected sequences and depositional environments assessed in connection with the palaeogeography. The comparison shows that thickness of Lower Beaufort sequences varies greatly; sequences are over a kilometre thick at the southern tip, but decrease drastically to the north, northwest and northeast, and is commonly absent from the western part of the subcontinent. Depositional environments are continental except for small estuarine intervals from a sequence in Tanzania. The commonest lithologies comprise mudstones, siltstones, arkoses and carbonates. In spite of the dominance of fluvial facies, the records preserved by intervals of lacustrine sequences suggest that large lakes were major features of the palaeogeography, and that lacustrine environments may have been dominant deposition environments. The Lower Beaufort landscape is generally interpreted as an expansive cratonic lowland with meandering rivers and streams crossing vast floodplains, which were indented by concomitant shallow lakes of various sizes. The lakes from the Karoo tectono-sedimentary terrain were often ephemeral and closely linked with fluvial processes, but large, anoxic lakers are also documented. On the other hand, giant, freshwater lakes, covered large areas of the Zambezian tectono-sedimentary terrain and may have been locally connected. Evidence from abundant freshwater fossil assemblages, particularly from the Zambezian tectono-sedimentary terrain suggest that in spite of the generally semi-arid global climate of the Upper Permian, seasonal precipitation (monsoonal?) supplied enough moisture to sustain large perennial lakes. Because of the unique nature of the Permian cotinental configuration and palaeogeography, however, modern analogues of large systems are lacking. The general lithological and palaeontological correlability of Lower Beaufort sequences suggests a similar regional palaeoclimate, whereas the differences in distribution are taken to be a result of control of tectonic settings. From the widespread occurrences of lake deposits in the African subcontinent, over relatively long interval, we conclude that lake deposits provide more information for a better understanding of Karoo palaeogeography than previously thought, since such lacustrine sequences should hold sensitive, high resolution records for palaeoenvironmental interpretations.

  8. GMOMETHODS: the European Union database of reference methods for GMO analysis.

    PubMed

    Bonfini, Laura; Van den Bulcke, Marc H; Mazzara, Marco; Ben, Enrico; Patak, Alexandre

    2012-01-01

    In order to provide reliable and harmonized information on methods for GMO (genetically modified organism) analysis we have published a database called "GMOMETHODS" that supplies information on PCR assays validated according to the principles and requirements of ISO 5725 and/or the International Union of Pure and Applied Chemistry protocol. In addition, the database contains methods that have been verified by the European Union Reference Laboratory for Genetically Modified Food and Feed in the context of compliance with an European Union legislative act. The web application provides search capabilities to retrieve primers and probes sequence information on the available methods. It further supplies core data required by analytical labs to carry out GM tests and comprises information on the applied reference material and plasmid standards. The GMOMETHODS database currently contains 118 different PCR methods allowing identification of 51 single GM events and 18 taxon-specific genes in a sample. It also provides screening assays for detection of eight different genetic elements commonly used for the development of GMOs. The application is referred to by the Biosafety Clearing House, a global mechanism set up by the Cartagena Protocol on Biosafety to facilitate the exchange of information on Living Modified Organisms. The publication of the GMOMETHODS database can be considered an important step toward worldwide standardization and harmonization in GMO analysis.

  9. Sequence information gain based motif analysis.

    PubMed

    Maynou, Joan; Pairó, Erola; Marco, Santiago; Perera, Alexandre

    2015-11-09

    The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70% of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.

  10. Construction of phylogenetic trees by kernel-based comparative analysis of metabolic networks.

    PubMed

    Oh, S June; Joung, Je-Gun; Chang, Jeong-Ho; Zhang, Byoung-Tak

    2006-06-06

    To infer the tree of life requires knowledge of the common characteristics of each species descended from a common ancestor as the measuring criteria and a method to calculate the distance between the resulting values of each measure. Conventional phylogenetic analysis based on genomic sequences provides information about the genetic relationships between different organisms. In contrast, comparative analysis of metabolic pathways in different organisms can yield insights into their functional relationships under different physiological conditions. However, evaluating the similarities or differences between metabolic networks is a computationally challenging problem, and systematic methods of doing this are desirable. Here we introduce a graph-kernel method for computing the similarity between metabolic networks in polynomial time, and use it to profile metabolic pathways and to construct phylogenetic trees. To compare the structures of metabolic networks in organisms, we adopted the exponential graph kernel, which is a kernel-based approach with a labeled graph that includes a label matrix and an adjacency matrix. To construct the phylogenetic trees, we used an unweighted pair-group method with arithmetic mean, i.e., a hierarchical clustering algorithm. We applied the kernel-based network profiling method in a comparative analysis of nine carbohydrate metabolic networks from 81 biological species encompassing Archaea, Eukaryota, and Eubacteria. The resulting phylogenetic hierarchies generally support the tripartite scheme of three domains rather than the two domains of prokaryotes and eukaryotes. By combining the kernel machines with metabolic information, the method infers the context of biosphere development that covers physiological events required for adaptation by genetic reconstruction. The results show that one may obtain a global view of the tree of life by comparing the metabolic pathway structures using meta-level information rather than sequence information. This method may yield further information about biological evolution, such as the history of horizontal transfer of each gene, by studying the detailed structure of the phylogenetic tree constructed by the kernel-based method.

  11. Information empowerment: predeparture resource training for students in global health.

    PubMed

    Rana, Gurpreet K

    2014-04-01

    The Taubman Health Sciences Library (THL) collaborates with health sciences schools to provide information skills instruction for students preparing for international experiences. THL enhances students' global health learning through predeparture instruction for students who are involved in global health research, clinical internships, and international collaborations. This includes teaching international literature searching skills, providing country-specific data sources, building awareness of relevant mobile resources, and encouraging investigation of international news. Information skills empower creation of stronger global partnerships. Use of information resources has enhanced international research and training experiences, built lifelong learning foundations, and contributed to the university's global engagement. THL continues to assess predeparture instruction.

  12. Predicting residue-wise contact orders in proteins by support vector regression.

    PubMed

    Song, Jiangning; Burrage, Kevin

    2006-10-03

    The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

  13. Simian T Lymphotropic Virus 1 Infection of Papio anubis: tax Sequence Heterogeneity and T Cell Recognition.

    PubMed

    Termini, James M; Magnani, Diogo M; Maxwell, Helen S; Lauer, William; Castro, Iris; Pecotte, Jerilyn; Barber, Glen N; Watkins, David I; Desrosiers, Ronald C

    2017-10-15

    Baboons naturally infected with simian T lymphotropic virus (STLV) are a potentially useful model system for the study of vaccination against human T lymphotropic virus (HTLV). Here we expanded the number of available full-length baboon STLV-1 sequences from one to three and related the T cell responses that recognize the immunodominant Tax protein to the tax sequences present in two individual baboons. Continuously growing T cell lines were established from two baboons, animals 12141 and 12752. Next-generation sequencing (NGS) of complete STLV genome sequences from these T cell lines revealed them to be closely related but distinct from each other and from the baboon STLV-1 sequence in the NCBI sequence database. Overlapping peptides corresponding to each unique Tax sequence and to the reference baboon Tax sequence were used to analyze recognition by T cells from each baboon using intracellular cytokine staining (ICS). Individual baboons expressed more gamma interferon and tumor necrosis factor alpha in response to Tax peptides corresponding to their own STLV-1 sequence than in response to Tax peptides corresponding to the reference baboon STLV-1 sequence. Thus, our analyses revealed distinct but closely related STLV-1 genome sequences in two baboons, extremely low heterogeneity of STLV sequences within each baboon, no evidence for superinfection within each baboon, and a ready ability of T cells in each baboon to recognize circulating Tax sequences. While amino acid substitutions that result in escape from CD8 + T cell recognition were not observed, premature stop codons were observed in 7% and 56% of tax sequences from peripheral blood mononuclear cells from animals 12141 and 12752, respectively. IMPORTANCE It has been estimated that approximately 100,000 people suffer serious morbidity and 10,000 people die each year from the consequences associated with human T lymphotropic virus (HTLV) infection. There are no antiviral drugs and no preventive vaccine. A preventive vaccine would significantly impact the global burden associated with HTLV infections. Here we provide fundamental information on the simian T lymphotropic virus (STLV) naturally transmitted in a colony of captive baboons. The limited viral sequence heterogeneity in individual baboons, the identity of the viral gene product that is the major target of cellular immune responses, the persistence of viral amino acid sequences that are the major targets of cellular immune responses, and the emergence in vivo of truncated variants in the major target of cellular immune responses all parallel what are seen with HTLV infection of humans. These results justify the use of STLV-infected baboons as a model system for vaccine development efforts. Copyright © 2017 American Society for Microbiology.

  14. Revealing impaired pathways in the an11 mutant by high-throughput characterization of Petunia axillaris and Petunia inflata transcriptomes.

    PubMed

    Zenoni, Sara; D'Agostino, Nunzio; Tornielli, Giovanni B; Quattrocchio, Francesca; Chiusano, Maria L; Koes, Ronald; Zethof, Jan; Guzzo, Flavia; Delledonne, Massimo; Frusciante, Luigi; Gerats, Tom; Pezzotti, Mario

    2011-10-01

    Petunia is an excellent model system, especially for genetic, physiological and molecular studies. Thus far, however, genome-wide expression analysis has been applied rarely because of the lack of sequence information. We applied next-generation sequencing to generate, through de novo read assembly, a large catalogue of transcripts for Petunia axillaris and Petunia inflata. On the basis of both transcriptomes, comprehensive microarray chips for gene expression analysis were established and used for the analysis of global- and organ-specific gene expression in Petunia axillaris and Petunia inflata and to explore the molecular basis of the seed coat defects in a Petunia hybrida mutant, anthocyanin 11 (an11), lacking a WD40-repeat (WDR) transcription regulator. Among the transcripts differentially expressed in an11 seeds compared with wild type, many expected targets of AN11 were found but also several interesting new candidates that might play a role in morphogenesis of the seed coat. Our results validate the combination of next-generation sequencing with microarray analyses strategies to identify the transcriptome of two petunia species without previous knowledge of their genome, and to develop comprehensive chips as useful tools for the analysis of gene expression in P. axillaris, P. inflata and P. hybrida. © 2011 The Authors. The Plant Journal © 2011 Blackwell Publishing Ltd.

  15. Mapping autosomal recessive intellectual disability: combined microarray and exome sequencing identifies 26 novel candidate genes in 192 consanguineous families.

    PubMed

    Harripaul, R; Vasli, N; Mikhailov, A; Rafiq, M A; Mittal, K; Windpassinger, C; Sheikh, T I; Noor, A; Mahmood, H; Downey, S; Johnson, M; Vleuten, K; Bell, L; Ilyas, M; Khan, F S; Khan, V; Moradi, M; Ayaz, M; Naeem, F; Heidari, A; Ahmed, I; Ghadami, S; Agha, Z; Zeinali, S; Qamar, R; Mozhdehipanah, H; John, P; Mir, A; Ansar, M; French, L; Ayub, M; Vincent, J B

    2018-04-01

    Approximately 1% of the global population is affected by intellectual disability (ID), and the majority receive no molecular diagnosis. Previous studies have indicated high levels of genetic heterogeneity, with estimates of more than 2500 autosomal ID genes, the majority of which are autosomal recessive (AR). Here, we combined microarray genotyping, homozygosity-by-descent (HBD) mapping, copy number variation (CNV) analysis, and whole exome sequencing (WES) to identify disease genes/mutations in 192 multiplex Pakistani and Iranian consanguineous families with non-syndromic ID. We identified definite or candidate mutations (or CNVs) in 51% of families in 72 different genes, including 26 not previously reported for ARID. The new ARID genes include nine with loss-of-function mutations (ABI2, MAPK8, MPDZ, PIDD1, SLAIN1, TBC1D23, TRAPPC6B, UBA7 and USP44), and missense mutations include the first reports of variants in BDNF or TET1 associated with ID. The genes identified also showed overlap with de novo gene sets for other neuropsychiatric disorders. Transcriptional studies showed prominent expression in the prenatal brain. The high yield of AR mutations for ID indicated that this approach has excellent clinical potential and should inform clinical diagnostics, including clinical whole exome and genome sequencing, for populations in which consanguinity is common. As with other AR disorders, the relevance will also apply to outbred populations.

  16. Differentiation of Trichophyton rubrum clinical isolates from Japanese and Chinese patients by randomly amplified polymorphic DNA and DNA sequence analysis of the non-transcribed spacer region of the rRNA gene.

    PubMed

    Yang, Xiumin; Sugita, Takashi; Takashima, Masako; Hiruma, Masataro; Li, Ruoyu; Sudo, Hajime; Ogawa, Hideoki; Ikeda, Shigaku

    2009-04-01

    Trichophyton rubrum is the most common pathogen causing dermatophytosis worldwide. Recent genetic investigations showed that the microorganism originated in Africa and then spread to Europe and North America via Asia. We investigated the intraspecific diversity of T. rubrum isolated from two closely located Asian countries, Japan and China. A total of 150 clinical isolates of T. rubrum obtained from Japanese and Chinese patients were analyzed by randomly amplified polymorphic DNA (RAPD) and DNA sequence analysis of the non-transcribed spacer (NTS) region in the rRNA gene. RAPD analysis divided the 150 strains into two major clusters, A and B. Of the Japanese isolates, 30% belonged to cluster A and 70% belonged to cluster B, whereas 91% of the Chinese isolates were in cluster A. The NTS region of the rRNA gene was divided into four major groups (I-IV) based on DNA sequencing. The majority of Japanese isolates were type IV (51%), and the majority of Chinese isolates were type III (75%). These results suggest that although Japan and China are neighboring countries, the origins of T. rubrum isolates from these countries may not be identical. These findings provide information useful for tracing the global transmission routes of T. rubrum.

  17. Recombination-Mediated Host Adaptation by Avian Staphylococcus aureus

    PubMed Central

    Murray, Susan; Pascoe, Ben; Méric, Guillaume; Mageiros, Leonardos; Yahara, Koji; Hitchings, Matthew D.; Friedmann, Yasmin; Wilkinson, Thomas S.; Gormley, Fraser J.; Mack, Dietrich; Bray, James E.; Lamble, Sarah; Bowden, Rory; Jolley, Keith A.; Maiden, Martin C.J.; Wendlandt, Sarah; Schwarz, Stefan; Corander, Jukka; Fitzgerald, J. Ross

    2017-01-01

    Staphylococcus aureus are globally disseminated among farmed chickens causing skeletal muscle infections, dermatitis, and septicaemia. The emergence of poultry-associated lineages has involved zoonotic transmission from humans to chickens but questions remain about the specific adaptations that promote proliferation of chicken pathogens. We characterized genetic variation in a population of genome-sequenced S. aureus isolates of poultry and human origin. Genealogical analysis identified a dominant poultry-associated sequence cluster within the CC5 clonal complex. Poultry and human CC5 isolates were significantly distinct from each other and more recombination events were detected in the poultry isolates. We identified 44 recombination events in 33 genes along the branch extending to the poultry-specific CC5 cluster, and 47 genes were found more often in CC5 poultry isolates compared with those from humans. Many of these gene sequences were common in chicken isolates from other clonal complexes suggesting horizontal gene transfer among poultry associated lineages. Consistent with functional predictions for putative poultry-associated genes, poultry isolates showed enhanced growth at 42 °C and greater erythrocyte lysis on chicken blood agar in comparison with human isolates. By combining phenotype information with evolutionary analyses of staphylococcal genomes, we provide evidence of adaptation, following a human-to-poultry host transition. This has important implications for the emergence and dissemination of new pathogenic clones associated with modern agriculture. PMID:28338786

  18. Gene Structures, Evolution and Transcriptional Profiling of the WRKY Gene Family in Castor Bean (Ricinus communis L.).

    PubMed

    Zou, Zhi; Yang, Lifu; Wang, Danhua; Huang, Qixing; Mo, Yeyong; Xie, Guishui

    2016-01-01

    WRKY proteins comprise one of the largest transcription factor families in plants and form key regulators of many plant processes. This study presents the characterization of 58 WRKY genes from the castor bean (Ricinus communis L., Euphorbiaceae) genome. Compared with the automatic genome annotation, one more WRKY-encoding locus was identified and 20 out of the 57 predicted gene models were manually corrected. All RcWRKY genes were shown to contain at least one intron in their coding sequences. According to the structural features of the present WRKY domains, the identified RcWRKY genes were assigned to three previously defined groups (I-III). Although castor bean underwent no recent whole-genome duplication event like physic nut (Jatropha curcas L., Euphorbiaceae), comparative genomics analysis indicated that one gene loss, one intron loss and one recent proximal duplication occurred in the RcWRKY gene family. The expression of all 58 RcWRKY genes was supported by ESTs and/or RNA sequencing reads derived from roots, leaves, flowers, seeds and endosperms. Further global expression profiles with RNA sequencing data revealed diverse expression patterns among various tissues. Results obtained from this study not only provide valuable information for future functional analysis and utilization of the castor bean WRKY genes, but also provide a useful reference to investigate the gene family expansion and evolution in Euphorbiaceus plants.

  19. High-Throughput Development of SSR Markers from Pea (Pisum sativum L.) Based on Next Generation Sequencing of a Purified Chinese Commercial Variety

    PubMed Central

    Zhang, Xiaoyan; Hu, Jinguo; Bao, Shiying; Hao, Junjie; Li, Ling; He, Yuhua; Jiang, Junye; Wang, Fang; Tian, Shufang; Zong, Xuxiao

    2015-01-01

    Pea (Pisum sativum L.) is an important food legume globally, and is the plant species that J.G. Mendel used to lay the foundation of modern genetics. However, genomics resources of pea are limited comparing to other crop species. Application of marker assisted selection (MAS) in pea breeding has lagged behind many other crops. Development of a large number of novel and reliable SSR (simple sequence repeat) or microsatellite markers will help both basic and applied genomics research of this crop. The Illumina HiSeq 2500 System was used to uncover 8,899 putative SSR containing sequences, and 3,275 non-redundant primers were designed to amplify these SSRs. Among the 1,644 SSRs that were randomly selected for primer validation, 841 yielded reliable amplifications of detectable polymorphisms among 24 genotypes of cultivated pea (Pisum sativum L.) and wild relatives (P. fulvum Sm.) originated from diverse geographical locations. The dataset indicated that the allele number per locus ranged from 2 to 10, and that the polymorphism information content (PIC) ranged from 0.08 to 0.82 with an average of 0.38. These 1,644 novel SSR markers were also tested for polymorphism between genotypes G0003973 and G0005527. Finally, 33 polymorphic SSR markers were anchored on the genetic linkage map of G0003973 × G0005527 F2 population. PMID:26440522

  20. An Easy Phylogenetically Informative Method to Trace the Globally Invasive Potamopyrgus Mud Snail from River's eDNA.

    PubMed

    Clusa, Laura; Ardura, Alba; Gower, Fiona; Miralles, Laura; Tsartsianidou, Valentina; Zaiko, Anastasija; Garcia-Vazquez, Eva

    2016-01-01

    Potamopyrgus antipodarum (New Zealand mud snail) is a prosobranch mollusk native to New Zealand with a wide invasive distribution range. Its non-indigenous populations are reported from Australia, Asia, Europe and North America. Being an extremely tolerant species, Potamopyrgus is capable to survive in a great range of salinity and temperature conditions, which explains its high invasiveness and successful spread outside the native range. Here we report the first finding of Potamopyrgus antipodarum in a basin of the Cantabrian corridor in North Iberia (Bay of Biscay, Spain). Two haplotypes already described in Europe were found in different sectors of River Nora (Nalon basin), suggesting the secondary introductions from earlier established invasive populations. To enhance the surveillance of the species and tracking its further spread in the region, we developed a specific set of primers for the genus Potamopyrgus that amplify a fragment of 16S rDNA. The sequences obtained from PCR on DNA extracted from tissue and water samples (environmental DNA, eDNA) were identical in each location, suggesting clonal reproduction of the introduced individuals. Multiple introduction events from different source populations were inferred from our sequence data. The eDNA tool developed here can serve for tracing New Zealand mud snail populations outside its native range, and for inventorying mud snail population assemblages in the native settings if high throughput sequencing methodologies are employed.

  1. An Easy Phylogenetically Informative Method to Trace the Globally Invasive Potamopyrgus Mud Snail from River’s eDNA

    PubMed Central

    Clusa, Laura; Ardura, Alba; Gower, Fiona; Miralles, Laura; Tsartsianidou, Valentina; Zaiko, Anastasija; Garcia-Vazquez, Eva

    2016-01-01

    Potamopyrgus antipodarum (New Zealand mud snail) is a prosobranch mollusk native to New Zealand with a wide invasive distribution range. Its non-indigenous populations are reported from Australia, Asia, Europe and North America. Being an extremely tolerant species, Potamopyrgus is capable to survive in a great range of salinity and temperature conditions, which explains its high invasiveness and successful spread outside the native range. Here we report the first finding of Potamopyrgus antipodarum in a basin of the Cantabrian corridor in North Iberia (Bay of Biscay, Spain). Two haplotypes already described in Europe were found in different sectors of River Nora (Nalon basin), suggesting the secondary introductions from earlier established invasive populations. To enhance the surveillance of the species and tracking its further spread in the region, we developed a specific set of primers for the genus Potamopyrgus that amplify a fragment of 16S rDNA. The sequences obtained from PCR on DNA extracted from tissue and water samples (environmental DNA, eDNA) were identical in each location, suggesting clonal reproduction of the introduced individuals. Multiple introduction events from different source populations were inferred from our sequence data. The eDNA tool developed here can serve for tracing New Zealand mud snail populations outside its native range, and for inventorying mud snail population assemblages in the native settings if high throughput sequencing methodologies are employed. PMID:27706172

  2. Draft Genome Sequences of 12 Clinical and Environmental Methicillin-Resistant Strains of Staphylococcus pseudintermedius Isolated from a Veterinary Teaching Hospital in Washington State

    PubMed Central

    Shah, Devendra H.; Jones, Lisa P.; Paul, Narayan

    2018-01-01

    ABSTRACT Methicillin-resistant Staphylococcus pseudintermedius (MRSP) is a globally emergent multidrug-resistant pathogen of dogs associated with nosocomial transmission in dogs and with potential zoonotic impacts. Here, we report the draft whole-genome sequences of 12 hospital-associated MRSP strains and their resistance genotypes and phenotypes. PMID:29650582

  3. Complete Whole-Genome Sequence of Salmonella enterica subsp. enterica Serovar Java NCTC5706.

    PubMed

    Fazal, Mohammed-Abbas; Alexander, Sarah; Burnett, Edward; Deheer-Graham, Ana; Oliver, Karen; Holroyd, Nancy; Parkhill, Julian; Russell, Julie E

    2016-11-03

    Salmonellae are a significant cause of morbidity and mortality globally. Here, we report the first complete genome sequence for Salmonella enterica subsp. enterica serovar Java strain NCTC5706. This strain is of historical significance, having been isolated in the pre-antibiotic era and was deposited into the National Collection of Type Cultures in 1939. © Crown copyright 2016.

  4. Finding the missing honey bee genes: lessons learned from a genome upgrade.

    PubMed

    Elsik, Christine G; Worley, Kim C; Bennett, Anna K; Beye, Martin; Camara, Francisco; Childers, Christopher P; de Graaf, Dirk C; Debyser, Griet; Deng, Jixin; Devreese, Bart; Elhaik, Eran; Evans, Jay D; Foster, Leonard J; Graur, Dan; Guigo, Roderic; Hoff, Katharina Jasmin; Holder, Michael E; Hudson, Matthew E; Hunt, Greg J; Jiang, Huaiyang; Joshi, Vandita; Khetani, Radhika S; Kosarev, Peter; Kovar, Christie L; Ma, Jian; Maleszka, Ryszard; Moritz, Robin F A; Munoz-Torres, Monica C; Murphy, Terence D; Muzny, Donna M; Newsham, Irene F; Reese, Justin T; Robertson, Hugh M; Robinson, Gene E; Rueppell, Olav; Solovyev, Victor; Stanke, Mario; Stolle, Eckart; Tsuruda, Jennifer M; Vaerenbergh, Matthias Van; Waterhouse, Robert M; Weaver, Daniel B; Whitfield, Charles W; Wu, Yuanqing; Zdobnov, Evgeny M; Zhang, Lan; Zhu, Dianhui; Gibbs, Richard A

    2014-01-30

    The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.

  5. Current sequencing technology makes microhaplotypes a powerful new type of genetic marker for forensics.

    PubMed

    Kidd, Kenneth K; Pakstis, Andrew J; Speed, William C; Lagacé, Robert; Chang, Joseph; Wootton, Sharon; Haigh, Eva; Kidd, Judith R

    2014-09-01

    SNPs that are molecularly very close (<10kb) will generally have extremely low recombination rates, much less than 10(-4). Multiple haplotypes will often exist because of the history of the origins of the variants at the different sites, rare recombinants, and the vagaries of random genetic drift and/or selection. Such multiallelic haplotype loci are potentially important in forensic work for individual identification, for defining ancestry, and for identifying familial relationships. The new DNA sequencing capabilities currently available make possible continuous runs of a few hundred base pairs so that we can now determine the allelic combination of multiple SNPs on each chromosome of an individual, i.e., the phase, for multiple SNPs within a small segment of DNA. Therefore, we have begun to identify regions, encompassing two to four SNPs with an extent of <200bp that define multiallelic haplotype loci. We have identified candidate regions and have collected pilot data on many candidate microhaplotype loci. Here we present 31 microhaplotype loci that have at least three alleles, have high heterozygosity, are globally informative, and are statistically independent at the population level. This study of microhaplotype loci (microhaps) provides proof of principle that such markers exist and validates their usefulness for ancestry inference, lineage-clan-family inference, and individual identification. The true value of microhaplotypes will come with sequencing methods that can establish alleles unambiguously, including disentangling of mixtures, because a single sequencing run on a single strand of DNA will encompass all of the SNPs. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  6. Whole-exome sequencing of primary plasma cell leukemia discloses heterogeneous mutational patterns.

    PubMed

    Cifola, Ingrid; Lionetti, Marta; Pinatel, Eva; Todoerti, Katia; Mangano, Eleonora; Pietrelli, Alessandro; Fabris, Sonia; Mosca, Laura; Simeon, Vittorio; Petrucci, Maria Teresa; Morabito, Fortunato; Offidani, Massimo; Di Raimondo, Francesco; Falcone, Antonietta; Caravita, Tommaso; Battaglia, Cristina; De Bellis, Gianluca; Palumbo, Antonio; Musto, Pellegrino; Neri, Antonino

    2015-07-10

    Primary plasma cell leukemia (pPCL) is a rare and aggressive form of plasma cell dyscrasia and may represent a valid model for high-risk multiple myeloma (MM). To provide novel information concerning the mutational profile of this disease, we performed the whole-exome sequencing of a prospective series of 12 pPCL cases included in a Phase II multicenter clinical trial and previously characterized at clinical and molecular levels. We identified 1, 928 coding somatic non-silent variants on 1, 643 genes, with a mean of 166 variants per sample, and only few variants and genes recurrent in two or more samples. An excess of C > T transitions and the presence of two main mutational signatures (related to APOBEC over-activity and aging) occurring in different translocation groups were observed. We identified 14 candidate cancer driver genes, mainly involved in cell-matrix adhesion, cell cycle, genome stability, RNA metabolism and protein folding. Furthermore, integration of mutation data with copy number alteration profiles evidenced biallelically disrupted genes with potential tumor suppressor functions. Globally, cadherin/Wnt signaling, extracellular matrix and cell cycle checkpoint resulted the most affected functional pathways. Sequencing results were finally combined with gene expression data to better elucidate the biological relevance of mutated genes. This study represents the first whole-exome sequencing screen of pPCL and evidenced a remarkable genetic heterogeneity of mutational patterns. This may provide a contribution to the comprehension of the pathogenetic mechanisms associated with this aggressive form of PC dyscrasia and potentially with high-risk MM.

  7. Finding the missing honey bee genes: lessons learned from a genome upgrade

    PubMed Central

    2014-01-01

    Background The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Results Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Conclusions Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination. PMID:24479613

  8. The salt-responsive transcriptome of chickpea roots and nodules via deepSuperSAGE

    PubMed Central

    2011-01-01

    Background The combination of high-throughput transcript profiling and next-generation sequencing technologies is a prerequisite for genome-wide comprehensive transcriptome analysis. Our recent innovation of deepSuperSAGE is based on an advanced SuperSAGE protocol and its combination with massively parallel pyrosequencing on Roche's 454 sequencing platform. As a demonstration of the power of this combination, we have chosen the salt stress transcriptomes of roots and nodules of the third most important legume crop chickpea (Cicer arietinum L.). While our report is more technology-oriented, it nevertheless addresses a major world-wide problem for crops generally: high salinity. Together with low temperatures and water stress, high salinity is responsible for crop losses of millions of tons of various legume (and other) crops. Continuously deteriorating environmental conditions will combine with salinity stress to further compromise crop yields. As a good example for such stress-exposed crop plants, we started to characterize salt stress responses of chickpeas on the transcriptome level. Results We used deepSuperSAGE to detect early global transcriptome changes in salt-stressed chickpea. The salt stress responses of 86,919 transcripts representing 17,918 unique 26 bp deepSuperSAGE tags (UniTags) from roots of the salt-tolerant variety INRAT-93 two hours after treatment with 25 mM NaCl were characterized. Additionally, the expression of 57,281 transcripts representing 13,115 UniTags was monitored in nodules of the same plants. From a total of 144,200 analyzed 26 bp tags in roots and nodules together, 21,401 unique transcripts were identified. Of these, only 363 and 106 specific transcripts, respectively, were commonly up- or down-regulated (>3.0-fold) under salt stress in both organs, witnessing a differential organ-specific response to stress. Profiting from recent pioneer works on massive cDNA sequencing in chickpea, more than 9,400 UniTags were able to be linked to UniProt entries. Additionally, gene ontology (GO) categories over-representation analysis enabled to filter out enriched biological processes among the differentially expressed UniTags. Subsequently, the gathered information was further cross-checked with stress-related pathways. From several filtered pathways, here we focus exemplarily on transcripts associated with the generation and scavenging of reactive oxygen species (ROS), as well as on transcripts involved in Na+ homeostasis. Although both processes are already very well characterized in other plants, the information generated in the present work is of high value. Information on expression profiles and sequence similarity for several hundreds of transcripts of potential interest is now available. Conclusions This report demonstrates, that the combination of the high-throughput transcriptome profiling technology SuperSAGE with one of the next-generation sequencing platforms allows deep insights into the first molecular reactions of a plant exposed to salinity. Cross validation with recent reports enriched the information about the salt stress dynamics of more than 9,000 chickpea ESTs, and enlarged their pool of alternative transcripts isoforms. As an example for the high resolution of the employed technology that we coin deepSuperSAGE, we demonstrate that ROS-scavenging and -generating pathways undergo strong global transcriptome changes in chickpea roots and nodules already 2 hours after onset of moderate salt stress (25 mM NaCl). Additionally, a set of more than 15 candidate transcripts are proposed to be potential components of the salt overly sensitive (SOS) pathway in chickpea. Newly identified transcript isoforms are potential targets for breeding novel cultivars with high salinity tolerance. We demonstrate that these targets can be integrated into breeding schemes by micro-arrays and RT-PCR assays downstream of the generation of 26 bp tags by SuperSAGE. PMID:21320317

  9. The salt-responsive transcriptome of chickpea roots and nodules via deepSuperSAGE.

    PubMed

    Molina, Carlos; Zaman-Allah, Mainassara; Khan, Faheema; Fatnassi, Nadia; Horres, Ralf; Rotter, Björn; Steinhauer, Diana; Amenc, Laurie; Drevon, Jean-Jacques; Winter, Peter; Kahl, Günter

    2011-02-14

    The combination of high-throughput transcript profiling and next-generation sequencing technologies is a prerequisite for genome-wide comprehensive transcriptome analysis. Our recent innovation of deepSuperSAGE is based on an advanced SuperSAGE protocol and its combination with massively parallel pyrosequencing on Roche's 454 sequencing platform. As a demonstration of the power of this combination, we have chosen the salt stress transcriptomes of roots and nodules of the third most important legume crop chickpea (Cicer arietinum L.). While our report is more technology-oriented, it nevertheless addresses a major world-wide problem for crops generally: high salinity. Together with low temperatures and water stress, high salinity is responsible for crop losses of millions of tons of various legume (and other) crops. Continuously deteriorating environmental conditions will combine with salinity stress to further compromise crop yields. As a good example for such stress-exposed crop plants, we started to characterize salt stress responses of chickpeas on the transcriptome level. We used deepSuperSAGE to detect early global transcriptome changes in salt-stressed chickpea. The salt stress responses of 86,919 transcripts representing 17,918 unique 26 bp deepSuperSAGE tags (UniTags) from roots of the salt-tolerant variety INRAT-93 two hours after treatment with 25 mM NaCl were characterized. Additionally, the expression of 57,281 transcripts representing 13,115 UniTags was monitored in nodules of the same plants. From a total of 144,200 analyzed 26 bp tags in roots and nodules together, 21,401 unique transcripts were identified. Of these, only 363 and 106 specific transcripts, respectively, were commonly up- or down-regulated (>3.0-fold) under salt stress in both organs, witnessing a differential organ-specific response to stress.Profiting from recent pioneer works on massive cDNA sequencing in chickpea, more than 9,400 UniTags were able to be linked to UniProt entries. Additionally, gene ontology (GO) categories over-representation analysis enabled to filter out enriched biological processes among the differentially expressed UniTags. Subsequently, the gathered information was further cross-checked with stress-related pathways. From several filtered pathways, here we focus exemplarily on transcripts associated with the generation and scavenging of reactive oxygen species (ROS), as well as on transcripts involved in Na+ homeostasis. Although both processes are already very well characterized in other plants, the information generated in the present work is of high value. Information on expression profiles and sequence similarity for several hundreds of transcripts of potential interest is now available. This report demonstrates, that the combination of the high-throughput transcriptome profiling technology SuperSAGE with one of the next-generation sequencing platforms allows deep insights into the first molecular reactions of a plant exposed to salinity. Cross validation with recent reports enriched the information about the salt stress dynamics of more than 9,000 chickpea ESTs, and enlarged their pool of alternative transcripts isoforms. As an example for the high resolution of the employed technology that we coin deepSuperSAGE, we demonstrate that ROS-scavenging and -generating pathways undergo strong global transcriptome changes in chickpea roots and nodules already 2 hours after onset of moderate salt stress (25 mM NaCl). Additionally, a set of more than 15 candidate transcripts are proposed to be potential components of the salt overly sensitive (SOS) pathway in chickpea. Newly identified transcript isoforms are potential targets for breeding novel cultivars with high salinity tolerance. We demonstrate that these targets can be integrated into breeding schemes by micro-arrays and RT-PCR assays downstream of the generation of 26 bp tags by SuperSAGE.

  10. Mitochondrial genome sequencing reveals potential origins of the scabies mite Sarcoptes scabiei infesting two iconic Australian marsupials.

    PubMed

    Fraser, Tamieka A; Shao, Renfu; Fountain-Jones, Nicholas M; Charleston, Michael; Martin, Alynn; Whiteley, Pam; Holme, Roz; Carver, Scott; Polkinghorne, Adam

    2017-11-28

    Debilitating skin infestations caused by the mite, Sarcoptes scabiei, have a profound impact on human and animal health globally. In Australia, this impact is evident across different segments of Australian society, with a growing recognition that it can contribute to rapid declines of native Australian marsupials. Cross-host transmission has been suggested to play a significant role in the epidemiology and origin of mite infestations in different species but a chronic lack of genetic resources has made further inferences difficult. To investigate the origins and molecular epidemiology of S. scabiei in Australian wildlife, we sequenced the mitochondrial genomes of S. scabiei from diseased wombats (Vombatus ursinus) and koalas (Phascolarctos cinereus) spanning New South Wales, Victoria and Tasmania, and compared them with the recently sequenced mitochondrial genome sequences of S. scabiei from humans. We found unique S. scabiei haplotypes among individual wombat and koala hosts with high sequence similarity (99.1% - 100%). Phylogenetic analysis of near full-length mitochondrial genomes revealed three clades of S. scabiei (one human and two marsupial), with no apparent geographic or host species pattern, suggestive of multiple introductions. The availability of additional mitochondrial gene sequences also enabled a re-evaluation of a range of putative molecular markers of S. scabiei, revealing that cox1 is the most informative gene for molecular epidemiological investigations. Utilising this gene target, we provide additional evidence to support cross-host transmission between different animal hosts. Our results suggest a history of parasite invasion through colonisation of Australia from hosts across the globe and the potential for cross-host transmission being a common feature of the epidemiology of this neglected pathogen. If this is the case, comparable patterns may exist elsewhere in the 'New World'. This work provides a basis for expanded molecular studies into mange epidemiology in humans and animals in Australia and other geographic regions.

  11. A typing scheme for the honeybee pathogen Melissococcus plutonius allows detection of disease transmission events and a study of the distribution of variants.

    PubMed

    Haynes, Edward; Helgason, Thorunn; Young, J Peter W; Thwaites, Richard; Budge, Giles E

    2013-08-01

    Melissococcus plutonius is the bacterial pathogen that causes European Foulbrood of honeybees, a globally important honeybee brood disease. We have used next-generation sequencing to identify highly polymorphic regions in an otherwise genetically homogenous organism, and used these loci to create a modified MLST scheme. This synthesis of a proven typing scheme format with next-generation sequencing combines reliability and low costs with insights only available from high-throughput sequencing technologies. Using this scheme we show that the global distribution of M.plutonius variants is not uniform. We use the scheme in epidemiological studies to trace movements of infective material around England, insights that would have been impossible to confirm without the typing scheme. We also demonstrate the persistence of local variants over time. © 2013 Crown copyright. Reproduced with the permission of the Controller of Her Majesty's Stationary Office/Queen’s Printer for Scotland and Food and Environment Research Agency.

  12. Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation.

    PubMed

    Zhao, Shancen; Zheng, Pingping; Dong, Shanshan; Zhan, Xiangjiang; Wu, Qi; Guo, Xiaosen; Hu, Yibo; He, Weiming; Zhang, Shanning; Fan, Wei; Zhu, Lifeng; Li, Dong; Zhang, Xuemei; Chen, Quan; Zhang, Hemin; Zhang, Zhihe; Jin, Xuelin; Zhang, Jinguo; Yang, Huanming; Wang, Jian; Wang, Jun; Wei, Fuwen

    2013-01-01

    The panda lineage dates back to the late Miocene and ultimately leads to only one extant species, the giant panda (Ailuropoda melanoleuca). Although global climate change and anthropogenic disturbances are recognized to shape animal population demography their contribution to panda population dynamics remains largely unknown. We sequenced the whole genomes of 34 pandas at an average 4.7-fold coverage and used this data set together with the previously deep-sequenced panda genome to reconstruct a continuous demographic history of pandas from their origin to the present. We identify two population expansions, two bottlenecks and two divergences. Evidence indicated that, whereas global changes in climate were the primary drivers of population fluctuation for millions of years, human activities likely underlie recent population divergence and serious decline. We identified three distinct panda populations that show genetic adaptation to their environments. However, in all three populations, anthropogenic activities have negatively affected pandas for 3,000 years.

  13. The Global Statistical Response of the Outer Radiation Belt During Geomagnetic Storms

    NASA Astrophysics Data System (ADS)

    Murphy, K. R.; Watt, C. E. J.; Mann, I. R.; Jonathan Rae, I.; Sibeck, D. G.; Boyd, A. J.; Forsyth, C. F.; Turner, D. L.; Claudepierre, S. G.; Baker, D. N.; Spence, H. E.; Reeves, G. D.; Blake, J. B.; Fennell, J.

    2018-05-01

    Using the total radiation belt electron content calculated from Van Allen Probe phase space density, the time-dependent and global response of the outer radiation belt during storms is statistically studied. Using phase space density reduces the impacts of adiabatic changes in the main phase, allowing a separation of adiabatic and nonadiabatic effects and revealing a clear modality and repeatable sequence of events in storm time radiation belt electron dynamics. This sequence exhibits an important first adiabatic invariant (μ)-dependent behavior in the seed (150 MeV/G), relativistic (1,000 MeV/G), and ultrarelativistic (4,000 MeV/G) populations. The outer radiation belt statistically shows an initial phase dominated by loss followed by a second phase of rapid acceleration, while the seed population shows little loss and immediate enhancement. The time sequence of the transition to the acceleration is also strongly μ dependent and occurs at low μ first, appearing to be repeatable from storm to storm.

  14. Information theory-based algorithm for in silico prediction of PCR products with whole genomic sequences as templates.

    PubMed

    Cao, Youfang; Wang, Lianjie; Xu, Kexue; Kou, Chunhai; Zhang, Yulei; Wei, Guifang; He, Junjian; Wang, Yunfang; Zhao, Liping

    2005-07-26

    A new algorithm for assessing similarity between primer and template has been developed based on the hypothesis that annealing of primer to template is an information transfer process. Primer sequence is converted to a vector of the full potential hydrogen numbers (3 for G or C, 2 for A or T), while template sequence is converted to a vector of the actual hydrogen bond numbers formed after primer annealing. The former is considered as source information and the latter destination information. An information coefficient is calculated as a measure for fidelity of this information transfer process and thus a measure of similarity between primer and potential annealing site on template. Successful prediction of PCR products from whole genomic sequences with a computer program based on the algorithm demonstrated the potential of this new algorithm in areas like in silico PCR and gene finding.

  15. Single-trial decoding of auditory novelty responses facilitates the detection of residual consciousness

    PubMed Central

    King, J.R.; Faugeras, F.; Gramfort, A.; Schurger, A.; El Karoui, I.; Sitt, J.D.; Rohaut, B.; Wacongne, C.; Labyt, E.; Bekinschtein, T.; Cohen, L.; Naccache, L.; Dehaene, S.

    2017-01-01

    Detecting residual consciousness in unresponsive patients is a major clinical concern and a challenge for theoretical neuroscience. To tackle this issue, we recently designed a paradigm that dissociates two electro-encephalographic (EEG) responses to auditory novelty. Whereas a local change in pitch automatically elicits a mismatch negativity (MMN), a change in global sound sequence leads to a late P300b response. The latter component is thought to be present only when subjects consciously perceive the global novelty. Unfortunately, it can be difficult to detect because individual variability is high, especially in clinical recordings. Here, we show that multivariate pattern classifiers can extract subject-specific EEG patterns and predict single-trial local or global novelty responses. We first validate our method with 38 high-density EEG, MEG and intracranial EEG recordings. We empirically demonstrate that our approach circumvents the issues associated with multiple comparisons and individual variability while improving the statistics. Moreover, we confirm in control subjects that local responses are robust to distraction whereas global responses depend on attention. We then investigate 104 vegetative state (VS), minimally conscious state (MCS) and conscious state (CS) patients recorded with high-density EEG. For the local response, the proportion of significant decoding scores (M = 60%) does not vary with the state of consciousness. By contrast, for the global response, only 14% of the VS patients' EEG recordings presented a significant effect, compared to 31% in MCS patients' and 52% in CS patients'. In conclusion, single-trial multivariate decoding of novelty responses provides valuable information in non-communicating patients and paves the way towards real-time monitoring of the state of consciousness. PMID:23859924

  16. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    PubMed

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  17. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

    PubMed Central

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

    2015-01-01

    Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745

  18. Niche specialization of terrestrial archaeal ammonia oxidizers.

    PubMed

    Gubry-Rangin, Cécile; Hai, Brigitte; Quince, Christopher; Engel, Marion; Thomson, Bruce C; James, Phillip; Schloter, Michael; Griffiths, Robert I; Prosser, James I; Nicol, Graeme W

    2011-12-27

    Soil pH is a major determinant of microbial ecosystem processes and potentially a major driver of evolution, adaptation, and diversity of ammonia oxidizers, which control soil nitrification. Archaea are major components of soil microbial communities and contribute significantly to ammonia oxidation in some soils. To determine whether pH drives evolutionary adaptation and community structure of soil archaeal ammonia oxidizers, sequences of amoA, a key functional gene of ammonia oxidation, were examined in soils at global, regional, and local scales. Globally distributed database sequences clustered into 18 well-supported phylogenetic lineages that dominated specific soil pH ranges classified as acidic (pH <5), acido-neutral (5 ≤ pH <7), or alkalinophilic (pH ≥ 7). To determine whether patterns were reproduced at regional and local scales, amoA gene fragments were amplified from DNA extracted from 47 soils in the United Kingdom (pH 3.5-8.7), including a pH-gradient formed by seven soils at a single site (pH 4.5-7.5). High-throughput sequencing and analysis of amoA gene fragments identified an additional, previously undiscovered phylogenetic lineage and revealed similar pH-associated distribution patterns at global, regional, and local scales, which were most evident for the five most abundant clusters. Archaeal amoA abundance and diversity increased with soil pH, which was the only physicochemical characteristic measured that significantly influenced community structure. These results suggest evolution based on specific adaptations to soil pH and niche specialization, resulting in a global distribution of archaeal lineages that have important consequences for soil ecosystem function and nitrogen cycling.

  19. RNA-Seq for Bacterial Gene Expression.

    PubMed

    Poulsen, Line Dahl; Vinther, Jeppe

    2018-06-01

    RNA sequencing (RNA-seq) has become the preferred method for global quantification of bacterial gene expression. With the continued improvements in sequencing technology and data analysis tools, the most labor-intensive and expensive part of an RNA-seq experiment is the preparation of sequencing libraries, which is also essential for the quality of the data obtained. Here, we present a straightforward and inexpensive basic protocol for preparation of strand-specific RNA-seq libraries from bacterial RNA as well as a computational pipeline for the data analysis of sequencing reads. The protocol is based on the Illumina platform and allows easy multiplexing of samples and the removal of sequencing reads that are PCR duplicates. © 2018 by John Wiley & Sons, Inc. © 2018 John Wiley & Sons, Inc.

  20. The tendency of unconscious thought toward global processing style.

    PubMed

    Li, Jiansheng; Wang, Fan; Shen, Mowei; Fan, Gang

    2017-08-01

    This study explored whether unconscious thought has a tendency to process information globally. In three experiments, a Navon task was used to activate global or local processing styles. Findings showed that in the unconscious-thought groups, those performing the local Navon task presented a poorer decision-making performance when compared to those performing the global Navon task (Experiment 1); participants reported that their judgments were made based on partial attributes (Experiment 2), and evaluated a target individual mainly based on information consistent with stereotypes (Experiment 3). These results showed that when presented with distracter tasks, conscious thought activates local processing, which impairs its ability to process information globally. However, this impairment would not happen if global processing were activated instead. This study provides support to the idea that unconscious thought has a tendency to process information globally. Copyright © 2017 Elsevier Inc. All rights reserved.

Top