Sample records for large-scale sequencing efforts

  1. Filling Gaps in Biodiversity Knowledge for Macrofungi: Contributions and Assessment of an Herbarium Collection DNA Barcode Sequencing Project

    PubMed Central

    Osmundson, Todd W.; Robert, Vincent A.; Schoch, Conrad L.; Baker, Lydia J.; Smith, Amy; Robich, Giovanni; Mizzan, Luca; Garbelotto, Matteo M.

    2013-01-01

    Despite recent advances spearheaded by molecular approaches and novel technologies, species description and DNA sequence information are significantly lagging for fungi compared to many other groups of organisms. Large scale sequencing of vouchered herbarium material can aid in closing this gap. Here, we describe an effort to obtain broad ITS sequence coverage of the approximately 6000 macrofungal-species-rich herbarium of the Museum of Natural History in Venice, Italy. Our goals were to investigate issues related to large sequencing projects, develop heuristic methods for assessing the overall performance of such a project, and evaluate the prospects of such efforts to reduce the current gap in fungal biodiversity knowledge. The effort generated 1107 sequences submitted to GenBank, including 416 previously unrepresented taxa and 398 sequences exhibiting a best BLAST match to an unidentified environmental sequence. Specimen age and taxon affected sequencing success, and subsequent work on failed specimens showed that an ITS1 mini-barcode greatly increased sequencing success without greatly reducing the discriminating power of the barcode. Similarity comparisons and nonmetric multidimensional scaling ordinations based on pairwise distance matrices proved to be useful heuristic tools for validating the overall accuracy of specimen identifications, flagging potential misidentifications, and identifying taxa in need of additional species-level revision. Comparison of within- and among-species nucleotide variation showed a strong increase in species discriminating power at 1–2% dissimilarity, and identified potential barcoding issues (same sequence for different species and vice-versa). All sequences are linked to a vouchered specimen, and results from this study have already prompted revisions of species-sequence assignments in several taxa. PMID:23638077

  2. Filling gaps in biodiversity knowledge for macrofungi: contributions and assessment of an herbarium collection DNA barcode sequencing project.

    PubMed

    Osmundson, Todd W; Robert, Vincent A; Schoch, Conrad L; Baker, Lydia J; Smith, Amy; Robich, Giovanni; Mizzan, Luca; Garbelotto, Matteo M

    2013-01-01

    Despite recent advances spearheaded by molecular approaches and novel technologies, species description and DNA sequence information are significantly lagging for fungi compared to many other groups of organisms. Large scale sequencing of vouchered herbarium material can aid in closing this gap. Here, we describe an effort to obtain broad ITS sequence coverage of the approximately 6000 macrofungal-species-rich herbarium of the Museum of Natural History in Venice, Italy. Our goals were to investigate issues related to large sequencing projects, develop heuristic methods for assessing the overall performance of such a project, and evaluate the prospects of such efforts to reduce the current gap in fungal biodiversity knowledge. The effort generated 1107 sequences submitted to GenBank, including 416 previously unrepresented taxa and 398 sequences exhibiting a best BLAST match to an unidentified environmental sequence. Specimen age and taxon affected sequencing success, and subsequent work on failed specimens showed that an ITS1 mini-barcode greatly increased sequencing success without greatly reducing the discriminating power of the barcode. Similarity comparisons and nonmetric multidimensional scaling ordinations based on pairwise distance matrices proved to be useful heuristic tools for validating the overall accuracy of specimen identifications, flagging potential misidentifications, and identifying taxa in need of additional species-level revision. Comparison of within- and among-species nucleotide variation showed a strong increase in species discriminating power at 1-2% dissimilarity, and identified potential barcoding issues (same sequence for different species and vice-versa). All sequences are linked to a vouchered specimen, and results from this study have already prompted revisions of species-sequence assignments in several taxa.

  3. About TCGA - TCGA

    Cancer.gov

    Find out about The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.

  4. Home - The Cancer Genome Atlas - Cancer Genome - TCGA

    Cancer.gov

    The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.

  5. HASP server: a database and structural visualization platform for comparative models of influenza A hemagglutinin proteins.

    PubMed

    Ambroggio, Xavier I; Dommer, Jennifer; Gopalan, Vivek; Dunham, Eleca J; Taubenberger, Jeffery K; Hurt, Darrell E

    2013-06-18

    Influenza A viruses possess RNA genomes that mutate frequently in response to immune pressures. The mutations in the hemagglutinin genes are particularly significant, as the hemagglutinin proteins mediate attachment and fusion to host cells, thereby influencing viral pathogenicity and species specificity. Large-scale influenza A genome sequencing efforts have been ongoing to understand past epidemics and pandemics and anticipate future outbreaks. Sequencing efforts thus far have generated nearly 9,000 distinct hemagglutinin amino acid sequences. Comparative models for all publicly available influenza A hemagglutinin protein sequences (8,769 to date) were generated using the Rosetta modeling suite. The C-alpha root mean square deviations between a randomly chosen test set of models and their crystallographic templates were less than 2 Å, suggesting that the modeling protocols yielded high-quality results. The models were compiled into an online resource, the Hemagglutinin Structure Prediction (HASP) server. The HASP server was designed as a scientific tool for researchers to visualize hemagglutinin protein sequences of interest in a three-dimensional context. With a built-in molecular viewer, hemagglutinin models can be compared side-by-side and navigated by a corresponding sequence alignment. The models and alignments can be downloaded for offline use and further analysis. The modeling protocols used in the HASP server scale well for large amounts of sequences and will keep pace with expanded sequencing efforts. The conservative approach to modeling and the intuitive search and visualization interfaces allow researchers to quickly analyze hemagglutinin sequences of interest in the context of the most highly related experimental structures, and allow them to directly compare hemagglutinin sequences to each other simultaneously in their two- and three-dimensional contexts. The models and methodology have shown utility in current research efforts and the ongoing aim of the HASP server is to continue to accelerate influenza A research and have a positive impact on global public health.

  6. Report on the Human Genome Initiative for the Office of Health and Environmental Research

    DOE R&D Accomplishments Database

    Tinoco, I.; Cahill, G.; Cantor, C.; Caskey, T.; Dulbecco, R.; Engelhardt, D. L.; Hood, L.; Lerman, L. S.; Mendelsohn, M. L.; Sinsheimer, R. L.; Smith, T.; Soll, D.; Stormo, G.; White, R. L.

    1987-04-01

    The report urges DOE and the Nation to commit to a large, multi-year, multidisciplinary, technological undertaking to order and sequence the human genome. This effort will first require significant innovation in general capability to manipulate DNA, major new analytical methods for ordering and sequencing, theoretical developments in computer science and mathematical biology, and great expansions in our ability to store and manipulate the information and to interface it with other large and diverse genetic databases. The actual ordering and sequencing involves the coordinated processing of some 3 billion bases from a reference human genome. Science is poised on the rudimentary edge of being able to read and understand human genes. A concerted, broadly based, scientific effort to provide new methods of sufficient power and scale should transform this activity from an inefficient one-gene-at-a-time, single laboratory effort into a coordinated, worldwide, comprehensive reading of "the book of man". The effort will be extraordinary in scope and magnitude, but so will be the benefit to biological understanding, new technology and the diagnosis and treatment of human disease.

  7. Study designs for identification of rare disease variants in complex diseases: the utility of family-based designs.

    PubMed

    Ionita-Laza, Iuliana; Ottman, Ruth

    2011-11-01

    The recent progress in sequencing technologies makes possible large-scale medical sequencing efforts to assess the importance of rare variants in complex diseases. The results of such efforts depend heavily on the use of efficient study designs and analytical methods. We introduce here a unified framework for association testing of rare variants in family-based designs or designs based on unselected affected individuals. This framework allows us to quantify the enrichment in rare disease variants in families containing multiple affected individuals and to investigate the optimal design of studies aiming to identify rare disease variants in complex traits. We show that for many complex diseases with small values for the overall sibling recurrence risk ratio, such as Alzheimer's disease and most cancers, sequencing affected individuals with a positive family history of the disease can be extremely advantageous for identifying rare disease variants. In contrast, for complex diseases with large values of the sibling recurrence risk ratio, sequencing unselected affected individuals may be preferable.

  8. Elimination sequence optimization for SPAR

    NASA Technical Reports Server (NTRS)

    Hogan, Harry A.

    1986-01-01

    SPAR is a large-scale computer program for finite element structural analysis. The program allows user specification of the order in which the joints of a structure are to be eliminated since this order can have significant influence over solution performance, in terms of both storage requirements and computer time. An efficient elimination sequence can improve performance by over 50% for some problems. Obtaining such sequences, however, requires the expertise of an experienced user and can take hours of tedious effort to affect. Thus, an automatic elimination sequence optimizer would enhance productivity by reducing the analysts' problem definition time and by lowering computer costs. Two possible methods for automating the elimination sequence specifications were examined. Several algorithms based on the graph theory representations of sparse matrices were studied with mixed results. Significant improvement in the program performance was achieved, but sequencing by an experienced user still yields substantially better results. The initial results provide encouraging evidence that the potential benefits of such an automatic sequencer would be well worth the effort.

  9. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Margaret Riley; Merry Buckley

    2009-01-01

    Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencingmore » is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin, since not only are their genomes available, but they are also accompanied by data on environment and physiology that can be used to understand the resulting data. As single cell isolation methods improve, there should be a shift toward incorporating uncultured organisms and communities into this effort. Efforts to sequence cultivated isolates should target characterized isolates from culture collections for which biochemical data are available, as well as other cultures of lasting value from personal collections. The genomes of type strains should be among the first targets for sequencing, but creative culture methods, novel cell isolation, and sorting methods would all be helpful in obtaining organisms we have not yet been able to cultivate for sequencing. The data that should be provided for strains targeted for sequencing will depend on the phylogenetic context of the organism and the amount of information available about its nearest relatives. Annotation is an important part of transforming genome sequences into useful resources, but it represents the most significant bottleneck to the field of comparative genomics right now and must be addressed. Furthermore, there is a need for more consistency in both annotation and achieving annotation data. As new annotation tools become available over time, re-annotation of genomes should be implemented, taking advantage of advancements in annotation techniques in order to capitalize on the genome sequences and increase both the societal and scientific benefit of genomics work. Given the proper resources, the knowledge and ability exist to be able to select model systems, some simple, some less so, and dissect them so that we may understand the processes and interactions at work in them. Colloquium participants suggest a five-pronged, coordinated initiative to exhaustively describe six different microbial ecosystems, designed to describe all the gene diversity, across genomes. In this effort, sequencing should be complemented by other experimental data, particularly transcriptomics and metabolomics data, all of which should be gathered and curated continuously. Systematic genomics efforts like the ones outlined in this document would significantly broaden our view of biological diversity and have major effects on science. This has to be backed up with examples. Considering these potential impacts and the need for acquiescence from both the public and scientists to get such projects funded and functioning, education and training will be crucial. New collaborations within the scientific community will also be necessary.« less

  10. Beyond Linear Sequence Comparisons: The use of genome-levelcharacters for phylogenetic reconstruction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boore, Jeffrey L.

    2004-11-27

    Although the phylogenetic relationships of many organisms have been convincingly resolved by the comparisons of nucleotide or amino acid sequences, others have remained equivocal despite great effort. Now that large-scale genome sequencing projects are sampling many lineages, it is becoming feasible to compare large data sets of genome-level features and to develop this as a tool for phylogenetic reconstruction that has advantages over conventional sequence comparisons. Although it is unlikely that these will address a large number of evolutionary branch points across the broad tree of life due to the infeasibility of such sampling, they have great potential for convincinglymore » resolving many critical, contested relationships for which no other data seems promising. However, it is important that we recognize potential pitfalls, establish reasonable standards for acceptance, and employ rigorous methodology to guard against a return to earlier days of scenario-driven evolutionary reconstructions.« less

  11. TaqMan Real-Time PCR Assays To Assess Arbuscular Mycorrhizal Responses to Field Manipulation of Grassland Biodiversity: Effects of Soil Characteristics, Plant Species Richness, and Functional Traits▿ †

    PubMed Central

    König, Stephan; Wubet, Tesfaye; Dormann, Carsten F.; Hempel, Stefan; Renker, Carsten; Buscot, François

    2010-01-01

    Large-scale (temporal and/or spatial) molecular investigations of the diversity and distribution of arbuscular mycorrhizal fungi (AMF) require considerable sampling efforts and high-throughput analysis. To facilitate such efforts, we have developed a TaqMan real-time PCR assay to detect and identify AMF in environmental samples. First, we screened the diversity in clone libraries, generated by nested PCR, of the nuclear ribosomal DNA internal transcribed spacer (ITS) of AMF in environmental samples. We then generated probes and forward primers based on the detected sequences, enabling AMF sequence type-specific detection in TaqMan multiplex real-time PCR assays. In comparisons to conventional clone library screening and Sanger sequencing, the TaqMan assay approach provided similar accuracy but higher sensitivity with cost and time savings. The TaqMan assays were applied to analyze the AMF community composition within plots of a large-scale plant biodiversity manipulation experiment, the Jena Experiment, primarily designed to investigate the interactive effects of plant biodiversity on element cycling and trophic interactions. The results show that environmental variables hierarchically shape AMF communities and that the sequence type spectrum is strongly affected by previous land use and disturbance, which appears to favor disturbance-tolerant members of the genus Glomus. The AMF species richness of disturbance-associated communities can be largely explained by richness of plant species and plant functional groups, while plant productivity and soil parameters appear to have only weak effects on the AMF community. PMID:20418424

  12. Comparative immunogenomics of molluscs.

    PubMed

    Schultz, Jonathan H; Adema, Coen M

    2017-10-01

    Comparative immunology, studying both vertebrates and invertebrates, provided the earliest descriptions of phagocytosis as a general immune mechanism. However, the large scale of animal diversity challenges all-inclusive investigations and the field of immunology has developed by mostly emphasizing study of a few vertebrate species. In addressing the lack of comprehensive understanding of animal immunity, especially that of invertebrates, comparative immunology helps toward management of invertebrates that are food sources, agricultural pests, pathogens, or transmit diseases, and helps interpret the evolution of animal immunity. Initial studies showed that the Mollusca (second largest animal phylum), and invertebrates in general, possess innate defenses but lack the lymphocytic immune system that characterizes vertebrate immunology. Recognizing the reality of both common and taxon-specific immune features, and applying up-to-date cell and molecular research capabilities, in-depth studies of a select number of bivalve and gastropod species continue to reveal novel aspects of molluscan immunity. The genomics era heralded a new stage of comparative immunology; large-scale efforts yielded an initial set of full molluscan genome sequences that is available for analyses of full complements of immune genes and regulatory sequences. Next-generation sequencing (NGS), due to lower cost and effort required, allows individual researchers to generate large sequence datasets for growing numbers of molluscs. RNAseq provides expression profiles that enable discovery of immune genes and genome sequences reveal distribution and diversity of immune factors across molluscan phylogeny. Although computational de novo sequence assembly will benefit from continued development and automated annotation may require some experimental validation, NGS is a powerful tool for comparative immunology, especially increasing coverage of the extensive molluscan diversity. To date, immunogenomics revealed new levels of complexity of molluscan defense by indicating sequence heterogeneity in individual snails and bivalves, and members of expanded immune gene families are expressed differentially to generate pathogen-specific defense responses. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. BAC sequencing using pooled methods.

    PubMed

    Saski, Christopher A; Feltus, F Alex; Parida, Laxmi; Haiminen, Niina

    2015-01-01

    Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.

  14. PASTA for Proteins.

    PubMed

    Collins, Kodi; Warnow, Tandy

    2018-06-19

    PASTA is a multiple sequence method that uses divide-and-conquer plus iteration to enable base alignment methods to scale with high accuracy to large sequence datasets. By default, PASTA included MAFFT L-INS-i; our new extension of PASTA enables the use of MAFFT G-INS-i, MAFFT Homologs, CONTRAlign, and ProbCons. We analyzed the performance of each base method and PASTA using these base methods on 224 datasets from BAliBASE 4 with at least 50 sequences. We show that PASTA enables the most accurate base methods to scale to larger datasets at reduced computational effort, and generally improves alignment and tree accuracy on the largest BAliBASE datasets. PASTA is available at https://github.com/kodicollins/pasta and has also been integrated into the original PASTA repository at https://github.com/smirarab/pasta. Supplementary data are available at Bioinformatics online.

  15. Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative

    PubMed Central

    Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C.; Fiser, Andras

    2014-01-01

    The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins—including proteins for which reliable homology models can be generated—on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long. PMID:24567391

  16. Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative.

    PubMed

    Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C; Fiser, Andras

    2014-03-11

    The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins--including proteins for which reliable homology models can be generated--on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.

  17. Megabase sequencing of human genome by ordered-shotgun-sequencing (OSS) strategy

    NASA Astrophysics Data System (ADS)

    Chen, Ellson Y.

    1997-05-01

    So far we have used OSS strategy to sequence over 2 megabases DNA in large-insert clones from regions of human X chromosomes with different characteristic levels of GC content. The method starts by randomly fragmenting a BAC, YAC or PAC to 8-12 kb pieces and subcloning those into lambda phage. Insert-ends of these clones are sequenced and overlapped to create a partial map. Complete sequencing is then done on a minimal tiling path of selected subclones, recursively focusing on those at the edges of contigs to facilitate mergers of clones across the entire target. To reduce manual labor, PCR processes have been adapted to prepare sequencing templates throughout the entire operation. The streamlined process can thus lend itself to further automation. The OSS approach is suitable for large- scale genomic sequencing, providing considerable flexibility in the choice of subclones or regions for more or less intensive sequencing. For example, subclones containing contaminating host cell DNA or cloning vector can be recognized and ignored with minimal sequencing effort; regions overlapping a neighboring clone already sequenced need not be redone; and segments containing tandem repeats or long repetitive sequences can be spotted early on and targeted for additional attention.

  18. TRDistiller: a rapid filter for enrichment of sequence datasets with proteins containing tandem repeats.

    PubMed

    Richard, François D; Kajava, Andrey V

    2014-06-01

    The dramatic growth of sequencing data evokes an urgent need to improve bioinformatics tools for large-scale proteome analysis. Over the last two decades, the foremost efforts of computer scientists were devoted to proteins with aperiodic sequences having globular 3D structures. However, a large portion of proteins contain periodic sequences representing arrays of repeats that are directly adjacent to each other (so called tandem repeats or TRs). These proteins frequently fold into elongated fibrous structures carrying different fundamental functions. Algorithms specific to the analysis of these regions are urgently required since the conventional approaches developed for globular domains have had limited success when applied to the TR regions. The protein TRs are frequently not perfect, containing a number of mutations, and some of them cannot be easily identified. To detect such "hidden" repeats several algorithms have been developed. However, the most sensitive among them are time-consuming and, therefore, inappropriate for large scale proteome analysis. To speed up the TR detection we developed a rapid filter that is based on the comparison of composition and order of short strings in the adjacent sequence motifs. Tests show that our filter discards up to 22.5% of proteins which are known to be without TRs while keeping almost all (99.2%) TR-containing sequences. Thus, we are able to decrease the size of the initial sequence dataset enriching it with TR-containing proteins which allows a faster subsequent TR detection by other methods. The program is available upon request. Copyright © 2014 Elsevier Inc. All rights reserved.

  19. A confidence interval analysis of sampling effort, sequencing depth, and taxonomic resolution of fungal community ecology in the era of high-throughput sequencing.

    PubMed

    Oono, Ryoko

    2017-01-01

    High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions 'how and why are communities different?' This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences.

  20. A confidence interval analysis of sampling effort, sequencing depth, and taxonomic resolution of fungal community ecology in the era of high-throughput sequencing

    PubMed Central

    2017-01-01

    High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions ‘how and why are communities different?’ This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences. PMID:29253889

  1. Insights from Human/Mouse genome comparisons

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pennacchio, Len A.

    2003-03-30

    Large-scale public genomic sequencing efforts have provided a wealth of vertebrate sequence data poised to provide insights into mammalian biology. These include deep genomic sequence coverage of human, mouse, rat, zebrafish, and two pufferfish (Fugu rubripes and Tetraodon nigroviridis) (Aparicio et al. 2002; Lander et al. 2001; Venter et al. 2001; Waterston et al. 2002). In addition, a high-priority has been placed on determining the genomic sequence of chimpanzee, dog, cow, frog, and chicken (Boguski 2002). While only recently available, whole genome sequence data have provided the unique opportunity to globally compare complete genome contents. Furthermore, the shared evolutionary ancestrymore » of vertebrate species has allowed the development of comparative genomic approaches to identify ancient conserved sequences with functionality. Accordingly, this review focuses on the initial comparison of available mammalian genomes and describes various insights derived from such analysis.« less

  2. Applications of species accumulation curves in large-scale biological data analysis.

    PubMed

    Deng, Chao; Daley, Timothy; Smith, Andrew D

    2015-09-01

    The species accumulation curve, or collector's curve, of a population gives the expected number of observed species or distinct classes as a function of sampling effort. Species accumulation curves allow researchers to assess and compare diversity across populations or to evaluate the benefits of additional sampling. Traditional applications have focused on ecological populations but emerging large-scale applications, for example in DNA sequencing, are orders of magnitude larger and present new challenges. We developed a method to estimate accumulation curves for predicting the complexity of DNA sequencing libraries. This method uses rational function approximations to a classical non-parametric empirical Bayes estimator due to Good and Toulmin [Biometrika, 1956, 43, 45-63]. Here we demonstrate how the same approach can be highly effective in other large-scale applications involving biological data sets. These include estimating microbial species richness, immune repertoire size, and k -mer diversity for genome assembly applications. We show how the method can be modified to address populations containing an effectively infinite number of species where saturation cannot practically be attained. We also introduce a flexible suite of tools implemented as an R package that make these methods broadly accessible.

  3. Applications of species accumulation curves in large-scale biological data analysis

    PubMed Central

    Deng, Chao; Daley, Timothy; Smith, Andrew D

    2016-01-01

    The species accumulation curve, or collector’s curve, of a population gives the expected number of observed species or distinct classes as a function of sampling effort. Species accumulation curves allow researchers to assess and compare diversity across populations or to evaluate the benefits of additional sampling. Traditional applications have focused on ecological populations but emerging large-scale applications, for example in DNA sequencing, are orders of magnitude larger and present new challenges. We developed a method to estimate accumulation curves for predicting the complexity of DNA sequencing libraries. This method uses rational function approximations to a classical non-parametric empirical Bayes estimator due to Good and Toulmin [Biometrika, 1956, 43, 45–63]. Here we demonstrate how the same approach can be highly effective in other large-scale applications involving biological data sets. These include estimating microbial species richness, immune repertoire size, and k-mer diversity for genome assembly applications. We show how the method can be modified to address populations containing an effectively infinite number of species where saturation cannot practically be attained. We also introduce a flexible suite of tools implemented as an R package that make these methods broadly accessible. PMID:27252899

  4. Genomic encyclopedia of bacteria and archaea: sequencing a myriad of type strains.

    PubMed

    Kyrpides, Nikos C; Hugenholtz, Philip; Eisen, Jonathan A; Woyke, Tanja; Göker, Markus; Parker, Charles T; Amann, Rudolf; Beck, Brian J; Chain, Patrick S G; Chun, Jongsik; Colwell, Rita R; Danchin, Antoine; Dawyndt, Peter; Dedeurwaerdere, Tom; DeLong, Edward F; Detter, John C; De Vos, Paul; Donohue, Timothy J; Dong, Xiu-Zhu; Ehrlich, Dusko S; Fraser, Claire; Gibbs, Richard; Gilbert, Jack; Gilna, Paul; Glöckner, Frank Oliver; Jansson, Janet K; Keasling, Jay D; Knight, Rob; Labeda, David; Lapidus, Alla; Lee, Jung-Sook; Li, Wen-Jun; Ma, Juncai; Markowitz, Victor; Moore, Edward R B; Morrison, Mark; Meyer, Folker; Nelson, Karen E; Ohkuma, Moriya; Ouzounis, Christos A; Pace, Norman; Parkhill, Julian; Qin, Nan; Rossello-Mora, Ramon; Sikorski, Johannes; Smith, David; Sogin, Mitch; Stevens, Rick; Stingl, Uli; Suzuki, Ken-Ichiro; Taylor, Dorothea; Tiedje, Jim M; Tindall, Brian; Wagner, Michael; Weinstock, George; Weissenbach, Jean; White, Owen; Wang, Jun; Zhang, Lixin; Zhou, Yu-Guang; Field, Dawn; Whitman, William B; Garrity, George M; Klenk, Hans-Peter

    2014-08-01

    Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently∼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.

  5. Genomic Encyclopedia of Bacteria and Archaea: Sequencing a Myriad of Type Strains

    PubMed Central

    Kyrpides, Nikos C.; Hugenholtz, Philip; Eisen, Jonathan A.; Woyke, Tanja; Göker, Markus; Parker, Charles T.; Amann, Rudolf; Beck, Brian J.; Chain, Patrick S. G.; Chun, Jongsik; Colwell, Rita R.; Danchin, Antoine; Dawyndt, Peter; Dedeurwaerdere, Tom; DeLong, Edward F.; Detter, John C.; De Vos, Paul; Donohue, Timothy J.; Dong, Xiu-Zhu; Ehrlich, Dusko S.; Fraser, Claire; Gibbs, Richard; Gilbert, Jack; Gilna, Paul; Glöckner, Frank Oliver; Jansson, Janet K.; Keasling, Jay D.; Knight, Rob; Labeda, David; Lapidus, Alla; Lee, Jung-Sook; Li, Wen-Jun; MA, Juncai; Markowitz, Victor; Moore, Edward R. B.; Morrison, Mark; Meyer, Folker; Nelson, Karen E.; Ohkuma, Moriya; Ouzounis, Christos A.; Pace, Norman; Parkhill, Julian; Qin, Nan; Rossello-Mora, Ramon; Sikorski, Johannes; Smith, David; Sogin, Mitch; Stevens, Rick; Stingl, Uli; Suzuki, Ken-ichiro; Taylor, Dorothea; Tiedje, Jim M.; Tindall, Brian; Wagner, Michael; Weinstock, George; Weissenbach, Jean; White, Owen; Wang, Jun; Zhang, Lixin; Zhou, Yu-Guang; Field, Dawn; Whitman, William B.; Garrity, George M.; Klenk, Hans-Peter

    2014-01-01

    Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently∼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment. PMID:25093819

  6. Stable isotope probing to study functional components of complex microbial ecosystems.

    PubMed

    Mazard, Sophie; Schäfer, Hendrik

    2014-01-01

    This protocol presents a method of dissecting the DNA or RNA of key organisms involved in a specific biochemical process within a complex ecosystem. Stable isotope probing (SIP) allows the labelling and separation of nucleic acids from community members that are involved in important biochemical transformations, yet are often not the most numerically abundant members of a community. This pure culture-independent technique circumvents limitations of traditional microbial isolation techniques or data mining from large-scale whole-community metagenomic studies to tease out the identities and genomic repertoires of microorganisms participating in biological nutrient cycles. SIP experiments can be applied to virtually any ecosystem and biochemical pathway under investigation provided a suitable stable isotope substrate is available. This versatile methodology allows a wide range of analyses to be performed, from fatty-acid analyses, community structure and ecology studies, and targeted metagenomics involving nucleic acid sequencing. SIP experiments provide an effective alternative to large-scale whole-community metagenomic studies by specifically targeting the organisms or biochemical transformations of interest, thereby reducing the sequencing effort and time-consuming bioinformatics analyses of large datasets.

  7. Back to the future: the human protein index (HPI) and the agenda for post-proteomic biology.

    PubMed

    Anderson, N G; Matheson, A; Anderson, N L

    2001-01-01

    The effort to produce an index of all human proteins (the human protein index, or HPI) began twenty years ago, before the initiation of the human genome program. Because DNA sequencing technology is inherently simpler and more scalable than protein analytical technology, and because the finiteness of genomes invited a spirit of rapid conquest, the notion of genome sequencing has displaced that of protein databases in the minds of most molecular biologists for the last decade. However, now that the human genome sequence is nearing completion, a major realignment is under way that brings proteins back to the center of biological thinking. Using an influx of new and improved protein technologies--from mass spectrometry to re-engineered two-dimensional (2-D) gel systems, the original objectives of the HPI have been expanded and the time frame for its execution radically shortened. Several additional large scale technology efforts flowing from the HPI are also described.

  8. A scalable double-barcode sequencing platform for characterization of dynamic protein-protein interactions.

    PubMed

    Schlecht, Ulrich; Liu, Zhimin; Blundell, Jamie R; St Onge, Robert P; Levy, Sasha F

    2017-05-25

    Several large-scale efforts have systematically catalogued protein-protein interactions (PPIs) of a cell in a single environment. However, little is known about how the protein interactome changes across environmental perturbations. Current technologies, which assay one PPI at a time, are too low throughput to make it practical to study protein interactome dynamics. Here, we develop a highly parallel protein-protein interaction sequencing (PPiSeq) platform that uses a novel double barcoding system in conjunction with the dihydrofolate reductase protein-fragment complementation assay in Saccharomyces cerevisiae. PPiSeq detects PPIs at a rate that is on par with current assays and, in contrast with current methods, quantitatively scores PPIs with enough accuracy and sensitivity to detect changes across environments. Both PPI scoring and the bulk of strain construction can be performed with cell pools, making the assay scalable and easily reproduced across environments. PPiSeq is therefore a powerful new tool for large-scale investigations of dynamic PPIs.

  9. Identification of Variant-Specific Functions of PIK3CA by Rapid Phenotyping of Rare Mutations | Office of Cancer Genomics

    Cancer.gov

    Large-scale sequencing efforts are uncovering the complexity of cancer genomes, which are composed of causal "driver" mutations that promote tumor progression along with many more pathologically neutral "passenger" events. The majority of mutations, both in known cancer drivers and uncharacterized genes, are generally of low occurrence, highlighting the need to functionally annotate the long tail of infrequent mutations present in heterogeneous cancers.

  10. The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika

    2010-01-27

    Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set ofmore » tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in EST library sequencing approaches, and thus represent a rich resource for studies of environmental genomics.« less

  11. A compact, in vivo screen of all 6-mers reveals drivers of tissue-specific expression and guides synthetic regulatory element design.

    PubMed

    Smith, Robin P; Riesenfeld, Samantha J; Holloway, Alisha K; Li, Qiang; Murphy, Karl K; Feliciano, Natalie M; Orecchia, Lorenzo; Oksenberg, Nir; Pollard, Katherine S; Ahituv, Nadav

    2013-07-18

    Large-scale annotation efforts have improved our ability to coarsely predict regulatory elements throughout vertebrate genomes. However, it is unclear how complex spatiotemporal patterns of gene expression driven by these elements emerge from the activity of short, transcription factor binding sequences. We describe a comprehensive promoter extension assay in which the regulatory potential of all 6 base-pair (bp) sequences was tested in the context of a minimal promoter. To enable this large-scale screen, we developed algorithms that use a reverse-complement aware decomposition of the de Bruijn graph to design a library of DNA oligomers incorporating every 6-bp sequence exactly once. Our library multiplexes all 4,096 unique 6-mers into 184 double-stranded 15-bp oligomers, which is sufficiently compact for in vivo testing. We injected each multiplexed construct into zebrafish embryos and scored GFP expression in 15 tissues at two developmental time points. Twenty-seven constructs produced consistent expression patterns, with the majority doing so in only one tissue. Functional sequences are enriched near biologically relevant genes, match motifs for developmental transcription factors, and are required for enhancer activity. By concatenating tissue-specific functional sequences, we generated completely synthetic enhancers for the notochord, epidermis, spinal cord, forebrain and otic lateral line, and show that short regulatory sequences do not always function modularly. This work introduces a unique in vivo catalog of short, functional regulatory sequences and demonstrates several important principles of regulatory element organization. Furthermore, we provide resources for designing compact, reverse-complement aware k-mer libraries.

  12. Genomic Encyclopedia of Bacteria and Archaea: Sequencing a Myriad of Type Strains

    DOE PAGES

    Kyrpides, Nikos C.; Hugenholtz, Philip; Eisen, Jonathan A.; ...

    2014-08-05

    Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overallmore » known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently~11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.« less

  13. Optical mapping and its potential for large-scale sequencing projects.

    PubMed

    Aston, C; Mishra, B; Schwartz, D C

    1999-07-01

    Physical mapping has been rediscovered as an important component of large-scale sequencing projects. Restriction maps provide landmark sequences at defined intervals, and high-resolution restriction maps can be assembled from ensembles of single molecules by optical means. Such optical maps can be constructed from both large-insert clones and genomic DNA, and are used as a scaffold for accurately aligning sequence contigs generated by shotgun sequencing.

  14. The Challenge of Large-Scale Literacy Improvement

    ERIC Educational Resources Information Center

    Levin, Ben

    2010-01-01

    This paper discusses the challenge of making large-scale improvements in literacy in schools across an entire education system. Despite growing interest and rhetoric, there are very few examples of sustained, large-scale change efforts around school-age literacy. The paper reviews 2 instances of such efforts, in England and Ontario. After…

  15. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    PubMed

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  16. Bioinformatics and genomic analysis of transposable elements in eukaryotic genomes.

    PubMed

    Janicki, Mateusz; Rooke, Rebecca; Yang, Guojun

    2011-08-01

    A major portion of most eukaryotic genomes are transposable elements (TEs). During evolution, TEs have introduced profound changes to genome size, structure, and function. As integral parts of genomes, the dynamic presence of TEs will continue to be a major force in reshaping genomes. Early computational analyses of TEs in genome sequences focused on filtering out "junk" sequences to facilitate gene annotation. When the high abundance and diversity of TEs in eukaryotic genomes were recognized, these early efforts transformed into the systematic genome-wide categorization and classification of TEs. The availability of genomic sequence data reversed the classical genetic approaches to discovering new TE families and superfamilies. Curated TE databases and their accurate annotation of genome sequences in turn facilitated the studies on TEs in a number of frontiers including: (1) TE-mediated changes of genome size and structure, (2) the influence of TEs on genome and gene functions, (3) TE regulation by host, (4) the evolution of TEs and their population dynamics, and (5) genomic scale studies of TE activity. Bioinformatics and genomic approaches have become an integral part of large-scale studies on TEs to extract information with pure in silico analyses or to assist wet lab experimental studies. The current revolution in genome sequencing technology facilitates further progress in the existing frontiers of research and emergence of new initiatives. The rapid generation of large-sequence datasets at record low costs on a routine basis is challenging the computing industry on storage capacity and manipulation speed and the bioinformatics community for improvement in algorithms and their implementations.

  17. Testing of the NASA Hypersonics Project Combined Cycle Engine Large Scale Inlet Mode Transition Experiment (CCE LlMX)

    NASA Technical Reports Server (NTRS)

    Saunders, J. D.; Stueber, T. J.; Thomas, S. R.; Suder, K. L.; Weir, L. J.; Sanders, B. W.

    2012-01-01

    Status on an effort to develop Turbine Based Combined Cycle (TBCC) propulsion is described. This propulsion technology can enable reliable and reusable space launch systems. TBCC propulsion offers improved performance and safety over rocket propulsion. The potential to realize aircraft-like operations and reduced maintenance are additional benefits. Among most the critical TBCC enabling technologies are: 1) mode transition from turbine to scramjet propulsion, 2) high Mach turbine engines and 3) TBCC integration. To address these TBCC challenges, the effort is centered on a propulsion mode transition experiment and includes analytical research. The test program, the Combined-Cycle Engine Large Scale Inlet Mode Transition Experiment (CCE LIMX), was conceived to integrate TBCC propulsion with proposed hypersonic vehicles. The goals address: (1) dual inlet operability and performance, (2) mode-transition sequences enabling a switch between turbine and scramjet flow paths, and (3) turbine engine transients during transition. Four test phases are planned from which a database can be used to both validate design and analysis codes and characterize operability and integration issues for TBCC propulsion. In this paper we discuss the research objectives, features of the CCE hardware and test plans, and status of the parametric inlet characterization testing which began in 2011. This effort is sponsored by the NASA Fundamental Aeronautics Hypersonics project

  18. SNP-VISTA: An interactive SNP visualization tool

    PubMed Central

    Shah, Nameeta; Teplitsky, Michael V; Minovitsky, Simon; Pennacchio, Len A; Hugenholtz, Philip; Hamann, Bernd; Dubchak, Inna L

    2005-01-01

    Background Recent advances in sequencing technologies promise to provide a better understanding of the genetics of human disease as well as the evolution of microbial populations. Single Nucleotide Polymorphisms (SNPs) are established genetic markers that aid in the identification of loci affecting quantitative traits and/or disease in a wide variety of eukaryotic species. With today's technological capabilities, it has become possible to re-sequence a large set of appropriate candidate genes in individuals with a given disease in an attempt to identify causative mutations. In addition, SNPs have been used extensively in efforts to study the evolution of microbial populations, and the recent application of random shotgun sequencing to environmental samples enables more extensive SNP analysis of co-occurring and co-evolving microbial populations. The program is available at [1]. Results We have developed and present two modifications of an interactive visualization tool, SNP-VISTA, to aid in the analyses of the following types of data: A. Large-scale re-sequence data of disease-related genes for discovery of associated and/or causative alleles (GeneSNP-VISTA). B. Massive amounts of ecogenomics data for studying homologous recombination in microbial populations (EcoSNP-VISTA). The main features and capabilities of SNP-VISTA are: 1) mapping of SNPs to gene structure; 2) classification of SNPs, based on their location in the gene, frequency of occurrence in samples and allele composition; 3) clustering, based on user-defined subsets of SNPs, highlighting haplotypes as well as recombinant sequences; 4) integration of protein evolutionary conservation visualization; and 5) display of automatically calculated recombination points that are user-editable. Conclusion The main strength of SNP-VISTA is its graphical interface and use of visual representations, which support interactive exploration and hence better understanding of large-scale SNP data by the user. PMID:16336665

  19. Intra-reach headwater fish assemblage structure

    USGS Publications Warehouse

    McKenna, James E.

    2017-01-01

    Large-scale conservation efforts can take advantage of modern large databases and regional modeling and assessment methods. However, these broad-scale efforts often assume uniform average habitat conditions and/or species assemblages within stream reaches.

  20. New generation pharmacogenomic tools: a SNP linkage disequilibrium Map, validated SNP assay resource, and high-throughput instrumentation system for large-scale genetic studies.

    PubMed

    De La Vega, Francisco M; Dailey, David; Ziegle, Janet; Williams, Julie; Madden, Dawn; Gilbert, Dennis A

    2002-06-01

    Since public and private efforts announced the first draft of the human genome last year, researchers have reported great numbers of single nucleotide polymorphisms (SNPs). We believe that the availability of well-mapped, quality SNP markers constitutes the gateway to a revolution in genetics and personalized medicine that will lead to better diagnosis and treatment of common complex disorders. A new generation of tools and public SNP resources for pharmacogenomic and genetic studies--specifically for candidate-gene, candidate-region, and whole-genome association studies--will form part of the new scientific landscape. This will only be possible through the greater accessibility of SNP resources and superior high-throughput instrumentation-assay systems that enable affordable, highly productive large-scale genetic studies. We are contributing to this effort by developing a high-quality linkage disequilibrium SNP marker map and an accompanying set of ready-to-use, validated SNP assays across every gene in the human genome. This effort incorporates both the public sequence and SNP data sources, and Celera Genomics' human genome assembly and enormous resource ofphysically mapped SNPs (approximately 4,000,000 unique records). This article discusses our approach and methodology for designing the map, choosing quality SNPs, designing and validating these assays, and obtaining population frequency ofthe polymorphisms. We also discuss an advanced, high-performance SNP assay chemisty--a new generation of the TaqMan probe-based, 5' nuclease assay-and high-throughput instrumentation-software system for large-scale genotyping. We provide the new SNP map and validation information, validated SNP assays and reagents, and instrumentation systems as a novel resource for genetic discoveries.

  1. Transfer of movement sequences: bigger is better.

    PubMed

    Dean, Noah J; Kovacs, Attila J; Shea, Charles H

    2008-02-01

    Experiment 1 was conducted to determine if proportional transfer from "small to large" scale movements is as effective as transferring from "large to small." We hypothesize that the learning of larger scale movement will require the participant to learn to manage the generation, storage, and dissipation of forces better than when practicing smaller scale movements. Thus, we predict an advantage for transfer of larger scale movements to smaller scale movements relative to transfer from smaller to larger scale movements. Experiment 2 was conducted to determine if adding a load to a smaller scale movement would enhance later transfer to a larger scale movement sequence. It was hypothesized that the added load would require the participants to consider the dynamics of the movement to a greater extent than without the load. The results replicated earlier findings of effective transfer from large to small movements, but consistent with our hypothesis, transfer was less effective from small to large (Experiment 1). However, when a load was added during acquisition transfer from small to large was enhanced even though the load was removed during the transfer test. These results are consistent with the notion that the transfer asymmetry noted in Experiment 1 was due to factors related to movement dynamics that were enhanced during practice of the larger scale movement sequence, but not during the practice of the smaller scale movement sequence. The findings that the movement structure is unaffected by transfer direction but the movement dynamics are influenced by transfer direction is consistent with hierarchal models of sequence production.

  2. The Human Genome Project: big science transforms biology and medicine.

    PubMed

    Hood, Leroy; Rowen, Lee

    2013-01-01

    The Human Genome Project has transformed biology through its integrated big science approach to deciphering a reference human genome sequence along with the complete sequences of key model organisms. The project exemplifies the power, necessity and success of large, integrated, cross-disciplinary efforts - so-called 'big science' - directed towards complex major objectives. In this article, we discuss the ways in which this ambitious endeavor led to the development of novel technologies and analytical tools, and how it brought the expertise of engineers, computer scientists and mathematicians together with biologists. It established an open approach to data sharing and open-source software, thereby making the data resulting from the project accessible to all. The genome sequences of microbes, plants and animals have revolutionized many fields of science, including microbiology, virology, infectious disease and plant biology. Moreover, deeper knowledge of human sequence variation has begun to alter the practice of medicine. The Human Genome Project has inspired subsequent large-scale data acquisition initiatives such as the International HapMap Project, 1000 Genomes, and The Cancer Genome Atlas, as well as the recently announced Human Brain Project and the emerging Human Proteome Project.

  3. The Human Genome Project: big science transforms biology and medicine

    PubMed Central

    2013-01-01

    The Human Genome Project has transformed biology through its integrated big science approach to deciphering a reference human genome sequence along with the complete sequences of key model organisms. The project exemplifies the power, necessity and success of large, integrated, cross-disciplinary efforts - so-called ‘big science’ - directed towards complex major objectives. In this article, we discuss the ways in which this ambitious endeavor led to the development of novel technologies and analytical tools, and how it brought the expertise of engineers, computer scientists and mathematicians together with biologists. It established an open approach to data sharing and open-source software, thereby making the data resulting from the project accessible to all. The genome sequences of microbes, plants and animals have revolutionized many fields of science, including microbiology, virology, infectious disease and plant biology. Moreover, deeper knowledge of human sequence variation has begun to alter the practice of medicine. The Human Genome Project has inspired subsequent large-scale data acquisition initiatives such as the International HapMap Project, 1000 Genomes, and The Cancer Genome Atlas, as well as the recently announced Human Brain Project and the emerging Human Proteome Project. PMID:24040834

  4. A new strategy for genome assembly using short sequence reads and reduced representation libraries.

    PubMed

    Young, Andrew L; Abaan, Hatice Ozel; Zerbino, Daniel; Mullikin, James C; Birney, Ewan; Margulies, Elliott H

    2010-02-01

    We have developed a novel approach for using massively parallel short-read sequencing to generate fast and inexpensive de novo genomic assemblies comparable to those generated by capillary-based methods. The ultrashort (<100 base) sequences generated by this technology pose specific biological and computational challenges for de novo assembly of large genomes. To account for this, we devised a method for experimentally partitioning the genome using reduced representation (RR) libraries prior to assembly. We use two restriction enzymes independently to create a series of overlapping fragment libraries, each containing a tractable subset of the genome. Together, these libraries allow us to reassemble the entire genome without the need of a reference sequence. As proof of concept, we applied this approach to sequence and assembled the majority of the 125-Mb Drosophila melanogaster genome. We subsequently demonstrate the accuracy of our assembly method with meaningful comparisons against the current available D. melanogaster reference genome (dm3). The ease of assembly and accuracy for comparative genomics suggest that our approach will scale to future mammalian genome-sequencing efforts, saving both time and money without sacrificing quality.

  5. TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes

    PubMed Central

    Leroy, Philippe; Guilhot, Nicolas; Sakai, Hiroaki; Bernard, Aurélien; Choulet, Frédéric; Theil, Sébastien; Reboux, Sébastien; Amano, Naoki; Flutre, Timothée; Pelegrin, Céline; Ohyanagi, Hajime; Seidel, Michael; Giacomoni, Franck; Reichstadt, Mathieu; Alaux, Michael; Gicquello, Emmanuelle; Legeai, Fabrice; Cerutti, Lorenzo; Numa, Hisataka; Tanaka, Tsuyoshi; Mayer, Klaus; Itoh, Takeshi; Quesneville, Hadi; Feuillet, Catherine

    2012-01-01

    In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architecture allows for the annotation and masking of transposable elements, the structural, and functional annotation of protein-coding genes with an evidence-based quality indexing, and the identification of conserved non-coding sequences and molecular markers. The TriAnnot pipeline is parallelized on a 712 CPU computing cluster that can run a 1-Gb sequence annotation in less than 5 days. It is accessible through a web interface for small scale analyses or through a server for large scale annotations. The performance of TriAnnot was evaluated in terms of sensitivity, specificity, and general fitness using curated reference sequence sets from rice and wheat. In less than 8 h, TriAnnot was able to predict more than 83% of the 3,748 CDS from rice chromosome 1 with a fitness of 67.4%. On a set of 12 reference Mb-sized contigs from wheat chromosome 3B, TriAnnot predicted and annotated 93.3% of the genes among which 54% were perfectly identified in accordance with the reference annotation. It also allowed the curation of 12 genes based on new biological evidences, increasing the percentage of perfect gene prediction to 63%. TriAnnot systematically showed a higher fitness than other annotation pipelines that are not improved for wheat. As it is easily adaptable to the annotation of other plant genomes, TriAnnot should become a useful resource for the annotation of large and complex genomes in the future. PMID:22645565

  6. Cloud computing for genomic data analysis and collaboration.

    PubMed

    Langmead, Ben; Nellore, Abhinav

    2018-04-01

    Next-generation sequencing has made major strides in the past decade. Studies based on large sequencing data sets are growing in number, and public archives for raw sequencing data have been doubling in size every 18 months. Leveraging these data requires researchers to use large-scale computational resources. Cloud computing, a model whereby users rent computers and storage from large data centres, is a solution that is gaining traction in genomics research. Here, we describe how cloud computing is used in genomics for research and large-scale collaborations, and argue that its elasticity, reproducibility and privacy features make it ideally suited for the large-scale reanalysis of publicly available archived data, including privacy-protected data.

  7. PinAPL-Py: A comprehensive web-application for the analysis of CRISPR/Cas9 screens.

    PubMed

    Spahn, Philipp N; Bath, Tyler; Weiss, Ryan J; Kim, Jihoon; Esko, Jeffrey D; Lewis, Nathan E; Harismendy, Olivier

    2017-11-20

    Large-scale genetic screens using CRISPR/Cas9 technology have emerged as a major tool for functional genomics. With its increased popularity, experimental biologists frequently acquire large sequencing datasets for which they often do not have an easy analysis option. While a few bioinformatic tools have been developed for this purpose, their utility is still hindered either due to limited functionality or the requirement of bioinformatic expertise. To make sequencing data analysis of CRISPR/Cas9 screens more accessible to a wide range of scientists, we developed a Platform-independent Analysis of Pooled Screens using Python (PinAPL-Py), which is operated as an intuitive web-service. PinAPL-Py implements state-of-the-art tools and statistical models, assembled in a comprehensive workflow covering sequence quality control, automated sgRNA sequence extraction, alignment, sgRNA enrichment/depletion analysis and gene ranking. The workflow is set up to use a variety of popular sgRNA libraries as well as custom libraries that can be easily uploaded. Various analysis options are offered, suitable to analyze a large variety of CRISPR/Cas9 screening experiments. Analysis output includes ranked lists of sgRNAs and genes, and publication-ready plots. PinAPL-Py helps to advance genome-wide screening efforts by combining comprehensive functionality with user-friendly implementation. PinAPL-Py is freely accessible at http://pinapl-py.ucsd.edu with instructions and test datasets.

  8. National Science Foundation-sponsored workshop report. Draft plan for soybean genomics.

    PubMed

    Stacey, Gary; Vodkin, Lila; Parrott, Wayne A; Shoemaker, Randy C

    2004-05-01

    Recent efforts to coordinate and define a research strategy for soybean (Glycine max) genomics began with the establishment of a Soybean Genetics Executive Committee, which will serve as a communication focal point between the soybean research community and granting agencies. Secondly, a workshop was held to define a strategy to incorporate existing tools into a framework for advancing soybean genomics research. This workshop identified and ranked research priorities essential to making more informed decisions as to how to proceed with large scale sequencing and other genomics efforts. Most critical among these was the need to finalize a physical map and to obtain a better understanding of genome microstructure. Addressing these research needs will require pilot work on new technologies to demonstrate an ability to discriminate between recently duplicated regions in the soybean genome and pilot projects to analyze an adequate amount of random genome sequence to identify and catalog common repeats. The development of additional markers, reverse genetics tools, and bioinformatics is also necessary. Successful implementation of these goals will require close coordination among various working groups.

  9. Sequencing Data Discovery and Integration for Earth System Science with MetaSeek

    NASA Astrophysics Data System (ADS)

    Hoarfrost, A.; Brown, N.; Arnosti, C.

    2017-12-01

    Microbial communities play a central role in biogeochemical cycles. Sequencing data resources from environmental sources have grown exponentially in recent years, and represent a singular opportunity to investigate microbial interactions with Earth system processes. Carrying out such meta-analyses depends on our ability to discover and curate sequencing data into large-scale integrated datasets. However, such integration efforts are currently challenging and time-consuming, with sequencing data scattered across multiple repositories and metadata that is not easily or comprehensively searchable. MetaSeek is a sequencing data discovery tool that integrates sequencing metadata from all the major data repositories, allowing the user to search and filter on datasets in a lightweight application with an intuitive, easy-to-use web-based interface. Users can save and share curated datasets, while other users can browse these data integrations or use them as a jumping off point for their own curation. Missing and/or erroneous metadata are inferred automatically where possible, and where not possible, users are prompted to contribute to the improvement of the sequencing metadata pool by correcting and amending metadata errors. Once an integrated dataset has been curated, users can follow simple instructions to download their raw data and quickly begin their investigations. In addition to the online interface, the MetaSeek database is easily queryable via an open API, further enabling users and facilitating integrations of MetaSeek with other data curation tools. This tool lowers the barriers to curation and integration of environmental sequencing data, clearing the path forward to illuminating the ecosystem-scale interactions between biological and abiotic processes.

  10. 2012 U.S. Department of Energy: Joint Genome Institute: Progress Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilbert, David

    2013-01-01

    The mission of the U.S. Department of Energy Joint Genome Institute (DOE JGI) is to serve the diverse scientific community as a user facility, enabling the application of large-scale genomics and analysis of plants, microbes, and communities of microbes to address the DOE mission goals in bioenergy and the environment. The DOE JGI's sequencing efforts fall under the Eukaryote Super Program, which includes the Plant and Fungal Genomics Programs; and the Prokaryote Super Program, which includes the Microbial Genomics and Metagenomics Programs. In 2012, several projects made news for their contributions to energy and environment research.

  11. MAINTAINING DATA QUALITY IN THE PERFORMANCE OF A LARGE SCALE INTEGRATED MONITORING EFFORT

    EPA Science Inventory

    Macauley, John M. and Linda C. Harwell. In press. Maintaining Data Quality in the Performance of a Large Scale Integrated Monitoring Effort (Abstract). To be presented at EMAP Symposium 2004: Integrated Monitoring and Assessment for Effective Water Quality Management, 3-7 May 200...

  12. The Use of Weighted Graphs for Large-Scale Genome Analysis

    PubMed Central

    Zhou, Fang; Toivonen, Hannu; King, Ross D.

    2014-01-01

    There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution. PMID:24619061

  13. PLAN-IT: Knowledge-Based Mission Sequencing

    NASA Astrophysics Data System (ADS)

    Biefeld, Eric W.

    1987-02-01

    Mission sequencing consumes a large amount of time and manpower during a space exploration effort. Plan-It is a knowledge-based approach to assist in mission sequencing. Plan-It uses a combined frame and blackboard architecture. This paper reports on the approach implemented by Plan-It and the current applications of Plan-It for sequencing at NASA.

  14. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

    PubMed

    Wan, Shixiang; Zou, Quan

    2017-01-01

    Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

  15. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.

    PubMed

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan

    2013-06-27

    Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available for third-party implementation and use, and can be downloaded from http://s3.amazonaws.com/jnj_rainbow/index.html.

  16. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing

    PubMed Central

    2013-01-01

    Background Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Results Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Conclusions Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available for third-party implementation and use, and can be downloaded from http://s3.amazonaws.com/jnj_rainbow/index.html. PMID:23802613

  17. The Need for Large-Scale, Longitudinal Empirical Studies in Middle Level Education Research

    ERIC Educational Resources Information Center

    Mertens, Steven B.; Caskey, Micki M.; Flowers, Nancy

    2016-01-01

    This essay describes and discusses the ongoing need for large-scale, longitudinal, empirical research studies focused on middle grades education. After a statement of the problem and concerns, the essay describes and critiques several prior middle grades efforts and research studies. Recommendations for future research efforts to inform policy…

  18. DOE Joint Genome Institute 2008 Progress Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilbert, David

    2009-03-12

    While initially a virtual institute, the driving force behind the creation of the DOE Joint Genome Institute in Walnut Creek, California in the Fall of 1999 was the Department of Energy's commitment to sequencing the human genome. With the publication in 2004 of a trio of manuscripts describing the finished 'DOE Human Chromosomes', the Institute successfully completed its human genome mission. In the time between the creation of the Department of Energy Joint Genome Institute (DOE JGI) and completion of the Human Genome Project, sequencing and its role in biology spread to fields extending far beyond what could be imaginedmore » when the Human Genome Project first began. Accordingly, the targets of the DOE JGI's sequencing activities changed, moving from a single human genome to the genomes of large numbers of microbes, plants, and other organisms, and the community of users of DOE JGI data similarly expanded and diversified. Transitioning into operating as a user facility, the DOE JGI modeled itself after other DOE user facilities, such as synchrotron light sources and supercomputer facilities, empowering the science of large numbers of investigators working in areas of relevance to energy and the environment. The JGI's approach to being a user facility is based on the concept that by focusing state-of-the-art sequencing and analysis capabilities on the best peer-reviewed ideas drawn from a broad community of scientists, the DOE JGI will effectively encourage creative approaches to DOE mission areas and produce important science. This clearly has occurred, only partially reflected in the fact that the DOE JGI has played a major role in more than 45 papers published in just the past three years alone in Nature and Science. The involvement of a large and engaged community of users working on important problems has helped maximize the impact of JGI science. A seismic technological change is presently underway at the JGI. The Sanger capillary-based sequencing process that dominated how sequencing was done in the last decade is being replaced by a variety of new processes and sequencing instruments. The JGI, with an increasing number of next-generation sequencers, whose throughput is 100- to 1,000-fold greater than the Sanger capillary-based sequencers, is increasingly focused in new directions on projects of scale and complexity not previously attempted. These new directions for the JGI come, in part, from the 2008 National Research Council report on the goals of the National Plant Genome Initiative as well as the 2007 National Research Council report on the New Science of Metagenomics. Both reports outline a crucial need for systematic large-scale surveys of the plant and microbial components of the biosphere as well as an increasing need for large-scale analysis capabilities to meet the challenge of converting sequence data into knowledge. The JGI is extensively discussed in both reports as vital to progress in these fields of major national interest. JGI's future plan for plants and microbes includes a systematic approach for investigation of these organisms at a scale requiring the special capabilities of the JGI to generate, manage, and analyze the datasets. JGI will generate and provide not only community access to these plant and microbial datasets, but also the tools for analyzing them. These activities will produce essential knowledge that will be needed if we are to be able to respond to the world's energy and environmental challenges. As the JGI Plant and Microbial programs advance, the JGI as a user facility is also evolving. The Institute has been highly successful in bending its technical and analytical skills to help users solve large complex problems of major importance, and that effort will continue unabated. The JGI will increasingly move from a central focus on 'one-off' user projects coming from small user communities to much larger scale projects driven by systematic and problem-focused approaches to selection of sequencing targets. Entire communities of scientists working in a particular field, such as feedstock improvement or biomass degradation, will be users of this information. Despite this new emphasis, an investigator-initiated user program will remain. This program in the future will replace small projects that increasingly can be accomplished without the involvement of JGI, with imaginative large-scale 'Grand Challenge' projects of foundational relevance to energy and the environment that require a new scale of sequencing and analysis capabilities. Close interactions with the DOE Bioenergy Research Centers, and with other DOE institutions that may follow, will also play a major role in shaping aspects of how the JGI operates as a user facility. Based on increased availability of high-throughput sequencing, the JGI will increasingly provide to users, in addition to DNA sequencing, an array of both pre- and post-sequencing value-added capabilities to accelerate their science.« less

  19. The World Health Organization Global Programme on AIDS proposal for standardization of HIV sequence nomenclature. WHO Network for HIV Isolation and Characterization.

    PubMed

    Korber, B T; Osmanov, S; Esparza, J; Myers, G

    1994-11-01

    The World Health Organization Global Programme on AIDS (WHO/GPA) is conducting a large-scale collaborative study of human immunodeficiency virus type 1 (HIV-1) variation, based in four potential vaccine-trial site countries: Brazil, Rwanda, Thailand, and Uganda. Through the course of this study, it was crucial to keep track of certain attributes of the samples from which the viral nucleotide sequences were derived (e.g., country of origin and viral culture characterization), so that meaningful sequence comparisons could be made. Here we describe a system developed in the context of the WHO/GPA study that summarizes such critical attributes by representing them as standardized characters directly incorporated into sequence names. This nomenclature allows linkage of clinical, phenotypic, and geographic information with molecular data. We propose that other investigators involved in human immunodeficiency virus (HIV) nucleotide sequencing efforts adopt a similar standardized sequence nomenclature to facilitate cross-study sequence comparison. HIV sequence data are being generated at an ever-increasing rate; directly coupled to this increase is our deepening understanding of biological parameters that influence or result from sequence variability. A standardized sequence nomenclature that includes relevant biological information would enable researchers to better utilize the growing body of sequence data, and enhance their ability to interpret the biological implications of their own data through facilitating comparisons with previously published work.

  20. ISS-based Development of Elements and Operations for Robotic Assembly of A Space Solar Power Collector

    NASA Technical Reports Server (NTRS)

    Valinia, Azita; Moe, Rud; Seery, Bernard D.; Mankins, John C.

    2013-01-01

    We present a concept for an ISS-based optical system assembly demonstration designed to advance technologies related to future large in-space optical facilities deployment, including space solar power collectors and large-aperture astronomy telescopes. The large solar power collector problem is not unlike the large astronomical telescope problem, but at least conceptually it should be easier in principle, given the tolerances involved. We strive in this application to leverage heavily the work done on the NASA Optical Testbed Integration on ISS Experiment (OpTIIX) effort to erect a 1.5 m imaging telescope on the International Space Station (ISS). Specifically, we examine a robotic assembly sequence for constructing a large (meter diameter) slightly aspheric or spherical primary reflector, comprised of hexagonal mirror segments affixed to a lightweight rigidizing backplane structure. This approach, together with a structured robot assembler, will be shown to be scalable to the area and areal densities required for large-scale solar concentrator arrays.

  1. Genetics of Resistant Hypertension: the Missing Heritability and Opportunities.

    PubMed

    Teixeira, Samantha K; Pereira, Alexandre C; Krieger, Jose E

    2018-05-19

    Blood pressure regulation in humans has long been known to be a genetically determined trait. The identification of causal genetic modulators for this trait has been unfulfilling at the least. Despite the recent advances of genome-wide genetic studies, loci associated with hypertension or blood pressure still explain a very low percentage of the overall variation of blood pressure in the general population. This has precluded the translation of discoveries in the genetics of human hypertension to clinical use. Here, we propose the combined use of resistant hypertension as a trait for mapping genetic determinants in humans and the integration of new large-scale technologies to approach in model systems the multidimensional nature of the problem. New large-scale efforts in the genetic and genomic arenas are paving the way for an increased and granular understanding of genetic determinants of hypertension. New technologies for whole genome sequence and large-scale forward genetic screens can help prioritize gene and gene-pathways for downstream characterization and large-scale population studies, and guided pharmacological design can be used to drive discoveries to the translational application through better risk stratification and new therapeutic approaches. Although significant challenges remain in the mapping and identification of genetic determinants of hypertension, new large-scale technological approaches have been proposed to surpass some of the shortcomings that have limited progress in the area for the last three decades. The incorporation of these technologies to hypertension research may significantly help in the understanding of inter-individual blood pressure variation and the deployment of new phenotyping and treatment approaches for the condition.

  2. Large-scale sequencing trials begin

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Roberts, L.

    1990-12-07

    As genome sequencing gets under way, investigators are grappling not just with new techniques but also with questions about what is acceptable accuracy and when data should be released. Four groups are embarking on projects that could make or break the human genome project. They are setting out to sequence the longest stretches of DNA ever tackled-several million bases each-and to do it faster and cheaper than anyone has before. If these groups can't pull it off, then prospects for knocking off the entire human genome, all 3 billion bases, in 15 years and for $3 billion will look increasinglymore » unlikely. Harvard's Walter Gilbert, is first tackling the genome of Mycoplasma capricolum. At Stanford, David Botstein and Ron Davis are sequencing Saccharomyces cerevisiae. In a collaborative effort, Robert Waterson at Washington University and John Sulston at the Medical Research Council lab in Cambridge, England, have already started on the nematode Caenorhabditis elegans. And in the only longstanding project of the bunch, University of Wisconsin geneticist Fred Blattner is already several hundred kilobases into the Escherichia coli genome.« less

  3. GAP Final Technical Report 12-14-04

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andrew J. Bordner, PhD, Senior Research Scientist

    2004-12-14

    The Genomics Annotation Platform (GAP) was designed to develop new tools for high throughput functional annotation and characterization of protein sequences and structures resulting from genomics and structural proteomics, benchmarking and application of those tools. Furthermore, this platform integrated the genomic scale sequence and structural analysis and prediction tools with the advanced structure prediction and bioinformatics environment of ICM. The development of GAP was primarily oriented towards the annotation of new biomolecular structures using both structural and sequence data. Even though the amount of protein X-ray crystal data is growing exponentially, the volume of sequence data is growing even moremore » rapidly. This trend was exploited by leveraging the wealth of sequence data to provide functional annotation for protein structures. The additional information provided by GAP is expected to assist the majority of the commercial users of ICM, who are involved in drug discovery, in identifying promising drug targets as well in devising strategies for the rational design of therapeutics directed at the protein of interest. The GAP also provided valuable tools for biochemistry education, and structural genomics centers. In addition, GAP incorporates many novel prediction and analysis methods not available in other molecular modeling packages. This development led to signing the first Molsoft agreement in the structural genomics annotation area with the University of oxford Structural Genomics Center. This commercial agreement validated the Molsoft efforts under the GAP project and provided the basis for further development of the large scale functional annotation platform.« less

  4. Precision medicine in the age of big data: The present and future role of large-scale unbiased sequencing in drug discovery and development.

    PubMed

    Vicini, P; Fields, O; Lai, E; Litwack, E D; Martin, A-M; Morgan, T M; Pacanowski, M A; Papaluca, M; Perez, O D; Ringel, M S; Robson, M; Sakul, H; Vockley, J; Zaks, T; Dolsten, M; Søgaard, M

    2016-02-01

    High throughput molecular and functional profiling of patients is a key driver of precision medicine. DNA and RNA characterization has been enabled at unprecedented cost and scale through rapid, disruptive progress in sequencing technology, but challenges persist in data management and interpretation. We analyze the state-of-the-art of large-scale unbiased sequencing in drug discovery and development, including technology, application, ethical, regulatory, policy and commercial considerations, and discuss issues of LUS implementation in clinical and regulatory practice. © 2015 American Society for Clinical Pharmacology and Therapeutics.

  5. Genetics of coronary artery disease: discovery, biology and clinical translation

    PubMed Central

    Khera, Amit V.; Kathiresan, Sekar

    2018-01-01

    Coronary artery disease is the leading global cause of mortality. Long recognized to be heritable, recent advances have started to unravel the genetic architecture of the disease. Common variant association studies have linked about 60 genetic loci to coronary risk. Large-scale gene sequencing efforts and functional studies have facilitated a better understanding of causal risk factors, elucidated underlying biology and informed the development of new therapeutics. Moving forward, genetic testing could enable precision medicine approaches, by identifying subgroups of patients at increased risk of CAD or those with a specific driving pathophysiology in whom a therapeutic or preventive approach is most useful. PMID:28286336

  6. What does physics have to do with cancer?

    PubMed Central

    Michor, Franziska; Liphardt, Jan; Ferrari, Mauro; Widom, Jonathan

    2013-01-01

    Large-scale cancer genomics, proteomics and RNA-sequencing efforts are currently mapping in fine detail the genetic and biochemical alterations that occur in cancer. However, it is becoming clear that it is difficult to integrate and interpret these data and to translate them into treatments. This difficulty is compounded by the recognition that cancer cells evolve, and that initiation, progression and metastasis are influenced by a wide variety of factors. To help tackle this challenge, the US National Cancer Institute Physical Sciences-Oncology Centers initiative is bringing together physicists, cancer biologists, chemists, mathematicians and engineers. How are we beginning to address cancer from the perspective of the physical sciences? PMID:21850037

  7. The Comprehensive Microbial Resource.

    PubMed

    Peterson, J D; Umayam, L A; Dickinson, T; Hickey, E K; White, O

    2001-01-01

    One challenge presented by large-scale genome sequencing efforts is effective display of uniform information to the scientific community. The Comprehensive Microbial Resource (CMR) contains robust annotation of all complete microbial genomes and allows for a wide variety of data retrievals. The bacterial information has been placed on the Web at http://www.tigr.org/CMR for retrieval using standard web browsing technology. Retrievals can be based on protein properties such as molecular weight or hydrophobicity, GC-content, functional role assignments and taxonomy. The CMR also has special web-based tools to allow data mining using pre-run homology searches, whole genome dot-plots, batch downloading and traversal across genomes using a variety of datatypes.

  8. Genomic Encyclopedia of Type Strains, Phase I: The one thousand microbial genomes (KMG-I) project

    DOE PAGES

    Kyrpides, Nikos C.; Woyke, Tanja; Eisen, Jonathan A.; ...

    2014-06-15

    The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project was launched by the JGI in 2007 as a pilot project with the objective of sequencing 250 bacterial and archaeal genomes. The two major goals of that project were (a) to test the hypothesis that there are many benefits to the use the phylogenetic diversity of organisms in the tree of life as a primary criterion for generating their genome sequence and (b) to develop the necessary framework, technology and organization for large-scale sequencing of microbial isolate genomes. While the GEBA pilot project has not yet been entirely completed, both ofmore » the original goals have already been successfully accomplished, leading the way for the next phase of the project. Here we propose taking the GEBA project to the next level, by generating high quality draft genomes for 1,000 bacterial and archaeal strains. This represents a combined 16-fold increase in both scale and speed as compared to the GEBA pilot project (250 isolate genomes in 4+ years). We will follow a similar approach for organism selection and sequencing prioritization as was done for the GEBA pilot project (i.e. phylogenetic novelty, availability and growth of cultures of type strains and DNA extraction capability), focusing on type strains as this ensures reproducibility of our results and provides the strongest linkage between genome sequences and other knowledge about each strain. In turn, this project will constitute a pilot phase of a larger effort that will target the genome sequences of all available type strains of the Bacteria and Archaea.« less

  9. Genomic Encyclopedia of Type Strains, Phase I: The one thousand microbial genomes (KMG-I) project

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kyrpides, Nikos C.; Woyke, Tanja; Eisen, Jonathan A.

    The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project was launched by the JGI in 2007 as a pilot project with the objective of sequencing 250 bacterial and archaeal genomes. The two major goals of that project were (a) to test the hypothesis that there are many benefits to the use the phylogenetic diversity of organisms in the tree of life as a primary criterion for generating their genome sequence and (b) to develop the necessary framework, technology and organization for large-scale sequencing of microbial isolate genomes. While the GEBA pilot project has not yet been entirely completed, both ofmore » the original goals have already been successfully accomplished, leading the way for the next phase of the project. Here we propose taking the GEBA project to the next level, by generating high quality draft genomes for 1,000 bacterial and archaeal strains. This represents a combined 16-fold increase in both scale and speed as compared to the GEBA pilot project (250 isolate genomes in 4+ years). We will follow a similar approach for organism selection and sequencing prioritization as was done for the GEBA pilot project (i.e. phylogenetic novelty, availability and growth of cultures of type strains and DNA extraction capability), focusing on type strains as this ensures reproducibility of our results and provides the strongest linkage between genome sequences and other knowledge about each strain. In turn, this project will constitute a pilot phase of a larger effort that will target the genome sequences of all available type strains of the Bacteria and Archaea.« less

  10. Exploring the feasibility of using copy number variants as genetic markers through large-scale whole genome sequencing experiments

    USDA-ARS?s Scientific Manuscript database

    Copy number variants (CNV) are large scale duplications or deletions of genomic sequence that are caused by a diverse set of molecular phenomena that are distinct from single nucleotide polymorphism (SNP) formation. Due to their different mechanisms of formation, CNVs are often difficult to track us...

  11. DISRUPTION OF LARGE-SCALE NEURAL NETWORKS IN NON-FLUENT/AGRAMMATIC VARIANT PRIMARY PROGRESSIVE APHASIA ASSOCIATED WITH FRONTOTEMPORAL DEGENERATION PATHOLOGY

    PubMed Central

    Grossman, Murray; Powers, John; Ash, Sherry; McMillan, Corey; Burkholder, Lisa; Irwin, David; Trojanowski, John Q.

    2012-01-01

    Non-fluent/agrammatic primary progressive aphasia (naPPA) is a progressive neurodegenerative condition most prominently associated with slowed, effortful speech. A clinical imaging marker of naPPA is disease centered in the left inferior frontal lobe. We used multimodal imaging to assess large-scale neural networks underlying effortful expression in 15 patients with sporadic naPPA due to frontotemporal lobar degeneration (FTLD) spectrum pathology. Effortful speech in these patients is related in part to impaired grammatical processing, and to phonologic speech errors. Gray matter (GM) imaging shows frontal and anterior-superior temporal atrophy, most prominently in the left hemisphere. Diffusion tensor imaging reveals reduced fractional anisotropy in several white matter (WM) tracts mediating projections between left frontal and other GM regions. Regression analyses suggest disruption of three large-scale GM-WM neural networks in naPPA that support fluent, grammatical expression. These findings emphasize the role of large-scale neural networks in language, and demonstrate associated language deficits in naPPA. PMID:23218686

  12. Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index.

    PubMed

    Pandey, Prashant; Almodaresi, Fatemeh; Bender, Michael A; Ferdman, Michael; Johnson, Rob; Patro, Rob

    2018-06-18

    Sequence-level searches on large collections of RNA sequencing experiments, such as the NCBI Sequence Read Archive (SRA), would enable one to ask many questions about the expression or variation of a given transcript in a population. Existing approaches, such as the sequence Bloom tree, suffer from fundamental limitations of the Bloom filter, resulting in slow build and query times, less-than-optimal space usage, and potentially large numbers of false-positives. This paper introduces Mantis, a space-efficient system that uses new data structures to index thousands of raw-read experiments and facilitates large-scale sequence searches. In our evaluation, index construction with Mantis is 6× faster and yields a 20% smaller index than the state-of-the-art split sequence Bloom tree (SSBT). For queries, Mantis is 6-108× faster than SSBT and has no false-positives or -negatives. For example, Mantis was able to search for all 200,400 known human transcripts in an index of 2,652 RNA sequencing experiments in 82 min; SSBT took close to 4 days. Copyright © 2018 Elsevier Inc. All rights reserved.

  13. Using SQL Databases for Sequence Similarity Searching and Analysis.

    PubMed

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  14. FOUNTAIN: A JAVA open-source package to assist large sequencing projects

    PubMed Central

    Buerstedde, Jean-Marie; Prill, Florian

    2001-01-01

    Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort. PMID:11591214

  15. RNA Interference for Functional Genomics and Improvement of Cotton (Gossypium sp.)

    PubMed Central

    Abdurakhmonov, Ibrokhim Y.; Ayubov, Mirzakamol S.; Ubaydullaeva, Khurshida A.; Buriev, Zabardast T.; Shermatov, Shukhrat E.; Ruziboev, Haydarali S.; Shapulatov, Umid M.; Saha, Sukumar; Ulloa, Mauricio; Yu, John Z.; Percy, Richard G.; Devor, Eric J.; Sharma, Govind C.; Sripathi, Venkateswara R.; Kumpatla, Siva P.; van der Krol, Alexander; Kater, Hake D.; Khamidov, Khakimdjan; Salikhov, Shavkat I.; Jenkins, Johnie N.; Abdukarimov, Abdusattor; Pepper, Alan E.

    2016-01-01

    RNA interference (RNAi), is a powerful new technology in the discovery of genetic sequence functions, and has become a valuable tool for functional genomics of cotton (Gossypium sp.). The rapid adoption of RNAi has replaced previous antisense technology. RNAi has aided in the discovery of function and biological roles of many key cotton genes involved in fiber development, fertility and somatic embryogenesis, resistance to important biotic and abiotic stresses, and oil and seed quality improvements as well as the key agronomic traits including yield and maturity. Here, we have comparatively reviewed seminal research efforts in previously used antisense approaches and currently applied breakthrough RNAi studies in cotton, analyzing developed RNAi methodologies, achievements, limitations, and future needs in functional characterizations of cotton genes. We also highlighted needed efforts in the development of RNAi-based cotton cultivars, and their safety and risk assessment, small and large-scale field trials, and commercialization. PMID:26941765

  16. RNA Interference for Functional Genomics and Improvement of Cotton (Gossypium sp.).

    PubMed

    Abdurakhmonov, Ibrokhim Y; Ayubov, Mirzakamol S; Ubaydullaeva, Khurshida A; Buriev, Zabardast T; Shermatov, Shukhrat E; Ruziboev, Haydarali S; Shapulatov, Umid M; Saha, Sukumar; Ulloa, Mauricio; Yu, John Z; Percy, Richard G; Devor, Eric J; Sharma, Govind C; Sripathi, Venkateswara R; Kumpatla, Siva P; van der Krol, Alexander; Kater, Hake D; Khamidov, Khakimdjan; Salikhov, Shavkat I; Jenkins, Johnie N; Abdukarimov, Abdusattor; Pepper, Alan E

    2016-01-01

    RNA interference (RNAi), is a powerful new technology in the discovery of genetic sequence functions, and has become a valuable tool for functional genomics of cotton (Gossypium sp.). The rapid adoption of RNAi has replaced previous antisense technology. RNAi has aided in the discovery of function and biological roles of many key cotton genes involved in fiber development, fertility and somatic embryogenesis, resistance to important biotic and abiotic stresses, and oil and seed quality improvements as well as the key agronomic traits including yield and maturity. Here, we have comparatively reviewed seminal research efforts in previously used antisense approaches and currently applied breakthrough RNAi studies in cotton, analyzing developed RNAi methodologies, achievements, limitations, and future needs in functional characterizations of cotton genes. We also highlighted needed efforts in the development of RNAi-based cotton cultivars, and their safety and risk assessment, small and large-scale field trials, and commercialization.

  17. Proteomics meets blue biotechnology: a wealth of novelties and opportunities.

    PubMed

    Hartmann, Erica M; Durighello, Emie; Pible, Olivier; Nogales, Balbina; Beltrametti, Fabrizio; Bosch, Rafael; Christie-Oleza, Joseph A; Armengaud, Jean

    2014-10-01

    Blue biotechnology, in which aquatic environments provide the inspiration for various products such as food additives, aquaculture, biosensors, green chemistry, bioenergy, and pharmaceuticals, holds enormous promise. Large-scale efforts to sequence aquatic genomes and metagenomes, as well as campaigns to isolate new organisms and culture-based screenings, are helping to push the boundaries of known organisms. Mass spectrometry-based proteomics can complement 16S gene sequencing in the effort to discover new organisms of potential relevance to blue biotechnology by facilitating the rapid screening of microbial isolates and by providing in depth profiles of the proteomes and metaproteomes of marine organisms, both model cultivable isolates and, more recently, exotic non-cultivable species and communities. Proteomics has already contributed to blue biotechnology by identifying aquatic proteins with potential applications to food fermentation, the textile industry, and biomedical drug development. In this review, we discuss historical developments in blue biotechnology, the current limitations to the known marine biosphere, and the ways in which mass spectrometry can expand that knowledge. We further speculate about directions that research in blue biotechnology will take given current and near-future technological advancements in mass spectrometry. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. Simrank: Rapid and sensitive general-purpose k-mer search tool

    PubMed Central

    2011-01-01

    Background Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available. Results Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset. Conclusions Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity. PMID:21524302

  19. The African Genome Variation Project shapes medical genetics in Africa

    PubMed Central

    Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.

    2014-01-01

    Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterisation of African genetic diversity is needed. The African Genome Variation Project (AGVP) provides a resource to help design, implement and interpret genomic studies in sub-Saharan Africa (SSA) and worldwide. The AGVP represents dense genotypes from 1,481 and whole genome sequences (WGS) from 320 individuals across SSA. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across SSA. We identify new loci under selection, including for malaria and hypertension. We show that modern imputation panels can identify association signals at highly differentiated loci across populations in SSA. Using WGS, we show further improvement in imputation accuracy supporting efforts for large-scale sequencing of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa, showing for the first time that such designs are feasible. PMID:25470054

  20. Automation, parallelism, and robotics for proteomics.

    PubMed

    Alterovitz, Gil; Liu, Jonathan; Chow, Jijun; Ramoni, Marco F

    2006-07-01

    The speed of the human genome project (Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C. et al., Nature 2001, 409, 860-921) was made possible, in part, by developments in automation of sequencing technologies. Before these technologies, sequencing was a laborious, expensive, and personnel-intensive task. Similarly, automation and robotics are changing the field of proteomics today. Proteomics is defined as the effort to understand and characterize proteins in the categories of structure, function and interaction (Englbrecht, C. C., Facius, A., Comb. Chem. High Throughput Screen. 2005, 8, 705-715). As such, this field nicely lends itself to automation technologies since these methods often require large economies of scale in order to achieve cost and time-saving benefits. This article describes some of the technologies and methods being applied in proteomics in order to facilitate automation within the field as well as in linking proteomics-based information with other related research areas.

  1. Advances in DNA sequencing technologies for high resolution HLA typing.

    PubMed

    Cereb, Nezih; Kim, Hwa Ran; Ryu, Jaejun; Yang, Soo Young

    2015-12-01

    This communication describes our experience in large-scale G group-level high resolution HLA typing using three different DNA sequencing platforms - ABI 3730 xl, Illumina MiSeq and PacBio RS II. Recent advances in DNA sequencing technologies, so-called next generation sequencing (NGS), have brought breakthroughs in deciphering the genetic information in all living species at a large scale and at an affordable level. The NGS DNA indexing system allows sequencing multiple genes for large number of individuals in a single run. Our laboratory has adopted and used these technologies for HLA molecular testing services. We found that each sequencing technology has its own strengths and weaknesses, and their sequencing performances complement each other. HLA genes are highly complex and genotyping them is quite challenging. Using these three sequencing platforms, we were able to meet all requirements for G group-level high resolution and high volume HLA typing. Copyright © 2015 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.

  2. Critical Issues in Large-Scale Assessment: A Resource Guide.

    ERIC Educational Resources Information Center

    Redfield, Doris

    The purpose of this document is to provide practical guidance and support for the design, development, and implementation of large-scale assessment systems that are grounded in research and best practice. Information is included about existing large-scale testing efforts, including national testing programs, state testing programs, and…

  3. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses

    PubMed Central

    Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

    2014-01-01

    Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. PMID:24462600

  4. Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.

    PubMed

    Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

    2014-06-01

    Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. Copyright © 2014 Elsevier Inc. All rights reserved.

  5. Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome

    PubMed Central

    2009-01-01

    Background Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. Results We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. Conclusion We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes. PMID:19656416

  6. Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome.

    PubMed

    Hamberger, Björn; Hall, Dawn; Yuen, Mack; Oddy, Claire; Hamberger, Britta; Keeling, Christopher I; Ritland, Carol; Ritland, Kermit; Bohlmann, Jörg

    2009-08-06

    Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes.

  7. Random access in large-scale DNA data storage.

    PubMed

    Organick, Lee; Ang, Siena Dumas; Chen, Yuan-Jyue; Lopez, Randolph; Yekhanin, Sergey; Makarychev, Konstantin; Racz, Miklos Z; Kamath, Govinda; Gopalan, Parikshit; Nguyen, Bichlien; Takahashi, Christopher N; Newman, Sharon; Parker, Hsing-Yeh; Rashtchian, Cyrus; Stewart, Kendall; Gupta, Gagan; Carlson, Robert; Mulligan, John; Carmean, Douglas; Seelig, Georg; Ceze, Luis; Strauss, Karin

    2018-03-01

    Synthetic DNA is durable and can encode digital data with high density, making it an attractive medium for data storage. However, recovering stored data on a large-scale currently requires all the DNA in a pool to be sequenced, even if only a subset of the information needs to be extracted. Here, we encode and store 35 distinct files (over 200 MB of data), in more than 13 million DNA oligonucleotides, and show that we can recover each file individually and with no errors, using a random access approach. We design and validate a large library of primers that enable individual recovery of all files stored within the DNA. We also develop an algorithm that greatly reduces the sequencing read coverage required for error-free decoding by maximizing information from all sequence reads. These advances demonstrate a viable, large-scale system for DNA data storage and retrieval.

  8. Large-Scale Biomonitoring of Remote and Threatened Ecosystems via High-Throughput Sequencing

    PubMed Central

    Gibson, Joel F.; Shokralla, Shadi; Curry, Colin; Baird, Donald J.; Monk, Wendy A.; King, Ian; Hajibabaei, Mehrdad

    2015-01-01

    Biodiversity metrics are critical for assessment and monitoring of ecosystems threatened by anthropogenic stressors. Existing sorting and identification methods are too expensive and labour-intensive to be scaled up to meet management needs. Alternately, a high-throughput DNA sequencing approach could be used to determine biodiversity metrics from bulk environmental samples collected as part of a large-scale biomonitoring program. Here we show that both morphological and DNA sequence-based analyses are suitable for recovery of individual taxonomic richness, estimation of proportional abundance, and calculation of biodiversity metrics using a set of 24 benthic samples collected in the Peace-Athabasca Delta region of Canada. The high-throughput sequencing approach was able to recover all metrics with a higher degree of taxonomic resolution than morphological analysis. The reduced cost and increased capacity of DNA sequence-based approaches will finally allow environmental monitoring programs to operate at the geographical and temporal scale required by industrial and regulatory end-users. PMID:26488407

  9. Genetic Population Structure of Tectura paleacea: Implications for the Mechanisms Regulating Population Structure in Patchy Coastal Habitats

    PubMed Central

    Begovic, Emina; Lindberg, David R.

    2011-01-01

    The seagrass limpet Tectura paleacea (Gastropoda; Patellogastropoda) belongs to a seagrass obligate lineage that has shifted from the Caribbean in the late Miocene, across the Isthmus of Panama prior to the closing of the Panamanian seaway, and then northward to its modern Baja California – Oregon distribution. To address whether larval entrainment by seagrass beds contributes to population structuring, populations were sampled at six California/Oregon localities approximately 2 degrees latitude apart during two post-settlement periods in July 2002 and June 2003. Partial cytochrome oxidase b (Cytb) sequences were obtained from 20 individuals (10 per year) from each population in order to determine the levels of population subdivision/connectivity. From the 120 individuals sequenced, there were eighty-one unique haplotypes, with the greatest haplotype diversity occurring in southern populations. The only significant genetic break detected was consistent with a peri-Point Conception (PPC) biogeographic boundary while populations north and south of Point Conception were each panmictic. The data further indicate that populations found south of the PPC biogeographic boundary originated from northern populations. This pattern of population structure suggests that seagrass patches are not entraining the larvae of T. paleacea by altering flow regimes within their environment; a process hypothesized to produce extensive genetic subdivision on fine geographic scales. In contrast to the haplotype data, morphological patterns vary significantly over very fine geographic scales that are inconsistent with the observed patterns of genetic population structure, indicating that morphological variation in T. paleacea might be attributed to differential ecophenotypic expression in response to local habitat variability throughout its distribution. These results suggest that highly localized conservation efforts may not be as effective as large-scale conservation efforts in near shore marine environments. PMID:21490969

  10. ReSeqTools: an integrated toolkit for large-scale next-generation sequencing based resequencing analysis.

    PubMed

    He, W; Zhao, S; Liu, X; Dong, S; Lv, J; Liu, D; Wang, J; Meng, Z

    2013-12-04

    Large-scale next-generation sequencing (NGS)-based resequencing detects sequence variations, constructs evolutionary histories, and identifies phenotype-related genotypes. However, NGS-based resequencing studies generate extraordinarily large amounts of data, making computations difficult. Effective use and analysis of these data for NGS-based resequencing studies remains a difficult task for individual researchers. Here, we introduce ReSeqTools, a full-featured toolkit for NGS (Illumina sequencing)-based resequencing analysis, which processes raw data, interprets mapping results, and identifies and annotates sequence variations. ReSeqTools provides abundant scalable functions for routine resequencing analysis in different modules to facilitate customization of the analysis pipeline. ReSeqTools is designed to use compressed data files as input or output to save storage space and facilitates faster and more computationally efficient large-scale resequencing studies in a user-friendly manner. It offers abundant practical functions and generates useful statistics during the analysis pipeline, which significantly simplifies resequencing analysis. Its integrated algorithms and abundant sub-functions provide a solid foundation for special demands in resequencing projects. Users can combine these functions to construct their own pipelines for other purposes.

  11. Defining precision: The precision medicine initiative trials NCI-MPACT and NCI-MATCH.

    PubMed

    Coyne, Geraldine O'Sullivan; Takebe, Naoko; Chen, Alice P

    "Precision" trials, using rationally incorporated biomarker targets and molecularly selective anticancer agents, have become of great interest to both patients and their physicians. In the endeavor to test the cornerstone premise of precision oncotherapy, that is, determining if modulating a specific molecular aberration in a patient's tumor with a correspondingly specific therapeutic agent improves clinical outcomes, the design of clinical trials with embedded genomic characterization platforms which guide therapy are an increasing challenge. The National Cancer Institute Precision Medicine Initiative is an unprecedented large interdisciplinary collaborative effort to conceptualize and test the feasibility of trials incorporating sequencing platforms and large-scale bioinformatics processing that are not currently uniformly available to patients. National Cancer Institute-Molecular Profiling-based Assignment of Cancer Therapy and National Cancer Institute-Molecular Analysis for Therapy Choice are 2 genomic to phenotypic trials under this National Cancer Institute initiative, where treatment is selected according to predetermined genetic alterations detected using next-generation sequencing technology across a broad range of tumor types. In this article, we discuss the objectives and trial designs that have enabled the public-private partnerships required to complete the scale of both trials, as well as interim trial updates and strategic considerations that have driven data analysis and targeted therapy assignment, with the intent of elucidating further the benefits of this treatment approach for patients. Copyright © 2017. Published by Elsevier Inc.

  12. Comparative genomic characterization of citrus-associated Xylella fastidiosa strains.

    PubMed

    da Silva, Vivian S; Shida, Cláudio S; Rodrigues, Fabiana B; Ribeiro, Diógenes C D; de Souza, Alessandra A; Coletta-Filho, Helvécio D; Machado, Marcos A; Nunes, Luiz R; de Oliveira, Regina Costa

    2007-12-21

    The xylem-inhabiting bacterium Xylella fastidiosa (Xf) is the causal agent of Pierce's disease (PD) in vineyards and citrus variegated chlorosis (CVC) in orange trees. Both of these economically-devastating diseases are caused by distinct strains of this complex group of microorganisms, which has motivated researchers to conduct extensive genomic sequencing projects with Xf strains. This sequence information, along with other molecular tools, have been used to estimate the evolutionary history of the group and provide clues to understand the capacity of Xf to infect different hosts, causing a variety of symptoms. Nonetheless, although significant amounts of information have been generated from Xf strains, a large proportion of these efforts has concentrated on the study of North American strains, limiting our understanding about the genomic composition of South American strains - which is particularly important for CVC-associated strains. This paper describes the first genome-wide comparison among South American Xf strains, involving 6 distinct citrus-associated bacteria. Comparative analyses performed through a microarray-based approach allowed identification and characterization of large mobile genetic elements that seem to be exclusive to South American strains. Moreover, a large-scale sequencing effort, based on Suppressive Subtraction Hybridization (SSH), identified 290 new ORFs, distributed in 135 Groups of Orthologous Elements, throughout the genomes of these bacteria. Results from microarray-based comparisons provide further evidence concerning activity of horizontally transferred elements, reinforcing their importance as major mediators in the evolution of Xf. Moreover, the microarray-based genomic profiles showed similarity between Xf strains 9a5c and Fb7, which is unexpected, given the geographical and chronological differences associated with the isolation of these microorganisms. The newly identified ORFs, obtained by SSH, represent an approximately 10% increase in our current knowledge of the South American Xf gene pool and include new putative virulence factors, as well as novel potential markers for strain identification. Surprisingly, this list of novel elements include sequences previously believed to be unique to North American strains, pointing to the necessity of revising the list of specific markers that may be used for identification of distinct Xf strains.

  13. Cyclicity in Upper Mississippian Bangor Limestone, Blount County, Alabama

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bronner, R.L.

    1988-01-01

    The Upper Mississippian (Chesterian) Bangor Limestone in Alabama consists of a thick, complex sequence of carbonate platform deposits. A continuous core through the Bangor on Blount Mountain in north-central Alabama provides the opportunity to analyze the unit for cyclicity and to identify controls on vertical facies sequence. Lithologies from the core represent four general environments of deposition: (1) subwave-base, open marine, (2) shoal, (3) lagoon, and (4) peritidal. Analysis of the vertical sequence of lithologies in the core indicates the presence of eight large-scale cycles dominated by subtidal deposits, but defined on the basis of peritidal caps. These large-scale cyclesmore » can be subdivided into 16 small-scale cycles that may be entirely subtidal but illustrate upward shallowing followed by rapid deepening. Large-scale cycles range from 33 to 136 ft thick, averaging 68 ft; small-scale cycles range from 5 to 80 ft thick and average 34 ft. Small-scale cycles have an average duration of approximately 125,000 years, which is compatible with Milankovitch periodicity. The large-scale cycles have an average duration of approximately 250,000 years, which may simply reflect variations in amplitude of sea level fluctuation or the influence of tectonic subsidence along the southeastern margin of the North American craton.« less

  14. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data.

    PubMed

    Jun, Goo; Wing, Mary Kate; Abecasis, Gonçalo R; Kang, Hyun Min

    2015-06-01

    The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies. © 2015 Jun et al.; Published by Cold Spring Harbor Laboratory Press.

  15. Preliminary Validation of a New Measure of Negative Response Bias: The Temporal Memory Sequence Test.

    PubMed

    Hegedish, Omer; Kivilis, Naama; Hoofien, Dan

    2015-01-01

    The Temporal Memory Sequence Test (TMST) is a new measure of negative response bias (NRB) that was developed to enrich the forced-choice paradigm. The TMST does not resemble the common structure of forced-choice tests and is presented as a temporal recall memory test. The validation sample consisted of 81 participants: 21 healthy control participants, 20 coached simulators, and 40 patients with acquired brain injury (ABI). The TMST had high reliability and significantly high positive correlations with the Test of Memory Malingering and Word Memory Test effort scales. Moreover, the TMST effort scales exhibited high negative correlations with the Glasgow Coma Scale, thus validating the previously reported association between probable malingering and mild traumatic brain injury. A suggested cutoff score yielded acceptable classification rates in the ABI group as well as in the simulator and control groups. The TMST appears to be a promising measure of NRB detection, with respectable rates of reliability and construct and criterion validity.

  16. The power and promise of RNA-seq in ecology and evolution.

    PubMed

    Todd, Erica V; Black, Michael A; Gemmell, Neil J

    2016-03-01

    Reference is regularly made to the power of new genomic sequencing approaches. Using powerful technology, however, is not the same as having the necessary power to address a research question with statistical robustness. In the rush to adopt new and improved genomic research methods, limitations of technology and experimental design may be initially neglected. Here, we review these issues with regard to RNA sequencing (RNA-seq). RNA-seq adds large-scale transcriptomics to the toolkit of ecological and evolutionary biologists, enabling differential gene expression (DE) studies in nonmodel species without the need for prior genomic resources. High biological variance is typical of field-based gene expression studies and means that larger sample sizes are often needed to achieve the same degree of statistical power as clinical studies based on data from cell lines or inbred animal models. Sequencing costs have plummeted, yet RNA-seq studies still underutilize biological replication. Finite research budgets force a trade-off between sequencing effort and replication in RNA-seq experimental design. However, clear guidelines for negotiating this trade-off, while taking into account study-specific factors affecting power, are currently lacking. Study designs that prioritize sequencing depth over replication fail to capitalize on the power of RNA-seq technology for DE inference. Significant recent research effort has gone into developing statistical frameworks and software tools for power analysis and sample size calculation in the context of RNA-seq DE analysis. We synthesize progress in this area and derive an accessible rule-of-thumb guide for designing powerful RNA-seq experiments relevant in eco-evolutionary and clinical settings alike. © 2016 John Wiley & Sons Ltd.

  17. Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis

    PubMed Central

    2012-01-01

    Background The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Results Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. Conclusions By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand. PMID:22276739

  18. Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis.

    PubMed

    Tu, Jing; Ge, Qinyu; Wang, Shengqin; Wang, Lei; Sun, Beili; Yang, Qi; Bai, Yunfei; Lu, Zuhong

    2012-01-25

    The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand.

  19. Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia.

    PubMed

    Shortt, Jonathan A; Card, Daren C; Schield, Drew R; Liu, Yang; Zhong, Bo; Castoe, Todd A; Carlton, Elizabeth J; Pollock, David D

    2017-01-01

    In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies. We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq) to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample. This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance, and could be applied to other schistosome species and other parasitic helminthes.

  20. The Comprehensive Microbial Resource

    PubMed Central

    Peterson, Jeremy D.; Umayam, Lowell A.; Dickinson, Tanja; Hickey, Erin K.; White, Owen

    2001-01-01

    One challenge presented by large-scale genome sequencing efforts is effective display of uniform information to the scientific community. The Comprehensive Microbial Resource (CMR) contains robust annotation of all complete microbial genomes and allows for a wide variety of data retrievals. The bacterial information has been placed on the Web at http://www.tigr.org/CMR for retrieval using standard web browsing technology. Retrievals can be based on protein properties such as molecular weight or hydrophobicity, GC-content, functional role assignments and taxonomy. The CMR also has special web-based tools to allow data mining using pre-run homology searches, whole genome dot-plots, batch downloading and traversal across genomes using a variety of datatypes. PMID:11125067

  1. DNA barcoding as a tool for coral reef conservation

    NASA Astrophysics Data System (ADS)

    Neigel, J.; Domingo, A.; Stake, J.

    2007-09-01

    DNA Barcoding (DBC) is a method for taxonomic identification of animals that is based entirely on the 5' portion of the mitochondrial gene, cytochrome oxidase subunit I ( COI-5). It can be especially useful for identification of larval forms or incomplete specimens lacking diagnostic morphological characters. DBC can also facilitate the discovery of species and in defining “molecular taxonomic units” in problematic groups. However, DBC is not a panacea for coral reef taxonomy. In two of the most ecologically important groups on coral reefs, the Anthozoa and Porifera, COI-5 sequences have diverged too little to be diagnostic for all species. Other problems for DBC include paraphyly in mitochondrial gene trees and lack of differentiation between hybrids and their maternal ancestors. DBC also depends on the availability of databases of COI-5 sequences, which are still in early stages of development. A global effort to barcode all fish species has demonstrated the importance of large-scale coordination and is yielding promising results. Whether or not COI-5 by itself is sufficient for species assignments has become a contentious question; it is generally advantageous to use sequences from multiple loci.

  2. The African Genome Variation Project shapes medical genetics in Africa

    NASA Astrophysics Data System (ADS)

    Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.

    2015-01-01

    Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.

  3. The African Genome Variation Project shapes medical genetics in Africa.

    PubMed

    Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O; Choudhury, Ananyo; Ritchie, Graham R S; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N; Young, Elizabeth H; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S

    2015-01-15

    Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.

  4. A negative genetic interaction map in isogenic cancer cell lines reveals cancer cell vulnerabilities

    PubMed Central

    Vizeacoumar, Franco J; Arnold, Roland; Vizeacoumar, Frederick S; Chandrashekhar, Megha; Buzina, Alla; Young, Jordan T F; Kwan, Julian H M; Sayad, Azin; Mero, Patricia; Lawo, Steffen; Tanaka, Hiromasa; Brown, Kevin R; Baryshnikova, Anastasia; Mak, Anthony B; Fedyshyn, Yaroslav; Wang, Yadong; Brito, Glauber C; Kasimer, Dahlia; Makhnevych, Taras; Ketela, Troy; Datti, Alessandro; Babu, Mohan; Emili, Andrew; Pelletier, Laurence; Wrana, Jeff; Wainberg, Zev; Kim, Philip M; Rottapel, Robert; O'Brien, Catherine A; Andrews, Brenda; Boone, Charles; Moffat, Jason

    2013-01-01

    Improved efforts are necessary to define the functional product of cancer mutations currently being revealed through large-scale sequencing efforts. Using genome-scale pooled shRNA screening technology, we mapped negative genetic interactions across a set of isogenic cancer cell lines and confirmed hundreds of these interactions in orthogonal co-culture competition assays to generate a high-confidence genetic interaction network of differentially essential or differential essentiality (DiE) genes. The network uncovered examples of conserved genetic interactions, densely connected functional modules derived from comparative genomics with model systems data, functions for uncharacterized genes in the human genome and targetable vulnerabilities. Finally, we demonstrate a general applicability of DiE gene signatures in determining genetic dependencies of other non-isogenic cancer cell lines. For example, the PTEN−/− DiE genes reveal a signature that can preferentially classify PTEN-dependent genotypes across a series of non-isogenic cell lines derived from the breast, pancreas and ovarian cancers. Our reference network suggests that many cancer vulnerabilities remain to be discovered through systematic derivation of a network of differentially essential genes in an isogenic cancer cell model. PMID:24104479

  5. De Novo Protein Structure Prediction

    NASA Astrophysics Data System (ADS)

    Hung, Ling-Hong; Ngan, Shing-Chung; Samudrala, Ram

    An unparalleled amount of sequence data is being made available from large-scale genome sequencing efforts. The data provide a shortcut to the determination of the function of a gene of interest, as long as there is an existing sequenced gene with similar sequence and of known function. This has spurred structural genomic initiatives with the goal of determining as many protein folds as possible (Brenner and Levitt, 2000; Burley, 2000; Brenner, 2001; Heinemann et al., 2001). The purpose of this is twofold: First, the structure of a gene product can often lead to direct inference of its function. Second, since the function of a protein is dependent on its structure, direct comparison of the structures of gene products can be more sensitive than the comparison of sequences of genes for detecting homology. Presently, structural determination by crystallography and NMR techniques is still slow and expensive in terms of manpower and resources, despite attempts to automate the processes. Computer structure prediction algorithms, while not providing the accuracy of the traditional techniques, are extremely quick and inexpensive and can provide useful low-resolution data for structure comparisons (Bonneau and Baker, 2001). Given the immense number of structures which the structural genomic projects are attempting to solve, there would be a considerable gain even if the computer structure prediction approach were applicable to a subset of proteins.

  6. Captured metagenomics: large-scale targeting of genes based on ‘sequence capture’ reveals functional diversity in soils

    PubMed Central

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K.; Hedlund, Katarina; Ahrén, Dag

    2015-01-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. PMID:26490729

  7. Novel genomic findings in multiple myeloma identified through routine diagnostic sequencing.

    PubMed

    Ryland, Georgina L; Jones, Kate; Chin, Melody; Markham, John; Aydogan, Elle; Kankanige, Yamuna; Caruso, Marisa; Guinto, Jerick; Dickinson, Michael; Prince, H Miles; Yong, Kwee; Blombery, Piers

    2018-05-14

    Multiple myeloma is a genomically complex haematological malignancy with many genomic alterations recognised as important in diagnosis, prognosis and therapeutic decision making. Here, we provide a summary of genomic findings identified through routine diagnostic next-generation sequencing at our centre. A cohort of 86 patients with multiple myeloma underwent diagnostic sequencing using a custom hybridisation-based panel targeting 104 genes. Sequence variants, genome-wide copy number changes and structural rearrangements were detected using an inhouse-developed bioinformatics pipeline. At least one mutation was found in 69 (80%) patients. Frequently mutated genes included TP53 (36%), KRAS (22.1%), NRAS (15.1%), FAM46C/DIS3 (8.1%) and TET2/FGFR3 (5.8%), including multiple mutations not previously described in myeloma. Importantly we observed TP53 mutations in the absence of a 17 p deletion in 8% of the cohort, highlighting the need for sequencing-based assessment in addition to cytogenetics to identify these high-risk patients. Multiple novel copy number changes and immunoglobulin heavy chain translocations are also discussed. Our results demonstrate that many clinically relevant genomic findings remain in multiple myeloma which have not yet been identified through large-scale sequencing efforts, and provide important mechanistic insights into plasma cell pathobiology. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  8. ESRI applications of GIS technology: Mineral resource development

    NASA Technical Reports Server (NTRS)

    Derrenbacher, W.

    1981-01-01

    The application of geographic information systems technology to large scale regional assessment related to mineral resource development, identifying candidate sites for related industry, and evaluating sites for waste disposal is discussed. Efforts to develop data bases were conducted at scales ranging from 1:3,000,000 to 1:25,000. In several instances, broad screening was conducted for large areas at a very general scale with more detailed studies subsequently undertaken in promising areas windowed out of the generalized data base. Increasingly, the systems which are developed are structured as the spatial framework for the long-term collection, storage, referencing, and retrieval of vast amounts of data about large regions. Typically, the reconnaissance data base for a large region is structured at 1:250,000 scale, data bases for smaller areas being structured at 1:25,000, 1:50,000 or 1:63,360. An integrated data base for the coterminous US was implemented at a scale of 1:3,000,000 for two separate efforts.

  9. Entomological Collections in the Age of Big Data.

    PubMed

    Short, Andrew Edward Z; Dikow, Torsten; Moreau, Corrie S

    2018-01-07

    With a million described species and more than half a billion preserved specimens, the large scale of insect collections is unequaled by those of any other group. Advances in genomics, collection digitization, and imaging have begun to more fully harness the power that such large data stores can provide. These new approaches and technologies have transformed how entomological collections are managed and utilized. While genomic research has fundamentally changed the way many specimens are collected and curated, advances in technology have shown promise for extracting sequence data from the vast holdings already in museums. Efforts to mainstream specimen digitization have taken root and have accelerated traditional taxonomic studies as well as distribution modeling and global change research. Emerging imaging technologies such as microcomputed tomography and confocal laser scanning microscopy are changing how morphology can be investigated. This review provides an overview of how the realization of big data has transformed our field and what may lie in store.

  10. Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts.

    PubMed

    Hakenberg, Jörg; Cheng, Wei-Yi; Thomas, Philippe; Wang, Ying-Chih; Uzilov, Andrew V; Chen, Rong

    2016-01-08

    Data from a plethora of high-throughput sequencing studies is readily available to researchers, providing genetic variants detected in a variety of healthy and disease populations. While each individual cohort helps gain insights into polymorphic and disease-associated variants, a joint perspective can be more powerful in identifying polymorphisms, rare variants, disease-associations, genetic burden, somatic variants, and disease mechanisms. We have set up a Reference Variant Store (RVS) containing variants observed in a number of large-scale sequencing efforts, such as 1000 Genomes, ExAC, Scripps Wellderly, UK10K; various genotyping studies; and disease association databases. RVS holds extensive annotations pertaining to affected genes, functional impacts, disease associations, and population frequencies. RVS currently stores 400 million distinct variants observed in more than 80,000 human samples. RVS facilitates cross-study analysis to discover novel genetic risk factors, gene-disease associations, potential disease mechanisms, and actionable variants. Due to its large reference populations, RVS can also be employed for variant filtration and gene prioritization. A web interface to public datasets and annotations in RVS is available at https://rvs.u.hpc.mssm.edu/.

  11. A Glance at Microsatellite Motifs from 454 Sequencing Reads of Watermelon Genomic DNA

    USDA-ARS?s Scientific Manuscript database

    A single 454 (Life Sciences Sequencing Technology) run of Charleston Gray watermelon (Citrullus lanatus var. lanatus) genomic DNA was performed and sequence data were assembled. A large scale identification of simple sequence repeat (SSR) was performed and SSR sequence data were used for the develo...

  12. How well could existing sensors detect the deployment of a solar radiation management (SRM) geoengineering effort?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hurd, Alan J.

    2016-04-29

    While the stated reason for asking this question is “to understand better our ability to warn policy makers in the unlikely event of an unanticipated SRM geoengineering deployment or large-scale field experiment”, my colleagues and I felt that motives would be important context because the scale of any meaningful SRM deployment would be so large that covert deployment seems impossible. However, several motives emerged that suggest a less-than-global effort might be important.

  13. Large Composite Structures Processing Technologies for Reusable Launch Vehicles

    NASA Technical Reports Server (NTRS)

    Clinton, R. G., Jr.; Vickers, J. H.; McMahon, W. M.; Hulcher, A. B.; Johnston, N. J.; Cano, R. J.; Belvin, H. L.; McIver, K.; Franklin, W.; Sidwell, D.

    2001-01-01

    Significant efforts have been devoted to establishing the technology foundation to enable the progression to large scale composite structures fabrication. We are not capable today of fabricating many of the composite structures envisioned for the second generation reusable launch vehicle (RLV). Conventional 'aerospace' manufacturing and processing methodologies (fiber placement, autoclave, tooling) will require substantial investment and lead time to scale-up. Out-of-autoclave process techniques will require aggressive efforts to mature the selected technologies and to scale up. Focused composite processing technology development and demonstration programs utilizing the building block approach are required to enable envisioned second generation RLV large composite structures applications. Government/industry partnerships have demonstrated success in this area and represent best combination of skills and capabilities to achieve this goal.

  14. Evolutionary dynamics of selfish DNA explains the abundance distribution of genomic subsequences

    PubMed Central

    Sheinman, Michael; Ramisch, Anna; Massip, Florian; Arndt, Peter F.

    2016-01-01

    Since the sequencing of large genomes, many statistical features of their sequences have been found. One intriguing feature is that certain subsequences are much more abundant than others. In fact, abundances of subsequences of a given length are distributed with a scale-free power-law tail, resembling properties of human texts, such as Zipf’s law. Despite recent efforts, the understanding of this phenomenon is still lacking. Here we find that selfish DNA elements, such as those belonging to the Alu family of repeats, dominate the power-law tail. Interestingly, for the Alu elements the power-law exponent increases with the length of the considered subsequences. Motivated by these observations, we develop a model of selfish DNA expansion. The predictions of this model qualitatively and quantitatively agree with the empirical observations. This allows us to estimate parameters for the process of selfish DNA spreading in a genome during its evolution. The obtained results shed light on how evolution of selfish DNA elements shapes non-trivial statistical properties of genomes. PMID:27488939

  15. G2S: a web-service for annotating genomic variants on 3D protein structures.

    PubMed

    Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong

    2018-06-01

    Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that supports programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online.

  16. Cumulative effects of restoration efforts on ecological characteristics of an open water area within the Upper Mississippi River

    USGS Publications Warehouse

    Gray, B.R.; Shi, W.; Houser, J.N.; Rogala, J.T.; Guan, Z.; Cochran-Biederman, J. L.

    2011-01-01

    Ecological restoration efforts in large rivers generally aim to ameliorate ecological effects associated with large-scale modification of those rivers. This study examined whether the effects of restoration efforts-specifically those of island construction-within a largely open water restoration area of the Upper Mississippi River (UMR) might be seen at the spatial scale of that 3476ha area. The cumulative effects of island construction, when observed over multiple years, were postulated to have made the restoration area increasingly similar to a positive reference area (a proximate area comprising contiguous backwater areas) and increasingly different from two negative reference areas. The negative reference areas represented the Mississippi River main channel in an area proximate to the restoration area and an open water area in a related Mississippi River reach that has seen relatively little restoration effort. Inferences on the effects of restoration were made by comparing constrained and unconstrained models of summer chlorophyll a (CHL), summer inorganic suspended solids (ISS) and counts of benthic mayfly larvae. Constrained models forced trends in means or in both means and sampling variances to become, over time, increasingly similar to those in the positive reference area and increasingly dissimilar to those in the negative reference areas. Trends were estimated over 12- (mayflies) or 14-year sampling periods, and were evaluated using model information criteria. Based on these methods, restoration effects were observed for CHL and mayflies while evidence in favour of restoration effects on ISS was equivocal. These findings suggest that the cumulative effects of island building at relatively large spatial scales within large rivers may be estimated using data from large-scale surveillance monitoring programs. Published in 2010 by John Wiley & Sons, Ltd.

  17. Discovery of phosphonic acid natural products by mining the genomes of 10,000 actinomycetes.

    PubMed

    Ju, Kou-San; Gao, Jiangtao; Doroghazi, James R; Wang, Kwo-Kwang A; Thibodeaux, Christopher J; Li, Steven; Metzger, Emily; Fudala, John; Su, Joleen; Zhang, Jun Kai; Lee, Jaeheon; Cioni, Joel P; Evans, Bradley S; Hirota, Ryuichi; Labeda, David P; van der Donk, Wilfred A; Metcalf, William W

    2015-09-29

    Although natural products have been a particularly rich source of human medicines, activity-based screening results in a very high rate of rediscovery of known molecules. Based on the large number of natural product biosynthetic genes in microbial genomes, many have proposed "genome mining" as an alternative approach for discovery efforts; however, this idea has yet to be performed experimentally on a large scale. Here, we demonstrate the feasibility of large-scale, high-throughput genome mining by screening a collection of over 10,000 actinomycetes for the genetic potential to make phosphonic acids, a class of natural products with diverse and useful bioactivities. Genome sequencing identified a diverse collection of phosphonate biosynthetic gene clusters within 278 strains. These clusters were classified into 64 distinct groups, of which 55 are likely to direct the synthesis of unknown compounds. Characterization of strains within five of these groups resulted in the discovery of a new archetypical pathway for phosphonate biosynthesis, the first (to our knowledge) dedicated pathway for H-phosphinates, and 11 previously undescribed phosphonic acid natural products. Among these compounds are argolaphos, a broad-spectrum antibacterial phosphonopeptide composed of aminomethylphosphonate in peptide linkage to a rare amino acid N(5)-hydroxyarginine; valinophos, an N-acetyl l-Val ester of 2,3-dihydroxypropylphosphonate; and phosphonocystoximate, an unusual thiohydroximate-containing molecule representing a new chemotype of sulfur-containing phosphonate natural products. Analysis of the genome sequences from the remaining strains suggests that the majority of the phosphonate biosynthetic repertoire of Actinobacteria has been captured at the gene level. This dereplicated strain collection now provides a reservoir of numerous, as yet undiscovered, phosphonate natural products.

  18. Discovery of phosphonic acid natural products by mining the genomes of 10,000 actinomycetes

    PubMed Central

    Ju, Kou-San; Gao, Jiangtao; Doroghazi, James R.; Wang, Kwo-Kwang A.; Thibodeaux, Christopher J.; Li, Steven; Metzger, Emily; Fudala, John; Su, Joleen; Zhang, Jun Kai; Lee, Jaeheon; Cioni, Joel P.; Evans, Bradley S.; Hirota, Ryuichi; Labeda, David P.; van der Donk, Wilfred A.; Metcalf, William W.

    2015-01-01

    Although natural products have been a particularly rich source of human medicines, activity-based screening results in a very high rate of rediscovery of known molecules. Based on the large number of natural product biosynthetic genes in microbial genomes, many have proposed “genome mining” as an alternative approach for discovery efforts; however, this idea has yet to be performed experimentally on a large scale. Here, we demonstrate the feasibility of large-scale, high-throughput genome mining by screening a collection of over 10,000 actinomycetes for the genetic potential to make phosphonic acids, a class of natural products with diverse and useful bioactivities. Genome sequencing identified a diverse collection of phosphonate biosynthetic gene clusters within 278 strains. These clusters were classified into 64 distinct groups, of which 55 are likely to direct the synthesis of unknown compounds. Characterization of strains within five of these groups resulted in the discovery of a new archetypical pathway for phosphonate biosynthesis, the first (to our knowledge) dedicated pathway for H-phosphinates, and 11 previously undescribed phosphonic acid natural products. Among these compounds are argolaphos, a broad-spectrum antibacterial phosphonopeptide composed of aminomethylphosphonate in peptide linkage to a rare amino acid N5-hydroxyarginine; valinophos, an N-acetyl l-Val ester of 2,3-dihydroxypropylphosphonate; and phosphonocystoximate, an unusual thiohydroximate-containing molecule representing a new chemotype of sulfur-containing phosphonate natural products. Analysis of the genome sequences from the remaining strains suggests that the majority of the phosphonate biosynthetic repertoire of Actinobacteria has been captured at the gene level. This dereplicated strain collection now provides a reservoir of numerous, as yet undiscovered, phosphonate natural products. PMID:26324907

  19. Assessing the effects of fire disturbances on ecosystems: A scientific agenda for research and management

    USGS Publications Warehouse

    Schmoldt, D.L.; Peterson, D.L.; Keane, R.E.; Lenihan, J.M.; McKenzie, D.; Weise, D.R.; Sandberg, D.V.

    1999-01-01

    A team of fire scientists and resource managers convened 17-19 April 1996 in Seattle, Washington, to assess the effects of fire disturbance on ecosystems. Objectives of this workshop were to develop scientific recommendations for future fire research and management activities. These recommendations included a series of numerically ranked scientific and managerial questions and responses focusing on (1) links among fire effects, fuels, and climate; (2) fire as a large-scale disturbance; (3) fire-effects modeling structures; and (4) managerial concerns, applications, and decision support. At the present time, understanding of fire effects and the ability to extrapolate fire-effects knowledge to large spatial scales are limited, because most data have been collected at small spatial scales for specific applications. Although we clearly need more large-scale fire-effects data, it will be more expedient to concentrate efforts on improving and linking existing models that simulate fire effects in a georeferenced format while integrating empirical data as they become available. A significant component of this effort should be improved communication between modelers and managers to develop modeling tools to use in a planning context. Another component of this modeling effort should improve our ability to predict the interactions of fire and potential climatic change at very large spatial scales. The priority issues and approaches described here provide a template for fire science and fire management programs in the next decade and beyond.

  20. Omics approaches in food safety: fulfilling the promise?

    PubMed Central

    Bergholz, Teresa M.; Moreno Switt, Andrea I.; Wiedmann, Martin

    2014-01-01

    Genomics, transcriptomics, and proteomics are rapidly transforming our approaches to detection, prevention and treatment of foodborne pathogens. Microbial genome sequencing in particular has evolved from a research tool into an approach that can be used to characterize foodborne pathogen isolates as part of routine surveillance systems. Genome sequencing efforts will not only improve outbreak detection and source tracking, but will also create large amounts of foodborne pathogen genome sequence data, which will be available for data mining efforts that could facilitate better source attribution and provide new insights into foodborne pathogen biology and transmission. While practical uses and application of metagenomics, transcriptomics, and proteomics data and associated tools are less prominent, these tools are also starting to yield practical food safety solutions. PMID:24572764

  1. Sequence analysis reveals genomic factors affecting EST-SSR primer performance and polymorphism

    USDA-ARS?s Scientific Manuscript database

    Search for simple sequence repeat (SSR) motifs and design of flanking primers in expressed sequence tag (EST) sequences can be easily done at a large scale using bioinformatics programs. However, failed amplification and/or detection, along with lack of polymorphism, is often seen among randomly sel...

  2. A resource of large-scale molecular markers for monitoring Agropyron cristatum chromatin introgression in wheat background based on transcriptome sequences.

    PubMed

    Zhang, Jinpeng; Liu, Weihua; Lu, Yuqing; Liu, Qunxing; Yang, Xinming; Li, Xiuquan; Li, Lihui

    2017-09-20

    Agropyron cristatum is a wild grass of the tribe Triticeae and serves as a gene donor for wheat improvement. However, very few markers can be used to monitor A. cristatum chromatin introgressions in wheat. Here, we reported a resource of large-scale molecular markers for tracking alien introgressions in wheat based on transcriptome sequences. By aligning A. cristatum unigenes with the Chinese Spring reference genome sequences, we designed 9602 A. cristatum expressed sequence tag-sequence-tagged site (EST-STS) markers for PCR amplification and experimental screening. As a result, 6063 polymorphic EST-STS markers were specific for the A. cristatum P genome in the single-receipt wheat background. A total of 4956 randomly selected polymorphic EST-STS markers were further tested in eight wheat variety backgrounds, and 3070 markers displaying stable and polymorphic amplification were validated. These markers covered more than 98% of the A. cristatum genome, and the marker distribution density was approximately 1.28 cM. An application case of all EST-STS markers was validated on the A. cristatum 6 P chromosome. These markers were successfully applied in the tracking of alien A. cristatum chromatin. Altogether, this study provided a universal method of large-scale molecular marker development to monitor wild relative chromatin in wheat.

  3. Oklahoma experiences largest earthquake during ongoing regional wastewater injection hazard mitigation efforts

    USGS Publications Warehouse

    Yeck, William; Hayes, Gavin; McNamara, Daniel E.; Rubinstein, Justin L.; Barnhart, William; Earle, Paul; Benz, Harley M.

    2017-01-01

    The 3 September 2016, Mw 5.8 Pawnee earthquake was the largest recorded earthquake in the state of Oklahoma. Seismic and geodetic observations of the Pawnee sequence, including precise hypocenter locations and moment tensor modeling, shows that the Pawnee earthquake occurred on a previously unknown left-lateral strike-slip basement fault that intersects the mapped right-lateral Labette fault zone. The Pawnee earthquake is part of an unprecedented increase in the earthquake rate in Oklahoma that is largely considered the result of the deep injection of waste fluids from oil and gas production. If this is, indeed, the case for the M5.8 Pawnee earthquake, then this would be the largest event to have been induced by fluid injection. Since 2015, Oklahoma has undergone wide-scale mitigation efforts primarily aimed at reducing injection volumes. Thus far in 2016, the rate of M3 and greater earthquakes has decreased as compared to 2015, while the cumulative moment—or energy released from earthquakes—has increased. This highlights the difficulty in earthquake hazard mitigation efforts given the poorly understood long-term diffusive effects of wastewater injection and their connection to seismicity.

  4. Evaluation of 16S Rrna amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers

    USDA-ARS?s Scientific Manuscript database

    Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...

  5. Evaluation of 16S rRNA amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers

    USDA-ARS?s Scientific Manuscript database

    Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...

  6. CLAST: CUDA implemented large-scale alignment search tool.

    PubMed

    Yano, Masahiro; Mori, Hiroshi; Akiyama, Yutaka; Yamada, Takuji; Kurokawa, Ken

    2014-12-11

    Metagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases. Metagenomic analyses with next-generation sequencing technologies produce enormous numbers of reads from microbial communities, and many reads are derived from microbes whose genomes have not yet been sequenced, limiting the usefulness of existing sequence similarity search tools. Therefore, there is a clear need for a sequence similarity search tool that can rapidly detect weak similarity in large datasets. We developed a tool, which we named CLAST (CUDA implemented large-scale alignment search tool), that enables analyses of millions of reads and thousands of reference genome sequences, and runs on NVIDIA Fermi architecture graphics processing units. CLAST has four main advantages over existing alignment tools. First, CLAST was capable of identifying sequence similarities ~80.8 times faster than BLAST and 9.6 times faster than BLAT. Second, CLAST executes global alignment as the default (local alignment is also an option), enabling CLAST to assign reads to taxonomic and functional groups based on evolutionarily distant nucleotide sequences with high accuracy. Third, CLAST does not need a preprocessed sequence database like Burrows-Wheeler Transform-based tools, and this enables CLAST to incorporate large, frequently updated sequence databases. Fourth, CLAST requires <2 GB of main memory, making it possible to run CLAST on a standard desktop computer or server node. CLAST achieved very high speed (similar to the Burrows-Wheeler Transform-based Bowtie 2 for long reads) and sensitivity (equal to BLAST, BLAT, and FR-HIT) without the need for extensive database preprocessing or a specialized computing platform. Our results demonstrate that CLAST has the potential to be one of the most powerful and realistic approaches to analyze the massive amount of sequence data from next-generation sequencing technologies.

  7. Multi-OMICs and Genome Editing Perspectives on Liver Cancer Signaling Networks.

    PubMed

    Lin, Shengda; Yin, Yi A; Jiang, Xiaoqian; Sahni, Nidhi; Yi, Song

    2016-01-01

    The advent of the human genome sequence and the resulting ~20,000 genes provide a crucial framework for a transition from traditional biology to an integrative "OMICs" arena (Lander et al., 2001; Venter et al., 2001; Kitano, 2002). This brings in a revolution for cancer research, which now enters a big data era. In the past decade, with the facilitation by next-generation sequencing, there have been a huge number of large-scale sequencing efforts, such as The Cancer Genome Atlas (TCGA), the HapMap, and the 1000 genomes project. As a result, a deluge of genomic information becomes available from patients stricken by a variety of cancer types. The list of cancer-associated genes is ever expanding. New discoveries are made on how frequent and highly penetrant mutations, such as those in the telomerase reverse transcriptase (TERT) and TP53, function in cancer initiation, progression, and metastasis. Most genes with relatively frequent but weakly penetrant cancer mutations still remain to be characterized. In addition, genes that harbor rare but highly penetrant cancer-associated mutations continue to emerge. Here, we review recent advances related to cancer genomics, proteomics, and systems biology and suggest new perspectives in targeted therapy and precision medicine.

  8. Cloud computing for comparative genomics

    PubMed Central

    2010-01-01

    Background Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. Results We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. Conclusions The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems. PMID:20482786

  9. Cloud computing for comparative genomics.

    PubMed

    Wall, Dennis P; Kudtarkar, Parul; Fusaro, Vincent A; Pivovarov, Rimma; Patil, Prasad; Tonellato, Peter J

    2010-05-18

    Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems.

  10. Lateral fluid flow in a compacting sand-shale sequence: South Caspian basin.

    USGS Publications Warehouse

    Bredehoeft, J.D.; Djevanshir, R.D.; Belitz, K.R.

    1988-01-01

    The South Caspian basin contains both sands and shales that have pore-fluid pressures substantially in excess of hydrostatic fluid pressure. Pore-pressure data from the South Caspian basin demonstrate that large differences in excess hydraulic head exist between sand and shale. The data indicate that sands are acting as drains for overlying and underlying compacting shales and that fluid flows laterally through the sand on a regional scale from the basin interior northward to points of discharge. The major driving force for the fluid movement is shale compaction. We present a first- order mathematical analysis in an effort to test if the permeability of the sands required to support a regional flow system is reasonable. The results of the analysis suggest regional sand permeabilities ranging from 1 to 30 md; a range that seems reasonable. This result supports the thesis that lateral fluid flow is occurring on a regional scale within the South Caspian basin. If vertical conduits for flow exist within the basin, they are sufficiently impermeable and do not provide a major outlet for the regional flow system. The lateral fluid flow within the sands implies that the stratigraphic sequence is divided into horizontal units that are hydraulically isolated from one another, a conclusion that has important implications for oil and gas migration.-Authors

  11. Museum genomics: low-cost and high-accuracy genetic data from historical specimens.

    PubMed

    Rowe, Kevin C; Singhal, Sonal; Macmanes, Matthew D; Ayroles, Julien F; Morelli, Toni Lyn; Rubidge, Emily M; Bi, Ke; Moritz, Craig C

    2011-11-01

    Natural history collections are unparalleled repositories of geographical and temporal variation in faunal conditions. Molecular studies offer an opportunity to uncover much of this variation; however, genetic studies of historical museum specimens typically rely on extracting highly degraded and chemically modified DNA samples from skins, skulls or other dried samples. Despite this limitation, obtaining short fragments of DNA sequences using traditional PCR amplification of DNA has been the primary method for genetic study of historical specimens. Few laboratories have succeeded in obtaining genome-scale sequences from historical specimens and then only with considerable effort and cost. Here, we describe a low-cost approach using high-throughput next-generation sequencing to obtain reliable genome-scale sequence data from a traditionally preserved mammal skin and skull using a simple extraction protocol. We show that single-nucleotide polymorphisms (SNPs) from the genome sequences obtained independently from the skin and from the skull are highly repeatable compared to a reference genome. © 2011 Blackwell Publishing Ltd.

  12. The 3,000 rice genomes project

    PubMed Central

    2014-01-01

    Background Rice, Oryza sativa L., is the staple food for half the world’s population. By 2030, the production of rice must increase by at least 25% in order to keep up with global population growth and demand. Accelerated genetic gains in rice improvement are needed to mitigate the effects of climate change and loss of arable land, as well as to ensure a stable global food supply. Findings We resequenced a core collection of 3,000 rice accessions from 89 countries. All 3,000 genomes had an average sequencing depth of 14×, with average genome coverages and mapping rates of 94.0% and 92.5%, respectively. From our sequencing efforts, approximately 18.9 million single nucleotide polymorphisms (SNPs) in rice were discovered when aligned to the reference genome of the temperate japonica variety, Nipponbare. Phylogenetic analyses based on SNP data confirmed differentiation of the O. sativa gene pool into 5 varietal groups – indica, aus/boro, basmati/sadri, tropical japonica and temperate japonica. Conclusions Here, we report an international resequencing effort of 3,000 rice genomes. This data serves as a foundation for large-scale discovery of novel alleles for important rice phenotypes using various bioinformatics and/or genetic approaches. It also serves to understand the genomic diversity within O. sativa at a higher level of detail. With the release of the sequencing data, the project calls for the global rice community to take advantage of this data as a foundation for establishing a global, public rice genetic/genomic database and information platform for advancing rice breeding technology for future rice improvement. PMID:24872877

  13. From the Battlefield to the Bedside: Supporting Warfighter and Civilian Health With the "ART" of Whole Genome Sequencing for Antibiotic Resistance and Outbreak Investigations.

    PubMed

    Lesho, Emil; Lin, Xiaoxu; Clifford, Robert; Snesrud, Erik; Onmus-Leone, Fatma; Appalla, Lakshmi; Ong, Ana; Maybank, Rosslyn; Nielsen, Lindsey; Kwak, Yoon; Hinkle, Mary; Turco, John; Marin, Juan A; Hooks, Sally; Matthews, Stacy; Hyland, Stephen; Little, Jered; Waterman, Paige; McGann, Patrick

    2016-07-01

    Awareness, responsiveness, and throughput characterize an approach for enhancing the clinical impact of whole genome sequencing for austere environments and for large geographically dispersed health systems. This Department of Defense approach is informing interagency efforts linking antibiograms of multidrug-resistant organisms to their genome sequences in a public database. Reprint & Copyright © 2016 Association of Military Surgeons of the U.S.

  14. Simple chained guide trees give high-quality protein multiple sequence alignments

    PubMed Central

    Boyce, Kieran; Sievers, Fabian; Higgins, Desmond G.

    2014-01-01

    Guide trees are used to decide the order of sequence alignment in the progressive multiple sequence alignment heuristic. These guide trees are often the limiting factor in making large alignments, and considerable effort has been expended over the years in making these quickly or accurately. In this article we show that, at least for protein families with large numbers of sequences that can be benchmarked with known structures, simple chained guide trees give the most accurate alignments. These also happen to be the fastest and simplest guide trees to construct, computationally. Such guide trees have a striking effect on the accuracy of alignments produced by some of the most widely used alignment packages. There is a marked increase in accuracy and a marked decrease in computational time, once the number of sequences goes much above a few hundred. This is true, even if the order of sequences in the guide tree is random. PMID:25002495

  15. Liquid Crystalline Polymers

    DTIC Science & Technology

    1990-02-28

    include energy costs, time required for cooling, large volume changes, and degradation. For many high -temperature LCPs, the latter may be the most...LCPs)- high local (microscopic) orientational order, which is retained in the solid state-has significant implications in a range of DOD applications...that yield and maintain specific mer sequences. * Continue efforts to measure mer sequence distribution, e.g., by multinuclei NMR. 0 Develop high

  16. DNA barcoding at riverscape scales: Assessing biodiversity among fishes of the genus Cottus (Teleostei) in northern Rocky Mountain streams

    Treesearch

    Michael K. Young; Kevin S. McKelvey; Kristine L. Pilgrim; Michael K. Schwartz

    2013-01-01

    There is growing interest in broad-scale biodiversity assessments that can serve as benchmarks for identifying ecological change. Genetic tools have been used for such assessments for decades, but spatial sampling considerations have largely been ignored. Here, we demonstrate how intensive sampling efforts across a large geographical scale can influence identification...

  17. School Improvement Networks as a Strategy for Large-Scale Education Reform: The Role of Educational Environments

    ERIC Educational Resources Information Center

    Glazer, Joshua L.; Peurach, Donald J.

    2013-01-01

    The development and scale-up of school improvement networks is among the most important educational innovations of the last decade, and current federal, state, and district efforts attempt to use school improvement networks as a mechanism for supporting large-scale change. The potential of improvement networks, however, rests on the extent to…

  18. Targeted enrichment strategies for next-generation plant biology

    Treesearch

    Richard Cronn; Brian J. Knaus; Aaron Liston; Peter J. Maughan; Matthew Parks; John V. Syring; Joshua Udall

    2012-01-01

    The dramatic advances offered by modem DNA sequencers continue to redefine the limits of what can be accomplished in comparative plant biology. Even with recent achievements, however, plant genomes present obstacles that can make it difficult to execute large-scale population and phylogenetic studies on next-generation sequencing platforms. Factors like large genome...

  19. Large-Scale 3D Printing: The Way Forward

    NASA Astrophysics Data System (ADS)

    Jassmi, Hamad Al; Najjar, Fady Al; Ismail Mourad, Abdel-Hamid

    2018-03-01

    Research on small-scale 3D printing has rapidly evolved, where numerous industrial products have been tested and successfully applied. Nonetheless, research on large-scale 3D printing, directed to large-scale applications such as construction and automotive manufacturing, yet demands a great a great deal of efforts. Large-scale 3D printing is considered an interdisciplinary topic and requires establishing a blended knowledge base from numerous research fields including structural engineering, materials science, mechatronics, software engineering, artificial intelligence and architectural engineering. This review article summarizes key topics of relevance to new research trends on large-scale 3D printing, particularly pertaining (1) technological solutions of additive construction (i.e. the 3D printers themselves), (2) materials science challenges, and (3) new design opportunities.

  20. Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia

    PubMed Central

    Shortt, Jonathan A.; Card, Daren C.; Schield, Drew R.; Liu, Yang; Zhong, Bo; Castoe, Todd A.

    2017-01-01

    Background In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies. Methodology/Principal Findings We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq) to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample. Conclusions/Significance This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance, and could be applied to other schistosome species and other parasitic helminthes. PMID:28107347

  1. An Efficient and Versatile Means for Assembling and Manufacturing Systems in Space

    NASA Technical Reports Server (NTRS)

    Dorsey, John T.; Doggett, William R.; Hafley, Robert A.; Komendera, Erik; Correll, Nikolaus; King, Bruce

    2012-01-01

    Within NASA Space Science, Exploration and the Office of Chief Technologist, there are Grand Challenges and advanced future exploration, science and commercial mission applications that could benefit significantly from large-span and large-area structural systems. Of particular and persistent interest to the Space Science community is the desire for large (in the 10- 50 meter range for main aperture diameter) space telescopes that would revolutionize space astronomy. Achieving these systems will likely require on-orbit assembly, but previous approaches for assembling large-scale telescope truss structures and systems in space have been perceived as very costly because they require high precision and custom components. These components rely on a large number of mechanical connections and supporting infrastructure that are unique to each application. In this paper, a new assembly paradigm that mitigates these concerns is proposed and described. A new assembly approach, developed to implement the paradigm, is developed incorporating: Intelligent Precision Jigging Robots, Electron-Beam welding, robotic handling/manipulation, operations assembly sequence and path planning, and low precision weldable structural elements. Key advantages of the new assembly paradigm, as well as concept descriptions and ongoing research and technology development efforts for each of the major elements are summarized.

  2. Scaling up the 454 Titanium Library Construction and Pooling of Barcoded Libraries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Phung, Wilson; Hack, Christopher; Shapiro, Harris

    2009-03-23

    We have been developing a high throughput 454 library construction process at the Joint Genome Institute to meet the needs of de novo sequencing a large number of microbial and eukaryote genomes, EST, and metagenome projects. We have been focusing efforts in three areas: (1) modifying the current process to allow the construction of 454 standard libraries on a 96-well format; (2) developing a robotic platform to perform the 454 library construction; and (3) designing molecular barcodes to allow pooling and sorting of many different samples. In the development of a high throughput process to scale up the number ofmore » libraries by adapting the process to a 96-well plate format, the key process change involves the replacement of gel electrophoresis for size selection with Solid Phase Reversible Immobilization (SPRI) beads. Although the standard deviation of the insert sizes increases, the overall quality sequence and distribution of the reads in the genome has not changed. The manual process of constructing 454 shotgun libraries on 96-well plates is a time-consuming, labor-intensive, and ergonomically hazardous process; we have been experimenting to program a BioMek robot to perform the library construction. This will not only enable library construction to be completed in a single day, but will also minimize any ergonomic risk. In addition, we have implemented a set of molecular barcodes (AKA Multiple Identifiers or MID) and a pooling process that allows us to sequence many targets simultaneously. Here we will present the testing of pooling a set of selected fosmids derived from the endomycorrhizal fungus Glomus intraradices. By combining the robotic library construction process and the use of molecular barcodes, it is now possible to sequence hundreds of fosmids that represent a minimal tiling path of this genome. Here we present the progress and the challenges of developing these scaled-up processes.« less

  3. Large scale wind tunnel investigation of a folding tilt rotor

    NASA Technical Reports Server (NTRS)

    1972-01-01

    A twenty-five foot diameter folding tilt rotor was tested in a large scale wind tunnel to determine its aerodynamic characteristics in unfolded, partially folded, and fully folded configurations. During the tests, the rotor completed over forty start/stop sequences. After completing the sequences in a stepwise manner, smooth start/stop transitions were made in approximately two seconds. Wind tunnel speeds up through seventy-five knots were used, at which point the rotor mast angle was increased to four degrees, corresponding to a maneuver condition of one and one-half g.

  4. Metagenomic and near full-length 16S rRNA sequence data in support of the phylogenetic analysis of the rumen bacterial community in steers

    USDA-ARS?s Scientific Manuscript database

    Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...

  5. Enabling large-scale next-generation sequence assembly with Blacklight

    PubMed Central

    Couger, M. Brian; Pipes, Lenore; Squina, Fabio; Prade, Rolf; Siepel, Adam; Palermo, Robert; Katze, Michael G.; Mason, Christopher E.; Blood, Philip D.

    2014-01-01

    Summary A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic sequence assembly, very large metagenomic sequence assembly, transcriptome assembly, and sequencing error correction. The data sets used in these analyses included uncategorized fungal species, reference microbial data, very large soil and human gut microbiome sequence data, and primate transcriptomes, composed of both short-read and long-read sequence data. A new parallel command execution program was developed on the Blacklight resource to handle some of these analyses. These results, initially reported previously at XSEDE13 and expanded here, represent significant advances for their respective scientific communities. The breadth and depth of the results achieved demonstrate the ease of use, versatility, and unique capabilities of the Blacklight XSEDE resource for scientific analysis of genomic and transcriptomic sequence data, and the power of these resources, together with XSEDE support, in meeting the most challenging scientific problems. PMID:25294974

  6. Large scale ab initio modeling of structurally uncharacterized antimicrobial peptides reveals known and novel folds.

    PubMed

    Kozic, Mara; Fox, Stephen J; Thomas, Jens M; Verma, Chandra S; Rigden, Daniel J

    2018-05-01

    Antimicrobial resistance within a wide range of infectious agents is a severe and growing public health threat. Antimicrobial peptides (AMPs) are among the leading alternatives to current antibiotics, exhibiting broad spectrum activity. Their activity is determined by numerous properties such as cationic charge, amphipathicity, size, and amino acid composition. Currently, only around 10% of known AMP sequences have experimentally solved structures. To improve our understanding of the AMP structural universe we have carried out large scale ab initio 3D modeling of structurally uncharacterized AMPs that revealed similarities between predicted folds of the modeled sequences and structures of characterized AMPs. Two of the peptides whose models matched known folds are Lebocin Peptide 1A (LP1A) and Odorranain M, predicted to form β-hairpins but, interestingly, to lack the intramolecular disulfide bonds, cation-π or aromatic interactions that generally stabilize such AMP structures. Other examples include Ponericin Q42, Latarcin 4a, Kassinatuerin 1, Ceratotoxin D, and CPF-B1 peptide, which have α-helical folds, as well as mixed αβ folds of human Histatin 2 peptide and Garvicin A which are, to the best of our knowledge, the first linear αββ fold AMPs lacking intramolecular disulfide bonds. In addition to fold matches to experimentally derived structures, unique folds were also obtained, namely for Microcin M and Ipomicin. These results help in understanding the range of protein scaffolds that naturally bear antimicrobial activity and may facilitate protein design efforts towards better AMPs. © 2018 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.

  7. QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization.

    PubMed

    Zhao, Shanrong; Xi, Li; Quan, Jie; Xi, Hualin; Zhang, Ying; von Schack, David; Vincent, Michael; Zhang, Baohong

    2016-01-08

    RNA sequencing (RNA-seq), a next-generation sequencing technique for transcriptome profiling, is being increasingly used, in part driven by the decreasing cost of sequencing. Nevertheless, the analysis of the massive amounts of data generated by large-scale RNA-seq remains a challenge. Multiple algorithms pertinent to basic analyses have been developed, and there is an increasing need to automate the use of these tools so as to obtain results in an efficient and user friendly manner. Increased automation and improved visualization of the results will help make the results and findings of the analyses readily available to experimental scientists. By combing the best open source tools developed for RNA-seq data analyses and the most advanced web 2.0 technologies, we have implemented QuickRNASeq, a pipeline for large-scale RNA-seq data analyses and visualization. The QuickRNASeq workflow consists of three main steps. In Step #1, each individual sample is processed, including mapping RNA-seq reads to a reference genome, counting the numbers of mapped reads, quality control of the aligned reads, and SNP (single nucleotide polymorphism) calling. Step #1 is computationally intensive, and can be processed in parallel. In Step #2, the results from individual samples are merged, and an integrated and interactive project report is generated. All analyses results in the report are accessible via a single HTML entry webpage. Step #3 is the data interpretation and presentation step. The rich visualization features implemented here allow end users to interactively explore the results of RNA-seq data analyses, and to gain more insights into RNA-seq datasets. In addition, we used a real world dataset to demonstrate the simplicity and efficiency of QuickRNASeq in RNA-seq data analyses and interactive visualizations. The seamless integration of automated capabilites with interactive visualizations in QuickRNASeq is not available in other published RNA-seq pipelines. The high degree of automation and interactivity in QuickRNASeq leads to a substantial reduction in the time and effort required prior to further downstream analyses and interpretation of the analyses findings. QuickRNASeq advances primary RNA-seq data analyses to the next level of automation, and is mature for public release and adoption.

  8. Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes

    PubMed Central

    Cannon, Steven B.; Sterck, Lieven; Rombauts, Stephane; Sato, Shusei; Cheung, Foo; Gouzy, Jérôme; Wang, Xiaohong; Mudge, Joann; Vasdewani, Jayprakash; Schiex, Thomas; Spannagl, Manuel; Monaghan, Erin; Nicholson, Christine; Humphray, Sean J.; Schoof, Heiko; Mayer, Klaus F. X.; Rogers, Jane; Quétier, Francis; Oldroyd, Giles E.; Debellé, Frédéric; Cook, Douglas R.; Retzel, Ernest F.; Roe, Bruce A.; Town, Christopher D.; Tabata, Satoshi; Van de Peer, Yves; Young, Nevin D.

    2006-01-01

    Genome sequencing of the model legumes, Medicago truncatula and Lotus japonicus, provides an opportunity for large-scale sequence-based comparison of two genomes in the same plant family. Here we report synteny comparisons between these species, including details about chromosome relationships, large-scale synteny blocks, microsynteny within blocks, and genome regions lacking clear correspondence. The Lotus and Medicago genomes share a minimum of 10 large-scale synteny blocks, each with substantial collinearity and frequently extending the length of whole chromosome arms. The proportion of genes syntenic and collinear within each synteny block is relatively homogeneous. Medicago–Lotus comparisons also indicate similar and largely homogeneous gene densities, although gene-containing regions in Mt occupy 20–30% more space than Lj counterparts, primarily because of larger numbers of Mt retrotransposons. Because the interpretation of genome comparisons is complicated by large-scale genome duplications, we describe synteny, synonymous substitutions and phylogenetic analyses to identify and date a probable whole-genome duplication event. There is no direct evidence for any recent large-scale genome duplication in either Medicago or Lotus but instead a duplication predating speciation. Phylogenetic comparisons place this duplication within the Rosid I clade, clearly after the split between legumes and Salicaceae (poplar). PMID:17003129

  9. Transcriptome Sequencing and Developmental Regulation of Gene Expression in Anopheles aquasalis

    PubMed Central

    Silva, Maria C. P.; Lopes, Adriana R.; Barros, Michele S.; Sá-Nunes, Anderson; Kojin, Bianca B.; Carvalho, Eneas; Suesdek, Lincoln; Silva-Neto, Mário Alberto C.; James, Anthony A.; Capurro, Margareth L.

    2014-01-01

    Background Anopheles aquasalis is a major malaria vector in coastal areas of South and Central America where it breeds preferentially in brackish water. This species is very susceptible to Plasmodium vivax and it has been already incriminated as responsible vector in malaria outbreaks. There has been no high-throughput investigation into the sequencing of An. aquasalis genes, transcripts and proteins despite its epidemiological relevance. Here we describe the sequencing, assembly and annotation of the An. aquasalis transcriptome. Methodology/Principal Findings A total of 419 thousand cDNA sequence reads, encompassing 164 million nucleotides, were assembled in 7544 contigs of ≥2 sequences, and 1999 singletons. The majority of the An. aquasalis transcripts encode proteins with their closest counterparts in another neotropical malaria vector, An. darlingi. Several analyses in different protein databases were used to annotate and predict the putative functions of the deduced An. aquasalis proteins. Larval and adult-specific transcripts were represented by 121 and 424 contig sequences, respectively. Fifty-one transcripts were only detected in blood-fed females. The data also reveal a list of transcripts up- or down-regulated in adult females after a blood meal. Transcripts associated with immunity, signaling networks and blood feeding and digestion are discussed. Conclusions/Significance This study represents the first large-scale effort to sequence the transcriptome of An. aquasalis. It provides valuable information that will facilitate studies on the biology of this species and may lead to novel strategies to reduce malaria transmission on the South American continent. The An. aquasalis transcriptome is accessible at http://exon.niaid.nih.gov/transcriptome/An_aquasalis/Anaquexcel.xlsx. PMID:25033462

  10. Scaling NASA Applications to 1024 CPUs on Origin 3K

    NASA Technical Reports Server (NTRS)

    Taft, Jim

    2002-01-01

    The long and highly successful joint SGI-NASA research effort in ever larger SSI systems was to a large degree the result of the successful development of the MLP scalable parallel programming paradigm developed at ARC: 1) MLP scaling in real production codes justified ever larger systems at NAS; 2) MLP scaling on 256p Origin 2000 gave SGl impetus to productize 256p; 3) MLP scaling on 512 gave SGI courage to build 1024p O3K; and 4) History of MLP success resulted in IBM Star Cluster based MLP effort.

  11. Evaluation of Targeted Sequencing for Transcriptional Analysis of Archival Formalin-Fixed Paraffin-Embedded (FFPE) Samples

    EPA Science Inventory

    Next-generation sequencing provides unprecedented access to genomic information in archival FFPE tissue samples. However, costs and technical challenges related to RNA isolation and enrichment limit use of whole-genome RNA-sequencing for large-scale studies of FFPE specimens. Rec...

  12. A Comprehensive, Automatically Updated Fungal ITS Sequence Dataset for Reference-Based Chimera Control in Environmental Sequencing Efforts.

    PubMed

    Nilsson, R Henrik; Tedersoo, Leho; Ryberg, Martin; Kristiansson, Erik; Hartmann, Martin; Unterseher, Martin; Porter, Teresita M; Bengtsson-Palme, Johan; Walker, Donald M; de Sousa, Filipe; Gamper, Hannes Andres; Larsson, Ellen; Larsson, Karl-Henrik; Kõljalg, Urmas; Edgar, Robert C; Abarenkov, Kessy

    2015-01-01

    The nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen genetic marker for the molecular identification of fungi in environmental sequencing and molecular ecology studies. Several analytical issues complicate such efforts, one of which is the formation of chimeric-artificially joined-DNA sequences during PCR amplification or sequence assembly. Several software tools are currently available for chimera detection, but rely to various degrees on the presence of a chimera-free reference dataset for optimal performance. However, no such dataset is available for use with the fungal ITS region. This study introduces a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database for the molecular identification of fungi. This dataset supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial (ITS1 or ITS2 only) datasets. The performance of the dataset on a large set of artificial chimeras was above 99.5%, and we subsequently used the dataset to remove nearly 1,000 compromised fungal ITS sequences from public circulation. The dataset is available at http://unite.ut.ee/repository.php and is subject to web-based third-party curation.

  13. A Comprehensive, Automatically Updated Fungal ITS Sequence Dataset for Reference-Based Chimera Control in Environmental Sequencing Efforts

    PubMed Central

    Nilsson, R. Henrik; Tedersoo, Leho; Ryberg, Martin; Kristiansson, Erik; Hartmann, Martin; Unterseher, Martin; Porter, Teresita M.; Bengtsson-Palme, Johan; Walker, Donald M.; de Sousa, Filipe; Gamper, Hannes Andres; Larsson, Ellen; Larsson, Karl-Henrik; Kõljalg, Urmas; Edgar, Robert C.; Abarenkov, Kessy

    2015-01-01

    The nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen genetic marker for the molecular identification of fungi in environmental sequencing and molecular ecology studies. Several analytical issues complicate such efforts, one of which is the formation of chimeric—artificially joined—DNA sequences during PCR amplification or sequence assembly. Several software tools are currently available for chimera detection, but rely to various degrees on the presence of a chimera-free reference dataset for optimal performance. However, no such dataset is available for use with the fungal ITS region. This study introduces a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database for the molecular identification of fungi. This dataset supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial (ITS1 or ITS2 only) datasets. The performance of the dataset on a large set of artificial chimeras was above 99.5%, and we subsequently used the dataset to remove nearly 1,000 compromised fungal ITS sequences from public circulation. The dataset is available at http://unite.ut.ee/repository.php and is subject to web-based third-party curation. PMID:25786896

  14. XLID-Causing Mutations and Associated Genes Challenged in Light of Data From Large-Scale Human Exome Sequencing

    PubMed Central

    Piton, Amélie; Redin, Claire; Mandel, Jean-Louis

    2013-01-01

    Because of the unbalanced sex ratio (1.3–1.4 to 1) observed in intellectual disability (ID) and the identification of large ID-affected families showing X-linked segregation, much attention has been focused on the genetics of X-linked ID (XLID). Mutations causing monogenic XLID have now been reported in over 100 genes, most of which are commonly included in XLID diagnostic gene panels. Nonetheless, the boundary between true mutations and rare non-disease-causing variants often remains elusive. The sequencing of a large number of control X chromosomes, required for avoiding false-positive results, was not systematically possible in the past. Such information is now available thanks to large-scale sequencing projects such as the National Heart, Lung, and Blood (NHLBI) Exome Sequencing Project, which provides variation information on 10,563 X chromosomes from the general population. We used this NHLBI cohort to systematically reassess the implication of 106 genes proposed to be involved in monogenic forms of XLID. We particularly question the implication in XLID of ten of them (AGTR2, MAGT1, ZNF674, SRPX2, ATP6AP2, ARHGEF6, NXF5, ZCCHC12, ZNF41, and ZNF81), in which truncating variants or previously published mutations are observed at a relatively high frequency within this cohort. We also highlight 15 other genes (CCDC22, CLIC2, CNKSR2, FRMPD4, HCFC1, IGBP1, KIAA2022, KLF8, MAOA, NAA10, NLGN3, RPL10, SHROOM4, ZDHHC15, and ZNF261) for which replication studies are warranted. We propose that similar reassessment of reported mutations (and genes) with the use of data from large-scale human exome sequencing would be relevant for a wide range of other genetic diseases. PMID:23871722

  15. Genetics of Triglycerides and the Risk of Atherosclerosis.

    PubMed

    Dron, Jacqueline S; Hegele, Robert A

    2017-07-01

    Plasma triglycerides are routinely measured with a lipid profile, and elevated plasma triglycerides are commonly encountered in the clinic. The confounded nature of this trait, which is correlated with numerous other metabolic perturbations, including depressed high-density lipoprotein cholesterol (HDL-C), has thwarted efforts to directly implicate triglycerides as causal in atherogenesis. Human genetic approaches involving large-scale populations and high-throughput genomic assessment under a Mendelian randomization framework have undertaken to sort out questions of causality. We review recent large-scale meta-analyses of cohorts and population-based sequencing studies designed to address whether common and rare variants in genes whose products are determinants of plasma triglycerides are also associated with clinical cardiovascular endpoints. The studied loci include genes encoding lipoprotein lipase and proteins that interact with it, such as apolipoprotein (apo) A-V, apo C-III and angiopoietin-like proteins 3 and 4, and common polymorphisms identified in genome-wide association studies. Triglyceride-raising variant alleles of these genes showed generally strong associations with clinical cardiovascular endpoints. However, in most cases, a second lipid disturbance-usually depressed HDL-C-was concurrently associated. While the findings collectively shift our understanding towards a potential causal role for triglycerides, we still cannot rule out the possibilities that triglycerides are a component of a joint phenotype with low HDL-C or that they are but markers of deeper causal metabolic disturbances that are not routinely measured in epidemiological-scale genetic studies.

  16. A Large Set of Newly Created Interspecific Saccharomyces Hybrids Increases Aromatic Diversity in Lager Beers

    PubMed Central

    Mertens, Stijn; Steensels, Jan; Saels, Veerle; De Rouck, Gert; Aerts, Guido

    2015-01-01

    Lager beer is the most consumed alcoholic beverage in the world. Its production process is marked by a fermentation conducted at low (8 to 15°C) temperatures and by the use of Saccharomyces pastorianus, an interspecific hybrid between Saccharomyces cerevisiae and the cold-tolerant Saccharomyces eubayanus. Recent whole-genome-sequencing efforts revealed that the currently available lager yeasts belong to one of only two archetypes, “Saaz” and “Frohberg.” This limited genetic variation likely reflects that all lager yeasts descend from only two separate interspecific hybridization events, which may also explain the relatively limited aromatic diversity between the available lager beer yeasts compared to, for example, wine and ale beer yeasts. In this study, 31 novel interspecific yeast hybrids were developed, resulting from large-scale robot-assisted selection and breeding between carefully selected strains of S. cerevisiae (six strains) and S. eubayanus (two strains). Interestingly, many of the resulting hybrids showed a broader temperature tolerance than their parental strains and reference S. pastorianus yeasts. Moreover, they combined a high fermentation capacity with a desirable aroma profile in laboratory-scale lager beer fermentations, thereby successfully enriching the currently available lager yeast biodiversity. Pilot-scale trials further confirmed the industrial potential of these hybrids and identified one strain, hybrid H29, which combines a fast fermentation, high attenuation, and the production of a complex, desirable fruity aroma. PMID:26407881

  17. Evaluating stream trout habitat on large-scale aerial color photographs

    Treesearch

    Wallace J. Greentree; Robert C. Aldrich

    1976-01-01

    Large-scale aerial color photographs were used to evaluate trout habitat by studying stream and streambank conditions. Ninety-two percent of these conditions could be identified correctly on the color photographs. Color photographs taken 1 year apart showed that rehabilitation efforts resulted in stream vegetation changes. Water depth was correlated with film density:...

  18. Global biogeographic sampling of bacterial secondary metabolism

    PubMed Central

    Charlop-Powers, Zachary; Owen, Jeremy G; Reddy, Boojala Vijay B; Ternei, Melinda A; Guimarães, Denise O; de Frias, Ulysses A; Pupo, Monica T; Seepe, Prudy; Feng, Zhiyang; Brady, Sean F

    2015-01-01

    Recent bacterial (meta)genome sequencing efforts suggest the existence of an enormous untapped reservoir of natural-product-encoding biosynthetic gene clusters in the environment. Here we use the pyro-sequencing of PCR amplicons derived from both nonribosomal peptide adenylation domains and polyketide ketosynthase domains to compare biosynthetic diversity in soil microbiomes from around the globe. We see large differences in domain populations from all except the most proximal and biome-similar samples, suggesting that most microbiomes will encode largely distinct collections of bacterial secondary metabolites. Our data indicate a correlation between two factors, geographic distance and biome-type, and the biosynthetic diversity found in soil environments. By assigning reads to known gene clusters we identify hotspots of biomedically relevant biosynthetic diversity. These observations not only provide new insights into the natural world, they also provide a road map for guiding future natural products discovery efforts. DOI: http://dx.doi.org/10.7554/eLife.05048.001 PMID:25599565

  19. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    ERIC Educational Resources Information Center

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  20. Evaluating Agronomic Performance and Investigating Molecular Structure of Drought and Heat Tolerant Wild Alfalfa (Medicago sativa L.) Collection from the Southeastern Turkey.

    PubMed

    Basbag, Mehmet; Aydin, Ali; Sakiroglu, Muhammet

    2017-02-01

    Drought is a major stress factor for agricultural production including alfalfa production. One way to counterbalance the yield losses is the introgression of drought tolerant germplasm into breeding programs. As an effort to exploit such germplasm, 16 individual plants were selected from the Southeastern Turkey from their natural habitat and clonally propagated in field trials with an ultimate goal to use the germplasm as parents for releasing a synthetic cultivar. Forage yield and forage quality traits were evaluated and molecular genetic diversity among genotypes were determined using inter simple sequence repeat markers. Genotypes showed a variation from growth habit to yield and quality traits indicating sufficient phenotypic variation for diverse breeding efforts (for grazing or harvesting) and long term selection schemes. A large amount of genetic variation was observed even with a limited number of marker and genotypes. However, no pattern of spatial genetic structure was observed for the scale of the study when genetic variation is linked to the geographic origin. We conclude that ex situ natural variation provides a wealth of germplasm that could be incorporated into breeding programs aiming to improve drought tolerance. We also suggest an extensive collection of seeds/plant tissue from unique plants with desirable traits rather than putting more efforts to create a spatial germplasm sampling efforts in narrow regions.

  1. Genome resequencing in Populus: Revealing large-scale genome variation and implications on specialized-trait genomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Muchero, Wellington; Labbe, Jessy L; Priya, Ranjan

    2014-01-01

    To date, Populus ranks among a few plant species with a complete genome sequence and other highly developed genomic resources. With the first genome sequence among all tree species, Populus has been adopted as a suitable model organism for genomic studies in trees. However, far from being just a model species, Populus is a key renewable economic resource that plays a significant role in providing raw materials for the biofuel and pulp and paper industries. Therefore, aside from leading frontiers of basic tree molecular biology and ecological research, Populus leads frontiers in addressing global economic challenges related to fuel andmore » fiber production. The latter fact suggests that research aimed at improving quality and quantity of Populus as a raw material will likely drive the pursuit of more targeted and deeper research in order to unlock the economic potential tied in molecular biology processes that drive this tree species. Advances in genome sequence-driven technologies, such as resequencing individual genotypes, which in turn facilitates large scale SNP discovery and identification of large scale polymorphisms are key determinants of future success in these initiatives. In this treatise we discuss implications of genome sequence-enable technologies on Populus genomic and genetic studies of complex and specialized-traits.« less

  2. Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants in the InSiGHT locus-specific database.

    PubMed

    Thompson, Bryony A; Spurdle, Amanda B; Plazzer, John-Paul; Greenblatt, Marc S; Akagi, Kiwamu; Al-Mulla, Fahd; Bapat, Bharati; Bernstein, Inge; Capellá, Gabriel; den Dunnen, Johan T; du Sart, Desiree; Fabre, Aurelie; Farrell, Michael P; Farrington, Susan M; Frayling, Ian M; Frebourg, Thierry; Goldgar, David E; Heinen, Christopher D; Holinski-Feder, Elke; Kohonen-Corish, Maija; Robinson, Kristina Lagerstedt; Leung, Suet Yi; Martins, Alexandra; Moller, Pal; Morak, Monika; Nystrom, Minna; Peltomaki, Paivi; Pineda, Marta; Qi, Ming; Ramesar, Rajkumar; Rasmussen, Lene Juel; Royer-Pokora, Brigitte; Scott, Rodney J; Sijmons, Rolf; Tavtigian, Sean V; Tops, Carli M; Weber, Thomas; Wijnen, Juul; Woods, Michael O; Macrae, Finlay; Genuardi, Maurizio

    2014-02-01

    The clinical classification of hereditary sequence variants identified in disease-related genes directly affects clinical management of patients and their relatives. The International Society for Gastrointestinal Hereditary Tumours (InSiGHT) undertook a collaborative effort to develop, test and apply a standardized classification scheme to constitutional variants in the Lynch syndrome-associated genes MLH1, MSH2, MSH6 and PMS2. Unpublished data submission was encouraged to assist in variant classification and was recognized through microattribution. The scheme was refined by multidisciplinary expert committee review of the clinical and functional data available for variants, applied to 2,360 sequence alterations, and disseminated online. Assessment using validated criteria altered classifications for 66% of 12,006 database entries. Clinical recommendations based on transparent evaluation are now possible for 1,370 variants that were not obviously protein truncating from nomenclature. This large-scale endeavor will facilitate the consistent management of families suspected to have Lynch syndrome and demonstrates the value of multidisciplinary collaboration in the curation and classification of variants in public locus-specific databases.

  3. Evolutionary and molecular foundations of multiple contemporary functions of the nitroreductase superfamily

    PubMed Central

    Akiva, Eyal; Copp, Janine N.; Tokuriki, Nobuhiko; Babbitt, Patricia C.

    2017-01-01

    Insight regarding how diverse enzymatic functions and reactions have evolved from ancestral scaffolds is fundamental to understanding chemical and evolutionary biology, and for the exploitation of enzymes for biotechnology. We undertook an extensive computational analysis using a unique and comprehensive combination of tools that include large-scale phylogenetic reconstruction to determine the sequence, structural, and functional relationships of the functionally diverse flavin mononucleotide-dependent nitroreductase (NTR) superfamily (>24,000 sequences from all domains of life, 54 structures, and >10 enzymatic functions). Our results suggest an evolutionary model in which contemporary subgroups of the superfamily have diverged in a radial manner from a minimal flavin-binding scaffold. We identified the structural design principle for this divergence: Insertions at key positions in the minimal scaffold that, combined with the fixation of key residues, have led to functional specialization. These results will aid future efforts to delineate the emergence of functional diversity in enzyme superfamilies, provide clues for functional inference for superfamily members of unknown function, and facilitate rational redesign of the NTR scaffold. PMID:29078300

  4. Application of DNA barcodes in wildlife conservation in Tropical East Asia.

    PubMed

    Wilson, John-James; Sing, Kong-Wah; Lee, Ping-Shin; Wee, Alison K S

    2016-10-01

    Over the past 50 years, Tropical East Asia has lost more biodiversity than any tropical region. Tropical East Asia is a megadiverse region with an acute taxonomic impediment. DNA barcodes are short standardized DNA sequences used for taxonomic purposes and have the potential to lessen the challenges of biodiversity inventory and assessments in regions where they are most needed. We reviewed DNA barcoding efforts in Tropical East Asia relative to other tropical regions. We suggest DNA barcodes (or metabarcodes from next-generation sequencers) may be especially useful for characterizing and connecting species-level biodiversity units in inventories encompassing taxa lacking formal description (particularly arthropods) and in large-scale, minimal-impact approaches to vertebrate monitoring and population assessments through secondary sources of DNA (invertebrate derived DNA and environmental DNA). We suggest interest and capacity for DNA barcoding are slowly growing in Tropical East Asia, particularly among the younger generation of researchers who can connect with the barcoding analogy and understand the need for new approaches to the conservation challenges being faced. © 2016 Society for Conservation Biology.

  5. Advances in QTL Mapping in Pigs

    PubMed Central

    Rothschild, Max F.; Hu, Zhi-liang; Jiang, Zhihua

    2007-01-01

    Over the past 15 years advances in the porcine genetic linkage map and discovery of useful candidate genes have led to valuable gene and trait information being discovered. Early use of exotic breed crosses and now commercial breed crosses for quantitative trait loci (QTL) scans and candidate gene analyses have led to 110 publications which have identified 1,675 QTL. Additionally, these studies continue to identify genes associated with economically important traits such as growth rate, leanness, feed intake, meat quality, litter size, and disease resistance. A well developed QTL database called PigQTLdb is now as a valuable tool for summarizing and pinpointing in silico regions of interest to researchers. The commercial pig industry is actively incorporating these markers in marker-assisted selection along with traditional performance information to improve traits of economic performance. The long awaited sequencing efforts are also now beginning to provide sequence available for both comparative genomics and large scale single nucleotide polymorphism (SNP) association studies. While these advances are all positive, development of useful new trait families and measurement of new or underlying traits still limits future discoveries. A review of these developments is presented. PMID:17384738

  6. Application of a five-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants lodged on the InSiGHT locus-specific database

    PubMed Central

    Plazzer, John-Paul; Greenblatt, Marc S.; Akagi, Kiwamu; Al-Mulla, Fahd; Bapat, Bharati; Bernstein, Inge; Capellá, Gabriel; den Dunnen, Johan T.; du Sart, Desiree; Fabre, Aurelie; Farrell, Michael P.; Farrington, Susan M.; Frayling, Ian M.; Frebourg, Thierry; Goldgar, David E.; Heinen, Christopher D.; Holinski-Feder, Elke; Kohonen-Corish, Maija; Robinson, Kristina Lagerstedt; Leung, Suet Yi; Martins, Alexandra; Moller, Pal; Morak, Monika; Nystrom, Minna; Peltomaki, Paivi; Pineda, Marta; Qi, Ming; Ramesar, Rajkumar; Rasmussen, Lene Juel; Royer-Pokora, Brigitte; Scott, Rodney J.; Sijmons, Rolf; Tavtigian, Sean V.; Tops, Carli M.; Weber, Thomas; Wijnen, Juul; Woods, Michael O.; Macrae, Finlay; Genuardi, Maurizio

    2015-01-01

    Clinical classification of sequence variants identified in hereditary disease genes directly affects clinical management of patients and their relatives. The International Society for Gastrointestinal Hereditary Tumours (InSiGHT) undertook a collaborative effort to develop, test and apply a standardized classification scheme to constitutional variants in the Lynch Syndrome genes MLH1, MSH2, MSH6 and PMS2. Unpublished data submission was encouraged to assist variant classification, and recognized by microattribution. The scheme was refined by multidisciplinary expert committee review of clinical and functional data available for variants, applied to 2,360 sequence alterations, and disseminated online. Assessment using validated criteria altered classifications for 66% of 12,006 database entries. Clinical recommendations based on transparent evaluation are now possible for 1,370 variants not obviously protein-truncating from nomenclature. This large-scale endeavor will facilitate consistent management of suspected Lynch Syndrome families, and demonstrates the value of multidisciplinary collaboration for curation and classification of variants in public locus-specific databases. PMID:24362816

  7. Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA.

    PubMed

    Xu, Weijia; Ozer, Stuart; Gutell, Robin R

    2009-01-01

    With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure.

  8. Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA

    PubMed Central

    Xu, Weijia; Ozer, Stuart; Gutell, Robin R.

    2010-01-01

    With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure. PMID:20502534

  9. Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences.

    PubMed

    Warris, Sven; Boymans, Sander; Muiser, Iwe; Noback, Michiel; Krijnen, Wim; Nap, Jan-Peter

    2014-01-13

    Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.

  10. Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts.

    PubMed

    Göke, Jonathan; Schulz, Marcel H; Lasserre, Julia; Vingron, Martin

    2012-03-01

    The identity of cells and tissues is to a large degree governed by transcriptional regulation. A major part is accomplished by the combinatorial binding of transcription factors at regulatory sequences, such as enhancers. Even though binding of transcription factors is sequence-specific, estimating the sequence similarity of two functionally similar enhancers is very difficult. However, a similarity measure for regulatory sequences is crucial to detect and understand functional similarities between two enhancers and will facilitate large-scale analyses like clustering, prediction and classification of genome-wide datasets. We present the standardized alignment-free sequence similarity measure N2, a flexible framework that is defined for word neighbourhoods. We explore the usefulness of adding reverse complement words as well as words including mismatches into the neighbourhood. On simulated enhancer sequences as well as functional enhancers in mouse development, N2 is shown to outperform previous alignment-free measures. N2 is flexible, faster than competing methods and less susceptible to single sequence noise and the occurrence of repetitive sequences. Experiments on the mouse enhancers reveal that enhancers active in different tissues can be separated by pairwise comparison using N2. N2 represents an improvement over previous alignment-free similarity measures without compromising speed, which makes it a good candidate for large-scale sequence comparison of regulatory sequences. The software is part of the open-source C++ library SeqAn (www.seqan.de) and a compiled version can be downloaded at http://www.seqan.de/projects/alf.html. Supplementary data are available at Bioinformatics online.

  11. Shallowing-upward cyclic patterns within larger-scale transgressive-regressive (T-R) sedimentary sequences, St. Peter through Decorah Formations, Ordovician, Iowa area

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Witzke, B.J.

    1993-03-01

    Four large-scale (2--8 Ma) T-R sedimentary sequences of M. Ord. age (late Chaz.-Sherm.) were delimited by Witzke Kolata (1980) in the Iowa area, each bounded by local to regional unconformity/disconformity surfaces. These encompass both siliciclastic and carbonate intervals, in ascending order: (1) St. Peter-Glenwood fms., (2) Platteville Fm., (3) Decorah Fm., (4) Dunleith/upper Decorah fms. Finer-scale resolution of depth-related depositional features has led to regional recognition of smaller-scale shallowing-upward cyclicity contained within each large-scale sequence. Such smaller-scale cyclicity encompasses stratigraphic intervals of 1--10 m thickness, with estimated durations of 0.5--1.5 Ma. The St. Peter Sandst. has long been regarded asmore » a classic transgressive sheet sand. However, four discrete shallowing-upward packages characterize the St. Peter-Glenwood interval regionally (IA, MN, NB, KS), including western facies displaying coarsening-upward sandstone packages with condensed conodont-rich brown shale and phosphatic sediments in their lower part (local oolitic ironstone), commonly above pyritic hardgrounds. Regional continuity of small-scale cyclic patterns in M. Ord. strata of the Iowa area may suggest eustatic controls; this can be tested through inter-regional comparisons.« less

  12. iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model

    PubMed Central

    Lin, Wei-Zhong; Fang, Jian-An; Xiao, Xuan; Chou, Kuo-Chen

    2011-01-01

    DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power. By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the “grey model” and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins. As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results. PMID:21935457

  13. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

    PubMed

    Nordberg, Henrik; Bhatia, Karan; Wang, Kai; Wang, Zhong

    2013-12-01

    The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation. We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.

  14. GLAD: a system for developing and deploying large-scale bioinformatics grid.

    PubMed

    Teo, Yong-Meng; Wang, Xianbing; Ng, Yew-Kwong

    2005-03-01

    Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.

  15. Low-Cost Ultra-Wide Genotyping Using Roche/454 Pyrosequencing for Surveillance of HIV Drug Resistance

    PubMed Central

    Dudley, Dawn M.; Chin, Emily N.; Bimber, Benjamin N.; Sanabani, Sabri S.; Tarosso, Leandro F.; Costa, Priscilla R.; Sauer, Mariana M.; Kallas, Esper G.; O.’Connor, David H.

    2012-01-01

    Background Great efforts have been made to increase accessibility of HIV antiretroviral therapy (ART) in low and middle-income countries. The threat of wide-scale emergence of drug resistance could severely hamper ART scale-up efforts. Population-based surveillance of transmitted HIV drug resistance ensures the use of appropriate first-line regimens to maximize efficacy of ART programs where drug options are limited. However, traditional HIV genotyping is extremely expensive, providing a cost barrier to wide-scale and frequent HIV drug resistance surveillance. Methods/Results We have developed a low-cost laboratory-scale next-generation sequencing-based genotyping method to monitor drug resistance. We designed primers specifically to amplify protease and reverse transcriptase from Brazilian HIV subtypes and developed a multiplexing scheme using multiplex identifier tags to minimize cost while providing more robust data than traditional genotyping techniques. Using this approach, we characterized drug resistance from plasma in 81 HIV infected individuals collected in São Paulo, Brazil. We describe the complexities of analyzing next-generation sequencing data and present a simplified open-source workflow to analyze drug resistance data. From this data, we identified drug resistance mutations in 20% of treatment naïve individuals in our cohort, which is similar to frequencies identified using traditional genotyping in Brazilian patient samples. Conclusion The developed ultra-wide sequencing approach described here allows multiplexing of at least 48 patient samples per sequencing run, 4 times more than the current genotyping method. This method is also 4-fold more sensitive (5% minimal detection frequency vs. 20%) at a cost 3–5× less than the traditional Sanger-based genotyping method. Lastly, by using a benchtop next-generation sequencer (Roche/454 GS Junior), this approach can be more easily implemented in low-resource settings. This data provides proof-of-concept that next-generation HIV drug resistance genotyping is a feasible and low-cost alternative to current genotyping methods and may be particularly beneficial for in-country surveillance of transmitted drug resistance. PMID:22574170

  16. Extensive Geographic Mosaicism in Avian Influenza Viruses from Gulls in the Northern Hemisphere

    PubMed Central

    Wille, Michelle; Robertson, Gregory J.; Whitney, Hugh; Bishop, Mary Anne; Runstadler, Jonathan A.; Lang, Andrew S.

    2011-01-01

    Due to limited interaction of migratory birds between Eurasia and America, two independent avian influenza virus (AIV) gene pools have evolved. There is evidence of low frequency reassortment between these regions, which has major implications in global AIV dynamics. Indeed, all currently circulating lineages of the PB1 and PA segments in North America are of Eurasian origin. Large-scale analyses of intercontinental reassortment have shown that viruses isolated from Charadriiformes (gulls, terns, and shorebirds) are the major contributor of these outsider events. To clarify the role of gulls in AIV dynamics, specifically in movement of genes between geographic regions, we have sequenced six gull AIV isolated in Alaska and analyzed these along with 142 other available gull virus sequences. Basic investigations of host species and the locations and times of isolation reveal biases in the available sequence information. Despite these biases, our analyses reveal a high frequency of geographic reassortment in gull viruses isolated in America. This intercontinental gene mixing is not found in the viruses isolated from gulls in Eurasia. This study demonstrates that gulls are important as vectors for geographically reassorted viruses, particularly in America, and that more surveillance effort should be placed on this group of birds. PMID:21697989

  17. Comparison of the theoretical and real-world evolutionary potential of a genetic circuit

    NASA Astrophysics Data System (ADS)

    Razo-Mejia, M.; Boedicker, J. Q.; Jones, D.; DeLuna, A.; Kinney, J. B.; Phillips, R.

    2014-04-01

    With the development of next-generation sequencing technologies, many large scale experimental efforts aim to map genotypic variability among individuals. This natural variability in populations fuels many fundamental biological processes, ranging from evolutionary adaptation and speciation to the spread of genetic diseases and drug resistance. An interesting and important component of this variability is present within the regulatory regions of genes. As these regions evolve, accumulated mutations lead to modulation of gene expression, which may have consequences for the phenotype. A simple model system where the link between genetic variability, gene regulation and function can be studied in detail is missing. In this article we develop a model to explore how the sequence of the wild-type lac promoter dictates the fold-change in gene expression. The model combines single-base pair resolution maps of transcription factor and RNA polymerase binding energies with a comprehensive thermodynamic model of gene regulation. The model was validated by predicting and then measuring the variability of lac operon regulation in a collection of natural isolates. We then implement the model to analyze the sensitivity of the promoter sequence to the regulatory output, and predict the potential for regulation to evolve due to point mutations in the promoter region.

  18. A rapid and cost-effective method for sequencing pooled cDNA clones by using a combination of transposon insertion and Gateway technology.

    PubMed

    Morozumi, Takeya; Toki, Daisuke; Eguchi-Ogawa, Tomoko; Uenishi, Hirohide

    2011-09-01

    Large-scale cDNA-sequencing projects require an efficient strategy for mass sequencing. Here we describe a method for sequencing pooled cDNA clones using a combination of transposon insertion and Gateway technology. Our method reduces the number of shotgun clones that are unsuitable for reconstruction of cDNA sequences, and has the advantage of reducing the total costs of the sequencing project.

  19. A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    PubMed

    Thakur, Shalabh; Guttman, David S

    2016-06-30

    Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/ .

  20. The proximal-to-distal sequence in upper-limb motions on multiple levels and time scales.

    PubMed

    Serrien, Ben; Baeyens, Jean-Pierre

    2017-10-01

    The proximal-to-distal sequence is a phenomenon that can be observed in a large variety of motions of the upper limbs in both humans and other mammals. The mechanisms behind this sequence are not completely understood and motor control theories able to explain this phenomenon are currently incomplete. The aim of this narrative review is to take a theoretical constraints-led approach to the proximal-to-distal sequence and provide a broad multidisciplinary overview of relevant literature. This sequence exists at multiple levels (brain, spine, muscles, kinetics and kinematics) and on multiple time scales (motion, motor learning and development, growth and possibly even evolution). We hypothesize that the proximodistal spatiotemporal direction on each time scale and level provides part of the organismic constraints that guide the dynamics at the other levels and time scales. The constraint-led approach in this review may serve as a first onset towards integration of evidence and a framework for further experimentation to reveal the dynamics of the proximal-to-distal sequence. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Understanding protein evolution: from protein physics to Darwinian selection.

    PubMed

    Zeldovich, Konstantin B; Shakhnovich, Eugene I

    2008-01-01

    Efforts in whole-genome sequencing and structural proteomics start to provide a global view of the protein universe, the set of existing protein structures and sequences. However, approaches based on the selection of individual sequences have not been entirely successful at the quantitative description of the distribution of structures and sequences in the protein universe because evolutionary pressure acts on the entire organism, rather than on a particular molecule. In parallel to this line of study, studies in population genetics and phenomenological molecular evolution established a mathematical framework to describe the changes in genome sequences in populations of organisms over time. Here, we review both microscopic (physics-based) and macroscopic (organism-level) models of protein-sequence evolution and demonstrate that bridging the two scales provides the most complete description of the protein universe starting from clearly defined, testable, and physiologically relevant assumptions.

  2. [Computational chemistry in structure-based drug design].

    PubMed

    Cao, Ran; Li, Wei; Sun, Han-Zi; Zhou, Yu; Huang, Niu

    2013-07-01

    Today, the understanding of the sequence and structure of biologically relevant targets is growing rapidly and researchers from many disciplines, physics and computational science in particular, are making significant contributions to modern biology and drug discovery. However, it remains challenging to rationally design small molecular ligands with desired biological characteristics based on the structural information of the drug targets, which demands more accurate calculation of ligand binding free-energy. With the rapid advances in computer power and extensive efforts in algorithm development, physics-based computational chemistry approaches have played more important roles in structure-based drug design. Here we reviewed the newly developed computational chemistry methods in structure-based drug design as well as the elegant applications, including binding-site druggability assessment, large scale virtual screening of chemical database, and lead compound optimization. Importantly, here we address the current bottlenecks and propose practical solutions.

  3. Selectivity by host plants affects the distribution of arbuscular mycorrhizal fungi: evidence from ITS rDNA sequence metadata.

    PubMed

    Yang, Haishui; Zang, Yanyan; Yuan, Yongge; Tang, Jianjun; Chen, Xin

    2012-04-12

    Arbuscular mycorrhizal fungi (AMF) can form obligate symbioses with the vast majority of land plants, and AMF distribution patterns have received increasing attention from researchers. At the local scale, the distribution of AMF is well documented. Studies at large scales, however, are limited because intensive sampling is difficult. Here, we used ITS rDNA sequence metadata obtained from public databases to study the distribution of AMF at continental and global scales. We also used these sequence metadata to investigate whether host plant is the main factor that affects the distribution of AMF at large scales. We defined 305 ITS virtual taxa (ITS-VTs) among all sequences of the Glomeromycota by using a comprehensive maximum likelihood phylogenetic analysis. Each host taxonomic order averaged about 53% specific ITS-VTs, and approximately 60% of the ITS-VTs were host specific. Those ITS-VTs with wide host range showed wide geographic distribution. Most ITS-VTs occurred in only one type of host functional group. The distributions of most ITS-VTs were limited across ecosystem, across continent, across biogeographical realm, and across climatic zone. Non-metric multidimensional scaling analysis (NMDS) showed that AMF community composition differed among functional groups of hosts, and among ecosystem, continent, biogeographical realm, and climatic zone. The Mantel test showed that AMF community composition was significantly correlated with plant community composition among ecosystem, among continent, among biogeographical realm, and among climatic zone. The structural equation modeling (SEM) showed that the effects of ecosystem, continent, biogeographical realm, and climatic zone were mainly indirect on AMF distribution, but plant had strongly direct effects on AMF. The distribution of AMF as indicated by ITS rDNA sequences showed a pattern of high endemism at large scales. This pattern indicates high specificity of AMF for host at different scales (plant taxonomic order and functional group) and high selectivity from host plants for AMF. The effects of ecosystemic, biogeographical, continental and climatic factors on AMF distribution might be mediated by host plants.

  4. Development and Evaluation of a 9K SNP Array for Peach by Internationally Coordinated SNP Detection and Validation in Breeding Germplasm

    PubMed Central

    Scalabrin, Simone; Gilmore, Barbara; Lawley, Cynthia T.; Gasic, Ksenija; Micheletti, Diego; Rosyara, Umesh R.; Cattonaro, Federica; Vendramin, Elisa; Main, Dorrie; Aramini, Valeria; Blas, Andrea L.; Mockler, Todd C.; Bryant, Douglas W.; Wilhelm, Larry; Troggio, Michela; Sosinski, Bryon; Aranzana, Maria José; Arús, Pere; Iezzoni, Amy; Morgante, Michele; Peace, Cameron

    2012-01-01

    Although a large number of single nucleotide polymorphism (SNP) markers covering the entire genome are needed to enable molecular breeding efforts such as genome wide association studies, fine mapping, genomic selection and marker-assisted selection in peach [Prunus persica (L.) Batsch] and related Prunus species, only a limited number of genetic markers, including simple sequence repeats (SSRs), have been available to date. To address this need, an international consortium (The International Peach SNP Consortium; IPSC) has pursued a coordinated effort to perform genome-scale SNP discovery in peach using next generation sequencing platforms to develop and characterize a high-throughput Illumina Infinium® SNP genotyping array platform. We performed whole genome re-sequencing of 56 peach breeding accessions using the Illumina and Roche/454 sequencing technologies. Polymorphism detection algorithms identified a total of 1,022,354 SNPs. Validation with the Illumina GoldenGate® assay was performed on a subset of the predicted SNPs, verifying ∼75% of genic (exonic and intronic) SNPs, whereas only about a third of intergenic SNPs were verified. Conservative filtering was applied to arrive at a set of 8,144 SNPs that were included on the IPSC peach SNP array v1, distributed over all eight peach chromosomes with an average spacing of 26.7 kb between SNPs. Use of this platform to screen a total of 709 accessions of peach in two separate evaluation panels identified a total of 6,869 (84.3%) polymorphic SNPs. The almost 7,000 SNPs verified as polymorphic through extensive empirical evaluation represent an excellent source of markers for future studies in genetic relatedness, genetic mapping, and dissecting the genetic architecture of complex agricultural traits. The IPSC peach SNP array v1 is commercially available and we expect that it will be used worldwide for genetic studies in peach and related stone fruit and nut species. PMID:22536421

  5. Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies.

    PubMed

    Thorsen, Jonathan; Brejnrod, Asker; Mortensen, Martin; Rasmussen, Morten A; Stokholm, Jakob; Al-Soud, Waleed Abu; Sørensen, Søren; Bisgaard, Hans; Waage, Johannes

    2016-11-25

    There is an immense scientific interest in the human microbiome and its effects on human physiology, health, and disease. A common approach for examining bacterial communities is high-throughput sequencing of 16S rRNA gene hypervariable regions, aggregating sequence-similar amplicons into operational taxonomic units (OTUs). Strategies for detecting differential relative abundance of OTUs between sample conditions include classical statistical approaches as well as a plethora of newer methods, many borrowing from the related field of RNA-seq analysis. This effort is complicated by unique data characteristics, including sparsity, sequencing depth variation, and nonconformity of read counts to theoretical distributions, which is often exacerbated by exploratory and/or unbalanced study designs. Here, we assess the robustness of available methods for (1) inference in differential relative abundance analysis and (2) beta-diversity-based sample separation, using a rigorous benchmarking framework based on large clinical 16S microbiome datasets from different sources. Running more than 380,000 full differential relative abundance tests on real datasets with permuted case/control assignments and in silico-spiked OTUs, we identify large differences in method performance on a range of parameters, including false positive rates, sensitivity to sparsity and case/control balances, and spike-in retrieval rate. In large datasets, methods with the highest false positive rates also tend to have the best detection power. For beta-diversity-based sample separation, we show that library size normalization has very little effect and that the distance metric is the most important factor in terms of separation power. Our results, generalizable to datasets from different sequencing platforms, demonstrate how the choice of method considerably affects analysis outcome. Here, we give recommendations for tools that exhibit low false positive rates, have good retrieval power across effect sizes and case/control proportions, and have low sparsity bias. Result output from some commonly used methods should be interpreted with caution. We provide an easily extensible framework for benchmarking of new methods and future microbiome datasets.

  6. Scalable Analysis Methods and In Situ Infrastructure for Extreme Scale Knowledge Discovery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bethel, Wes

    2016-07-24

    The primary challenge motivating this team’s work is the widening gap between the ability to compute information and to store it for subsequent analysis. This gap adversely impacts science code teams, who are able to perform analysis only on a small fraction of the data they compute, resulting in the very real likelihood of lost or missed science, when results are computed but not analyzed. Our approach is to perform as much analysis or visualization processing on data while it is still resident in memory, an approach that is known as in situ processing. The idea in situ processing wasmore » not new at the time of the start of this effort in 2014, but efforts in that space were largely ad hoc, and there was no concerted effort within the research community that aimed to foster production-quality software tools suitable for use by DOE science projects. In large, our objective was produce and enable use of production-quality in situ methods and infrastructure, at scale, on DOE HPC facilities, though we expected to have impact beyond DOE due to the widespread nature of the challenges, which affect virtually all large-scale computational science efforts. To achieve that objective, we assembled a unique team of researchers consisting of representatives from DOE national laboratories, academia, and industry, and engaged in software technology R&D, as well as engaged in close partnerships with DOE science code teams, to produce software technologies that were shown to run effectively at scale on DOE HPC platforms.« less

  7. The evolution of surface magnetic fields in young solar-type stars II: the early main sequence (250-650 Myr)

    NASA Astrophysics Data System (ADS)

    Folsom, C. P.; Bouvier, J.; Petit, P.; Lèbre, A.; Amard, L.; Palacios, A.; Morin, J.; Donati, J.-F.; Vidotto, A. A.

    2018-03-01

    There is a large change in surface rotation rates of sun-like stars on the pre-main sequence and early main sequence. Since these stars have dynamo-driven magnetic fields, this implies a strong evolution of their magnetic properties over this time period. The spin-down of these stars is controlled by interactions between stellar and magnetic fields, thus magnetic evolution in turn plays an important role in rotational evolution. We present here the second part of a study investigating the evolution of large-scale surface magnetic fields in this critical time period. We observed stars in open clusters and stellar associations with known ages between 120 and 650 Myr, and used spectropolarimetry and Zeeman Doppler Imaging to characterize their large-scale magnetic field strength and geometry. We report 15 stars with magnetic detections here. These stars have masses from 0.8 to 0.95 M⊙, rotation periods from 0.326 to 10.6 d, and we find large-scale magnetic field strengths from 8.5 to 195 G with a wide range of geometries. We find a clear trend towards decreasing magnetic field strength with age, and a power law decrease in magnetic field strength with Rossby number. There is some tentative evidence for saturation of the large-scale magnetic field strength at Rossby numbers below 0.1, although the saturation point is not yet well defined. Comparing to younger classical T Tauri stars, we support the hypothesis that differences in internal structure produce large differences in observed magnetic fields, however for weak-lined T Tauri stars this is less clear.

  8. Living laboratory: whole-genome sequencing as a learning healthcare enterprise.

    PubMed

    Angrist, M; Jamal, L

    2015-04-01

    With the proliferation of affordable large-scale human genomic data come profound and vexing questions about management of such data and their clinical uncertainty. These issues challenge the view that genomic research on human beings can (or should) be fully segregated from clinical genomics, either conceptually or practically. Here, we argue that the sharp distinction between clinical care and research is especially problematic in the context of large-scale genomic sequencing of people with suspected genetic conditions. Core goals of both enterprises (e.g. understanding genotype-phenotype relationships; generating an evidence base for genomic medicine) are more likely to be realized at a population scale if both those ordering and those undergoing sequencing for diagnostic reasons are routinely and longitudinally studied. Rather than relying on expensive and lengthy randomized clinical trials and meta-analyses, we propose leveraging nascent clinical-research hybrid frameworks into a broader, more permanent instantiation of exploratory medical sequencing. Such an investment could enlighten stakeholders about the real-life challenges posed by whole-genome sequencing, such as establishing the clinical actionability of genetic variants, returning 'off-target' results to families, developing effective service delivery models and monitoring long-term outcomes. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  9. Large-scale chromosome folding versus genomic DNA sequences: A discrete double Fourier transform technique.

    PubMed

    Chechetkin, V R; Lobzin, V V

    2017-08-07

    Using state-of-the-art techniques combining imaging methods and high-throughput genomic mapping tools leaded to the significant progress in detailing chromosome architecture of various organisms. However, a gap still remains between the rapidly growing structural data on the chromosome folding and the large-scale genome organization. Could a part of information on the chromosome folding be obtained directly from underlying genomic DNA sequences abundantly stored in the databanks? To answer this question, we developed an original discrete double Fourier transform (DDFT). DDFT serves for the detection of large-scale genome regularities associated with domains/units at the different levels of hierarchical chromosome folding. The method is versatile and can be applied to both genomic DNA sequences and corresponding physico-chemical parameters such as base-pairing free energy. The latter characteristic is closely related to the replication and transcription and can also be used for the assessment of temperature or supercoiling effects on the chromosome folding. We tested the method on the genome of E. coli K-12 and found good correspondence with the annotated domains/units established experimentally. As a brief illustration of further abilities of DDFT, the study of large-scale genome organization for bacteriophage PHIX174 and bacterium Caulobacter crescentus was also added. The combined experimental, modeling, and bioinformatic DDFT analysis should yield more complete knowledge on the chromosome architecture and genome organization. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. mySyntenyPortal: an application package to construct websites for synteny block analysis.

    PubMed

    Lee, Jongin; Lee, Daehwan; Sim, Mikang; Kwon, Daehong; Kim, Juyeon; Ko, Younhee; Kim, Jaebum

    2018-06-05

    Advances in sequencing technologies have facilitated large-scale comparative genomics based on whole genome sequencing. Constructing and investigating conserved genomic regions among multiple species (called synteny blocks) are essential in the comparative genomics. However, they require significant amounts of computational resources and time in addition to bioinformatics skills. Many web interfaces have been developed to make such tasks easier. However, these web interfaces cannot be customized for users who want to use their own set of genome sequences or definition of synteny blocks. To resolve this limitation, we present mySyntenyPortal, a stand-alone application package to construct websites for synteny block analyses by using users' own genome data. mySyntenyPortal provides both command line and web-based interfaces to build and manage websites for large-scale comparative genomic analyses. The websites can be also easily published and accessed by other users. To demonstrate the usability of mySyntenyPortal, we present an example study for building websites to compare genomes of three mammalian species (human, mouse, and cow) and show how they can be easily utilized to identify potential genes affected by genome rearrangements. mySyntenyPortal will contribute for extended comparative genomic analyses based on large-scale whole genome sequences by providing unique functionality to support the easy creation of interactive websites for synteny block analyses from user's own genome data.

  11. Phylotranscriptomic consolidation of the jawed vertebrate timetree.

    PubMed

    Irisarri, Iker; Baurain, Denis; Brinkmann, Henner; Delsuc, Frédéric; Sire, Jean-Yves; Kupfer, Alexander; Petersen, Jörn; Jarek, Michael; Meyer, Axel; Vences, Miguel; Philippe, Hervé

    2017-09-01

    Phylogenomics is extremely powerful but introduces new challenges as no agreement exists on "standards" for data selection, curation and tree inference. We use jawed vertebrates (Gnathostomata) as model to address these issues. Despite considerable efforts in resolving their evolutionary history and macroevolution, few studies have included a full phylogenetic diversity of gnathostomes and some relationships remain controversial. We tested a novel bioinformatic pipeline to assemble large and accurate phylogenomic datasets from RNA sequencing and find this phylotranscriptomic approach successful and highly cost-effective. Increased sequencing effort up to ca. 10Gbp allows recovering more genes, but shallower sequencing (1.5Gbp) is sufficient to obtain thousands of full-length orthologous transcripts. We reconstruct a robust and strongly supported timetree of jawed vertebrates using 7,189 nuclear genes from 100 taxa, including 23 new transcriptomes from previously unsampled key species. Gene jackknifing of genomic data corroborates the robustness of our tree and allows calculating genome-wide divergence times by overcoming gene sampling bias. Mitochondrial genomes prove insufficient to resolve the deepest relationships because of limited signal and among-lineage rate heterogeneity. Our analyses emphasize the importance of large curated nuclear datasets to increase the accuracy of phylogenomics and provide a reference framework for the evolutionary history of jawed vertebrates.

  12. SWARM : a scientific workflow for supporting Bayesian approaches to improve metabolic models.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, X.; Stevens, R.; Mathematics and Computer Science

    2008-01-01

    With the exponential growth of complete genome sequences, the analysis of these sequences is becoming a powerful approach to build genome-scale metabolic models. These models can be used to study individual molecular components and their relationships, and eventually study cells as systems. However, constructing genome-scale metabolic models manually is time-consuming and labor-intensive. This property of manual model-building process causes the fact that much fewer genome-scale metabolic models are available comparing to hundreds of genome sequences available. To tackle this problem, we design SWARM, a scientific workflow that can be utilized to improve genome-scale metabolic models in high-throughput fashion. SWARM dealsmore » with a range of issues including the integration of data across distributed resources, data format conversions, data update, and data provenance. Putting altogether, SWARM streamlines the whole modeling process that includes extracting data from various resources, deriving training datasets to train a set of predictors and applying Bayesian techniques to assemble the predictors, inferring on the ensemble of predictors to insert missing data, and eventually improving draft metabolic networks automatically. By the enhancement of metabolic model construction, SWARM enables scientists to generate many genome-scale metabolic models within a short period of time and with less effort.« less

  13. Identification of differentially methylated sites with weak methylation effect

    USDA-ARS?s Scientific Manuscript database

    DNA methylation is an epigenetic alteration crucial for regulating stress responses. Identifying large-scale DNA methylation at single nucleotide resolution is made possible by whole genome bisulfite sequencing. An essential task following the generation of bisulfite sequencing data is to detect dif...

  14. DNA Data Bank of Japan (DDBJ) for genome scale research in life science

    PubMed Central

    Tateno, Y.; Imanishi, T.; Miyazaki, S.; Fukami-Kobayashi, K.; Saitou, N.; Sugawara, H.; Gojobori, T.

    2002-01-01

    The DNA Data Bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) has made an effort to collect as much data as possible mainly from Japanese researchers. The increase rates of the data we collected, annotated and released to the public in the past year are 43% for the number of entries and 52% for the number of bases. The increase rates are accelerated even after the human genome was sequenced, because sequencing technology has been remarkably advanced and simplified, and research in life science has been shifted from the gene scale to the genome scale. In addition, we have developed the Genome Information Broker (GIB, http://gib.genes.nig.ac.jp) that now includes more than 50 complete microbial genome and Arabidopsis genome data. We have also developed a database of the human genome, the Human Genomics Studio (HGS, http://studio.nig.ac.jp). HGS provides one with a set of sequences being as continuous as possible in any one of the 24 chromosomes. Both GIB and HGS have been updated incorporating newly available data and retrieval tools. PMID:11752245

  15. Pigeonpea genomics initiative (PGI): an international effort to improve crop productivity of pigeonpea (Cajanus cajan L.)

    PubMed Central

    Penmetsa, R. V.; Dutta, S.; Kulwal, P. L.; Saxena, R. K.; Datta, S.; Sharma, T. R.; Rosen, B.; Carrasquilla-Garcia, N.; Farmer, A. D.; Dubey, A.; Saxena, K. B.; Gao, J.; Fakrudin, B.; Singh, M. N.; Singh, B. P.; Wanjari, K. B.; Yuan, M.; Srivastava, R. K.; Kilian, A.; Upadhyaya, H. D.; Mallikarjuna, N.; Town, C. D.; Bruening, G. E.; He, G.; May, G. D.; McCombie, R.; Jackson, S. A.; Singh, N. K.; Cook, D. R.

    2009-01-01

    Pigeonpea (Cajanus cajan), an important food legume crop in the semi-arid regions of the world and the second most important pulse crop in India, has an average crop productivity of 780 kg/ha. The relatively low crop yields may be attributed to non-availability of improved cultivars, poor crop husbandry and exposure to a number of biotic and abiotic stresses in pigeonpea growing regions. Narrow genetic diversity in cultivated germplasm has further hampered the effective utilization of conventional breeding as well as development and utilization of genomic tools, resulting in pigeonpea being often referred to as an ‘orphan crop legume’. To enable genomics-assisted breeding in this crop, the pigeonpea genomics initiative (PGI) was initiated in late 2006 with funding from Indian Council of Agricultural Research under the umbrella of Indo-US agricultural knowledge initiative, which was further expanded with financial support from the US National Science Foundation’s Plant Genome Research Program and the Generation Challenge Program. As a result of the PGI, the last 3 years have witnessed significant progress in development of both genetic as well as genomic resources in this crop through effective collaborations and coordination of genomics activities across several institutes and countries. For instance, 25 mapping populations segregating for a number of biotic and abiotic stresses have been developed or are under development. An 11X-genome coverage bacterial artificial chromosome (BAC) library comprising of 69,120 clones have been developed of which 50,000 clones were end sequenced to generate 87,590 BAC-end sequences (BESs). About 10,000 expressed sequence tags (ESTs) from Sanger sequencing and ca. 2 million short ESTs by 454/FLX sequencing have been generated. A variety of molecular markers have been developed from BESs, microsatellite or simple sequence repeat (SSR)-enriched libraries and mining of ESTs and genomic amplicon sequencing. Of about 21,000 SSRs identified, 6,698 SSRs are under analysis along with 670 orthologous genes using a GoldenGate SNP (single nucleotide polymorphism) genotyping platform, with large scale SNP discovery using Solexa, a next generation sequencing technology, is in progress. Similarly a diversity array technology array comprising of ca. 15,000 features has been developed. In addition, >600 unique nucleotide binding site (NBS) domain containing members of the NBS-leucine rich repeat disease resistance homologs were cloned in pigeonpea; 960 BACs containing these sequences were identified by filter hybridization, BES physical maps developed using high information content fingerprinting. To enrich the genomic resources further, sequenced soybean genome is being analyzed to establish the anchor points between pigeonpea and soybean genomes. In addition, Solexa sequencing is being used to explore the feasibility of generating whole genome sequence. In summary, the collaborative efforts of several research groups under the umbrella of PGI are making significant progress in improving molecular tools in pigeonpea and should significantly benefit pigeonpea genetics and breeding. As these efforts come to fruition, and expanded (depending on funding), pigeonpea would move from an ‘orphan legume crop’ to one where genomics-assisted breeding approaches for a sustainable crop improvement are routine. PMID:20976284

  16. Leveraging premalignant biology for immune-based cancer prevention.

    PubMed

    Spira, Avrum; Disis, Mary L; Schiller, John T; Vilar, Eduardo; Rebbeck, Timothy R; Bejar, Rafael; Ideker, Trey; Arts, Janine; Yurgelun, Matthew B; Mesirov, Jill P; Rao, Anjana; Garber, Judy; Jaffee, Elizabeth M; Lippman, Scott M

    2016-09-27

    Prevention is an essential component of cancer eradication. Next-generation sequencing of cancer genomes and epigenomes has defined large numbers of driver mutations and molecular subgroups, leading to therapeutic advances. By comparison, there is a relative paucity of such knowledge in premalignant neoplasia, which inherently limits the potential to develop precision prevention strategies. Studies on the interplay between germ-line and somatic events have elucidated genetic processes underlying premalignant progression and preventive targets. Emerging data hint at the immune system's ability to intercept premalignancy and prevent cancer. Genetically engineered mouse models have identified mechanisms by which genetic drivers and other somatic alterations recruit inflammatory cells and induce changes in normal cells to create and interact with the premalignant tumor microenvironment to promote oncogenesis and immune evasion. These studies are currently limited to only a few lesion types and patients. In this Perspective, we advocate a large-scale collaborative effort to systematically map the biology of premalignancy and the surrounding cellular response. By bringing together scientists from diverse disciplines (e.g., biochemistry, omics, and computational biology; microbiology, immunology, and medical genetics; engineering, imaging, and synthetic chemistry; and implementation science), we can drive a concerted effort focused on cancer vaccines to reprogram the immune response to prevent, detect, and reject premalignancy. Lynch syndrome, clonal hematopoiesis, and cervical intraepithelial neoplasia which also serve as models for inherited syndromes, blood, and viral premalignancies, are ideal scenarios in which to launch this initiative.

  17. Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences

    PubMed Central

    2014-01-01

    Background Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Results Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. Conclusion The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification. PMID:24418292

  18. Quantitative Tracking of Combinatorially Engineered Populations with Multiplexed Binary Assemblies.

    PubMed

    Zeitoun, Ramsey I; Pines, Gur; Grau, Willliam C; Gill, Ryan T

    2017-04-21

    Advances in synthetic biology and genomics have enabled full-scale genome engineering efforts on laboratory time scales. However, the absence of sufficient approaches for mapping engineered genomes at system-wide scales onto performance has limited the adoption of more sophisticated algorithms for engineering complex biological systems. Here we report on the development and application of a robust approach to quantitatively map combinatorially engineered populations at scales up to several dozen target sites. This approach works by assembling genome engineered sites with cell-specific barcodes into a format compatible with high-throughput sequencing technologies. This approach, called barcoded-TRACE (bTRACE) was applied to assess E. coli populations engineered by recursive multiplex recombineering across both 6-target sites and 31-target sites. The 31-target library was then tracked throughout growth selections in the presence and absence of isopentenol (a potential next-generation biofuel). We also use the resolution of bTRACE to compare the influence of technical and biological noise on genome engineering efforts.

  19. WImpiBLAST: web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing.

    PubMed

    Sharma, Parichit; Mantri, Shrikant S

    2014-01-01

    The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis.

  20. WImpiBLAST: Web Interface for mpiBLAST to Help Biologists Perform Large-Scale Annotation Using High Performance Computing

    PubMed Central

    Sharma, Parichit; Mantri, Shrikant S.

    2014-01-01

    The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis. PMID:24979410

  1. Large-scale DNA Barcode Library Generation for Biomolecule Identification in High-throughput Screens.

    PubMed

    Lyons, Eli; Sheridan, Paul; Tremmel, Georg; Miyano, Satoru; Sugano, Sumio

    2017-10-24

    High-throughput screens allow for the identification of specific biomolecules with characteristics of interest. In barcoded screens, DNA barcodes are linked to target biomolecules in a manner allowing for the target molecules making up a library to be identified by sequencing the DNA barcodes using Next Generation Sequencing. To be useful in experimental settings, the DNA barcodes in a library must satisfy certain constraints related to GC content, homopolymer length, Hamming distance, and blacklisted subsequences. Here we report a novel framework to quickly generate large-scale libraries of DNA barcodes for use in high-throughput screens. We show that our framework dramatically reduces the computation time required to generate large-scale DNA barcode libraries, compared with a naїve approach to DNA barcode library generation. As a proof of concept, we demonstrate that our framework is able to generate a library consisting of one million DNA barcodes for use in a fragment antibody phage display screening experiment. We also report generating a general purpose one billion DNA barcode library, the largest such library yet reported in literature. Our results demonstrate the value of our novel large-scale DNA barcode library generation framework for use in high-throughput screening applications.

  2. Theme II Joint Work Plan -2017 Collaboration and Knowledge Sharing on Large-scale Demonstration Projects

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Xiaoliang; Stauffer, Philip H.

    This effort is designed to expedite learnings from existing and planned large demonstration projects and their associated research through effective knowledge sharing among participants in the US and China.

  3. Identification of tissue-specific, abiotic stress-responsive gene expression patterns in wine grape (Vitis vinifera L.) based on curation and mining of large-scale EST data sets

    PubMed Central

    2011-01-01

    Background Abiotic stresses, such as water deficit and soil salinity, result in changes in physiology, nutrient use, and vegetative growth in vines, and ultimately, yield and flavor in berries of wine grape, Vitis vinifera L. Large-scale expressed sequence tags (ESTs) were generated, curated, and analyzed to identify major genetic determinants responsible for stress-adaptive responses. Although roots serve as the first site of perception and/or injury for many types of abiotic stress, EST sequencing in root tissues of wine grape exposed to abiotic stresses has been extremely limited to date. To overcome this limitation, large-scale EST sequencing was conducted from root tissues exposed to multiple abiotic stresses. Results A total of 62,236 expressed sequence tags (ESTs) were generated from leaf, berry, and root tissues from vines subjected to abiotic stresses and compared with 32,286 ESTs sequenced from 20 public cDNA libraries. Curation to correct annotation errors, clustering and assembly of the berry and leaf ESTs with currently available V. vinifera full-length transcripts and ESTs yielded a total of 13,278 unique sequences, with 2302 singletons and 10,976 mapped to V. vinifera gene models. Of these, 739 transcripts were found to have significant differential expression in stressed leaves and berries including 250 genes not described previously as being abiotic stress responsive. In a second analysis of 16,452 ESTs from a normalized root cDNA library derived from roots exposed to multiple, short-term, abiotic stresses, 135 genes with root-enriched expression patterns were identified on the basis of their relative EST abundance in roots relative to other tissues. Conclusions The large-scale analysis of relative EST frequency counts among a diverse collection of 23 different cDNA libraries from leaf, berry, and root tissues of wine grape exposed to a variety of abiotic stress conditions revealed distinct, tissue-specific expression patterns, previously unrecognized stress-induced genes, and many novel genes with root-enriched mRNA expression for improving our understanding of root biology and manipulation of rootstock traits in wine grape. mRNA abundance estimates based on EST library-enriched expression patterns showed only modest correlations between microarray and quantitative, real-time reverse transcription-polymerase chain reaction (qRT-PCR) methods highlighting the need for deep-sequencing expression profiling methods. PMID:21592389

  4. XLID-causing mutations and associated genes challenged in light of data from large-scale human exome sequencing.

    PubMed

    Piton, Amélie; Redin, Claire; Mandel, Jean-Louis

    2013-08-08

    Because of the unbalanced sex ratio (1.3-1.4 to 1) observed in intellectual disability (ID) and the identification of large ID-affected families showing X-linked segregation, much attention has been focused on the genetics of X-linked ID (XLID). Mutations causing monogenic XLID have now been reported in over 100 genes, most of which are commonly included in XLID diagnostic gene panels. Nonetheless, the boundary between true mutations and rare non-disease-causing variants often remains elusive. The sequencing of a large number of control X chromosomes, required for avoiding false-positive results, was not systematically possible in the past. Such information is now available thanks to large-scale sequencing projects such as the National Heart, Lung, and Blood (NHLBI) Exome Sequencing Project, which provides variation information on 10,563 X chromosomes from the general population. We used this NHLBI cohort to systematically reassess the implication of 106 genes proposed to be involved in monogenic forms of XLID. We particularly question the implication in XLID of ten of them (AGTR2, MAGT1, ZNF674, SRPX2, ATP6AP2, ARHGEF6, NXF5, ZCCHC12, ZNF41, and ZNF81), in which truncating variants or previously published mutations are observed at a relatively high frequency within this cohort. We also highlight 15 other genes (CCDC22, CLIC2, CNKSR2, FRMPD4, HCFC1, IGBP1, KIAA2022, KLF8, MAOA, NAA10, NLGN3, RPL10, SHROOM4, ZDHHC15, and ZNF261) for which replication studies are warranted. We propose that similar reassessment of reported mutations (and genes) with the use of data from large-scale human exome sequencing would be relevant for a wide range of other genetic diseases. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  5. Nanowire-nanopore transistor sensor for DNA detection during translocation

    NASA Astrophysics Data System (ADS)

    Xie, Ping; Xiong, Qihua; Fang, Ying; Qing, Quan; Lieber, Charles

    2011-03-01

    Nanopore sequencing, as a promising low cost, high throughput sequencing technique, has been proposed more than a decade ago. Due to the incompatibility between small ionic current signal and fast translocation speed and the technical difficulties on large scale integration of nanopore for direct ionic current sequencing, alternative methods rely on integrated DNA sensors have been proposed, such as using capacitive coupling or tunnelling current etc. But none of them have been experimentally demonstrated yet. Here we show that for the first time an amplified sensor signal has been experimentally recorded from a nanowire-nanopore field effect transistor sensor during DNA translocation. Independent multi-channel recording was also demonstrated for the first time. Our results suggest that the signal is from highly localized potential change caused by DNA translocation in none-balanced buffer condition. Given this method may produce larger signal for smaller nanopores, we hope our experiment can be a starting point for a new generation of nanopore sequencing devices with larger signal, higher bandwidth and large-scale multiplexing capability and finally realize the ultimate goal of low cost high throughput sequencing.

  6. A large set of newly created interspecific Saccharomyces hybrids increases aromatic diversity in lager beers.

    PubMed

    Mertens, Stijn; Steensels, Jan; Saels, Veerle; De Rouck, Gert; Aerts, Guido; Verstrepen, Kevin J

    2015-12-01

    Lager beer is the most consumed alcoholic beverage in the world. Its production process is marked by a fermentation conducted at low (8 to 15°C) temperatures and by the use of Saccharomyces pastorianus, an interspecific hybrid between Saccharomyces cerevisiae and the cold-tolerant Saccharomyces eubayanus. Recent whole-genome-sequencing efforts revealed that the currently available lager yeasts belong to one of only two archetypes, "Saaz" and "Frohberg." This limited genetic variation likely reflects that all lager yeasts descend from only two separate interspecific hybridization events, which may also explain the relatively limited aromatic diversity between the available lager beer yeasts compared to, for example, wine and ale beer yeasts. In this study, 31 novel interspecific yeast hybrids were developed, resulting from large-scale robot-assisted selection and breeding between carefully selected strains of S. cerevisiae (six strains) and S. eubayanus (two strains). Interestingly, many of the resulting hybrids showed a broader temperature tolerance than their parental strains and reference S. pastorianus yeasts. Moreover, they combined a high fermentation capacity with a desirable aroma profile in laboratory-scale lager beer fermentations, thereby successfully enriching the currently available lager yeast biodiversity. Pilot-scale trials further confirmed the industrial potential of these hybrids and identified one strain, hybrid H29, which combines a fast fermentation, high attenuation, and the production of a complex, desirable fruity aroma. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  7. Concept For Generation Of Long Pseudorandom Sequences

    NASA Technical Reports Server (NTRS)

    Wang, C. C.

    1990-01-01

    Conceptual very-large-scale integrated (VLSI) digital circuit performs exponentiation in finite field. Algorithm that generates unusually long sequences of pseudorandom numbers executed by digital processor that includes such circuits. Concepts particularly advantageous for such applications as spread-spectrum communications, cryptography, and generation of ranging codes, synthetic noise, and test data, where usually desirable to make pseudorandom sequences as long as possible.

  8. Why Do Phylogenomic Data Sets Yield Conflicting Trees? Data Type Influences the Avian Tree of Life more than Taxon Sampling.

    PubMed

    Reddy, Sushma; Kimball, Rebecca T; Pandey, Akanksha; Hosner, Peter A; Braun, Michael J; Hackett, Shannon J; Han, Kin-Lan; Harshman, John; Huddleston, Christopher J; Kingston, Sarah; Marks, Ben D; Miglia, Kathleen J; Moore, William S; Sheldon, Frederick H; Witt, Christopher C; Yuri, Tamaki; Braun, Edward L

    2017-09-01

    Phylogenomics, the use of large-scale data matrices in phylogenetic analyses, has been viewed as the ultimate solution to the problem of resolving difficult nodes in the tree of life. However, it has become clear that analyses of these large genomic data sets can also result in conflicting estimates of phylogeny. Here, we use the early divergences in Neoaves, the largest clade of extant birds, as a "model system" to understand the basis for incongruence among phylogenomic trees. We were motivated by the observation that trees from two recent avian phylogenomic studies exhibit conflicts. Those studies used different strategies: 1) collecting many characters [$\\sim$ 42 mega base pairs (Mbp) of sequence data] from 48 birds, sometimes including only one taxon for each major clade; and 2) collecting fewer characters ($\\sim$ 0.4 Mbp) from 198 birds, selected to subdivide long branches. However, the studies also used different data types: the taxon-poor data matrix comprised 68% non-coding sequences whereas coding exons dominated the taxon-rich data matrix. This difference raises the question of whether the primary reason for incongruence is the number of sites, the number of taxa, or the data type. To test among these alternative hypotheses we assembled a novel, large-scale data matrix comprising 90% non-coding sequences from 235 bird species. Although increased taxon sampling appeared to have a positive impact on phylogenetic analyses the most important variable was data type. Indeed, by analyzing different subsets of the taxa in our data matrix we found that increased taxon sampling actually resulted in increased congruence with the tree from the previous taxon-poor study (which had a majority of non-coding data) instead of the taxon-rich study (which largely used coding data). We suggest that the observed differences in the estimates of topology for these studies reflect data-type effects due to violations of the models used in phylogenetic analyses, some of which may be difficult to detect. If incongruence among trees estimated using phylogenomic methods largely reflects problems with model fit developing more "biologically-realistic" models is likely to be critical for efforts to reconstruct the tree of life. [Birds; coding exons; GTR model; model fit; Neoaves; non-coding DNA; phylogenomics; taxon sampling.]. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  9. A New Omics Data Resource of Pleurocybella porrigens for Gene Discovery

    PubMed Central

    Dohra, Hideo; Someya, Takumi; Takano, Tomoyuki; Harada, Kiyonori; Omae, Saori; Hirai, Hirofumi; Yano, Kentaro; Kawagishi, Hirokazu

    2013-01-01

    Background Pleurocybella porrigens is a mushroom-forming fungus, which has been consumed as a traditional food in Japan. In 2004, 55 people were poisoned by eating the mushroom and 17 people among them died of acute encephalopathy. Since then, the Japanese government has been alerting Japanese people to take precautions against eating the P . porrigens mushroom. Unfortunately, despite efforts, the molecular mechanism of the encephalopathy remains elusive. The genome and transcriptome sequence data of P . porrigens and the related species, however, are not stored in the public database. To gain the omics data in P . porrigens , we sequenced genome and transcriptome of its fruiting bodies and mycelia by next generation sequencing. Methodology/Principal Findings Short read sequences of genomic DNAs and mRNAs in P . porrigens were generated by Illumina Genome Analyzer. Genome short reads were de novo assembled into scaffolds using Velvet. Comparisons of genome signatures among Agaricales showed that P . porrigens has a unique genome signature. Transcriptome sequences were assembled into contigs (unigenes). Biological functions of unigenes were predicted by Gene Ontology and KEGG pathway analyses. The majority of unigenes would be novel genes without significant counterparts in the public omics databases. Conclusions Functional analyses of unigenes present the existence of numerous novel genes in the basidiomycetes division. The results mean that the omics information such as genome, transcriptome and metabolome in basidiomycetes is short in the current databases. The large-scale omics information on P . porrigens , provided from this research, will give a new data resource for gene discovery in basidiomycetes. PMID:23936076

  10. Microfluidic single-cell whole-transcriptome sequencing.

    PubMed

    Streets, Aaron M; Zhang, Xiannian; Cao, Chen; Pang, Yuhong; Wu, Xinglong; Xiong, Liang; Yang, Lu; Fu, Yusi; Zhao, Liang; Tang, Fuchou; Huang, Yanyi

    2014-05-13

    Single-cell whole-transcriptome analysis is a powerful tool for quantifying gene expression heterogeneity in populations of cells. Many techniques have, thus, been recently developed to perform transcriptome sequencing (RNA-Seq) on individual cells. To probe subtle biological variation between samples with limiting amounts of RNA, more precise and sensitive methods are still required. We adapted a previously developed strategy for single-cell RNA-Seq that has shown promise for superior sensitivity and implemented the chemistry in a microfluidic platform for single-cell whole-transcriptome analysis. In this approach, single cells are captured and lysed in a microfluidic device, where mRNAs with poly(A) tails are reverse-transcribed into cDNA. Double-stranded cDNA is then collected and sequenced using a next generation sequencing platform. We prepared 94 libraries consisting of single mouse embryonic cells and technical replicates of extracted RNA and thoroughly characterized the performance of this technology. Microfluidic implementation increased mRNA detection sensitivity as well as improved measurement precision compared with tube-based protocols. With 0.2 M reads per cell, we were able to reconstruct a majority of the bulk transcriptome with 10 single cells. We also quantified variation between and within different types of mouse embryonic cells and found that enhanced measurement precision, detection sensitivity, and experimental throughput aided the distinction between biological variability and technical noise. With this work, we validated the advantages of an early approach to single-cell RNA-Seq and showed that the benefits of combining microfluidic technology with high-throughput sequencing will be valuable for large-scale efforts in single-cell transcriptome analysis.

  11. Taking Teacher Learning to Scale: Sharing Knowledge and Spreading Ideas across Geographies

    ERIC Educational Resources Information Center

    Klein, Emily J.; Jaffe-Walter, Reva; Riordan, Megan

    2016-01-01

    This research reports data from case studies of three intermediary organizations facing the challenge of scaling up teacher learning. The turn of the century launched scaling-up efforts of all three intermediaries, growing from intimate groups, where founding teachers and staff were key supports for teacher learning, to large multistate…

  12. Multisite Studies and Scaling up in Educational Research

    ERIC Educational Resources Information Center

    Harwell, Michael

    2012-01-01

    A scale-up study in education typically expands the sample of students, schools, districts, and/or practices or materials used in smaller studies in ways that build in heterogeneity. Yet surprisingly little is known about the factors that promote successful scaling up efforts in education, in large part due to the absence of empirically supported…

  13. Shake Test Results and Dynamic Calibration Efforts for the Large Rotor Test Apparatus

    NASA Technical Reports Server (NTRS)

    Russell, Carl R.

    2014-01-01

    A shake test of the Large Rotor Test Apparatus (LRTA) was performed in an effort to enhance NASAscapability to measure dynamic hub loads for full-scale rotor tests. This paper documents the results of theshake test as well as efforts to calibrate the LRTA balance system to measure dynamic loads.Dynamic rotor loads are the primary source of vibration in helicopters and other rotorcraft, leading topassenger discomfort and damage due to fatigue of aircraft components. There are novel methods beingdeveloped to reduce rotor vibrations, but measuring the actual vibration reductions on full-scale rotorsremains a challenge. In order to measure rotor forces on the LRTA, a balance system in the non-rotatingframe is used. The forces at the balance can then be translated to the hub reference frame to measure therotor loads. Because the LRTA has its own dynamic response, the balance system must be calibrated toinclude the natural frequencies of the test rig.

  14. Large Scale Structure Studies: Final Results from a Rich Cluster Redshift Survey

    NASA Astrophysics Data System (ADS)

    Slinglend, K.; Batuski, D.; Haase, S.; Hill, J.

    1995-12-01

    The results from the COBE satellite show the existence of structure on scales on the order of 10% or more of the horizon scale of the universe. Rich clusters of galaxies from the Abell-ACO catalogs show evidence of structure on scales of 100 Mpc and hold the promise of confirming structure on the scale of the COBE result. Unfortunately, until now, redshift information has been unavailable for a large percentage of these clusters, so present knowledge of their three dimensional distribution has quite large uncertainties. Our approach in this effort has been to use the MX multifiber spectrometer on the Steward 2.3m to measure redshifts of at least ten galaxies in each of 88 Abell cluster fields with richness class R>= 1 and mag10 <= 16.8 (estimated z<= 0.12) and zero or one measured redshifts. This work has resulted in a deeper, 95% complete and more reliable sample of 3-D positions of rich clusters. The primary intent of this survey has been to constrain theoretical models for the formation of the structure we see in the universe today through 2-pt. spatial correlation function and other analyses of the large scale structures traced by these clusters. In addition, we have obtained enough redshifts per cluster to greatly improve the quality and size of the sample of reliable cluster velocity dispersions available for use in other studies of cluster properties. This new data has also allowed the construction of an updated and more reliable supercluster candidate catalog. Our efforts have resulted in effectively doubling the volume traced by these clusters. Presented here is the resulting 2-pt. spatial correlation function, as well as density plots and several other figures quantifying the large scale structure from this much deeper and complete sample. Also, with 10 or more redshifts in most of our cluster fields, we have investigated the extent of projection effects within the Abell catalog in an effort to quantify and understand how this may effect the Abell sample.

  15. Genome sequencing in microfabricated high-density picolitre reactors.

    PubMed

    Margulies, Marcel; Egholm, Michael; Altman, William E; Attiya, Said; Bader, Joel S; Bemben, Lisa A; Berka, Jan; Braverman, Michael S; Chen, Yi-Ju; Chen, Zhoutao; Dewell, Scott B; Du, Lei; Fierro, Joseph M; Gomes, Xavier V; Godwin, Brian C; He, Wen; Helgesen, Scott; Ho, Chun Heen; Ho, Chun He; Irzyk, Gerard P; Jando, Szilveszter C; Alenquer, Maria L I; Jarvie, Thomas P; Jirage, Kshama B; Kim, Jong-Bum; Knight, James R; Lanza, Janna R; Leamon, John H; Lefkowitz, Steven M; Lei, Ming; Li, Jing; Lohman, Kenton L; Lu, Hong; Makhijani, Vinod B; McDade, Keith E; McKenna, Michael P; Myers, Eugene W; Nickerson, Elizabeth; Nobile, John R; Plant, Ramona; Puc, Bernard P; Ronan, Michael T; Roth, George T; Sarkis, Gary J; Simons, Jan Fredrik; Simpson, John W; Srinivasan, Maithreyan; Tartaro, Karrie R; Tomasz, Alexander; Vogt, Kari A; Volkmer, Greg A; Wang, Shally H; Wang, Yong; Weiner, Michael P; Yu, Pengguang; Begley, Richard F; Rothberg, Jonathan M

    2005-09-15

    The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.

  16. Large-scale photospheric motions determined from granule tracking and helioseismology from SDO/HMI data

    NASA Astrophysics Data System (ADS)

    Roudier, Th.; Švanda, M.; Ballot, J.; Malherbe, J. M.; Rieutord, M.

    2018-04-01

    Context. Large-scale flows in the Sun play an important role in the dynamo process linked to the solar cycle. The important large-scale flows are the differential rotation and the meridional circulation with an amplitude of km s-1 and few m s-1, respectively. These flows also have a cycle-related components, namely the torsional oscillations. Aim. Our attempt is to determine large-scale plasma flows on the solar surface by deriving horizontal flow velocities using the techniques of solar granule tracking, dopplergrams, and time-distance helioseismology. Methods: Coherent structure tracking (CST) and time-distance helioseismology were used to investigate the solar differential rotation and meridional circulation at the solar surface on a 30-day HMI/SDO sequence. The influence of a large sunspot on these large-scale flows with a specific 7-day HMI/SDO sequence has been also studied. Results: The large-scale flows measured by the CST on the solar surface and the same flow determined from the same data with the helioseismology in the first 1 Mm below the surface are in good agreement in amplitude and direction. The torsional waves are also located at the same latitudes with amplitude of the same order. We are able to measure the meridional circulation correctly using the CST method with only 3 days of data and after averaging between ± 15° in longitude. Conclusions: We conclude that the combination of CST and Doppler velocities allows us to detect properly the differential solar rotation and also smaller amplitude flows such as the meridional circulation and torsional waves. The results of our methods are in good agreement with helioseismic measurements.

  17. Streaming fragment assignment for real-time analysis of sequencing experiments

    PubMed Central

    Roberts, Adam; Pachter, Lior

    2013-01-01

    We present eXpress, a software package for highly efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time, and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data, showing greater efficiency than other quantification methods. PMID:23160280

  18. Draft genome sequence of Staphylococcus aureus KT/312045, an ST1-MSSA PVL positive isolated from pus sample in East Coast Malaysia.

    PubMed

    Suhaili, Zarizal; Lean, Soo-Sum; Mohamad, Noor Muzamil; Rachman, Abdul R Abdul; Desa, Mohd Nasir Mohd; Yeo, Chew Chieng

    2016-09-01

    Most of the efforts in elucidating the molecular relatedness and epidemiology of Staphylococcus aureus in Malaysia have been largely focused on methicillin-resistant S. aureus (MRSA). Therefore, here we report the draft genome sequence of the methicillin-susceptible Staphylococcus aureus (MSSA) with sequence type 1 (ST1), spa type t127 with Panton-Valentine Leukocidin (pvl) pathogenic determinant isolated from pus sample designated as KT/314250 strain. The size of the draft genome is 2.86 Mbp with 32.7% of G + C content consisting 2673 coding sequences. The draft genome sequence has been deposited in DDBJ/EMBL/GenBank under the accession number AOCP00000000.

  19. Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics.

    PubMed

    Ragothaman, Anjani; Boddu, Sairam Chowdary; Kim, Nayong; Feinstein, Wei; Brylinski, Michal; Jha, Shantenu; Kim, Joohyun

    2014-01-01

    While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread--a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.

  20. Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

    PubMed Central

    Ragothaman, Anjani; Feinstein, Wei; Jha, Shantenu; Kim, Joohyun

    2014-01-01

    While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure. PMID:24995285

  1. Analysis of genetic diversity using SNP markers in oat

    USDA-ARS?s Scientific Manuscript database

    A large-scale single nucleotide polymorphism (SNP) discovery was carried out in cultivated oat using Roche 454 sequencing methods. DNA sequences were generated from cDNAs originating from a panel of 20 diverse oat cultivars, and from Diversity Array Technology (DArT) genomic complexity reductions fr...

  2. Use of low-coverage, large-insert, short-read data for rapid and accurate generation of enhanced-quality draft Pseudomonas genome sequences.

    PubMed

    O'Brien, Heath E; Gong, Yunchen; Fung, Pauline; Wang, Pauline W; Guttman, David S

    2011-01-01

    Next-generation genomic technology has both greatly accelerated the pace of genome research as well as increased our reliance on draft genome sequences. While groups such as the Genomics Standards Consortium have made strong efforts to promote genome standards there is a still a general lack of uniformity among published draft genomes, leading to challenges for downstream comparative analyses. This lack of uniformity is a particular problem when using standard draft genomes that frequently have large numbers of low-quality sequencing tracts. Here we present a proposal for an "enhanced-quality draft" genome that identifies at least 95% of the coding sequences, thereby effectively providing a full accounting of the genic component of the genome. Enhanced-quality draft genomes are easily attainable through a combination of small- and large-insert next-generation, paired-end sequencing. We illustrate the generation of an enhanced-quality draft genome by re-sequencing the plant pathogenic bacterium Pseudomonas syringae pv. phaseolicola 1448A (Pph 1448A), which has a published, closed genome sequence of 5.93 Mbp. We use a combination of Illumina paired-end and mate-pair sequencing, and surprisingly find that de novo assemblies with 100x paired-end coverage and mate-pair sequencing with as low as low as 2-5x coverage are substantially better than assemblies based on higher coverage. The rapid and low-cost generation of large numbers of enhanced-quality draft genome sequences will be of particular value for microbial diagnostics and biosecurity, which rely on precise discrimination of potentially dangerous clones from closely related benign strains.

  3. Outlook and Challenges of Perovskite Solar Cells toward Terawatt-Scale Photovoltaic Module Technology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhu, Kai; Kim, Donghoe; Whitaker, James B

    Rapid development of perovskite solar cells (PSCs) during the past several years has made this photovoltaic (PV) technology a serious contender for potential large-scale deployment on the terawatt scale in the PV market. To successfully transition PSC technology from the laboratory to industry scale, substantial efforts need to focus on scalable fabrication of high-performance perovskite modules with minimum negative environmental impact. Here, we provide an overview of the current research and our perspective regarding PSC technology toward future large-scale manufacturing and deployment. Several key challenges discussed are (1) a scalable process for large-area perovskite module fabrication; (2) less hazardous chemicalmore » routes for PSC fabrication; and (3) suitable perovskite module designs for different applications.« less

  4. Prediction of Ras-effector interactions using position energy matrices.

    PubMed

    Kiel, Christina; Serrano, Luis

    2007-09-01

    One of the more challenging problems in biology is to determine the cellular protein interaction network. Progress has been made to predict protein-protein interactions based on structural information, assuming that structural similar proteins interact in a similar way. In a previous publication, we have determined a genome-wide Ras-effector interaction network based on homology models, with a high accuracy of predicting binding and non-binding domains. However, for a prediction on a genome-wide scale, homology modelling is a time-consuming process. Therefore, we here successfully developed a faster method using position energy matrices, where based on different Ras-effector X-ray template structures, all amino acids in the effector binding domain are sequentially mutated to all other amino acid residues and the effect on binding energy is calculated. Those pre-calculated matrices can then be used to score for binding any Ras or effector sequences. Based on position energy matrices, the sequences of putative Ras-binding domains can be scanned quickly to calculate an energy sum value. By calibrating energy sum values using quantitative experimental binding data, thresholds can be defined and thus non-binding domains can be excluded quickly. Sequences which have energy sum values above this threshold are considered to be potential binding domains, and could be further analysed using homology modelling. This prediction method could be applied to other protein families sharing conserved interaction types, in order to determine in a fast way large scale cellular protein interaction networks. Thus, it could have an important impact on future in silico structural genomics approaches, in particular with regard to increasing structural proteomics efforts, aiming to determine all possible domain folds and interaction types. All matrices are deposited in the ADAN database (http://adan-embl.ibmc.umh.es/). Supplementary data are available at Bioinformatics online.

  5. Enhancing ecosystem restoration efficiency through spatial and temporal coordination.

    PubMed

    Neeson, Thomas M; Ferris, Michael C; Diebel, Matthew W; Doran, Patrick J; O'Hanley, Jesse R; McIntyre, Peter B

    2015-05-12

    In many large ecosystems, conservation projects are selected by a diverse set of actors operating independently at spatial scales ranging from local to international. Although small-scale decision making can leverage local expert knowledge, it also may be an inefficient means of achieving large-scale objectives if piecemeal efforts are poorly coordinated. Here, we assess the value of coordinating efforts in both space and time to maximize the restoration of aquatic ecosystem connectivity. Habitat fragmentation is a leading driver of declining biodiversity and ecosystem services in rivers worldwide, and we simultaneously evaluate optimal barrier removal strategies for 661 tributary rivers of the Laurentian Great Lakes, which are fragmented by at least 6,692 dams and 232,068 road crossings. We find that coordinating barrier removals across the entire basin is nine times more efficient at reconnecting fish to headwater breeding grounds than optimizing independently for each watershed. Similarly, a one-time pulse of restoration investment is up to 10 times more efficient than annual allocations totaling the same amount. Despite widespread emphasis on dams as key barriers in river networks, improving road culvert passability is also essential for efficiently restoring connectivity to the Great Lakes. Our results highlight the dramatic economic and ecological advantages of coordinating efforts in both space and time during restoration of large ecosystems.

  6. Enhancing ecosystem restoration efficiency through spatial and temporal coordination

    PubMed Central

    Neeson, Thomas M.; Ferris, Michael C.; Diebel, Matthew W.; Doran, Patrick J.; O’Hanley, Jesse R.; McIntyre, Peter B.

    2015-01-01

    In many large ecosystems, conservation projects are selected by a diverse set of actors operating independently at spatial scales ranging from local to international. Although small-scale decision making can leverage local expert knowledge, it also may be an inefficient means of achieving large-scale objectives if piecemeal efforts are poorly coordinated. Here, we assess the value of coordinating efforts in both space and time to maximize the restoration of aquatic ecosystem connectivity. Habitat fragmentation is a leading driver of declining biodiversity and ecosystem services in rivers worldwide, and we simultaneously evaluate optimal barrier removal strategies for 661 tributary rivers of the Laurentian Great Lakes, which are fragmented by at least 6,692 dams and 232,068 road crossings. We find that coordinating barrier removals across the entire basin is nine times more efficient at reconnecting fish to headwater breeding grounds than optimizing independently for each watershed. Similarly, a one-time pulse of restoration investment is up to 10 times more efficient than annual allocations totaling the same amount. Despite widespread emphasis on dams as key barriers in river networks, improving road culvert passability is also essential for efficiently restoring connectivity to the Great Lakes. Our results highlight the dramatic economic and ecological advantages of coordinating efforts in both space and time during restoration of large ecosystems. PMID:25918378

  7. Webinar July 28: H2@Scale - A Potential Opportunity | News | NREL

    Science.gov Websites

    role of hydrogen at the grid scale and the efforts of a large, national lab team assembled to evaluate the potential of hydrogen to play a critical role in our energy future. Presenters will share facts

  8. DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation.

    PubMed

    You, Ronghui; Huang, Xiaodi; Zhu, Shanfeng

    2018-06-06

    As of April 2018, UniProtKB has collected more than 115 million protein sequences. Less than 0.15% of these proteins, however, have been associated with experimental GO annotations. As such, the use of automatic protein function prediction (AFP) to reduce this huge gap becomes increasingly important. The previous studies conclude that sequence homology based methods are highly effective in AFP. In addition, mining motif, domain, and functional information from protein sequences has been found very helpful for AFP. Other than sequences, alternative information sources such as text, however, may be useful for AFP as well. Instead of using BOW (bag of words) representation in traditional text-based AFP, we propose a new method called DeepText2GO that relies on deep semantic text representation, together with different kinds of available protein information such as sequence homology, families, domains, and motifs, to improve large-scale AFP. Furthermore, DeepText2GO integrates text-based methods with sequence-based ones by means of a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence-based methods, validating its superiority. Copyright © 2018 Elsevier Inc. All rights reserved.

  9. Exploring the role of curriculum materials to support teachers in science education reform

    NASA Astrophysics Data System (ADS)

    Schneider, Rebecca M.

    2001-07-01

    For curriculum materials to succeed in promoting large-scale science education reform, teacher learning must be supported. Materials were designed to reflect desired reforms and to be educative by including detailed lesson descriptions that addressed necessary content, pedagogy, and pedagogical content knowledge for teachers. The goal of this research was to describe how such materials contributed to classroom practices. As part of an urban systemic reform effort, four middle school teachers' initial enactment of an inquiry-based science unit on force and motion were videotaped. Enactments focused on five lesson sequences containing experiences with phenomena, investigation, technology use, or artifact development. Each sequence spanned three to five days across the 10-week unit. For each lesson sequence, intended and actual enactment were compared using ratings of (1) accuracy and completeness of science ideas presented, (2) amount student learning opportunities, similarity of learning opportunities with those intended, and quality of adaptations , and (3) amount of instructional supports offered, appropriateness of instructional supports and source of ideas for instructional supports. Ratings indicated two teachers' enactments were consistent with intentions and two teachers' enactments were not. The first two were in school contexts supportive of the reform. They purposefully used the materials to guide enactment, which tended to be consistent with standards-based reform. They provided students opportunities to use technology tools, design investigations, and discuss ideas. However, enactment ratings were less reflective of curriculum intent when challenges were greatest, such as when teachers attempted to present challenging science ideas, respond to students' ideas, structure investigations, guide small-group discussions, or make adaptations. Moreover, enactment ratings were less consistent in parts of lessons where materials did not include lesson specific educative supports for teachers. Overall, findings indicate curriculum materials that include detailed descriptions of lessons accompanied by educative features can help teachers with enactment. Therefore, design principles to improve materials to support teachers in reform are suggested. However, results also demonstrate materials alone are not sufficient to create intended enactments; reform efforts must include professional development in content and pedagogy and efforts to create systemic change in context and policy to support teacher learning and classroom enactment.

  10. Transcriptome sequencing and annotation of the halophytic microalga Dunaliella salina * #

    PubMed Central

    Hong, Ling; Liu, Jun-li; Midoun, Samira Z.; Miller, Philip C.

    2017-01-01

    The unicellular green alga Dunaliella salina is well adapted to salt stress and contains compounds (including β-carotene and vitamins) with potential commercial value. A large transcriptome database of D. salina during the adjustment, exponential and stationary growth phases was generated using a high throughput sequencing platform. We characterized the metabolic processes in D. salina with a focus on valuable metabolites, with the aim of manipulating D. salina to achieve greater economic value in large-scale production through a bioengineering strategy. Gene expression profiles under salt stress verified using quantitative polymerase chain reaction (qPCR) implied that salt can regulate the expression of key genes. This study generated a substantial fraction of D. salina transcriptional sequences for the entire growth cycle, providing a basis for the discovery of novel genes. This first full-scale transcriptome study of D. salina establishes a foundation for further comparative genomic studies. PMID:28990374

  11. The autism sequencing consortium: large-scale, high-throughput sequencing in autism spectrum disorders.

    PubMed

    Buxbaum, Joseph D; Daly, Mark J; Devlin, Bernie; Lehner, Thomas; Roeder, Kathryn; State, Matthew W

    2012-12-20

    Research during the past decade has seen significant progress in the understanding of the genetic architecture of autism spectrum disorders (ASDs), with gene discovery accelerating as the characterization of genomic variation has become increasingly comprehensive. At the same time, this research has highlighted ongoing challenges. Here we address the enormous impact of high-throughput sequencing (HTS) on ASD gene discovery, outline a consensus view for leveraging this technology, and describe a large multisite collaboration developed to accomplish these goals. Similar approaches could prove effective for severe neurodevelopmental disorders more broadly. Copyright © 2012 Elsevier Inc. All rights reserved.

  12. (New hosts and vectors for genome cloning)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    The main goal of our project remains the development of new bacterial hosts and vectors for the stable propagation of human DNA clones in E. coli. During the past six months of our current budget period, we have (1) continued to develop new hosts that permit the stable maintenance of unstable features of human DNA, and (2) developed a series of vectors for (a) cloning large DNA inserts, (b) assessing the frequency of human sequences that are lethal to the growth of E. coli, and (c) assessing the stability of human sequences cloned in M13 for large-scale sequencing projects.

  13. [New hosts and vectors for genome cloning]. Progress report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    The main goal of our project remains the development of new bacterial hosts and vectors for the stable propagation of human DNA clones in E. coli. During the past six months of our current budget period, we have (1) continued to develop new hosts that permit the stable maintenance of unstable features of human DNA, and (2) developed a series of vectors for (a) cloning large DNA inserts, (b) assessing the frequency of human sequences that are lethal to the growth of E. coli, and (c) assessing the stability of human sequences cloned in M13 for large-scale sequencing projects.

  14. Interbreeding among deeply divergent mitochondrial lineages in the American cockroach (Periplaneta americana)

    NASA Astrophysics Data System (ADS)

    von Beeren, Christoph; Stoeckle, Mark Y.; Xia, Joyce; Burke, Griffin; Kronauer, Daniel J. C.

    2015-02-01

    DNA barcoding promises to be a useful tool to identify pest species assuming adequate representation of genetic variants in a reference library. Here we examined mitochondrial DNA barcodes in a global urban pest, the American cockroach (Periplaneta americana). Our sampling effort generated 284 cockroach specimens, most from New York City, plus 15 additional U.S. states and six other countries, enabling the first large-scale survey of P. americana barcode variation. Periplaneta americana barcode sequences (n = 247, including 24 GenBank records) formed a monophyletic lineage separate from other Periplaneta species. We found three distinct P. americana haplogroups with relatively small differences within (<=0.6%) and larger differences among groups (2.4%-4.7%). This could be interpreted as indicative of multiple cryptic species. However, nuclear DNA sequences (n = 77 specimens) revealed extensive gene flow among mitochondrial haplogroups, confirming a single species. This unusual genetic pattern likely reflects multiple introductions from genetically divergent source populations, followed by interbreeding in the invasive range. Our findings highlight the need for comprehensive reference databases in DNA barcoding studies, especially when dealing with invasive populations that might be derived from multiple genetically distinct source populations.

  15. A genome-wide interactome of DNA-associated proteins in the human liver.

    PubMed

    Ramaker, Ryne C; Savic, Daniel; Hardigan, Andrew A; Newberry, Kimberly; Cooper, Gregory M; Myers, Richard M; Cooper, Sara J

    2017-11-01

    Large-scale efforts like the ENCODE Project have made tremendous progress in cataloging the genomic binding patterns of DNA-associated proteins (DAPs), such as transcription factors (TFs). However, most chromatin immunoprecipitation-sequencing (ChIP-seq) analyses have focused on a few immortalized cell lines whose activities and physiology differ in important ways from endogenous cells and tissues. Consequently, binding data from primary human tissue are essential to improving our understanding of in vivo gene regulation. Here, we identify and analyze more than 440,000 binding sites using ChIP-seq data for 20 DAPs in two human liver tissue samples. We integrated binding data with transcriptome and phased WGS data to investigate allelic DAP interactions and the impact of heterozygous sequence variation on the expression of neighboring genes. Our tissue-based data set exhibits binding patterns more consistent with liver biology than cell lines, and we describe uses of these data to better prioritize impactful noncoding variation. Collectively, our rich data set offers novel insights into genome function in human liver tissue and provides a valuable resource for assessing disease-related disruptions. © 2017 Ramaker et al.; Published by Cold Spring Harbor Laboratory Press.

  16. MAPU: Max-Planck Unified database of organellar, cellular, tissue and body fluid proteomes

    PubMed Central

    Zhang, Yanling; Zhang, Yong; Adachi, Jun; Olsen, Jesper V.; Shi, Rong; de Souza, Gustavo; Pasini, Erica; Foster, Leonard J.; Macek, Boris; Zougman, Alexandre; Kumar, Chanchal; Wiśniewski, Jacek R.; Jun, Wang; Mann, Matthias

    2007-01-01

    Mass spectrometry (MS)-based proteomics has become a powerful technology to map the protein composition of organelles, cell types and tissues. In our department, a large-scale effort to map these proteomes is complemented by the Max-Planck Unified (MAPU) proteome database. MAPU contains several body fluid proteomes; including plasma, urine, and cerebrospinal fluid. Cell lines have been mapped to a depth of several thousand proteins and the red blood cell proteome has also been analyzed in depth. The liver proteome is represented with 3200 proteins. By employing high resolution MS and stringent validation criteria, false positive identification rates in MAPU are lower than 1:1000. Thus MAPU datasets can serve as reference proteomes in biomarker discovery. MAPU contains the peptides identifying each protein, measured masses, scores and intensities and is freely available at using a clickable interface of cell or body parts. Proteome data can be queried across proteomes by protein name, accession number, sequence similarity, peptide sequence and annotation information. More than 4500 mouse and 2500 human proteins have already been identified in at least one proteome. Basic annotation information and links to other public databases are provided in MAPU and we plan to add further analysis tools. PMID:17090601

  17. DNA barcodes for 1/1000 of the animal kingdom.

    PubMed

    Hebert, Paul D N; Dewaard, Jeremy R; Landry, Jean-François

    2010-06-23

    This study reports DNA barcodes for more than 1300 Lepidoptera species from the eastern half of North America, establishing that 99.3 per cent of these species possess diagnostic barcode sequences. Intraspecific divergences averaged just 0.43 per cent among this assemblage, but most values were lower. The mean was elevated by deep barcode divergences (greater than 2%) in 5.1 per cent of the species, often involving the sympatric occurrence of two barcode clusters. A few of these cases have been analysed in detail, revealing species overlooked by the current taxonomic system. This study also provided a large-scale test of the extent of regional divergence in barcode sequences, indicating that geographical differentiation in the Lepidoptera of eastern North America is small, even when comparisons involve populations as much as 2800 km apart. The present results affirm that a highly effective system for the identification of Lepidoptera in this region can be built with few records per species because of the limited intra-specific variation. As most terrestrial and marine taxa are likely to possess a similar pattern of population structure, an effective DNA-based identification system can be developed with modest effort.

  18. Major soybean maturity gene haplotypes revealed by SNPViz analysis of 72 sequenced soybean genomes

    USDA-ARS?s Scientific Manuscript database

    In this Genomics Era, vast amounts of next generation sequencing data have become publicly-available for multiple genomes across hundreds of species. Analysis of these large-scale datasets can become cumbersome, especially when comparing nucleotide polymorphisms across many samples within a dataset...

  19. Large-scale gene function analysis with the PANTHER classification system.

    PubMed

    Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D

    2013-08-01

    The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.

  20. Book Review: Large-Scale Ecosystem Restoration: Five Case Studies from the United States

    EPA Science Inventory

    Broad-scale ecosystem restoration efforts involve a very complex set of ecological and societal components, and the success of any ecosystem restoration project rests on an integrated approach to implementation. Editors Mary Doyle and Cynthia Drew have successfully synthesized ma...

  1. Spatially resolved Spectroscopy of Europa’s Large-scale Compositional Units at 3-4 μm with Keck NIRSPEC

    NASA Astrophysics Data System (ADS)

    Fischer, P. D.; Brown, M. E.; Trumbo, S. K.; Hand, K. P.

    2017-01-01

    We present spatially resolved spectroscopic observations of Europa’s surface at 3-4 μm obtained with the near-infrared spectrograph and adaptive optics system on the Keck II telescope. These are the highest quality spatially resolved reflectance spectra of Europa’s surface at 3-4 μm. The observations spatially resolve Europa’s large-scale compositional units at a resolution of several hundred kilometers. The spectra show distinct features and geographic variations associated with known compositional units; in particular, large-scale leading hemisphere chaos shows a characteristic longward shift in peak reflectance near 3.7 μm compared to icy regions. These observations complement previous spectra of large-scale chaos, and can aid efforts to identify the endogenous non-ice species.

  2. A leap forward in geographic scale for forest ectomycorrhizal fungi

    Treesearch

    Filipa Cox; Nadia Barsoum; Martin I. Bidartondo; Isabella Børja; Erik Lilleskov; Lars O. Nilsson; Pasi Rautio; Kath Tubby; Lars Vesterdal

    2010-01-01

    In this letter we propose a first large-scale assessment of mycorrhizas with a European-wide network of intensively monitored forest plots as a research platform. This effort would create a qualitative and quantitative shift in mycorrhizal research by delivering the first continental-scale map of mycorrhizal fungi. Readersmay note that several excellent detailed...

  3. PTGBase: an integrated database to study tandem duplicated genes in plants.

    PubMed

    Yu, Jingyin; Ke, Tao; Tehrim, Sadia; Sun, Fengming; Liao, Boshou; Hua, Wei

    2015-01-01

    Tandem duplication is a wide-spread phenomenon in plant genomes and plays significant roles in evolution and adaptation to changing environments. Tandem duplicated genes related to certain functions will lead to the expansion of gene families and bring increase of gene dosage in the form of gene cluster arrays. Many tandem duplication events have been studied in plant genomes; yet, there is a surprising shortage of efforts to systematically present the integration of large amounts of information about publicly deposited tandem duplicated gene data across the plant kingdom. To address this shortcoming, we developed the first plant tandem duplicated genes database, PTGBase. It delivers the most comprehensive resource available to date, spanning 39 plant genomes, including model species and newly sequenced species alike. Across these genomes, 54 130 tandem duplicated gene clusters (129 652 genes) are presented in the database. Each tandem array, as well as its member genes, is characterized in complete detail. Tandem duplicated genes in PTGBase can be explored through browsing or searching by identifiers or keywords of functional annotation and sequence similarity. Users can download tandem duplicated gene arrays easily to any scale, up to the complete annotation data set for an entire plant genome. PTGBase will be updated regularly with newly sequenced plant species as they become available. © The Author(s) 2015. Published by Oxford University Press.

  4. Bacterial natural product biosynthetic domain composition in soil correlates with changes in latitude on a continent-wide scale.

    PubMed

    Lemetre, Christophe; Maniko, Jeffrey; Charlop-Powers, Zachary; Sparrow, Ben; Lowe, Andrew J; Brady, Sean F

    2017-10-31

    Although bacterial bioactive metabolites have been one of the most prolific sources of lead structures for the development of small-molecule therapeutics, very little is known about the environmental factors associated with changes in secondary metabolism across natural environments. Large-scale sequencing of environmental microbiomes has the potential to shed light on the richness of bacterial biosynthetic diversity hidden in the environment, how it varies from one environment to the next, and what environmental factors correlate with changes in biosynthetic diversity. In this study, the sequencing of PCR amplicons generated using primers targeting either ketosynthase domains from polyketide biosynthesis or adenylation domains from nonribosomal peptide biosynthesis was used to assess biosynthetic domain composition and richness in soils collected across the Australian continent. Using environmental variables collected at each soil site, we looked for environmental factors that correlated with either high overall domain richness or changes in the domain composition. Among the environmental variables we measured, changes in biosynthetic domain composition correlate most closely with changes in latitude and to a lesser extent changes in pH. Although it is unclear at this time the exact mix of factors that may drive the relationship between biosynthetic domain composition and latitude, from a practical perspective the identification of a latitudinal basis for differences in soil metagenome biosynthetic domain compositions should help guide future natural product discovery efforts. Published under the PNAS license.

  5. An Open, Large-Scale, Collaborative Effort to Estimate the Reproducibility of Psychological Science.

    PubMed

    2012-11-01

    Reproducibility is a defining feature of science. However, because of strong incentives for innovation and weak incentives for confirmation, direct replication is rarely practiced or published. The Reproducibility Project is an open, large-scale, collaborative effort to systematically examine the rate and predictors of reproducibility in psychological science. So far, 72 volunteer researchers from 41 institutions have organized to openly and transparently replicate studies published in three prominent psychological journals in 2008. Multiple methods will be used to evaluate the findings, calculate an empirical rate of replication, and investigate factors that predict reproducibility. Whatever the result, a better understanding of reproducibility will ultimately improve confidence in scientific methodology and findings. © The Author(s) 2012.

  6. Understanding middle managers' influence in implementing patient safety culture.

    PubMed

    Gutberg, Jennifer; Berta, Whitney

    2017-08-22

    The past fifteen years have been marked by large-scale change efforts undertaken by healthcare organizations to improve patient safety and patient-centered care. Despite substantial investment of effort and resources, many of these large-scale or "radical change" initiatives, like those in other industries, have enjoyed limited success - with practice and behavioural changes neither fully adopted nor ultimately sustained - which has in large part been ascribed to inadequate implementation efforts. Culture change to "patient safety culture" (PSC) is among these radical change initiatives, where results to date have been mixed at best. This paper responds to calls for research that focus on explicating factors that affect efforts to implement radical change in healthcare contexts, and focuses on PSC as the radical change implementation. Specifically, this paper offers a novel conceptual model based on Organizational Learning Theory to explain the ability of middle managers in healthcare organizations to influence patient safety culture change. We propose that middle managers can capitalize on their unique position between upper and lower levels in the organization and engage in 'ambidextrous' learning that is critical to implementing and sustaining radical change. This organizational learning perspective offers an innovative way of framing the mid-level managers' role, through both explorative and exploitative activities, which further considers the necessary organizational context in which they operate.

  7. Large-Scale Comparative Phenotypic and Genomic Analyses Reveal Ecological Preferences of Shewanella Species and Identify Metabolic Pathways Conserved at the Genus Level ▿ †

    PubMed Central

    Rodrigues, Jorge L. M.; Serres, Margrethe H.; Tiedje, James M.

    2011-01-01

    The use of comparative genomics for the study of different microbiological species has increased substantially as sequence technologies become more affordable. However, efforts to fully link a genotype to its phenotype remain limited to the development of one mutant at a time. In this study, we provided a high-throughput alternative to this limiting step by coupling comparative genomics to the use of phenotype arrays for five sequenced Shewanella strains. Positive phenotypes were obtained for 441 nutrients (C, N, P, and S sources), with N-based compounds being the most utilized for all strains. Many genes and pathways predicted by genome analyses were confirmed with the comparative phenotype assay, and three degradation pathways believed to be missing in Shewanella were confirmed as missing. A number of previously unknown gene products were predicted to be parts of pathways or to have a function, expanding the number of gene targets for future genetic analyses. Ecologically, the comparative high-throughput phenotype analysis provided insights into niche specialization among the five different strains. For example, Shewanella amazonensis strain SB2B, isolated from the Amazon River delta, was capable of utilizing 60 C compounds, whereas Shewanella sp. strain W3-18-1, isolated from deep marine sediment, utilized only 25 of them. In spite of the large number of nutrient sources yielding positive results, our study indicated that except for the N sources, they were not sufficiently informative to predict growth phenotypes from increasing evolutionary distances. Our results indicate the importance of phenotypic evaluation for confirming genome predictions. This strategy will accelerate the functional discovery of genes and provide an ecological framework for microbial genome sequencing projects. PMID:21642407

  8. Swept-Wing Ice Accretion Characterization and Aerodynamics

    NASA Technical Reports Server (NTRS)

    Broeren, Andy P.; Potapczuk, Mark G.; Riley, James T.; Villedieu, Philippe; Moens, Frederic; Bragg, Michael B.

    2013-01-01

    NASA, FAA, ONERA, the University of Illinois and Boeing have embarked on a significant, collaborative research effort to address the technical challenges associated with icing on large-scale, three-dimensional swept wings. The overall goal is to improve the fidelity of experimental and computational simulation methods for swept-wing ice accretion formation and resulting aerodynamic effect. A seven-phase research effort has been designed that incorporates ice-accretion and aerodynamic experiments and computational simulations. As the baseline, full-scale, swept-wing-reference geometry, this research will utilize the 65% scale Common Research Model configuration. Ice-accretion testing will be conducted in the NASA Icing Research Tunnel for three hybrid swept-wing models representing the 20%, 64% and 83% semispan stations of the baseline-reference wing. Three-dimensional measurement techniques are being developed and validated to document the experimental ice-accretion geometries. Artificial ice shapes of varying geometric fidelity will be developed for aerodynamic testing over a large Reynolds number range in the ONERA F1 pressurized wind tunnel and in a smaller-scale atmospheric wind tunnel. Concurrent research will be conducted to explore and further develop the use of computational simulation tools for ice accretion and aerodynamics on swept wings. The combined results of this research effort will result in an improved understanding of the ice formation and aerodynamic effects on swept wings. The purpose of this paper is to describe this research effort in more detail and report on the current results and status to date. 1

  9. Swept-Wing Ice Accretion Characterization and Aerodynamics

    NASA Technical Reports Server (NTRS)

    Broeren, Andy P.; Potapczuk, Mark G.; Riley, James T.; Villedieu, Philippe; Moens, Frederic; Bragg, Michael B.

    2013-01-01

    NASA, FAA, ONERA, the University of Illinois and Boeing have embarked on a significant, collaborative research effort to address the technical challenges associated with icing on large-scale, three-dimensional swept wings. The overall goal is to improve the fidelity of experimental and computational simulation methods for swept-wing ice accretion formation and resulting aerodynamic effect. A seven-phase research effort has been designed that incorporates ice-accretion and aerodynamic experiments and computational simulations. As the baseline, full-scale, swept-wing-reference geometry, this research will utilize the 65 percent scale Common Research Model configuration. Ice-accretion testing will be conducted in the NASA Icing Research Tunnel for three hybrid swept-wing models representing the 20, 64 and 83 percent semispan stations of the baseline-reference wing. Threedimensional measurement techniques are being developed and validated to document the experimental ice-accretion geometries. Artificial ice shapes of varying geometric fidelity will be developed for aerodynamic testing over a large Reynolds number range in the ONERA F1 pressurized wind tunnel and in a smaller-scale atmospheric wind tunnel. Concurrent research will be conducted to explore and further develop the use of computational simulation tools for ice accretion and aerodynamics on swept wings. The combined results of this research effort will result in an improved understanding of the ice formation and aerodynamic effects on swept wings. The purpose of this paper is to describe this research effort in more detail and report on the current results and status to date.

  10. Final Report Scalable Analysis Methods and In Situ Infrastructure for Extreme Scale Knowledge Discovery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    O'Leary, Patrick

    The primary challenge motivating this project is the widening gap between the ability to compute information and to store it for subsequent analysis. This gap adversely impacts science code teams, who can perform analysis only on a small fraction of the data they calculate, resulting in the substantial likelihood of lost or missed science, when results are computed but not analyzed. Our approach is to perform as much analysis or visualization processing on data while it is still resident in memory, which is known as in situ processing. The idea in situ processing was not new at the time ofmore » the start of this effort in 2014, but efforts in that space were largely ad hoc, and there was no concerted effort within the research community that aimed to foster production-quality software tools suitable for use by Department of Energy (DOE) science projects. Our objective was to produce and enable the use of production-quality in situ methods and infrastructure, at scale, on DOE high-performance computing (HPC) facilities, though we expected to have an impact beyond DOE due to the widespread nature of the challenges, which affect virtually all large-scale computational science efforts. To achieve this objective, we engaged in software technology research and development (R&D), in close partnerships with DOE science code teams, to produce software technologies that were shown to run efficiently at scale on DOE HPC platforms.« less

  11. Sequence Determines Degree of Knottedness in a Coarse-Grained Protein Model

    NASA Astrophysics Data System (ADS)

    Wüst, Thomas; Reith, Daniel; Virnau, Peter

    2015-01-01

    Knots are abundant in globular homopolymers but rare in globular proteins. To shed new light on this long-standing conundrum, we study the influence of sequence on the formation of knots in proteins under native conditions within the framework of the hydrophobic-polar lattice protein model. By employing large-scale Wang-Landau simulations combined with suitable Monte Carlo trial moves we show that even though knots are still abundant on average, sequence introduces large variability in the degree of self-entanglements. Moreover, we are able to design sequences which are either almost always or almost never knotted. Our findings serve as proof of concept that the introduction of just one additional degree of freedom per monomer (in our case sequence) facilitates evolution towards a protein universe in which knots are rare.

  12. Techniques for Large-Scale Bacterial Genome Manipulation and Characterization of the Mutants with Respect to In Silico Metabolic Reconstructions.

    PubMed

    diCenzo, George C; Finan, Turlough M

    2018-01-01

    The rate at which all genes within a bacterial genome can be identified far exceeds the ability to characterize these genes. To assist in associating genes with cellular functions, a large-scale bacterial genome deletion approach can be employed to rapidly screen tens to thousands of genes for desired phenotypes. Here, we provide a detailed protocol for the generation of deletions of large segments of bacterial genomes that relies on the activity of a site-specific recombinase. In this procedure, two recombinase recognition target sequences are introduced into known positions of a bacterial genome through single cross-over plasmid integration. Subsequent expression of the site-specific recombinase mediates recombination between the two target sequences, resulting in the excision of the intervening region and its loss from the genome. We further illustrate how this deletion system can be readily adapted to function as a large-scale in vivo cloning procedure, in which the region excised from the genome is captured as a replicative plasmid. We next provide a procedure for the metabolic analysis of bacterial large-scale genome deletion mutants using the Biolog Phenotype MicroArray™ system. Finally, a pipeline is described, and a sample Matlab script is provided, for the integration of the obtained data with a draft metabolic reconstruction for the refinement of the reactions and gene-protein-reaction relationships in a metabolic reconstruction.

  13. Diversity analysis in Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers.

    PubMed

    Gao, Chunsheng; Xin, Pengfei; Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis.

  14. Diversity Analysis in Cannabis sativa Based on Large-Scale Development of Expressed Sequence Tag-Derived Simple Sequence Repeat Markers

    PubMed Central

    Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis. PMID:25329551

  15. Fabrication of the HIAD Large-Scale Demonstration Assembly and Upcoming Mission Applications

    NASA Technical Reports Server (NTRS)

    Swanson, G. T.; Johnson, R. K.; Hughes, S. J.; Dinonno, J. M.; Cheatwood, F M.

    2017-01-01

    Over a decade of work has been conducted in the development of NASAs Hypersonic Inflatable Aerodynamic Decelerator (HIAD) technology. This effort has included multiple ground test campaigns and flight tests culminating in the HIAD projects second generation (Gen-2) deployable aeroshell system and associated analytical tools. NASAs HIAD project team has developed, fabricated, and tested inflatable structures (IS) integrated with flexible thermal protection system (F-TPS), ranging in diameters from 3-6m, with cone angles of 60 and 70 deg.In 2015, United Launch Alliance (ULA) announced that they will use a HIAD (10-12m) as part of their Sensible, Modular, Autonomous Return Technology (SMART) for their upcoming Vulcan rocket. ULA expects SMART reusability, coupled with other advancements for Vulcan, will substantially reduce the cost of access to space. The first booster engine recovery via HIAD is scheduled for 2024. To meet this near-term need, as well as future NASA applications, the HIAD team is investigating taking the technology to the 10-15m diameter scale.In the last year, many significant development and fabrication efforts have been accomplished, culminating in the construction of a large-scale inflatable structure demonstration assembly. This assembly incorporated the first three tori for a 12m Mars Human-Scale Pathfinder HIAD conceptual design that was constructed with the current state of the art material set. Numerous design trades and torus fabrication demonstrations preceded this effort. In 2016, three large-scale tori (0.61m cross-section) and six subscale tori (0.25m cross-section) were manufactured to demonstrate fabrication techniques using the newest candidate material sets. These tori were tested to evaluate durability and load capacity. This work led to the selection of the inflatable structures third generation (Gen-3) structural liner. In late 2016, the three tori required for the large-scale demonstration assembly were fabricated, and then integrated in early 2017. The design includes provisions to add the remaining four tori necessary to complete the assembly of the 12m Human-Scale Pathfinder HIAD in the event future project funding becomes available.This presentation will discuss the HIAD large-scale demonstration assembly design and fabrication per-formed in the last year including the precursor tori development and the partial-stack fabrication. Potential near-term and future 10-15m HIAD applications will also be discussed.

  16. Fabrication of the HIAD Large-Scale Demonstration Assembly

    NASA Technical Reports Server (NTRS)

    Swanson, G. T.; Johnson, R. K.; Hughes, S. J.; DiNonno, J. M.; Cheatwood, F. M.

    2017-01-01

    Over a decade of work has been conducted in the development of NASA's Hypersonic Inflatable Aerodynamic Decelerator (HIAD) technology. This effort has included multiple ground test campaigns and flight tests culminating in the HIAD projects second generation (Gen-2) deployable aeroshell system and associated analytical tools. NASAs HIAD project team has developed, fabricated, and tested inflatable structures (IS) integrated with flexible thermal protection system (F-TPS), ranging in diameters from 3-6m, with cone angles of 60 and 70 deg.In 2015, United Launch Alliance (ULA) announced that they will use a HIAD (10-12m) as part of their Sensible, Modular, Autonomous Return Technology (SMART) for their upcoming Vulcan rocket. ULA expects SMART reusability, coupled with other advancements for Vulcan, will substantially reduce the cost of access to space. The first booster engine recovery via HIAD is scheduled for 2024. To meet this near-term need, as well as future NASA applications, the HIAD team is investigating taking the technology to the 10-15m diameter scale. In the last year, many significant development and fabrication efforts have been accomplished, culminating in the construction of a large-scale inflatable structure demonstration assembly. This assembly incorporated the first three tori for a 12m Mars Human-Scale Pathfinder HIAD conceptual design that was constructed with the current state of the art material set. Numerous design trades and torus fabrication demonstrations preceded this effort. In 2016, three large-scale tori (0.61m cross-section) and six subscale tori (0.25m cross-section) were manufactured to demonstrate fabrication techniques using the newest candidate material sets. These tori were tested to evaluate durability and load capacity. This work led to the selection of the inflatable structures third generation (Gen-3) structural liner. In late 2016, the three tori required for the large-scale demonstration assembly were fabricated, and then integrated in early 2017. The design includes provisions to add the remaining four tori necessary to complete the assembly of the 12m Human-Scale Pathfinder HIAD in the event future project funding becomes available.This presentation will discuss the HIAD large-scale demonstration assembly design and fabrication per-formed in the last year including the precursor tori development and the partial-stack fabrication. Potential near-term and future 10-15m HIAD applications will also be discussed.

  17. Search for 5'-leader regulatory RNA structures based on gene annotation aided by the RiboGap database.

    PubMed

    Naghdi, Mohammad Reza; Smail, Katia; Wang, Joy X; Wade, Fallou; Breaker, Ronald R; Perreault, Jonathan

    2017-03-15

    The discovery of noncoding RNAs (ncRNAs) and their importance for gene regulation led us to develop bioinformatics tools to pursue the discovery of novel ncRNAs. Finding ncRNAs de novo is challenging, first due to the difficulty of retrieving large numbers of sequences for given gene activities, and second due to exponential demands on calculation needed for comparative genomics on a large scale. Recently, several tools for the prediction of conserved RNA secondary structure were developed, but many of them are not designed to uncover new ncRNAs, or are too slow for conducting analyses on a large scale. Here we present various approaches using the database RiboGap as a primary tool for finding known ncRNAs and for uncovering simple sequence motifs with regulatory roles. This database also can be used to easily extract intergenic sequences of eubacteria and archaea to find conserved RNA structures upstream of given genes. We also show how to extend analysis further to choose the best candidate ncRNAs for experimental validation. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. RNA sequencing demonstrates large-scale temporal dysregulation of gene expression in stimulated macrophages derived from MHC-defined chicken haplotypes.

    PubMed

    Irizarry, Kristopher J L; Downs, Eileen; Bryden, Randall; Clark, Jory; Griggs, Lisa; Kopulos, Renee; Boettger, Cynthia M; Carr, Thomas J; Keeler, Calvin L; Collisson, Ellen; Drechsler, Yvonne

    2017-01-01

    Discovering genetic biomarkers associated with disease resistance and enhanced immunity is critical to developing advanced strategies for controlling viral and bacterial infections in different species. Macrophages, important cells of innate immunity, are directly involved in cellular interactions with pathogens, the release of cytokines activating other immune cells and antigen presentation to cells of the adaptive immune response. IFNγ is a potent activator of macrophages and increased production has been associated with disease resistance in several species. This study characterizes the molecular basis for dramatically different nitric oxide production and immune function between the B2 and the B19 haplotype chicken macrophages.A large-scale RNA sequencing approach was employed to sequence the RNA of purified macrophages from each haplotype group (B2 vs. B19) during differentiation and after stimulation. Our results demonstrate that a large number of genes exhibit divergent expression between B2 and B19 haplotype cells both prior and after stimulation. These differences in gene expression appear to be regulated by complex epigenetic mechanisms that need further investigation.

  19. Next-Generation Sequencing: The Translational Medicine Approach from “Bench to Bedside to Population”

    PubMed Central

    Beigh, Mohammad Muzafar

    2016-01-01

    Humans have predicted the relationship between heredity and diseases for a long time. Only in the beginning of the last century, scientists begin to discover the connotations between different genes and disease phenotypes. Recent trends in next-generation sequencing (NGS) technologies have brought a great momentum in biomedical research that in turn has remarkably augmented our basic understanding of human biology and its associated diseases. State-of-the-art next generation biotechnologies have started making huge strides in our current understanding of mechanisms of various chronic illnesses like cancers, metabolic disorders, neurodegenerative anomalies, etc. We are experiencing a renaissance in biomedical research primarily driven by next generation biotechnologies like genomics, transcriptomics, proteomics, metabolomics, lipidomics etc. Although genomic discoveries are at the forefront of next generation omics technologies, however, their implementation into clinical arena had been painstakingly slow mainly because of high reaction costs and unavailability of requisite computational tools for large-scale data analysis. However rapid innovations and steadily lowering cost of sequence-based chemistries along with the development of advanced bioinformatics tools have lately prompted launching and implementation of large-scale massively parallel genome sequencing programs in different fields ranging from medical genetics, infectious biology, agriculture sciences etc. Recent advances in large-scale omics-technologies is bringing healthcare research beyond the traditional “bench to bedside” approach to more of a continuum that will include improvements, in public healthcare and will be primarily based on predictive, preventive, personalized, and participatory medicine approach (P4). Recent large-scale research projects in genetic and infectious disease biology have indicated that massively parallel whole-genome/whole-exome sequencing, transcriptome analysis, and other functional genomic tools can reveal large number of unique functional elements and/or markers that otherwise would be undetected by traditional sequencing methodologies. Therefore, latest trends in the biomedical research is giving birth to the new branch in medicine commonly referred to as personalized and/or precision medicine. Developments in the post-genomic era are believed to completely restructure the present clinical pattern of disease prevention and treatment as well as methods of diagnosis and prognosis. The next important step in the direction of the precision/personalized medicine approach should be its early adoption in clinics for future medical interventions. Consequently, in coming year’s next generation biotechnologies will reorient medical practice more towards disease prediction and prevention approaches rather than curing them at later stages of their development and progression, even at wider population level(s) for general public healthcare system. PMID:28930123

  20. Environmental DNA sequencing primers for eutardigrades and bdelloid rotifers

    PubMed Central

    2009-01-01

    Background The time it takes to isolate individuals from environmental samples and then extract DNA from each individual is one of the problems with generating molecular data from meiofauna such as eutardigrades and bdelloid rotifers. The lack of consistent morphological information and the extreme abundance of these classes makes morphological identification of rare, or even common cryptic taxa a large and unwieldy task. This limits the ability to perform large-scale surveys of the diversity of these organisms. Here we demonstrate a culture-independent molecular survey approach that enables the generation of large amounts of eutardigrade and bdelloid rotifer sequence data directly from soil. Our PCR primers, specific to the 18s small-subunit rRNA gene, were developed for both eutardigrades and bdelloid rotifers. Results The developed primers successfully amplified DNA of their target organism from various soil DNA extracts. This was confirmed by both the BLAST similarity searches and phylogenetic analyses. Tardigrades showed much better phylogenetic resolution than bdelloids. Both groups of organisms exhibited varying levels of endemism. Conclusion The development of clade-specific primers for characterizing eutardigrades and bdelloid rotifers from environmental samples should greatly increase our ability to characterize the composition of these taxa in environmental samples. Environmental sequencing as shown here differs from other molecular survey methods in that there is no need to pre-isolate the organisms of interest from soil in order to amplify their DNA. The DNA sequences obtained from methods that do not require culturing can be identified post-hoc and placed phylogenetically as additional closely related sequences are obtained from morphologically identified conspecifics. Our non-cultured environmental sequence based approach will be able to provide a rapid and large-scale screening of the presence, absence and diversity of Bdelloidea and Eutardigrada in a variety of soils. PMID:20003362

  1. Test and Analysis Correlation of a Large-Scale, Orthogrid-Stiffened Metallic Cylinder without Weld Lands

    NASA Technical Reports Server (NTRS)

    Rudd, Michelle T.; Hilburger, Mark W.; Lovejoy, Andrew E.; Lindell, Michael C.; Gardner, Nathaniel W.; Schultz, Marc R.

    2018-01-01

    The NASA Engineering Safety Center (NESC) Shell Buckling Knockdown Factor Project (SBKF) was established in 2007 by the NESC with the primary objective to develop analysis-based buckling design factors and guidelines for metallic and composite launch-vehicle structures.1 A secondary objective of the project is to advance technologies that have the potential to increase the structural efficiency of launch-vehicles. The SBKF Project has determined that weld-land stiffness discontinuities can significantly reduce the buckling load of a cylinder. In addition, the welding process can introduce localized geometric imperfections that can further exacerbate the inherent buckling imperfection sensitivity of the cylinder. Therefore, single-piece barrel fabrication technologies can improve structural efficiency by eliminating these weld-land issues. As part of this effort, SBKF partnered with the Advanced Materials and Processing Branch (AMPB) at NASA Langley Research Center (LaRC), the Mechanical and Fabrication Branch at NASA Marshall Space Flight Center (MSFC), and ATI Forged Products to design and fabricate an 8-ft-diameter orthogrid-stiffened seamless metallic cylinder. The cylinder was subjected to seven subcritical load sequences (load levels that are not intended to induce test article buckling or material failure) and one load sequence to failure. The purpose of this test effort was to demonstrate the potential benefits of building cylindrical structures with no weld lands using the flow-formed manufacturing process. This seamless barrel is the ninth 8-ft-diameter metallic barrel and the first single-piece metallic structure to be tested under this program.

  2. Revealing Less Derived Nature of Cartilaginous Fish Genomes with Their Evolutionary Time Scale Inferred with Nuclear Genes

    PubMed Central

    Renz, Adina J.; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon. PMID:23825540

  3. Revealing less derived nature of cartilaginous fish genomes with their evolutionary time scale inferred with nuclear genes.

    PubMed

    Renz, Adina J; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon.

  4. The Nobel Prize as a Reward Mechanism in the Genomics Era: Anonymous Researchers, Visible Managers and the Ethics of Excellence

    PubMed Central

    2010-01-01

    The Human Genome Project (HGP) is regarded by many as one of the major scientific achievements in recent science history, a large-scale endeavour that is changing the way in which biomedical research is done and expected, moreover, to yield considerable benefit for society. Thus, since the completion of the human genome sequencing effort, a debate has emerged over the question whether this effort merits to be awarded a Nobel Prize and if so, who should be the one(s) to receive it, as (according to current procedures) no more than three individuals can be selected. In this article, the HGP is taken as a case study to consider the ethical question to what extent it is still possible, in an era of big science, of large-scale consortia and global team work, to acknowledge and reward individual contributions to important breakthroughs in biomedical fields. Is it still viable to single out individuals for their decisive contributions in order to reward them in a fair and convincing way? Whereas the concept of the Nobel prize as such seems to reflect an archetypical view of scientists as solitary researchers who, at a certain point in their careers, make their one decisive discovery, this vision has proven to be problematic from the very outset. Already during the first decade of the Nobel era, Ivan Pavlov was denied the Prize several times before finally receiving it, on the basis of the argument that he had been active as a research manager (a designer and supervisor of research projects) rather than as a researcher himself. The question then is whether, in the case of the HGP, a research effort that involved the contributions of hundreds or even thousands of researchers worldwide, it is still possible to “individualise” the Prize? The “HGP Nobel Prize problem” is regarded as an exemplary issue in current research ethics, highlighting a number of quandaries and trends involved in contemporary life science research practices more broadly. PMID:20730106

  5. Characterization of the Kenaf (Hibiscus cannabinus) Global Transcriptome Using Illumina Paired-End Sequencing and Development of EST-SSR Markers

    PubMed Central

    Li, Hui; Li, Defang; Chen, Anguo; Tang, Huijuan; Li, Jianjun; Huang, Siqi

    2016-01-01

    Kenaf (Hibiscus cannabinus L.) is an economically important natural fiber crop grown worldwide. However, only 20 expressed tag sequences (ESTs) for kenaf are available in public databases. The aim of this study was to develop large-scale simple sequence repeat (SSR) markers to lay a solid foundation for the construction of genetic linkage maps and marker-assisted breeding in kenaf. We used Illumina paired-end sequencing technology to generate new EST-simple sequences and MISA software to mine SSR markers. We identified 71,318 unigenes with an average length of 1143 nt and annotated these unigenes using four different protein databases. Overall, 9324 complementary pairs were designated as EST-SSR markers, and their quality was validated using 100 randomly selected SSR markers. In total, 72 primer pairs reproducibly amplified target amplicons, and 61 of these primer pairs detected significant polymorphism among 28 kenaf accessions. Thus, in this study, we have developed large-scale SSR markers for kenaf, and this new resource will facilitate construction of genetic linkage maps, investigation of fiber growth and development in kenaf, and also be of value to novel gene discovery and functional genomic studies. PMID:26960153

  6. Consolidation of glycosyl hydrolase family 30 : a dual domain 4/7 hydrolase family consisting of two structurally distinct groups

    Treesearch

    Franz J. St John; Javier M. Gonzalez; Edwin Pozharski

    2010-01-01

    In this work glycosyl hydrolase (GH) family 30 (GH30) is analyzed and shown to consist of its currently classified member sequences as well as several homologous sequence groups currently assigned within family GH5. A large scale amino acid sequence alignment and a phylogenetic tree were generated and GH30 groups and subgroups were designated. A partial rearrangement...

  7. Draft Genome Sequences of 510 Listeria monocytogenes Strains from Food Isolates and Human Listeriosis Cases from Northern Italy.

    PubMed

    Lomonaco, Sara; Gallina, Silvia; Filipello, Virginia; Sanchez Leon, Maria; Kastanis, George John; Allard, Marc; Brown, Eric; Amato, Ettore; Pontello, Mirella; Decastelli, Lucia

    2018-01-18

    Listeriosis outbreaks are frequently multistate/multicountry outbreaks, underlining the importance of molecular typing data for several diverse and well-characterized isolates. Large-scale whole-genome sequencing studies on Listeria monocytogenes isolates from non-U.S. locations have been limited. Herein, we describe the draft genome sequences of 510 L. monocytogenes isolates from northern Italy from different sources.

  8. XPAT: a toolkit to conduct cross-platform association studies with heterogeneous sequencing datasets.

    PubMed

    Yu, Yao; Hu, Hao; Bohlender, Ryan J; Hu, Fulan; Chen, Jiun-Sheng; Holt, Carson; Fowler, Jerry; Guthery, Stephen L; Scheet, Paul; Hildebrandt, Michelle A T; Yandell, Mark; Huff, Chad D

    2018-04-06

    High-throughput sequencing data are increasingly being made available to the research community for secondary analyses, providing new opportunities for large-scale association studies. However, heterogeneity in target capture and sequencing technologies often introduce strong technological stratification biases that overwhelm subtle signals of association in studies of complex traits. Here, we introduce the Cross-Platform Association Toolkit, XPAT, which provides a suite of tools designed to support and conduct large-scale association studies with heterogeneous sequencing datasets. XPAT includes tools to support cross-platform aware variant calling, quality control filtering, gene-based association testing and rare variant effect size estimation. To evaluate the performance of XPAT, we conducted case-control association studies for three diseases, including 783 breast cancer cases, 272 ovarian cancer cases, 205 Crohn disease cases and 3507 shared controls (including 1722 females) using sequencing data from multiple sources. XPAT greatly reduced Type I error inflation in the case-control analyses, while replicating many previously identified disease-gene associations. We also show that association tests conducted with XPAT using cross-platform data have comparable performance to tests using matched platform data. XPAT enables new association studies that combine existing sequencing datasets to identify genetic loci associated with common diseases and other complex traits.

  9. Lessons from SMD experience with approaches to the evaluation of fare changes

    DOT National Transportation Integrated Search

    1980-01-01

    Over the past several years UMTA's Service and Methods Demonstration Program (SMD) has undertaken a large number of studies of the effects of fare changes, both increases and decreases. Some of these studies have been large scale efforts directed at ...

  10. The global catalogue of microorganisms 10K type strain sequencing project: closing the genomic gaps for the validly published prokaryotic and fungi species.

    PubMed

    Wu, Linhuan; McCluskey, Kevin; Desmeth, Philippe; Liu, Shuangjiang; Hideaki, Sugawara; Yin, Ye; Moriya, Ohkuma; Itoh, Takashi; Kim, Cha Young; Lee, Jung-Sook; Zhou, Yuguang; Kawasaki, Hiroko; Hazbón, Manzour Hernando; Robert, Vincent; Boekhout, Teun; Lima, Nelson; Evtushenko, Lyudmila; Boundy-Mills, Kyria; Bunk, Boyke; Moore, Edward R B; Eurwilaichitr, Lily; Ingsriswang, Supawadee; Shah, Heena; Yao, Su; Jin, Tao; Huang, Jinqun; Shi, Wenyu; Sun, Qinglan; Fan, Guomei; Li, Wei; Li, Xian; Kurtböke, Ipek; Ma, Juncai

    2018-05-01

    Genomic information is essential for taxonomic, phylogenetic, and functional studies to comprehensively decipher the characteristics of microorganisms, to explore microbiomes through metagenomics, and to answer fundamental questions of nature and human life. However, large gaps remain in the available genomic sequencing information published for bacterial and archaeal species, and the gaps are even larger for fungal type strains. The Global Catalogue of Microorganisms (GCM) leads an internationally coordinated effort to sequence type strains and close gaps in the genomic maps of microorganisms. Hence, the GCM aims to promote research by deep-mining genomic data.

  11. Targeted Capture and High-Throughput Sequencing Using Molecular Inversion Probes (MIPs).

    PubMed

    Cantsilieris, Stuart; Stessman, Holly A; Shendure, Jay; Eichler, Evan E

    2017-01-01

    Molecular inversion probes (MIPs) in combination with massively parallel DNA sequencing represent a versatile, yet economical tool for targeted sequencing of genomic DNA. Several thousand genomic targets can be selectively captured using long oligonucleotides containing unique targeting arms and universal linkers. The ability to append sequencing adaptors and sample-specific barcodes allows large-scale pooling and subsequent high-throughput sequencing at relatively low cost per sample. Here, we describe a "wet bench" protocol detailing the capture and subsequent sequencing of >2000 genomic targets from 192 samples, representative of a single lane on the Illumina HiSeq 2000 platform.

  12. Draft De Novo Transcriptome of the Rat Kangaroo Potorous tridactylus as a Tool for Cell Biology

    PubMed Central

    Udy, Dylan B.; Voorhies, Mark; Chan, Patricia P.; Lowe, Todd M.; Dumont, Sophie

    2015-01-01

    The rat kangaroo (long-nosed potoroo, Potorous tridactylus) is a marsupial native to Australia. Cultured rat kangaroo kidney epithelial cells (PtK) are commonly used to study cell biological processes. These mammalian cells are large, adherent, and flat, and contain large and few chromosomes—and are thus ideal for imaging intra-cellular dynamics such as those of mitosis. Despite this, neither the rat kangaroo genome nor transcriptome have been sequenced, creating a challenge for probing the molecular basis of these cellular dynamics. Here, we present the sequencing, assembly and annotation of the draft rat kangaroo de novo transcriptome. We sequenced 679 million reads that mapped to 347,323 Trinity transcripts and 20,079 Unigenes. We present statistics emerging from transcriptome-wide analyses, and analyses suggesting that the transcriptome covers full-length sequences of most genes, many with multiple isoforms. We also validate our findings with a proof-of-concept gene knockdown experiment. We expect that this high quality transcriptome will make rat kangaroo cells a more tractable system for linking molecular-scale function and cellular-scale dynamics. PMID:26252667

  13. Draft De Novo Transcriptome of the Rat Kangaroo Potorous tridactylus as a Tool for Cell Biology.

    PubMed

    Udy, Dylan B; Voorhies, Mark; Chan, Patricia P; Lowe, Todd M; Dumont, Sophie

    2015-01-01

    The rat kangaroo (long-nosed potoroo, Potorous tridactylus) is a marsupial native to Australia. Cultured rat kangaroo kidney epithelial cells (PtK) are commonly used to study cell biological processes. These mammalian cells are large, adherent, and flat, and contain large and few chromosomes-and are thus ideal for imaging intra-cellular dynamics such as those of mitosis. Despite this, neither the rat kangaroo genome nor transcriptome have been sequenced, creating a challenge for probing the molecular basis of these cellular dynamics. Here, we present the sequencing, assembly and annotation of the draft rat kangaroo de novo transcriptome. We sequenced 679 million reads that mapped to 347,323 Trinity transcripts and 20,079 Unigenes. We present statistics emerging from transcriptome-wide analyses, and analyses suggesting that the transcriptome covers full-length sequences of most genes, many with multiple isoforms. We also validate our findings with a proof-of-concept gene knockdown experiment. We expect that this high quality transcriptome will make rat kangaroo cells a more tractable system for linking molecular-scale function and cellular-scale dynamics.

  14. Eastern Equine Encephalitis Virus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Borucki, M.

    2010-08-05

    Eastern equine encephalitis virus (EEEV) is a mosquito-borne virus capable of causing large outbreaks of encephalitis in humans and horses. In North America, EEEV infection has a very high mortality rate in humans, and survivors often suffer severe neurological sequelae. Interestingly, EEEV infections from South American isolates are generally subclinical. Although EEEV is divided into two antigenic varieties and four lineages, only eleven isolates have been sequenced and eight of these are from the North American variety (Lineage I). Most sequenced strains were collected from mosquitoes and only one human isolate has been sequenced. EEEV isolates exist from a varietymore » of hosts, vectors, years, and geographical locations and efforts should focus on sequencing strains that represent this diversity.« less

  15. [New hosts and vectors for genome cloning]. Progress report, 1990--1991

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    The main goal of our project remains the development of new bacterial hosts and vectors for the stable propagation of human DNA clones in E. coli. During the past six months of our current budget period, we have (1) continued to develop new hosts that permit the stable maintenance of unstable features of human DNA, and (2) developed a series of vectors for (a) cloning large DNA inserts, (b) assessing the frequency of human sequences that are lethal to the growth of E. coli, and (c) assessing the stability of human sequences cloned in M13 for large-scale sequencing projects.

  16. The sequence of sequencers: The history of sequencing DNA

    PubMed Central

    Heather, James M.; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. PMID:26554401

  17. Incentives and Test-Based Accountability in Education

    ERIC Educational Resources Information Center

    Hout, Michael, Ed.; Elliott, Stuart W., Ed.

    2011-01-01

    In recent years there have been increasing efforts to use accountability systems based on large-scale tests of students as a mechanism for improving student achievement. The federal No Child Left Behind Act (NCLB) is a prominent example of such an effort, but it is only the continuation of a steady trend toward greater test-based accountability in…

  18. Consequences of Normalizing Transcriptomic and Genomic Libraries of Plant Genomes Using a Duplex-Specific Nuclease and Tetramethylammonium Chloride

    PubMed Central

    Froenicke, Lutz; Lavelle, Dean; Martineau, Belinda; Perroud, Bertrand; Michelmore, Richard

    2013-01-01

    Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce. PMID:23409088

  19. Consequences of normalizing transcriptomic and genomic libraries of plant genomes using a duplex-specific nuclease and tetramethylammonium chloride.

    PubMed

    Matvienko, Marta; Kozik, Alexander; Froenicke, Lutz; Lavelle, Dean; Martineau, Belinda; Perroud, Bertrand; Michelmore, Richard

    2013-01-01

    Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce.

  20. PAT: predictor for structured units and its application for the optimization of target molecules for the generation of synthetic antibodies.

    PubMed

    Jeon, Jouhyun; Arnold, Roland; Singh, Fateh; Teyra, Joan; Braun, Tatjana; Kim, Philip M

    2016-04-01

    The identification of structured units in a protein sequence is an important first step for most biochemical studies. Importantly for this study, the identification of stable structured region is a crucial first step to generate novel synthetic antibodies. While many approaches to find domains or predict structured regions exist, important limitations remain, such as the optimization of domain boundaries and the lack of identification of non-domain structured units. Moreover, no integrated tool exists to find and optimize structural domains within protein sequences. Here, we describe a new tool, PAT ( http://www.kimlab.org/software/pat ) that can efficiently identify both domains (with optimized boundaries) and non-domain putative structured units. PAT automatically analyzes various structural properties, evaluates the folding stability, and reports possible structural domains in a given protein sequence. For reliability evaluation of PAT, we applied PAT to identify antibody target molecules based on the notion that soluble and well-defined protein secondary and tertiary structures are appropriate target molecules for synthetic antibodies. PAT is an efficient and sensitive tool to identify structured units. A performance analysis shows that PAT can characterize structurally well-defined regions in a given sequence and outperforms other efforts to define reliable boundaries of domains. Specially, PAT successfully identifies experimentally confirmed target molecules for antibody generation. PAT also offers the pre-calculated results of 20,210 human proteins to accelerate common queries. PAT can therefore help to investigate large-scale structured domains and improve the success rate for synthetic antibody generation.

  1. RNA sequencing: current and prospective uses in metabolic research.

    PubMed

    Vikman, Petter; Fadista, Joao; Oskolkov, Nikolay

    2014-10-01

    Previous global RNA analysis was restricted to known transcripts in species with a defined transcriptome. Next generation sequencing has transformed transcriptomics by making it possible to analyse expressed genes with an exon level resolution from any tissue in any species without any a priori knowledge of which genes that are being expressed, splice patterns or their nucleotide sequence. In addition, RNA sequencing is a more sensitive technique compared with microarrays with a larger dynamic range, and it also allows for investigation of imprinting and allele-specific expression. This can be done for a cost that is able to compete with that of a microarray, making RNA sequencing a technique available to most researchers. Therefore RNA sequencing has recently become the state of the art with regards to large-scale RNA investigations and has to a large extent replaced microarrays. The only drawback is the large data amounts produced, which together with the complexity of the data can make a researcher spend far more time on analysis than performing the actual experiment. © 2014 Society for Endocrinology.

  2. Large-Scale Modeling of Wordform Learning and Representation

    ERIC Educational Resources Information Center

    Sibley, Daragh E.; Kello, Christopher T.; Plaut, David C.; Elman, Jeffrey L.

    2008-01-01

    The forms of words as they appear in text and speech are central to theories and models of lexical processing. Nonetheless, current methods for simulating their learning and representation fail to approach the scale and heterogeneity of real wordform lexicons. A connectionist architecture termed the "sequence encoder" is used to learn…

  3. An Overview of NASA Efforts on Zero Boiloff Storage of Cryogenic Propellants

    NASA Technical Reports Server (NTRS)

    Hastings, Leon J.; Plachta, D. W.; Salerno, L.; Kittel, P.; Haynes, Davy (Technical Monitor)

    2001-01-01

    Future mission planning within NASA has increasingly motivated consideration of cryogenic propellant storage durations on the order of years as opposed to a few weeks or months. Furthermore, the advancement of cryocooler and passive insulation technologies in recent years has substantially improved the prospects for zero boiloff storage of cryogenics. Accordingly, a cooperative effort by NASA's Ames Research Center (ARC), Glenn Research Center (GRC), and Marshall Space Flight Center (MSFC) has been implemented to develop and demonstrate "zero boiloff" concepts for in-space storage of cryogenic propellants, particularly liquid hydrogen and oxygen. ARC is leading the development of flight-type cryocoolers, GRC the subsystem development and small scale testing, and MSFC the large scale and integrated system level testing. Thermal and fluid modeling involves a combined effort by the three Centers. Recent accomplishments include: 1) development of "zero boiloff" analytical modeling techniques for sizing the storage tankage, passive insulation, cryocooler, power source mass, and radiators; 2) an early subscale demonstration with liquid hydrogen 3) procurement of a flight-type 10 watt, 95 K pulse tube cryocooler for liquid oxygen storage and 4) assembly of a large-scale test article for an early demonstration of the integrated operation of passive insulation, destratification/pressure control, and cryocooler (commercial unit) subsystems to achieve zero boiloff storage of liquid hydrogen. Near term plans include the large-scale integrated system demonstration testing this summer, subsystem testing of the flight-type pulse-tube cryocooler with liquid nitrogen (oxygen simulant), and continued development of a flight-type liquid hydrogen pulse tube cryocooler.

  4. Detection of large-scale concentric gravity waves from a Chinese airglow imager network

    NASA Astrophysics Data System (ADS)

    Lai, Chang; Yue, Jia; Xu, Jiyao; Yuan, Wei; Li, Qinzeng; Liu, Xiao

    2018-06-01

    Concentric gravity waves (CGWs) contain a broad spectrum of horizontal wavelengths and periods due to their instantaneous localized sources (e.g., deep convection, volcanic eruptions, or earthquake, etc.). However, it is difficult to observe large-scale gravity waves of >100 km wavelength from the ground for the limited field of view of a single camera and local bad weather. Previously, complete large-scale CGW imagery could only be captured by satellite observations. In the present study, we developed a novel method that uses assembling separate images and applying low-pass filtering to obtain temporal and spatial information about complete large-scale CGWs from a network of all-sky airglow imagers. Coordinated observations from five all-sky airglow imagers in Northern China were assembled and processed to study large-scale CGWs over a wide area (1800 km × 1 400 km), focusing on the same two CGW events as Xu et al. (2015). Our algorithms yielded images of large-scale CGWs by filtering out the small-scale CGWs. The wavelengths, wave speeds, and periods of CGWs were measured from a sequence of consecutive assembled images. Overall, the assembling and low-pass filtering algorithms can expand the airglow imager network to its full capacity regarding the detection of large-scale gravity waves.

  5. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale.

    PubMed

    Liu, Siyang; Huang, Shujia; Rao, Junhua; Ye, Weijian; Krogh, Anders; Wang, Jun

    2015-01-01

    Comprehensive recognition of genomic variation in one individual is important for understanding disease and developing personalized medication and treatment. Many tools based on DNA re-sequencing exist for identification of single nucleotide polymorphisms, small insertions and deletions (indels) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction of population-scale pan-genomes. Our study also highlights the usefulness of the de novo assembly strategy for definition of genome structure.

  6. Genome-wide association study based on multiple imputation with low-depth sequencing data: application to biofuel traits in reed canarygrass

    USDA-ARS?s Scientific Manuscript database

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., P...

  7. Computerization of Library and Information Services in Mainland China.

    ERIC Educational Resources Information Center

    Lin, Sharon Chien

    1994-01-01

    Describes two phases of the automation of library and information services in mainland China. From 1974-86, much effort was concentrated on developing computer systems, databases, online retrieval, and networking. From 1986 to the present, practical progress became possible largely because of CD-ROM technology; and large scale networking for…

  8. Satellite-based peatland mapping: potential of the MODIS sensor.

    Treesearch

    D. Pflugmacher; O.N. Krankina; W.B. Cohen

    2006-01-01

    Peatlands play a major role in the global carbon cycle but are largely overlooked in current large-scale vegetation mapping efforts. In this study, we investigated the potential of the Moderate Resolution Imaging Spectroradiometer (MODIS) sensor to capture extent and distribution of peatlands in the St. Petersburg region of Russia.

  9. Deployment Pulmonary Health

    DTIC Science & Technology

    2015-02-11

    A similar risk-based approach may be appropriate for deploying military personnel. e) If DoD were to consider implementing a large- scale pre...quality of existing spirometry programs prior to considering a larger scale pre-deployment effort. Identifying an accelerated decrease in spirometry...baseline spirometry on a wider scale . e) Conduct pre-deployment baseline spirometry if there is a significant risk of exposure to a pulmonary hazard based

  10. Validity of Scores for a Developmental Writing Scale Based on Automated Scoring

    ERIC Educational Resources Information Center

    Attali, Yigal; Powers, Donald

    2009-01-01

    A developmental writing scale for timed essay-writing performance was created on the basis of automatically computed indicators of writing fluency, word choice, and conventions of standard written English. In a large-scale data collection effort that involved a national sample of more than 12,000 students from 4th, 6th, 8th, 10th, and 12th grade,…

  11. Global investigation of protein-protein interactions in yeast Saccharomyces cerevisiae using re-occurring short polypeptide sequences.

    PubMed

    Pitre, S; North, C; Alamgir, M; Jessulat, M; Chan, A; Luo, X; Green, J R; Dumontier, M; Dehne, F; Golshani, A

    2008-08-01

    Protein-protein interaction (PPI) maps provide insight into cellular biology and have received considerable attention in the post-genomic era. While large-scale experimental approaches have generated large collections of experimentally determined PPIs, technical limitations preclude certain PPIs from detection. Recently, we demonstrated that yeast PPIs can be computationally predicted using re-occurring short polypeptide sequences between known interacting protein pairs. However, the computational requirements and low specificity made this method unsuitable for large-scale investigations. Here, we report an improved approach, which exhibits a specificity of approximately 99.95% and executes 16,000 times faster. Importantly, we report the first all-to-all sequence-based computational screen of PPIs in yeast, Saccharomyces cerevisiae in which we identify 29,589 high confidence interactions of approximately 2 x 10(7) possible pairs. Of these, 14,438 PPIs have not been previously reported and may represent novel interactions. In particular, these results reveal a richer set of membrane protein interactions, not readily amenable to experimental investigations. From the novel PPIs, a novel putative protein complex comprised largely of membrane proteins was revealed. In addition, two novel gene functions were predicted and experimentally confirmed to affect the efficiency of non-homologous end-joining, providing further support for the usefulness of the identified PPIs in biological investigations.

  12. Possible roles for fronto-striatal circuits in reading disorder

    PubMed Central

    Hancock, Roeland; Richlan, Fabio; Hoeft, Fumiko

    2016-01-01

    Several studies have reported hyperactivation in frontal and striatal regions in individuals with reading disorder (RD) during reading-related tasks. Hyperactivation in these regions is typically interpreted as a form of neural compensation and related to articulatory processing. Fronto-striatal hyperactivation in RD can however, also arise from fundamental impairment in reading related processes, such as phonological processing and implicit sequence learning relevant to early language acquisition. We review current evidence for the compensation hypothesis in RD and apply large-scale reverse inference to investigate anatomical overlap between hyperactivation regions and neural systems for articulation, phonological processing, implicit sequence learning. We found anatomical convergence between hyperactivation regions and regions supporting articulation, consistent with the proposed compensatory role of these regions, and low convergence with phonological and implicit sequence learning regions. Although the application of large-scale reverse inference to decode function in a clinical population should be interpreted cautiously, our findings suggest future lines of research that may clarify the functional significance of hyperactivation in RD. PMID:27826071

  13. Collaborative approaches to the evolution of migration and the development of science-based conservation in shorebirds

    USGS Publications Warehouse

    Harrington, Brian A.; Brown, S.; Corven, James; Bart, Jonathan

    2002-01-01

    Shorebirds are among the most highly migratory creatures on earth. Both the study of their ecology and ongoing efforts to conserve their populations must reflect this central aspect of their biology. Many species of shorebirds use migration and staging sites scattered throughout the hemisphere to complete their annual migrations between breeding areas and nonbreeding habitats (Morrison 1984). The vast distances between habitats they use pose significant challenges for studying their migration ecology. At the same time, the large number of political boundaries shorebirds cross during their epic migrations create parallel challenges for organizations working on their management and conservation.Nebel et al. (2002) represent a collaborative effort to understand the conservation implications of Western Sandpiper (Calidris mauri) migration ecology on a scale worthy of this highly migratory species. The data sets involved in the analysis come from four U.S. states, two Canadian provinces, and a total of five nations. Only by collaborating on this historic scale were the authors able to assemble the information necessary to understand important aspects of the migration ecology of this species, and the implications for conservation of the patterns they discovered.Collaborative approaches to shorebird migration ecology developed slowly over several decades. The same period also saw the creation of large-scale efforts to monitor and conserve shorebirds. This overview first traces the history of the study of migration ecology of shorebirds during that fertile period, and then describes the monitoring and protection efforts that have been developed in an attempt to address the enormous issues of scale posed by shorebird migration ecology and conservation.

  14. Precise excision and self-integration of a composite transposon as a model for spontaneous large-scale chromosome inversion/deletion of the Staphylococcus haemolyticus clinical strain JCSC1435.

    PubMed

    Watanabe, Shinya; Ito, Teruyo; Morimoto, Yuh; Takeuchi, Fumihiko; Hiramatsu, Keiichi

    2007-04-01

    Large-scale chromosomal inversions (455 to 535 kbp) or deletions (266 to 320 kbp) were found to accompany spontaneous loss of beta-lactam resistance during drug-free passage of the multiresistant Staphylococcus haemolyticus clinical strain JCSC1435. Identification and sequencing of the rearranged chromosomal loci revealed that ISSha1 of S. haemolyticus is responsible for the chromosome rearrangements.

  15. Development of 5123 Intron-Length Polymorphic Markers for Large-Scale Genotyping Applications in Foxtail Millet

    PubMed Central

    Muthamilarasan, Mehanathan; Venkata Suresh, B.; Pandey, Garima; Kumari, Kajal; Parida, Swarup Kumar; Prasad, Manoj

    2014-01-01

    Generating genomic resources in terms of molecular markers is imperative in molecular breeding for crop improvement. Though development and application of microsatellite markers in large-scale was reported in the model crop foxtail millet, no such large-scale study was conducted for intron-length polymorphic (ILP) markers. Considering this, we developed 5123 ILP markers, of which 4049 were physically mapped onto 9 chromosomes of foxtail millet. BLAST analysis of 5123 expressed sequence tags (ESTs) suggested the function for ∼71.5% ESTs and grouped them into 5 different functional categories. About 440 selected primer pairs representing the foxtail millet genome and the different functional groups showed high-level of cross-genera amplification at an average of ∼85% in eight millets and five non-millet species. The efficacy of the ILP markers for distinguishing the foxtail millet is demonstrated by observed heterozygosity (0.20) and Nei's average gene diversity (0.22). In silico comparative mapping of physically mapped ILP markers demonstrated substantial percentage of sequence-based orthology and syntenic relationship between foxtail millet chromosomes and sorghum (∼50%), maize (∼46%), rice (∼21%) and Brachypodium (∼21%) chromosomes. Hence, for the first time, we developed large-scale ILP markers in foxtail millet and demonstrated their utility in germplasm characterization, transferability, phylogenetics and comparative mapping studies in millets and bioenergy grass species. PMID:24086082

  16. MONITORING COASTAL RESOURCES AT MULTIPLE SPATIAL AND TEMPORAL SCALES: LESSONS FROM EMAP 2001 EMAP SYMPOSIUM, APRIL 24-27, PENSACOLA BEACH, FL

    EPA Science Inventory

    In 1990, EMAP's Coastal Monitoring Program conducted its first regional sampling program in the Virginian Province. This first effort focused only at large spatial scales (regional) with some stratification to examine estuarine types. In the ensuing decade, EMAP-Coastal has condu...

  17. The Role of Teacher Leaders in Scaling Up Standards-Based Reform.

    ERIC Educational Resources Information Center

    Swanson, Judy; Snell, Jean; Koency, Gina; Berns, Barbara

    This study examined 10 urban middle school teacher leaders who played significant roles in their districts' and states' large-scale standards reform efforts. Interviews, observations, and shadowing were conducted during the first year to examine the teachers' scope of work. Observations focused on teachers working with a range of students and with…

  18. Prediction of protein tertiary structure from sequences using a very large back-propagation neural network

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, X.; Wilcox, G.L.

    1993-12-31

    We have implemented large scale back-propagation neural networks on a 544 node Connection Machine, CM-5, using the C language in MIMD mode. The program running on 512 processors performs backpropagation learning at 0.53 Gflops, which provides 76 million connection updates per second. We have applied the network to the prediction of protein tertiary structure from sequence information alone. A neural network with one hidden layer and 40 million connections is trained to learn the relationship between sequence and tertiary structure. The trained network yields predicted structures of some proteins on which it has not been trained given only their sequences.more » Presentation of the Fourier transform of the sequences accentuates periodicity in the sequence and yields good generalization with greatly increased training efficiency. Training simulations with a large, heterologous set of protein structures (111 proteins from CM-5 time) to solutions with under 2% RMS residual error within the training set (random responses give an RMS error of about 20%). Presentation of 15 sequences of related proteins in a testing set of 24 proteins yields predicted structures with less than 8% RMS residual error, indicating good apparent generalization.« less

  19. An integrated semiconductor device enabling non-optical genome sequencing.

    PubMed

    Rothberg, Jonathan M; Hinz, Wolfgang; Rearick, Todd M; Schultz, Jonathan; Mileski, William; Davey, Mel; Leamon, John H; Johnson, Kim; Milgrew, Mark J; Edwards, Matthew; Hoon, Jeremy; Simons, Jan F; Marran, David; Myers, Jason W; Davidson, John F; Branting, Annika; Nobile, John R; Puc, Bernard P; Light, David; Clark, Travis A; Huber, Martin; Branciforte, Jeffrey T; Stoner, Isaac B; Cawley, Simon E; Lyons, Michael; Fu, Yutao; Homer, Nils; Sedova, Marina; Miao, Xin; Reed, Brian; Sabina, Jeffrey; Feierstein, Erika; Schorn, Michelle; Alanjary, Mohammad; Dimalanta, Eileen; Dressman, Devin; Kasinskas, Rachel; Sokolsky, Tanya; Fidanza, Jacqueline A; Namsaraev, Eugeni; McKernan, Kevin J; Williams, Alan; Roth, G Thomas; Bustillo, James

    2011-07-20

    The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.

  20. Scaling exponents for ordered maxima

    DOE PAGES

    Ben-Naim, E.; Krapivsky, P. L.; Lemons, N. W.

    2015-12-22

    We study extreme value statistics of multiple sequences of random variables. For each sequence with N variables, independently drawn from the same distribution, the running maximum is defined as the largest variable to date. We compare the running maxima of m independent sequences and investigate the probability S N that the maxima are perfectly ordered, that is, the running maximum of the first sequence is always larger than that of the second sequence, which is always larger than the running maximum of the third sequence, and so on. The probability S N is universal: it does not depend on themore » distribution from which the random variables are drawn. For two sequences, S N~N –1/2, and in general, the decay is algebraic, S N~N –σm, for large N. We analytically obtain the exponent σ 3≅1.302931 as root of a transcendental equation. Moreover, the exponents σ m grow with m, and we show that σ m~m for large m.« less

  1. Current Barriers to Large-scale Interoperability of Traceability Technology in the Seafood Sector.

    PubMed

    Hardt, Marah J; Flett, Keith; Howell, Colleen J

    2017-08-01

    Interoperability is a critical component of full-chain digital traceability, but is almost nonexistent in the seafood industry. Using both quantitative and qualitative methodology, this study explores the barriers impeding progress toward large-scale interoperability among digital traceability systems in the seafood sector from the perspectives of seafood companies, technology vendors, and supply chains as a whole. We highlight lessons from recent research and field work focused on implementing traceability across full supply chains and make some recommendations for next steps in terms of overcoming challenges and scaling current efforts. © 2017 Institute of Food Technologists®.

  2. Fisher research in the US Rocky Mountains: A critical overview

    Treesearch

    Michael Schwartz; J. Sauder

    2013-01-01

    In this talk we review the recent fisher research and monitoring efforts that have occurred throughout Idaho and Montana in past 2 decades. We begin this talk with a summary of the habitat relationship work that has examined fisher habitat use at multiple scales. These have largely been conducted using radio and satellite telemetry, although a new, joint effort to use...

  3. Efforts to Address the Aging Academic Workforce: Assessing Progress through a Three-Stage Model of Institutional Change

    ERIC Educational Resources Information Center

    Kaskie, Brian; Walker, Mark; Andersson, Matthew

    2017-01-01

    The aging of the academic workforce is becoming more relevant to policy discussions in higher education. Yet there has been no formal, large-scale analysis of institutional efforts to develop policies and programs for aging employees. We fielded a representative survey of human resource specialists at 187 colleges and universities across the…

  4. Improvisation in evolution of genes and genomes: whose structure is it anyway?

    PubMed

    Shakhnovich, Boris E; Shakhnovich, Eugene I

    2008-06-01

    Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales--from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How do gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provides first robust answers to these questions.

  5. The sequence of sequencers: The history of sequencing DNA.

    PubMed

    Heather, James M; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  6. Scalable Methods for Uncertainty Quantification, Data Assimilation and Target Accuracy Assessment for Multi-Physics Advanced Simulation of Light Water Reactors

    NASA Astrophysics Data System (ADS)

    Khuwaileh, Bassam

    High fidelity simulation of nuclear reactors entails large scale applications characterized with high dimensionality and tremendous complexity where various physics models are integrated in the form of coupled models (e.g. neutronic with thermal-hydraulic feedback). Each of the coupled modules represents a high fidelity formulation of the first principles governing the physics of interest. Therefore, new developments in high fidelity multi-physics simulation and the corresponding sensitivity/uncertainty quantification analysis are paramount to the development and competitiveness of reactors achieved through enhanced understanding of the design and safety margins. Accordingly, this dissertation introduces efficient and scalable algorithms for performing efficient Uncertainty Quantification (UQ), Data Assimilation (DA) and Target Accuracy Assessment (TAA) for large scale, multi-physics reactor design and safety problems. This dissertation builds upon previous efforts for adaptive core simulation and reduced order modeling algorithms and extends these efforts towards coupled multi-physics models with feedback. The core idea is to recast the reactor physics analysis in terms of reduced order models. This can be achieved via identifying the important/influential degrees of freedom (DoF) via the subspace analysis, such that the required analysis can be recast by considering the important DoF only. In this dissertation, efficient algorithms for lower dimensional subspace construction have been developed for single physics and multi-physics applications with feedback. Then the reduced subspace is used to solve realistic, large scale forward (UQ) and inverse problems (DA and TAA). Once the elite set of DoF is determined, the uncertainty/sensitivity/target accuracy assessment and data assimilation analysis can be performed accurately and efficiently for large scale, high dimensional multi-physics nuclear engineering applications. Hence, in this work a Karhunen-Loeve (KL) based algorithm previously developed to quantify the uncertainty for single physics models is extended for large scale multi-physics coupled problems with feedback effect. Moreover, a non-linear surrogate based UQ approach is developed, used and compared to performance of the KL approach and brute force Monte Carlo (MC) approach. On the other hand, an efficient Data Assimilation (DA) algorithm is developed to assess information about model's parameters: nuclear data cross-sections and thermal-hydraulics parameters. Two improvements are introduced in order to perform DA on the high dimensional problems. First, a goal-oriented surrogate model can be used to replace the original models in the depletion sequence (MPACT -- COBRA-TF - ORIGEN). Second, approximating the complex and high dimensional solution space with a lower dimensional subspace makes the sampling process necessary for DA possible for high dimensional problems. Moreover, safety analysis and design optimization depend on the accurate prediction of various reactor attributes. Predictions can be enhanced by reducing the uncertainty associated with the attributes of interest. Accordingly, an inverse problem can be defined and solved to assess the contributions from sources of uncertainty; and experimental effort can be subsequently directed to further improve the uncertainty associated with these sources. In this dissertation a subspace-based gradient-free and nonlinear algorithm for inverse uncertainty quantification namely the Target Accuracy Assessment (TAA) has been developed and tested. The ideas proposed in this dissertation were first validated using lattice physics applications simulated using SCALE6.1 package (Pressurized Water Reactor (PWR) and Boiling Water Reactor (BWR) lattice models). Ultimately, the algorithms proposed her were applied to perform UQ and DA for assembly level (CASL progression problem number 6) and core wide problems representing Watts Bar Nuclear 1 (WBN1) for cycle 1 of depletion (CASL Progression Problem Number 9) modeled via simulated using VERA-CS which consists of several multi-physics coupled models. The analysis and algorithms developed in this dissertation were encoded and implemented in a newly developed tool kit algorithms for Reduced Order Modeling based Uncertainty/Sensitivity Estimator (ROMUSE).

  7. ExprAlign - the identification of ESTs in non-model species by alignment of cDNA microarray expression profiles

    PubMed Central

    2009-01-01

    Background Sequence identification of ESTs from non-model species offers distinct challenges particularly when these species have duplicated genomes and when they are phylogenetically distant from sequenced model organisms. For the common carp, an environmental model of aquacultural interest, large numbers of ESTs remained unidentified using BLAST sequence alignment. We have used the expression profiles from large-scale microarray experiments to suggest gene identities. Results Expression profiles from ~700 cDNA microarrays describing responses of 7 major tissues to multiple environmental stressors were used to define a co-expression landscape. This was based on the Pearsons correlation coefficient relating each gene with all other genes, from which a network description provided clusters of highly correlated genes as 'mountains'. We show that these contain genes with known identities and genes with unknown identities, and that the correlation constitutes evidence of identity in the latter. This procedure has suggested identities to 522 of 2701 unknown carp ESTs sequences. We also discriminate several common carp genes and gene isoforms that were not discriminated by BLAST sequence alignment alone. Precision in identification was substantially improved by use of data from multiple tissues and treatments. Conclusion The detailed analysis of co-expression landscapes is a sensitive technique for suggesting an identity for the large number of BLAST unidentified cDNAs generated in EST projects. It is capable of detecting even subtle changes in expression profiles, and thereby of distinguishing genes with a common BLAST identity into different identities. It benefits from the use of multiple treatments or contrasts, and from the large-scale microarray data. PMID:19939286

  8. Large-scale transcriptome sequencing and gene analyses in the crab-eating macaque (Macaca fascicularis) for biomedical research

    PubMed Central

    2012-01-01

    Background As a human replacement, the crab-eating macaque (Macaca fascicularis) is an invaluable non-human primate model for biomedical research, but the lack of genetic information on this primate has represented a significant obstacle for its broader use. Results Here, we sequenced the transcriptome of 16 tissues originated from two individuals of crab-eating macaque (male and female), and identified genes to resolve the main obstacles for understanding the biological response of the crab-eating macaque. From 4 million reads with 1.4 billion base sequences, 31,786 isotigs containing genes similar to those of humans, 12,672 novel isotigs, and 348,160 singletons were identified using the GS FLX sequencing method. Approximately 86% of human genes were represented among the genes sequenced in this study. Additionally, 175 tissue-specific transcripts were identified, 81 of which were experimentally validated. In total, 4,314 alternative splicing (AS) events were identified and analyzed. Intriguingly, 10.4% of AS events were associated with transposable element (TE) insertions. Finally, investigation of TE exonization events and evolutionary analysis were conducted, revealing interesting phenomena of human-specific amplified trends in TE exonization events. Conclusions This report represents the first large-scale transcriptome sequencing and genetic analyses of M. fascicularis and could contribute to its utility for biomedical research and basic biology. PMID:22554259

  9. Training set extension for SVM ensemble in P300-speller with familiar face paradigm.

    PubMed

    Li, Qi; Shi, Kaiyang; Gao, Ning; Li, Jian; Bai, Ou

    2018-03-27

    P300-spellers are brain-computer interface (BCI)-based character input systems. Support vector machine (SVM) ensembles are trained with large-scale training sets and used as classifiers in these systems. However, the required large-scale training data necessitate a prolonged collection time for each subject, which results in data collected toward the end of the period being contaminated by the subject's fatigue. This study aimed to develop a method for acquiring more training data based on a collected small training set. A new method was developed in which two corresponding training datasets in two sequences are superposed and averaged to extend the training set. The proposed method was tested offline on a P300-speller with the familiar face paradigm. The SVM ensemble with extended training set achieved 85% classification accuracy for the averaged results of four sequences, and 100% for 11 sequences in the P300-speller. In contrast, the conventional SVM ensemble with non-extended training set achieved only 65% accuracy for four sequences, and 92% for 11 sequences. The SVM ensemble with extended training set achieves higher classification accuracies than the conventional SVM ensemble, which verifies that the proposed method effectively improves the classification performance of BCI P300-spellers, thus enhancing their practicality.

  10. Intercomparison Project on Parameterizations of Large-Scale Dynamics for Simulations of Tropical Convection

    NASA Astrophysics Data System (ADS)

    Sobel, A. H.; Wang, S.; Bellon, G.; Sessions, S. L.; Woolnough, S.

    2013-12-01

    Parameterizations of large-scale dynamics have been developed in the past decade for studying the interaction between tropical convection and large-scale dynamics, based on our physical understanding of the tropical atmosphere. A principal advantage of these methods is that they offer a pathway to attack the key question of what controls large-scale variations of tropical deep convection. These methods have been used with both single column models (SCMs) and cloud-resolving models (CRMs) to study the interaction of deep convection with several kinds of environmental forcings. While much has been learned from these efforts, different groups' efforts are somewhat hard to compare. Different models, different versions of the large-scale parameterization methods, and experimental designs that differ in other ways are used. It is not obvious which choices are consequential to the scientific conclusions drawn and which are not. The methods have matured to the point that there is value in an intercomparison project. In this context, the Global Atmospheric Systems Study - Weak Temperature Gradient (GASS-WTG) project was proposed at the Pan-GASS meeting in September 2012. The weak temperature gradient approximation is one method to parameterize large-scale dynamics, and is used in the project name for historical reasons and simplicity, but another method, the damped gravity wave (DGW) method, will also be used in the project. The goal of the GASS-WTG project is to develop community understanding of the parameterization methods currently in use. Their strengths, weaknesses, and functionality in models with different physics and numerics will be explored in detail, and their utility to improve our understanding of tropical weather and climate phenomena will be further evaluated. This presentation will introduce the intercomparison project, including background, goals, and overview of the proposed experimental design. Interested groups will be invited to join (it will not be too late), and preliminary results will be presented.

  11. Demonstrating the feasibility of large-scale development of standardized assays to quantify human proteins

    PubMed Central

    Kennedy, Jacob J.; Abbatiello, Susan E.; Kim, Kyunggon; Yan, Ping; Whiteaker, Jeffrey R.; Lin, Chenwei; Kim, Jun Seok; Zhang, Yuzheng; Wang, Xianlong; Ivey, Richard G.; Zhao, Lei; Min, Hophil; Lee, Youngju; Yu, Myeong-Hee; Yang, Eun Gyeong; Lee, Cheolju; Wang, Pei; Rodriguez, Henry; Kim, Youngsoo; Carr, Steven A.; Paulovich, Amanda G.

    2014-01-01

    The successful application of MRM in biological specimens raises the exciting possibility that assays can be configured to measure all human proteins, resulting in an assay resource that would promote advances in biomedical research. We report the results of a pilot study designed to test the feasibility of a large-scale, international effort in MRM assay generation. We have configured, validated across three laboratories, and made publicly available as a resource to the community 645 novel MRM assays representing 319 proteins expressed in human breast cancer. Assays were multiplexed in groups of >150 peptides and deployed to quantify endogenous analyte in a panel of breast cancer-related cell lines. Median assay precision was 5.4%, with high inter-laboratory correlation (R2 >0.96). Peptide measurements in breast cancer cell lines were able to discriminate amongst molecular subtypes and identify genome-driven changes in the cancer proteome. These results establish the feasibility of a scaled, international effort. PMID:24317253

  12. Transposon facilitated DNA sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berg, D.E.; Berg, C.M.; Huang, H.V.

    1990-01-01

    The purpose of this research is to investigate and develop methods that exploit the power of bacterial transposable elements for large scale DNA sequencing: Our premise is that the use of transposons to put primer binding sites randomly in target DNAs should provide access to all portions of large DNA fragments, without the inefficiencies of methods involving random subcloning and attendant repetitive sequencing, or of sequential synthesis of many oligonucleotide primers that are used to match systematically along a DNA molecule. Two unrelated bacterial transposons, Tn5 and {gamma}{delta}, are being used because they have both proven useful for molecular analyses,more » and because they differ sufficiently in mechanism and specificity of transposition to merit parallel development.« less

  13. Role of substrate quality on IC performance and yields

    NASA Technical Reports Server (NTRS)

    Thomas, R. N.

    1981-01-01

    The development of silicon and gallium arsenide crystal growth for the production of large diameter substrates are discussed. Large area substrates of significantly improved compositional purity, dopant distribution and structural perfection on a microscopic as well as macroscopic scale are important requirements. The exploratory use of magnetic fields to suppress convection effects in Czochralski crystal growth is addressed. The growth of large crystals in space appears impractical at present however the efforts to improve substrate quality could benefit from the experiences gained in smaller scale growth experiments conducted in the zero gravity environment of space.

  14. New Tools For Understanding Microbial Diversity Using High-throughput Sequence Data

    NASA Astrophysics Data System (ADS)

    Knight, R.; Hamady, M.; Liu, Z.; Lozupone, C.

    2007-12-01

    High-throughput sequencing techniques such as 454 are straining the limits of tools traditionally used to build trees, choose OTUs, and perform other essential sequencing tasks. We have developed a workflow for phylogenetic analysis of large-scale sequence data sets that combines existing tools, such as the Arb phylogeny package and the NAST multiple sequence alignment tool, with new methods for choosing and clustering OTUs and for performing phylogenetic community analysis with UniFrac. This talk discusses the cyberinfrastructure we are developing to support the human microbiome project, and the application of these workflows to analyze very large data sets that contrast the gut microbiota with a range of physical environments. These tools will ultimately help to define core and peripheral microbiomes in a range of environments, and will allow us to understand the physical and biotic factors that contribute most to differences in microbial diversity.

  15. The Human Variome Project.

    PubMed

    Burn, John; Watson, Michael

    2016-06-01

    The practical realization of genomics has meant a growing realization that variant interpretation is a major barrier to practical use of DNA sequence data. The late Professor Dick Cotton devoted his life to innovation in molecular genetics and was a prime mover in the international response to the need to understand the "variome." His leadership resulted in the launch first of the Human Genetic Variation Society and then, in 2006, an international agreement to launch the Human Variome Project (HVP), aimed at data integration enabled by standards and infrastructure of the databases of variants being identified in families with a range of inherited disorders. The project attracted a network of affiliates across 81 countries and earned formal recognition by UNESCO, which now hosts its biennial meetings. It has also signed a Memorandum of Understanding with the World Health Organization. Future progress will depend on longer term secure funding and integration with the efforts of the genomics community where the rapid advances in sequencing technology have enabled variant capture on a previously unimaginable scale. Efforts are underway to integrate the efforts of HVP with those of the Global Alliance for Genomics and Health to provide a lasting legacy of Dick Cotton's vision. © 2016 WILEY PERIODICALS, INC.

  16. Development of Experimental Icing Simulation Capability for Full-Scale Swept Wings: Hybrid Design Process, Years 1 and 2

    NASA Technical Reports Server (NTRS)

    Fujiwara, Gustavo; Bragg, Mike; Triphahn, Chris; Wiberg, Brock; Woodard, Brian; Loth, Eric; Malone, Adam; Paul, Bernard; Pitera, David; Wilcox, Pete; hide

    2017-01-01

    This report presents the key results from the first two years of a program to develop experimental icing simulation capabilities for full-scale swept wings. This investigation was undertaken as a part of a larger collaborative research effort on ice accretion and aerodynamics for large-scale swept wings. Ice accretion and the resulting aerodynamic effect on large-scale swept wings presents a significant airplane design and certification challenge to air frame manufacturers, certification authorities, and research organizations alike. While the effect of ice accretion on straight wings has been studied in detail for many years, the available data on swept-wing icing are much more limited, especially for larger scales.

  17. Dendrites, deep learning, and sequences in the hippocampus.

    PubMed

    Bhalla, Upinder S

    2017-10-12

    The hippocampus places us both in time and space. It does so over remarkably large spans: milliseconds to years, and centimeters to kilometers. This works for sensory representations, for memory, and for behavioral context. How does it fit in such wide ranges of time and space scales, and keep order among the many dimensions of stimulus context? A key organizing principle for a wide sweep of scales and stimulus dimensions is that of order in time, or sequences. Sequences of neuronal activity are ubiquitous in sensory processing, in motor control, in planning actions, and in memory. Against this strong evidence for the phenomenon, there are currently more models than definite experiments about how the brain generates ordered activity. The flip side of sequence generation is discrimination. Discrimination of sequences has been extensively studied at the behavioral, systems, and modeling level, but again physiological mechanisms are fewer. It is against this backdrop that I discuss two recent developments in neural sequence computation, that at face value share little beyond the label "neural." These are dendritic sequence discrimination, and deep learning. One derives from channel physiology and molecular signaling, the other from applied neural network theory - apparently extreme ends of the spectrum of neural circuit detail. I suggest that each of these topics has deep lessons about the possible mechanisms, scales, and capabilities of hippocampal sequence computation. © 2017 Wiley Periodicals, Inc.

  18. WormBase ParaSite - a comprehensive resource for helminth genomics.

    PubMed

    Howe, Kevin L; Bolt, Bruce J; Shafie, Myriam; Kersey, Paul; Berriman, Matthew

    2017-07-01

    The number of publicly available parasitic worm genome sequences has increased dramatically in the past three years, and research interest in helminth functional genomics is now quickly gathering pace in response to the foundation that has been laid by these collective efforts. A systematic approach to the organisation, curation, analysis and presentation of these data is clearly vital for maximising the utility of these data to researchers. We have developed a portal called WormBase ParaSite (http://parasite.wormbase.org) for interrogating helminth genomes on a large scale. Data from over 100 nematode and platyhelminth species are integrated, adding value by way of systematic and consistent functional annotation (e.g. protein domains and Gene Ontology terms), gene expression analysis (e.g. alignment of life-stage specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide several ways of exploring the data, including genome browsers, genome and gene summary pages, text search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review, we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the displays and tools available to users for interrogating helminth genomic data. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  19. Global Transmission Dynamics of Measles in the Measles Elimination Era.

    PubMed

    Furuse, Yuki; Oshitani, Hitoshi

    2017-04-16

    Although there have been many epidemiological reports of the inter-country transmission of measles, systematic analysis of the global transmission dynamics of the measles virus (MV) is limited. In this study, we applied phylogeographic analysis to characterize the global transmission dynamics of the MV using large-scale genetic sequence data (obtained for 7456 sequences) from 115 countries between 1954 and 2015. These analyses reveal the spatial and temporal characteristics of global transmission of the virus, especially in Australia, China, India, Japan, the UK, and the USA in the period since 1990. The transmission is frequently observed, not only within the same region but also among distant and frequently visited areas. Frequencies of export from measles-endemic countries, such as China, India, and Japan are high but decreasing, while the frequencies from countries where measles is no longer endemic, such as Australia, the UK, and the USA, are low but slightly increasing. The world is heading toward measles eradication, but the disease is still transmitted regionally and globally. Our analysis reveals that countries wherein measles is endemic and those having eliminated the disease (apart from occasional outbreaks) both remain a source of global transmission in this measles elimination era. It is therefore crucial to maintain vigilance in efforts to monitor and eradicate measles globally.

  20. Genome structure and primitive sex chromosome revealed in Populus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tuskan, Gerald A; Yin, Tongming; Gunter, Lee E

    We constructed a comprehensive genetic map for Populus and ordered 332 Mb of sequence scaffolds along the 19 haploid chromosomes in order to compare chromosomal regions among diverse members of the genus. These efforts lead us to conclude that chromosome XIX in Populus is evolving into a sex chromosome. Consistent segregation distortion in favor of the sub-genera Tacamahaca alleles provided evidence of divergent selection among species, particularly at the proximal end of chromosome XIX. A large microsatellite marker (SSR) cluster was detected in the distorted region even though the genome-wide distribute SSR sites was uniform across the physical map. Themore » differences between the genetic map and physical sequence data suggested recombination suppression was occurring in the distorted region. A gender-determination locus and an overabundance of NBS-LRR genes were also co-located to the distorted region and were put forth as the cause for divergent selection and recombination suppression. This hypothesis was verified by using fine-scale mapping of an integrated scaffold in the vicinity of the gender-determination locus. As such it appears that chromosome XIX in Populus is in the process of evolving from an autosome into a sex chromosome and that NBS-LRR genes may play important role in the chromosomal diversification process in Populus.« less

  1. MAPU: Max-Planck Unified database of organellar, cellular, tissue and body fluid proteomes.

    PubMed

    Zhang, Yanling; Zhang, Yong; Adachi, Jun; Olsen, Jesper V; Shi, Rong; de Souza, Gustavo; Pasini, Erica; Foster, Leonard J; Macek, Boris; Zougman, Alexandre; Kumar, Chanchal; Wisniewski, Jacek R; Jun, Wang; Mann, Matthias

    2007-01-01

    Mass spectrometry (MS)-based proteomics has become a powerful technology to map the protein composition of organelles, cell types and tissues. In our department, a large-scale effort to map these proteomes is complemented by the Max-Planck Unified (MAPU) proteome database. MAPU contains several body fluid proteomes; including plasma, urine, and cerebrospinal fluid. Cell lines have been mapped to a depth of several thousand proteins and the red blood cell proteome has also been analyzed in depth. The liver proteome is represented with 3200 proteins. By employing high resolution MS and stringent validation criteria, false positive identification rates in MAPU are lower than 1:1000. Thus MAPU datasets can serve as reference proteomes in biomarker discovery. MAPU contains the peptides identifying each protein, measured masses, scores and intensities and is freely available at http://www.mapuproteome.com using a clickable interface of cell or body parts. Proteome data can be queried across proteomes by protein name, accession number, sequence similarity, peptide sequence and annotation information. More than 4500 mouse and 2500 human proteins have already been identified in at least one proteome. Basic annotation information and links to other public databases are provided in MAPU and we plan to add further analysis tools.

  2. Localized structural frustration for evaluating the impact of sequence variants

    PubMed Central

    Kumar, Sushant; Clarke, Declan; Gerstein, Mark

    2016-01-01

    Population-scale sequencing is increasingly uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions of the genome. The rarity of these variants makes it challenging to evaluate their deleteriousness with conventional phenotype–genotype associations. Protein structures provide a way of addressing this challenge. Previous efforts have focused on globally quantifying the impact of SNVs on protein stability. However, local perturbations may severely impact protein functionality without strongly disrupting global stability (e.g. in relation to catalysis or allostery). Here, we describe a workflow in which localized frustration, quantifying unfavorable local interactions, is employed as a metric to investigate such effects. Using this workflow on the Protein Databank, we find that frustration produces many immediately intuitive results: for instance, disease-related SNVs create stronger changes in localized frustration than non-disease related variants, and rare SNVs tend to disrupt local interactions to a larger extent than common variants. Less obviously, we observe that somatic SNVs associated with oncogenes and tumor suppressor genes (TSGs) induce very different changes in frustration. In particular, those associated with TSGs change the frustration more in the core than the surface (by introducing loss-of-function events), whereas those associated with oncogenes manifest the opposite pattern, creating gain-of-function events. PMID:27915290

  3. A gravitational puzzle.

    PubMed

    Caldwell, Robert R

    2011-12-28

    The challenge to understand the physical origin of the cosmic acceleration is framed as a problem of gravitation. Specifically, does the relationship between stress-energy and space-time curvature differ on large scales from the predictions of general relativity. In this article, we describe efforts to model and test a generalized relationship between the matter and the metric using cosmological observations. Late-time tracers of large-scale structure, including the cosmic microwave background, weak gravitational lensing, and clustering are shown to provide good tests of the proposed solution. Current data are very close to proving a critical test, leaving only a small window in parameter space in the case that the generalized relationship is scale free above galactic scales.

  4. Spatially Resolved Spectroscopy of Narrow-line Seyfert 1 Host Galaxies

    NASA Astrophysics Data System (ADS)

    Scharwächter, J.; Husemann, B.; Busch, G.; Komossa, S.; Dopita, M. A.

    2017-10-01

    We present optical integral field spectroscopy for five z< 0.062 narrow-line Seyfert 1 (NLS1) galaxies, probing their host galaxies at ≳ 2{--}3 {kpc} scales. Emission lines from the active galactic nucleus (AGN) and the large-scale host galaxy are analyzed separately, based on an AGN-host decomposition technique. The host galaxy gas kinematics indicates large-scale gas rotation in all five sources. At the probed scales of ≳ 2{--}3 {kpc}, the host galaxy gas is found to be predominantly ionized by star formation without any evidence of a strong AGN contribution. None of the five objects shows specific star formation rates (SFRs) exceeding the main sequence of low-redshift star-forming galaxies. The specific SFRs for MCG-05-01-013 and WPVS 007 are roughly consistent with the main sequence, while ESO 399-IG20, MS 22549-3712, and TON S180 show lower specific SFRs, intermediate to the main sequence and the red quiescent galaxies. The host galaxy metallicities, derived for the two sources with sufficient data quality (ESO 399-IG20 and MCG-05-01-013), indicate central oxygen abundances just below the low-redshift mass-metallicity relation. Based on this initial case study, we outline a comparison of AGN and host galaxy parameters as a starting point for future extended NLS1 studies with similar methods.

  5. Seismic Parameters of Mining-Induced Aftershock Sequences for Re-entry Protocol Development

    NASA Astrophysics Data System (ADS)

    Vallejos, Javier A.; Estay, Rodrigo A.

    2018-03-01

    A common characteristic of deep mines in hard rock is induced seismicity. This results from stress changes and rock failure around mining excavations. Following large seismic events, there is an increase in the levels of seismicity, which gradually decay with time. Restricting access to areas of a mine for enough time to allow this decay of seismic events is the main approach in re-entry strategies. The statistical properties of aftershock sequences can be studied with three scaling relations: (1) Gutenberg-Richter frequency magnitude, (2) the modified Omori's law (MOL) for the temporal decay, and (3) Båth's law for the magnitude of the largest aftershock. In this paper, these three scaling relations, in addition to the stochastic Reasenberg-Jones model are applied to study the characteristic parameters of 11 large magnitude mining-induced aftershock sequences in four mines in Ontario, Canada. To provide guidelines for re-entry protocol development, the dependence of the scaling relation parameters on the magnitude of the main event are studied. Some relations between the parameters and the magnitude of the main event are found. Using these relationships and the scaling relations, a space-time-magnitude re-entry protocol is developed. These findings provide a first approximation to concise and well-justified guidelines for re-entry protocol development applicable to the range of mining conditions found in Ontario, Canada.

  6. Genotyping-by-sequencing for Populus population genomics: An assessment of genome sampling patterns and filtering approaches

    Treesearch

    Martin P. Schilling; Paul G. Wolf; Aaron M. Duffy; Hardeep S. Rai; Carol A. Rowe; Bryce A. Richardson; Karen E. Mock

    2014-01-01

    Continuing advances in nucleotide sequencing technology are inspiring a suite of genomic approaches in studies of natural populations. Researchers are faced with data management and analytical scales that are increasing by orders of magnitude. With such dramatic advances comes a need to understand biases and error rates, which can be propagated and magnified in large-...

  7. Large-scale deletions of the ABCA1 gene in patients with hypoalphalipoproteinemia.

    PubMed

    Dron, Jacqueline S; Wang, Jian; Berberich, Amanda J; Iacocca, Michael A; Cao, Henian; Yang, Ping; Knoll, Joan; Tremblay, Karine; Brisson, Diane; Netzer, Christian; Gouni-Berthold, Ioanna; Gaudet, Daniel; Hegele, Robert A

    2018-06-04

    Copy-number variations (CNVs) have been studied in the context of familial hypercholesterolemia but have not yet been evaluated in patients with extremes of high-density lipoprotein (HDL) cholesterol levels. We evaluated targeted next-generation sequencing data from patients with very low HDL cholesterol (i.e. hypoalphalipoproteinemia) using the VarSeq-CNV caller algorithm to screen for CNVs disrupting the ABCA1, LCAT or APOA1 genes. In four individuals, we found three unique deletions in ABCA1: a heterozygous deletion of exon 4, a heterozygous deletion spanning exons 8 to 31, and a heterozygous deletion of the entire ABCA1 gene. Breakpoints were identified using Sanger sequencing, and the full-gene deletion was also confirmed using exome sequencing and the Affymetrix CytoScanTM HD Array. Before now, large-scale deletions in candidate HDL genes have not been associated with hypoalphalipoproteinemia; our findings indicate that CNVs in ABCA1 may be a previously unappreciated genetic determinant of low HDL cholesterol levels. By coupling bioinformatic analyses with next-generation sequencing data, we can successfully assess the spectrum of genetic determinants of many dyslipidemias, now including hypoalphalipoproteinemia. Published under license by The American Society for Biochemistry and Molecular Biology, Inc.

  8. Scaling-up NLP Pipelines to Process Large Corpora of Clinical Notes.

    PubMed

    Divita, G; Carter, M; Redd, A; Zeng, Q; Gupta, K; Trautner, B; Samore, M; Gundlapalli, A

    2015-01-01

    This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". This paper describes the scale-up efforts at the VA Salt Lake City Health Care System to address processing large corpora of clinical notes through a natural language processing (NLP) pipeline. The use case described is a current project focused on detecting the presence of an indwelling urinary catheter in hospitalized patients and subsequent catheter-associated urinary tract infections. An NLP algorithm using v3NLP was developed to detect the presence of an indwelling urinary catheter in hospitalized patients. The algorithm was tested on a small corpus of notes on patients for whom the presence or absence of a catheter was already known (reference standard). In planning for a scale-up, we estimated that the original algorithm would have taken 2.4 days to run on a larger corpus of notes for this project (550,000 notes), and 27 days for a corpus of 6 million records representative of a national sample of notes. We approached scaling-up NLP pipelines through three techniques: pipeline replication via multi-threading, intra-annotator threading for tasks that can be further decomposed, and remote annotator services which enable annotator scale-out. The scale-up resulted in reducing the average time to process a record from 206 milliseconds to 17 milliseconds or a 12- fold increase in performance when applied to a corpus of 550,000 notes. Purposely simplistic in nature, these scale-up efforts are the straight forward evolution from small scale NLP processing to larger scale extraction without incurring associated complexities that are inherited by the use of the underlying UIMA framework. These efforts represent generalizable and widely applicable techniques that will aid other computationally complex NLP pipelines that are of need to be scaled out for processing and analyzing big data.

  9. Urban Greening Bay Area

    EPA Pesticide Factsheets

    Information about the San Francisco Bay Water Quality Project (SFBWQP) Urban Greening Bay Area, a large-scale effort to re-envision urban landscapes to include green infrastructure (GI) making communities more livable and reducing stormwater runoff.

  10. SaDA: From Sampling to Data Analysis—An Extensible Open Source Infrastructure for Rapid, Robust and Automated Management and Analysis of Modern Ecological High-Throughput Microarray Data

    PubMed Central

    Singh, Kumar Saurabh; Thual, Dominique; Spurio, Roberto; Cannata, Nicola

    2015-01-01

    One of the most crucial characteristics of day-to-day laboratory information management is the collection, storage and retrieval of information about research subjects and environmental or biomedical samples. An efficient link between sample data and experimental results is absolutely important for the successful outcome of a collaborative project. Currently available software solutions are largely limited to large scale, expensive commercial Laboratory Information Management Systems (LIMS). Acquiring such LIMS indeed can bring laboratory information management to a higher level, but most of the times this requires a sufficient investment of money, time and technical efforts. There is a clear need for a light weighted open source system which can easily be managed on local servers and handled by individual researchers. Here we present a software named SaDA for storing, retrieving and analyzing data originated from microorganism monitoring experiments. SaDA is fully integrated in the management of environmental samples, oligonucleotide sequences, microarray data and the subsequent downstream analysis procedures. It is simple and generic software, and can be extended and customized for various environmental and biomedical studies. PMID:26047146

  11. From sexless to sexy: Why it is time for human genetics to consider and report analyses of sex.

    PubMed

    Powers, Matthew S; Smith, Phillip H; McKee, Sherry A; Ehringer, Marissa A

    2017-01-01

    Science has come a long way with regard to the consideration of sex differences in clinical and preclinical research, but one field remains behind the curve: human statistical genetics. The goal of this commentary is to raise awareness and discussion about how to best consider and evaluate possible sex effects in the context of large-scale human genetic studies. Over the course of this commentary, we reinforce the importance of interpreting genetic results in the context of biological sex, establish evidence that sex differences are not being considered in human statistical genetics, and discuss how best to conduct and report such analyses. Our recommendation is to run stratified analyses by sex no matter the sample size or the result and report the findings. Summary statistics from stratified analyses are helpful for meta-analyses, and patterns of sex-dependent associations may be hidden in a combined dataset. In the age of declining sequencing costs, large consortia efforts, and a number of useful control samples, it is now time for the field of human genetics to appropriately include sex in the design, analysis, and reporting of results.

  12. SaDA: From Sampling to Data Analysis-An Extensible Open Source Infrastructure for Rapid, Robust and Automated Management and Analysis of Modern Ecological High-Throughput Microarray Data.

    PubMed

    Singh, Kumar Saurabh; Thual, Dominique; Spurio, Roberto; Cannata, Nicola

    2015-06-03

    One of the most crucial characteristics of day-to-day laboratory information management is the collection, storage and retrieval of information about research subjects and environmental or biomedical samples. An efficient link between sample data and experimental results is absolutely important for the successful outcome of a collaborative project. Currently available software solutions are largely limited to large scale, expensive commercial Laboratory Information Management Systems (LIMS). Acquiring such LIMS indeed can bring laboratory information management to a higher level, but most of the times this requires a sufficient investment of money, time and technical efforts. There is a clear need for a light weighted open source system which can easily be managed on local servers and handled by individual researchers. Here we present a software named SaDA for storing, retrieving and analyzing data originated from microorganism monitoring experiments. SaDA is fully integrated in the management of environmental samples, oligonucleotide sequences, microarray data and the subsequent downstream analysis procedures. It is simple and generic software, and can be extended and customized for various environmental and biomedical studies.

  13. Insertion Sequence-Caused Large Scale-Rearrangements in the Genome of Escherichia coli

    DTIC Science & Technology

    2016-07-18

    rearrangements in the genome of Escherichia coli Heewook Lee1,2, Thomas G. Doak3,4, Ellen Popodi3, Patricia L. Foster3 and Haixu Tang1,* 1School of...and excisions of IS elements and recombi- nation between homologous IS elements identified in a large collection of Escherichia coli mutation accu...scale rear- rangements arose in the Escherichia coli genome during a long-term evolution experiment in a recent study (8). Com- bining WGSS with

  14. Recent developments in microfluidic large scale integration.

    PubMed

    Araci, Ismail Emre; Brisk, Philip

    2014-02-01

    In 2002, Thorsen et al. integrated thousands of micromechanical valves on a single microfluidic chip and demonstrated that the control of the fluidic networks can be simplified through multiplexors [1]. This enabled realization of highly parallel and automated fluidic processes with substantial sample economy advantage. Moreover, the fabrication of these devices by multilayer soft lithography was easy and reliable hence contributed to the power of the technology; microfluidic large scale integration (mLSI). Since then, mLSI has found use in wide variety of applications in biology and chemistry. In the meantime, efforts to improve the technology have been ongoing. These efforts mostly focus on; novel materials, components, micromechanical valve actuation methods, and chip architectures for mLSI. In this review, these technological advances are discussed and, recent examples of the mLSI applications are summarized. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. A large-scale examination of the nature and efficacy of teachers' practices to engage parents: assessment, parental contact, and student-level impact.

    PubMed

    Seitsinger, Anne M; Felner, Robert D; Brand, Stephen; Burns, Amy

    2008-08-01

    As schools move forward with comprehensive school reform, parents' roles have shifted and been redefined. Parent-teacher communication is critical to student success, yet how schools and teachers contact parents is the subject of few studies. Evaluations of school-change efforts require reliable and useful measures of teachers' practices in communicating with parents. The structure of teacher-parent-contact practices was examined using data from multiple, longitudinal cohorts of schools and teachers from a large-scale project and found to be a reliable and stable measure of parent contact across building levels and localities. Teacher/school practices in contacting parents were found to be significantly related to parent reports of school contact performance and student academic adjustment and achievement. Implications for school improvement efforts are discussed.

  16. Integrating market processes into utility resource planning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kahn, E.P.

    1992-11-01

    Integrated resource planning has resulted in an abundance of alternatives for meeting existing and new demand for electricity services: (1) utility demand-side management (DSM) programs, (2) DSM bidding, (3) competitive bidding for private power supplies, (4) utility re-powering, and (5) new utility construction. Each alternative relies on a different degree of planning for implementation and, therefore, each alternative relies on markets to a greater or lesser degree. This paper shows how the interaction of planning processes and market forces results in resource allocations among the alternatives. The discussion focuses on three phenomena that are driving forces behind the unanticipated consequences'more » of contemporary integrated resource planning efforts. These forces are: (1) large-scale DSM efforts, (2) customer bypass, and (3) large-scale independent power projects. 22 refs., 3 figs., 2 tabs.« less

  17. GWASeq: targeted re-sequencing follow up to GWAS.

    PubMed

    Salomon, Matthew P; Li, Wai Lok Sibon; Edlund, Christopher K; Morrison, John; Fortini, Barbara K; Win, Aung Ko; Conti, David V; Thomas, Duncan C; Duggan, David; Buchanan, Daniel D; Jenkins, Mark A; Hopper, John L; Gallinger, Steven; Le Marchand, Loïc; Newcomb, Polly A; Casey, Graham; Marjoram, Paul

    2016-03-03

    For the last decade the conceptual framework of the Genome-Wide Association Study (GWAS) has dominated the investigation of human disease and other complex traits. While GWAS have been successful in identifying a large number of variants associated with various phenotypes, the overall amount of heritability explained by these variants remains small. This raises the question of how best to follow up on a GWAS, localize causal variants accounting for GWAS hits, and as a consequence explain more of the so-called "missing" heritability. Advances in high throughput sequencing technologies now allow for the efficient and cost-effective collection of vast amounts of fine-scale genomic data to complement GWAS. We investigate these issues using a colon cancer dataset. After QC, our data consisted of 1993 cases, 899 controls. Using marginal tests of associations, we identify 10 variants distributed among six targeted regions that are significantly associated with colorectal cancer, with eight of the variants being novel to this study. Additionally, we perform so-called 'SNP-set' tests of association and identify two sets of variants that implicate both common and rare variants in the etiology of colorectal cancer. Here we present a large-scale targeted re-sequencing resource focusing on genomic regions implicated in colorectal cancer susceptibility previously identified in several GWAS, which aims to 1) provide fine-scale targeted sequencing data for fine-mapping and 2) provide data resources to address methodological questions regarding the design of sequencing-based follow-up studies to GWAS. Additionally, we show that this strategy successfully identifies novel variants associated with colorectal cancer susceptibility and can implicate both common and rare variants.

  18. Evaluation Findings from High School Reform Efforts in Baltimore

    ERIC Educational Resources Information Center

    Smerdon, Becky; Cohen, Jennifer

    2009-01-01

    The Baltimore City Public School System (BCPSS) is one of the first urban districts in the country to undertake large-scale high school reform, phasing in small learning communities by opening new high schools and transforming large, comprehensive high schools into small high schools. With support from the Bill & Melinda Gates Foundation, a…

  19. A Navy Shore Activity Manpower Planning System for Civilians. Technical Report No. 24.

    ERIC Educational Resources Information Center

    Niehaus, R. J.; Sholtz, D.

    This report describes the U.S. Navy Shore Activity Manpower Planning System (SAMPS) advanced development research project. This effort is aimed at large-scale feasibility tests of manpower models for large Naval installations. These local planning systems are integrated with Navy-wide information systems on a data-communications network accessible…

  20. Analysis of Newton's Third Law Questions on the Force Concepts Inventory at Georgia State University

    NASA Astrophysics Data System (ADS)

    Oakley, Christopher; Thoms, Brian

    2012-03-01

    A major emphasis of the Physics Education Research program at Georgia State University is an effort to assess and improve students' understanding of Newton's Laws concepts. As part of these efforts the Force Concepts Inventory (FCI) has been given to students in both the algebra-based and calculus-based introductory physics sequences. In addition, the algebra-based introductory physics sequence is taught in both a SCALE-UP and a traditional lecture format. The results of the FCI have been analyzed by individual question and also as categorized by content. The analysis indicates that students in both algebra and calculus-based courses are successful at overcoming Aristotelian misconceptions regarding Newton's Third Law (N3) in the context of a stationary system. However, students are less successful on N3 questions involving objects in constant motion or accelerating. Interference between understanding of Newton's Second and Third Laws as well as other possible explanations for lower student performance on N3 questions involving non-stationary objects will be discussed.

  1. Brief Self-Report Scales Assessing Life History Dimensions of Mating and Parenting Effort.

    PubMed

    Kruger, Daniel J

    2017-01-01

    Life history theory (LHT) is a powerful evolutionary framework for understanding physiological, psychological, and behavioral variation both between and within species. Researchers and theorists are increasingly integrating LHT into evolutionary psychology, as it provides a strong foundation for research across many topical areas. Human life history variation has been represented in psychological and behavioral research in several ways, including indicators of conditions in the developmental environment, indicators of conditions in the current environment, and indicators of maturation and life milestones (e.g., menarche, initial sexual activity, first pregnancy), and in self-report survey scale measures. Survey scale measures have included constructs such as time perspective and future discounting, although the most widely used index is a constellation of indicators assessing the K-factor, thought to index general life history speed (from fast to slow). The current project examined the utility of two brief self-report survey measures assessing the life history dimensions of mating effort and parenting effort with a large undergraduate sample in the United States. Consistent with the theory, items reflected two inversely related dimensions. In regressions including the K-factor, the Mating Effort Scale proved to be a powerful predictor of other constructs and indicators related to life history variation. The Parenting Effort Scale had less predictive power overall, although it explained unique variance across several constructs and was the only unique predictor of the number of long-term (serious and committed) relationships. These scales may be valuable additions to self-report survey research projects examining life history variation.

  2. Wireless in-situ Sensor Network for Agriculture and Water Monitoring on a River Basin Scale in Southern Finland: Evaluation from a Data User’s Perspective

    PubMed Central

    Kotamäki, Niina; Thessler, Sirpa; Koskiaho, Jari; Hannukkala, Asko O.; Huitu, Hanna; Huttula, Timo; Havento, Jukka; Järvenpää, Markku

    2009-01-01

    Sensor networks are increasingly being implemented for environmental monitoring and agriculture to provide spatially accurate and continuous environmental information and (near) real-time applications. These networks provide a large amount of data which poses challenges for ensuring data quality and extracting relevant information. In the present paper we describe a river basin scale wireless sensor network for agriculture and water monitoring. The network, called SoilWeather, is unique and the first of this type in Finland. The performance of the network is assessed from the user and maintainer perspectives, concentrating on data quality, network maintenance and applications. The results showed that the SoilWeather network has been functioning in a relatively reliable way, but also that the maintenance and data quality assurance by automatic algorithms and calibration samples requires a lot of effort, especially in continuous water monitoring over large areas. We see great benefits on sensor networks enabling continuous, real-time monitoring, while data quality control and maintenance efforts highlight the need for tight collaboration between sensor and sensor network owners to decrease costs and increase the quality of the sensor data in large scale applications. PMID:22574050

  3. Analyzing large scale genomic data on the cloud with Sparkhit

    PubMed Central

    Huang, Liren; Krüger, Jan

    2018-01-01

    Abstract Motivation The increasing amount of next-generation sequencing data poses a fundamental challenge on large scale genomic analytics. Existing tools use different distributed computational platforms to scale-out bioinformatics workloads. However, the scalability of these tools is not efficient. Moreover, they have heavy run time overheads when pre-processing large amounts of data. To address these limitations, we have developed Sparkhit: a distributed bioinformatics framework built on top of the Apache Spark platform. Results Sparkhit integrates a variety of analytical methods. It is implemented in the Spark extended MapReduce model. It runs 92–157 times faster than MetaSpark on metagenomic fragment recruitment and 18–32 times faster than Crossbow on data pre-processing. We analyzed 100 terabytes of data across four genomic projects in the cloud in 21 h, which includes the run times of cluster deployment and data downloading. Furthermore, our application on the entire Human Microbiome Project shotgun sequencing data was completed in 2 h, presenting an approach to easily associate large amounts of public datasets with reference data. Availability and implementation Sparkhit is freely available at: https://rhinempi.github.io/sparkhit/. Contact asczyrba@cebitec.uni-bielefeld.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:29253074

  4. RAID-2: Design and implementation of a large scale disk array controller

    NASA Technical Reports Server (NTRS)

    Katz, R. H.; Chen, P. M.; Drapeau, A. L.; Lee, E. K.; Lutz, K.; Miller, E. L.; Seshan, S.; Patterson, D. A.

    1992-01-01

    We describe the implementation of a large scale disk array controller and subsystem incorporating over 100 high performance 3.5 inch disk drives. It is designed to provide 40 MB/s sustained performance and 40 GB capacity in three 19 inch racks. The array controller forms an integral part of a file server that attaches to a Gb/s local area network. The controller implements a high bandwidth interconnect between an interleaved memory, an XOR calculation engine, the network interface (HIPPI), and the disk interfaces (SCSI). The system is now functionally operational, and we are tuning its performance. We review the design decisions, history, and lessons learned from this three year university implementation effort to construct a truly large scale system assembly.

  5. Progress in ion torrent semiconductor chip based sequencing.

    PubMed

    Merriman, Barry; Rothberg, Jonathan M

    2012-12-01

    In order for next-generation sequencing to become widely used as a diagnostic in the healthcare industry, sequencing instrumentation will need to be mass produced with a high degree of quality and economy. One way to achieve this is to recast DNA sequencing in a format that fully leverages the manufacturing base created for computer chips, complementary metal-oxide semiconductor chip fabrication, which is the current pinnacle of large scale, high quality, low-cost manufacturing of high technology. To achieve this, ideally the entire sensory apparatus of the sequencer would be embodied in a standard semiconductor chip, manufactured in the same fab facilities used for logic and memory chips. Recently, such a sequencing chip, and the associated sequencing platform, has been developed and commercialized by Ion Torrent, a division of Life Technologies, Inc. Here we provide an overview of this semiconductor chip based sequencing technology, and summarize the progress made since its commercial introduction. We described in detail the progress in chip scaling, sequencing throughput, read length, and accuracy. We also summarize the enhancements in the associated platform, including sample preparation, data processing, and engagement of the broader development community through open source and crowdsourcing initiatives. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Catfish Genome Consortium; Wang, Shaolin; Peatman, Eric

    2010-03-23

    Background-Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results-A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35percent of the unique sequences had significant similarities tomore » known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions-This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.« less

  7. Evaluation of Hydrogel Technologies for the Decontamination ...

    EPA Pesticide Factsheets

    Report This current research effort was developed to evaluate intermediate level (between bench-scale and large-scale or wide-area implementation) decontamination procedures, materials, technologies, and techniques used to remove radioactive material from different surfaces. In the event of such an incident, application of this technology would primarily be intended for decontamination of high-value buildings, important infrastructure, and landmarks.

  8. Native fish conservation areas: a vision for large-scale conservation of native fish communities

    Treesearch

    Jack E. Williams; Richard N. Williams; Russell F. Thurow; Leah Elwell; David P. Philipp; Fred A. Harris; Jeffrey L. Kershner; Patrick J. Martinez; Dirk Miller; Gordon H. Reeves; Christopher A. Frissell; James R. Sedell

    2011-01-01

    The status of freshwater fishes continues to decline despite substantial conservation efforts to reverse this trend and recover threatened and endangered aquatic species. Lack of success is partially due to working at smaller spatial scales and focusing on habitats and species that are already degraded. Protecting entire watersheds and aquatic communities, which we...

  9. Reduced representation approaches to interrogate genome diversity in large repetitive plant genomes.

    PubMed

    Hirsch, Cory D; Evans, Joseph; Buell, C Robin; Hirsch, Candice N

    2014-07-01

    Technology and software improvements in the last decade now provide methodologies to access the genome sequence of not only a single accession, but also multiple accessions of plant species. This provides a means to interrogate species diversity at the genome level. Ample diversity among accessions in a collection of species can be found, including single-nucleotide polymorphisms, insertions and deletions, copy number variation and presence/absence variation. For species with small, non-repetitive rich genomes, re-sequencing of query accessions is robust, highly informative, and economically feasible. However, for species with moderate to large sized repetitive-rich genomes, technical and economic barriers prevent en masse genome re-sequencing of accessions. Multiple approaches to access a focused subset of loci in species with larger genomes have been developed, including reduced representation sequencing, exome capture and transcriptome sequencing. Collectively, these approaches have enabled interrogation of diversity on a genome scale for large plant genomes, including crop species important to worldwide food security. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  10. A Toolkit for bulk PCR-based marker design from next-generation sequence data: application for development of a framework linkage map in bulb onion (Allium cepa L.)

    PubMed Central

    2012-01-01

    Background Although modern sequencing technologies permit the ready detection of numerous DNA sequence variants in any organisms, converting such information to PCR-based genetic markers is hampered by a lack of simple, scalable tools. Onion is an example of an under-researched crop with a complex, heterozygous genome where genome-based research has previously been hindered by limited sequence resources and genetic markers. Results We report the development of generic tools for large-scale web-based PCR-based marker design in the Galaxy bioinformatics framework, and their application for development of next-generation genetics resources in a wide cross of bulb onion (Allium cepa L.). Transcriptome sequence resources were developed for the homozygous doubled-haploid bulb onion line ‘CUDH2150’ and the genetically distant Indian landrace ‘Nasik Red’, using 454™ sequencing of normalised cDNA libraries of leaf and shoot. Read mapping of ‘Nasik Red’ reads onto ‘CUDH2150’ assemblies revealed 16836 indel and SNP polymorphisms that were mined for portable PCR-based marker development. Tools for detection of restriction polymorphisms and primer set design were developed in BioPython and adapted for use in the Galaxy workflow environment, enabling large-scale and targeted assay design. Using PCR-based markers designed with these tools, a framework genetic linkage map of over 800cM spanning all chromosomes was developed in a subset of 93 F2 progeny from a very large F2 family developed from the ‘Nasik Red’ x ‘CUDH2150’ inter-cross. The utility of tools and genetic resources developed was tested by designing markers to transcription factor-like polymorphic sequences. Bin mapping these markers using a subset of 10 progeny confirmed the ability to place markers within 10 cM bins, enabling increased efficiency in marker assignment and targeted map refinement. The major genetic loci conditioning red bulb colour (R) and fructan content (Frc) were located on this map by QTL analysis. Conclusions The generic tools developed for the Galaxy environment enable rapid development of sets of PCR assays targeting sequence variants identified from Illumina and 454 sequence data. They enable non-specialist users to validate and exploit large volumes of next-generation sequence data using basic equipment. PMID:23157543

  11. A toolkit for bulk PCR-based marker design from next-generation sequence data: application for development of a framework linkage map in bulb onion (Allium cepa L.).

    PubMed

    Baldwin, Samantha; Revanna, Roopashree; Thomson, Susan; Pither-Joyce, Meeghan; Wright, Kathryn; Crowhurst, Ross; Fiers, Mark; Chen, Leshi; Macknight, Richard; McCallum, John A

    2012-11-19

    Although modern sequencing technologies permit the ready detection of numerous DNA sequence variants in any organisms, converting such information to PCR-based genetic markers is hampered by a lack of simple, scalable tools. Onion is an example of an under-researched crop with a complex, heterozygous genome where genome-based research has previously been hindered by limited sequence resources and genetic markers. We report the development of generic tools for large-scale web-based PCR-based marker design in the Galaxy bioinformatics framework, and their application for development of next-generation genetics resources in a wide cross of bulb onion (Allium cepa L.). Transcriptome sequence resources were developed for the homozygous doubled-haploid bulb onion line 'CUDH2150' and the genetically distant Indian landrace 'Nasik Red', using 454™ sequencing of normalised cDNA libraries of leaf and shoot. Read mapping of 'Nasik Red' reads onto 'CUDH2150' assemblies revealed 16836 indel and SNP polymorphisms that were mined for portable PCR-based marker development. Tools for detection of restriction polymorphisms and primer set design were developed in BioPython and adapted for use in the Galaxy workflow environment, enabling large-scale and targeted assay design. Using PCR-based markers designed with these tools, a framework genetic linkage map of over 800cM spanning all chromosomes was developed in a subset of 93 F(2) progeny from a very large F(2) family developed from the 'Nasik Red' x 'CUDH2150' inter-cross. The utility of tools and genetic resources developed was tested by designing markers to transcription factor-like polymorphic sequences. Bin mapping these markers using a subset of 10 progeny confirmed the ability to place markers within 10 cM bins, enabling increased efficiency in marker assignment and targeted map refinement. The major genetic loci conditioning red bulb colour (R) and fructan content (Frc) were located on this map by QTL analysis. The generic tools developed for the Galaxy environment enable rapid development of sets of PCR assays targeting sequence variants identified from Illumina and 454 sequence data. They enable non-specialist users to validate and exploit large volumes of next-generation sequence data using basic equipment.

  12. Establishing a Framework for Community Modeling in Hydrologic Science: Recommendations from the CUAHSI CHyMP Initiative

    NASA Astrophysics Data System (ADS)

    Arrigo, J. S.; Famiglietti, J. S.; Murdoch, L. C.; Lakshmi, V.; Hooper, R. P.

    2012-12-01

    The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) continues a major effort towards supporting Community Hydrologic Modeling. From 2009 - 2011, the Community Hydrologic Modeling Platform (CHyMP) initiative held three workshops, the ultimate goal of which was to produce recommendations and an implementation plan to establish a community modeling program that enables comprehensive simulation of water anywhere on the North American continent. Such an effort would include connections to and advances in global climate models, biogeochemistry, and efforts of other disciplines that require an understanding of water patterns and processes in the environment. To achieve such a vision will require substantial investment in human and cyber-infrastructure and significant advances in the science of hydrologic modeling and spatial scaling. CHyMP concluded with a final workshop, held March 2011, and produced several recommendations. CUAHSI and the university community continue to advance community modeling and implement these recommendations through several related and follow on efforts. Key results from the final 2011 workshop included agreement among participants that the community is ready to move forward with implementation. It is recognized that initial implementation of this larger effort can begin with simulation capabilities that currently exist, or that can be easily developed. CHyMP identified four key activities in support of community modeling: benchmarking, dataset evaluation and development, platform evaluation, and developing a national water model framework. Key findings included: 1) The community supported the idea of a National Water Model framework; a community effort is needed to explore what the ultimate implementation of a National Water Model is. A true community modeling effort would support the modeling of "water anywhere" and would include all relevant scales and processes. 2) Implementation of a community modeling program could initially focus on continental scale modeling of water quantity (rather than quality). The goal of this initial model is the comprehensive description of water stores and fluxes in such a way to permit linkage to GCM's, biogeochemical, ecological, and geomorphic models. This continental scale focus allows systematic evaluation of our current state of knowledge and data, leverages existing efforts done by large scale modelers, contributes to scientific discovery that informs globally and societal relevant questions, and provides an initial framework to evaluate hydrologic information relevant to other disciplines and a structure into which to incorporate other classes of hydrologic models. 3) Dataset development will be a key aspect of any successful national water model implementation. Our current knowledge of the subsurface is limiting our ability to truly integrate soil and groundwater into large scale models, and to answering critical science questions with societal relevance (i.e. groundwater's influence on climate). 4) The CHyMP workshops and efforts to date have achieved collaboration between university scientists, government agencies and the private sector that must be maintained. Follow on efforts in community modeling should aim at leveraging and maintaining this collaboration for maximum scientific and societal benefit.

  13. Using Sunlight and Cell Networks to Bring Fleeting Tracking to Small Scale Fisheries

    NASA Astrophysics Data System (ADS)

    Garren, M.; Selbie, H.; Suchomel, D.; McDonald, W.; Solomon, D.

    2016-12-01

    Traditionally, the efforts of small scale fisheries have not been easily incorporated into the global picture of fishing effort and activity. That means that the activities of the vast majority ( 90%) of fishing vessels in the world have remained unquantified and largely opaque. With newly developed technology that harnesses solar power and cost-effective cellular networks to transmit data, it is becoming possible to provide vessel tracking systems on a large scale for vessels of all sizes. Furthermore, capitalizing on the relatively inexpensive cellular networks to transfer the data enables data of much higher granularity to be captured. By recording a vessel's position every few seconds, instead of minutes to hours as is typical of most satellite-based systems, we are able to resolve a diverse array of behaviors happening at sea including when and where fishing occurred and what type of fishing gear was used. This high granularity data is both incredibly useful and also a challenge to manage and mine. New approaches for handling and processing this continuous data stream of vessel positions are being developed to extract the most informative and actionable pieces of information for a variety of audiences including governing agencies, industry supply chains seeking transparency, non-profit organizations supporting conservation efforts, academic researchers and the fishers and boat owners.

  14. Influence of large-scale climate modes on dynamical complexity patterns of Indian Summer Monsoon rainfall

    NASA Astrophysics Data System (ADS)

    Papadimitriou, Constantinos; Donner, Reik V.; Stolbova, Veronika; Balasis, Georgios; Kurths, Jürgen

    2015-04-01

    Indian Summer monsoon is one of the most anticipated and important weather events with vast environmental, economical and social effects. Predictability of the Indian Summer Monsoon strength is crucial question for life and prosperity of the Indian population. In this study, we are attempting to uncover the relationship between the spatial complexity of Indian Summer Monsoon rainfall patterns, and the monsoon strength, in an effort to qualitatively determine how spatial organization of the rainfall patterns differs between strong and weak instances of the Indian Summer Monsoon. Here, we use observational satellite data from 1998 to 2012 from the Tropical Rainfall Measuring Mission (TRMM 3B42V7) and reanalysis gridded daily rainfall data for a time period of 57 years (1951-2007) (Asian Precipitation Highly Resolved Observational Data Integration Towards the Evaluation of Water Resources, APHRODITE). In order to capture different aspects of the system's dynamics, first, we convert rainfall time series to binary symbolic sequences, exploring various thresholding criteria. Second, we apply the Shannon entropy formulation (in a block-entropy sense) using different measures of normalization of the resulting entropy values. Finally, we examine the effect of various large-scale climate modes such as El-Niño-Southern Oscillation, North Atlantic Oscillation, and Indian Ocean Dipole, on the emerging complexity patterns, and discuss the possibility for the utilization of such pattern maps in the forecasting of the spatial variability and strength of the Indian Summer Monsoon.

  15. Validation of the RAGE Hydrocode for Impacts into Volatile-Rich Targets

    NASA Astrophysics Data System (ADS)

    Plesko, C. S.; Asphaug, E.; Coker, R. F.; Wohletz, K. H.; Korycansky, D. G.; Gisler, G. R.

    2007-12-01

    In preparation for a detailed study of large-scale impacts into the Martian surface, we have validated the RAGE hydrocode (Gittings et al., in press, CSD) against a suite of experiments and statistical models. We present comparisons of hydrocode models to centimeter-scale gas gun impacts (Nakazawa et al. 2002), an underground nuclear test (Perret, 1971), and crater scaling laws (Holsapple 1993, O'Keefe and Ahrens 1993). We have also conducted model convergence and uncertainty analyses which will be presented. Results to date are encouraging for our current model goals, and indicate areas where the hydrocode may be extended in the future. This validation work is focused on questions related to the specific problem of large impacts into volatile-rich targets. The overall goal of this effort is to be able to realistically model large-scale Noachian, and possibly post- Noachian, impacts on Mars not so much to model the crater morphology as to understand the evolution of target volatiles in the post-impact regime, to explore how large craters might set the stage for post-impact hydro- geologic evolution both locally (in the crater subsurface) and globally, due to the redistribution of volatiles from the surface and subsurface into the atmosphere. This work is performed under the auspices of IGPP and the DOE at LANL under contracts W-7405-ENG-36 and DE-AC52-06NA25396. Effort by DK and EA is sponsored by NASA's Mars Fundamental Research Program.

  16. Myeloid malignancies: mutations, models and management

    PubMed Central

    2012-01-01

    Myeloid malignant diseases comprise chronic (including myelodysplastic syndromes, myeloproliferative neoplasms and chronic myelomonocytic leukemia) and acute (acute myeloid leukemia) stages. They are clonal diseases arising in hematopoietic stem or progenitor cells. Mutations responsible for these diseases occur in several genes whose encoded proteins belong principally to five classes: signaling pathways proteins (e.g. CBL, FLT3, JAK2, RAS), transcription factors (e.g. CEBPA, ETV6, RUNX1), epigenetic regulators (e.g. ASXL1, DNMT3A, EZH2, IDH1, IDH2, SUZ12, TET2, UTX), tumor suppressors (e.g. TP53), and components of the spliceosome (e.g. SF3B1, SRSF2). Large-scale sequencing efforts will soon lead to the establishment of a comprehensive repertoire of these mutations, allowing for a better definition and classification of myeloid malignancies, the identification of new prognostic markers and therapeutic targets, and the development of novel therapies. Given the importance of epigenetic deregulation in myeloid diseases, the use of drugs targeting epigenetic regulators appears as a most promising therapeutic approach. PMID:22823977

  17. A molecular marker of artemisinin-resistant Plasmodium falciparum malaria

    NASA Astrophysics Data System (ADS)

    Ariey, Frédéric; Witkowski, Benoit; Amaratunga, Chanaki; Beghain, Johann; Langlois, Anne-Claire; Khim, Nimol; Kim, Saorin; Duru, Valentine; Bouchier, Christiane; Ma, Laurence; Lim, Pharath; Leang, Rithea; Duong, Socheat; Sreng, Sokunthea; Suon, Seila; Chuor, Char Meng; Bout, Denis Mey; Ménard, Sandie; Rogers, William O.; Genton, Blaise; Fandeur, Thierry; Miotto, Olivo; Ringwald, Pascal; Le Bras, Jacques; Berry, Antoine; Barale, Jean-Christophe; Fairhurst, Rick M.; Benoit-Vical, Françoise; Mercereau-Puijalon, Odile; Ménard, Didier

    2014-01-01

    Plasmodium falciparum resistance to artemisinin derivatives in southeast Asia threatens malaria control and elimination activities worldwide. To monitor the spread of artemisinin resistance, a molecular marker is urgently needed. Here, using whole-genome sequencing of an artemisinin-resistant parasite line from Africa and clinical parasite isolates from Cambodia, we associate mutations in the PF3D7_1343700 kelch propeller domain (`K13-propeller') with artemisinin resistance in vitro and in vivo. Mutant K13-propeller alleles cluster in Cambodian provinces where resistance is prevalent, and the increasing frequency of a dominant mutant K13-propeller allele correlates with the recent spread of resistance in western Cambodia. Strong correlations between the presence of a mutant allele, in vitro parasite survival rates and in vivo parasite clearance rates indicate that K13-propeller mutations are important determinants of artemisinin resistance. K13-propeller polymorphism constitutes a useful molecular marker for large-scale surveillance efforts to contain artemisinin resistance in the Greater Mekong Subregion and prevent its global spread.

  18. Mycotoxins: A Fungal Genomics Perspective

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, Daren W.; Baker, Scott E.

    The chemical and enzymatic diversity in the fungal kingdom is staggering. Large-scale fungal genome sequencing projects are generating a massive catalog of secondary metabolite biosynthetic genes and pathways. Fungal natural products are a boon and bane to man as valuable pharmaceuticals and harmful toxins. Understanding how these chemicals are synthesized will aid the development of new strategies to limit mycotoxin contamination of food and feeds as well as expand drug discovery programs. A survey of work focused on the fumonisin family of mycotoxins highlights technological advances and provides a blueprint for future studies of other fungal natural products. Expressed sequencemore » tags led to the discovery of new fumonisin genes (FUM) and hinted at a role for alternatively spliced transcripts in regulation. Phylogenetic studies of FUM genes uncovered a complex evolutionary history of the FUM cluster, as well as fungi with the potential to synthesize fumonisin or fumonisin-like chemicals. The application of new technologies (e.g., CRISPR) could substantially impact future efforts to harness fungal resources.« less

  19. Pretest aerosol code comparisons for LWR aerosol containment tests LA1 and LA2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wright, A.L.; Wilson, J.H.; Arwood, P.C.

    The Light-Water-Reactor (LWR) Aerosol Containment Experiments (LACE) are being performed in Richland, Washington, at the Hanford Engineering Development Laboratory (HEDL) under the leadership of an international project board and the Electric Power Research Institute. These tests have two objectives: (1) to investigate, at large scale, the inherent aerosol retention behavior in LWR containments under simulated severe accident conditions, and (2) to provide an experimental data base for validating aerosol behavior and thermal-hydraulic computer codes. Aerosol computer-code comparison activities are being coordinated at the Oak Ridge National Laboratory. For each of the six LACE tests, ''pretest'' calculations (for code-to-code comparisons) andmore » ''posttest'' calculations (for code-to-test data comparisons) are being performed. The overall goals of the comparison effort are (1) to provide code users with experience in applying their codes to LWR accident-sequence conditions and (2) to evaluate and improve the code models.« less

  20. A molecular marker of artemisinin-resistant Plasmodium falciparum malaria

    PubMed Central

    Ariey, Frédéric; Witkowski, Benoit; Amaratunga, Chanaki; Beghain, Johann; Langlois, Anne-Claire; Khim, Nimol; Kim, Saorin; Duru, Valentine; Bouchier, Christiane; Ma, Laurence; Lim, Pharath; Leang, Rithea; Duong, Socheat; Sreng, Sokunthea; Suon, Seila; Chuor, Char Meng; Bout, Denis Mey; Ménard, Sandie; Rogers, William O.; Genton, Blaise; Fandeur, Thierry; Miotto, Olivo; Ringwald, Pascal; Le Bras, Jacques; Berry, Antoine; Barale, Jean-Christophe; Fairhurst, Rick M.; Benoit-Vical, Françoise; Mercereau-Puijalon, Odile; Ménard, Didier

    2016-01-01

    Plasmodium falciparum resistance to artemisinin derivatives in southeast Asia threatens malaria control and elimination activities worldwide. To monitor the spread of artemisinin resistance, a molecular marker is urgently needed. Here, using whole-genome sequencing of an artemisinin-resistant parasite line from Africa and clinical parasite isolates from Cambodia, we associate mutations in the PF3D7_1343700 kelch propeller domain (‘K13-propeller’) with artemisinin resistance in vitro and in vivo. Mutant K13-propeller alleles cluster in Cambodian provinces where resistance is prevalent, and the increasing frequency of a dominant mutant K13-propeller allele correlates with the recent spread of resistance in western Cambodia. Strong correlations between the presence of a mutant allele, in vitro parasite survival rates and in vivo parasite clearance rates indicate that K13-propeller mutations are important determinants of artemisinin resistance. K13-propeller polymorphism constitutes a useful molecular marker for large-scale surveillance efforts to contain artemisinin resistance in the Greater Mekong Subregion and prevent its global spread. PMID:24352242

  1. Modeling microbial community structure and functional diversity across time and space.

    PubMed

    Larsen, Peter E; Gibbons, Sean M; Gilbert, Jack A

    2012-07-01

    Microbial communities exhibit exquisitely complex structure. Many aspects of this complexity, from the number of species to the total number of interactions, are currently very difficult to examine directly. However, extraordinary efforts are being made to make these systems accessible to scientific investigation. While recent advances in high-throughput sequencing technologies have improved accessibility to the taxonomic and functional diversity of complex communities, monitoring the dynamics of these systems over time and space - using appropriate experimental design - is still expensive. Fortunately, modeling can be used as a lens to focus low-resolution observations of community dynamics to enable mathematical abstractions of functional and taxonomic dynamics across space and time. Here, we review the approaches for modeling bacterial diversity at both the very large and the very small scales at which microbial systems interact with their environments. We show that modeling can help to connect biogeochemical processes to specific microbial metabolic pathways. © 2012 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  2. Genomic paradigms for food-borne enteric pathogen analysis at the USFDA: case studies highlighting method utility, integration and resolution.

    PubMed

    Elkins, C A; Kotewicz, M L; Jackson, S A; Lacher, D W; Abu-Ali, G S; Patel, I R

    2013-01-01

    Modern risk control and food safety practices involving food-borne bacterial pathogens are benefiting from new genomic technologies for rapid, yet highly specific, strain characterisations. Within the United States Food and Drug Administration (USFDA) Center for Food Safety and Applied Nutrition (CFSAN), optical genome mapping and DNA microarray genotyping have been used for several years to quickly assess genomic architecture and gene content, respectively, for outbreak strain subtyping and to enhance retrospective trace-back analyses. The application and relative utility of each method varies with outbreak scenario and the suspect pathogen, with comparative analytical power enhanced by database scale and depth. Integration of these two technologies allows high-resolution scrutiny of the genomic landscapes of enteric food-borne pathogens with notable examples including Shiga toxin-producing Escherichia coli (STEC) and Salmonella enterica serovars from a variety of food commodities. Moreover, the recent application of whole genome sequencing technologies to food-borne pathogen outbreaks and surveillance has enhanced resolution to the single nucleotide scale. This new wealth of sequence data will support more refined next-generation custom microarray designs, targeted re-sequencing and "genomic signature recognition" approaches involving a combination of genes and single nucleotide polymorphism detection to distil strain-specific fingerprinting to a minimised scale. This paper examines the utility of microarrays and optical mapping in analysing outbreaks, reviews best practices and the limits of these technologies for pathogen differentiation, and it considers future integration with whole genome sequencing efforts.

  3. Large-Scale Constraint-Based Pattern Mining

    ERIC Educational Resources Information Center

    Zhu, Feida

    2009-01-01

    We studied the problem of constraint-based pattern mining for three different data formats, item-set, sequence and graph, and focused on mining patterns of large sizes. Colossal patterns in each data formats are studied to discover pruning properties that are useful for direct mining of these patterns. For item-set data, we observed robustness of…

  4. MGIS: Managing banana (Musa spp.) genetic resources information and high-throughput genotyping data

    USDA-ARS?s Scientific Manuscript database

    Unraveling genetic diversity held in genebanks on a large scale is underway, due to the advances in Next-generation sequence-based technologies that produce high-density genetic markers for a large number of samples at low cost. Genebank users should be in a position to identify and select germplasm...

  5. Small scale sequence automation pays big dividends

    NASA Technical Reports Server (NTRS)

    Nelson, Bill

    1994-01-01

    Galileo sequence design and integration are supported by a suite of formal software tools. Sequence review, however, is largely a manual process with reviewers scanning hundreds of pages of cryptic computer printouts to verify sequence correctness. Beginning in 1990, a series of small, PC based sequence review tools evolved. Each tool performs a specific task but all have a common 'look and feel'. The narrow focus of each tool means simpler operation, and easier creation, testing, and maintenance. Benefits from these tools are (1) decreased review time by factors of 5 to 20 or more with a concomitant reduction in staffing, (2) increased review accuracy, and (3) excellent returns on time invested.

  6. A streamlined collecting and preparation protocol for DNA barcoding of Lepidoptera as part of large-scale rapid biodiversity assessment projects, exemplified by the Indonesian Biodiversity Discovery and Information System (IndoBioSys).

    PubMed

    Schmidt, Olga; Hausmann, Axel; Cancian de Araujo, Bruno; Sutrisno, Hari; Peggie, Djunijanti; Schmidt, Stefan

    2017-01-01

    Here we present a general collecting and preparation protocol for DNA barcoding of Lepidoptera as part of large-scale rapid biodiversity assessment projects, and a comparison with alternative preserving and vouchering methods. About 98% of the sequenced specimens processed using the present collecting and preparation protocol yielded sequences with more than 500 base pairs. The study is based on the first outcomes of the Indonesian Biodiversity Discovery and Information System (IndoBioSys). IndoBioSys is a German-Indonesian research project that is conducted by the Museum für Naturkunde in Berlin and the Zoologische Staatssammlung München, in close cooperation with the Research Center for Biology - Indonesian Institute of Sciences (RCB-LIPI, Bogor).

  7. StructRNAfinder: an automated pipeline and web server for RNA families prediction.

    PubMed

    Arias-Carrasco, Raúl; Vásquez-Morán, Yessenia; Nakaya, Helder I; Maracaja-Coutinho, Vinicius

    2018-02-17

    The function of many noncoding RNAs (ncRNAs) depend upon their secondary structures. Over the last decades, several methodologies have been developed to predict such structures or to use them to functionally annotate RNAs into RNA families. However, to fully perform this analysis, researchers should utilize multiple tools, which require the constant parsing and processing of several intermediate files. This makes the large-scale prediction and annotation of RNAs a daunting task even to researchers with good computational or bioinformatics skills. We present an automated pipeline named StructRNAfinder that predicts and annotates RNA families in transcript or genome sequences. This single tool not only displays the sequence/structural consensus alignments for each RNA family, according to Rfam database but also provides a taxonomic overview for each assigned functional RNA. Moreover, we implemented a user-friendly web service that allows researchers to upload their own nucleotide sequences in order to perform the whole analysis. Finally, we provided a stand-alone version of StructRNAfinder to be used in large-scale projects. The tool was developed under GNU General Public License (GPLv3) and is freely available at http://structrnafinder.integrativebioinformatics.me . The main advantage of StructRNAfinder relies on the large-scale processing and integrating the data obtained by each tool and database employed along the workflow, of which several files are generated and displayed in user-friendly reports, useful for downstream analyses and data exploration.

  8. High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency

    PubMed Central

    Calvo, Sarah E; Tucker, Elena J; Compton, Alison G; Kirby, Denise M; Crawford, Gabriel; Burtt, Noel P; Rivas, Manuel A; Guiducci, Candace; Bruno, Damien L; Goldberger, Olga A; Redman, Michelle C; Wiltshire, Esko; Wilson, Callum J; Altshuler, David; Gabriel, Stacey B; Daly, Mark J; Thorburn, David R; Mootha, Vamsi K

    2010-01-01

    Discovering the molecular basis of mitochondrial respiratory chain disease is challenging given the large number of both mitochondrial and nuclear genes involved. We report a strategy of focused candidate gene prediction, high-throughput sequencing, and experimental validation to uncover the molecular basis of mitochondrial complex I (CI) disorders. We created five pools of DNA from a cohort of 103 patients and then performed deep sequencing of 103 candidate genes to spotlight 151 rare variants predicted to impact protein function. We used confirmatory experiments to establish genetic diagnoses in 22% of previously unsolved cases, and discovered that defects in NUBPL and FOXRED1 can cause CI deficiency. Our study illustrates how large-scale sequencing, coupled with functional prediction and experimental validation, can reveal novel disease-causing mutations in individual patients. PMID:20818383

  9. SPATIALLY RESOLVED SPECTROSCOPY OF EUROPA’S LARGE-SCALE COMPOSITIONAL UNITS AT 3–4 μ m WITH KECK NIRSPEC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fischer, P. D.; Brown, M. E.; Trumbo, S. K.

    2017-01-01

    We present spatially resolved spectroscopic observations of Europa’s surface at 3–4 μ m obtained with the near-infrared spectrograph and adaptive optics system on the Keck II telescope. These are the highest quality spatially resolved reflectance spectra of Europa’s surface at 3–4 μ m. The observations spatially resolve Europa’s large-scale compositional units at a resolution of several hundred kilometers. The spectra show distinct features and geographic variations associated with known compositional units; in particular, large-scale leading hemisphere chaos shows a characteristic longward shift in peak reflectance near 3.7 μ m compared to icy regions. These observations complement previous spectra of large-scalemore » chaos, and can aid efforts to identify the endogenous non-ice species.« less

  10. preAssemble: a tool for automatic sequencer trace data processing.

    PubMed

    Adzhubei, Alexei A; Laerdahl, Jon K; Vlasova, Anna V

    2006-01-17

    Trace or chromatogram files (raw data) are produced by automatic nucleic acid sequencing equipment or sequencers. Each file contains information which can be interpreted by specialised software to reveal the sequence (base calling). This is done by the sequencer proprietary software or publicly available programs. Depending on the size of a sequencing project the number of trace files can vary from just a few to thousands of files. Sequencing quality assessment on various criteria is important at the stage preceding clustering and contig assembly. Two major publicly available packages--Phred and Staden are used by preAssemble to perform sequence quality processing. The preAssemble pre-assembly sequence processing pipeline has been developed for small to large scale automatic processing of DNA sequencer chromatogram (trace) data. The Staden Package Pregap4 module and base-calling program Phred are utilized in the pipeline, which produces detailed and self-explanatory output that can be displayed with a web browser. preAssemble can be used successfully with very little previous experience, however options for parameter tuning are provided for advanced users. preAssemble runs under UNIX and LINUX operating systems. It is available for downloading and will run as stand-alone software. It can also be accessed on the Norwegian Salmon Genome Project web site where preAssemble jobs can be run on the project server. preAssemble is a tool allowing to perform quality assessment of sequences generated by automatic sequencing equipment. preAssemble is flexible since both interactive jobs on the preAssemble server and the stand alone downloadable version are available. Virtually no previous experience is necessary to run a default preAssemble job, on the other hand options for parameter tuning are provided. Consequently preAssemble can be used as efficiently for just several trace files as for large scale sequence processing.

  11. Systems Biology-Based Investigation of Host-Plasmodium Interactions.

    PubMed

    Smith, Maren L; Styczynski, Mark P

    2018-05-18

    Malaria is a serious, complex disease caused by parasites of the genus Plasmodium. Plasmodium parasites affect multiple tissues as they evade immune responses, replicate, sexually reproduce, and transmit between vertebrate and invertebrate hosts. The explosion of omics technologies has enabled large-scale collection of Plasmodium infection data, revealing systems-scale patterns, mechanisms of pathogenesis, and the ways that host and pathogen affect each other. Here, we provide an overview of recent efforts using systems biology approaches to study host-Plasmodium interactions and the biological themes that have emerged from these efforts. We discuss some of the challenges in using systems biology for this goal, key research efforts needed to address those issues, and promising future malaria applications of systems biology. Copyright © 2018 Elsevier Ltd. All rights reserved.

  12. Toward an Integrated BAC Library Resource for Genome Sequencing and Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Simon, M. I.; Kim, U.-J.

    We developed a great deal of expertise in building large BAC libraries from a variety of DNA sources including humans, mice, corn, microorganisms, worms, and Arabidopsis. We greatly improved the technology for screening these libraries rapidly and for selecting appropriate BACs and mapping BACs to develop large overlapping contigs. We became involved in supplying BACs and BAC contigs to a variety of sequencing and mapping projects and we began to collaborate with Drs. Adams and Venter at TIGR and with Dr. Leroy Hood and his group at University of Washington to provide BACs for end sequencing and for mapping andmore » sequencing of large fragments of chromosome 16. Together with Dr. Ian Dunham and his co-workers at the Sanger Center we completed the mapping and they completed the sequencing of the first human chromosome, chromosome 22. This was published in Nature in 1999 and our BAC contigs made a major contribution to this sequencing effort. Drs. Shizuya and Ding invented an automated highly accurate BAC mapping technique. We also developed long-term collaborations with Dr. Uli Weier at UCSF in the design of BAC probes for characterization of human tumors and specific chromosome deletions and breakpoints. Finally the contribution of our work to the human genome project has been recognized in the publication both by the international consortium and the NIH of a draft sequence of the human genome in Nature last year. Dr. Shizuya was acknowledged in the authorship of that landmark paper. Dr. Simon was also an author on the Venter/Adams Celera project sequencing the human genome that was published in Science last year.« less

  13. Discrepancy between species borders at morphological and molecular levels in the genus Cochliopodium (Amoebozoa, Himatismenida), with the description of Cochliopodium plurinucleolum n. sp.

    PubMed

    Geisen, Stefan; Kudryavtsev, Alexander; Bonkowski, Michael; Smirnov, Alexey

    2014-05-01

    Amoebae of the genus Cochliopodium are characterized by a tectum that is a layer of scales covering the dorsal surface of the cell. A combination of scale structure, morphological features and, nowadays, molecular information allows species discrimination. Here we describe a soil species Cochliopodium plurinucleolum n. sp. that besides strong genetic divergence from all currently described species of Cochliopodium differs morphologically by the presence of several peripheral nucleoli in the nucleus. Further, we unambiguously show that the Golgi attachment associated with a dictyosome in Cochliopodium is a cytoplasmic microtubule organizing center (MTOC). Last, we provide detailed morphological and molecular information on the sister clade of C. plurinucleolum, containing C. minus, C. minutoidum, C. pentatrifurcatum and C. megatetrastylus. These species share nearly identical sequences of both, small subunit ribosomal RNA and partial Cox1 genes, and nearly identical structure of the scales. Scales of C. pentatrifurcatum differ, however, strongly from scales of the others while sequences of C. pentatrifurcatum and C. minus are nearly identical. These discrepancies urge for future sampling efforts to disentangle species characteristics within Cochliopdium and to investigate morphological and molecular patterns that allow reliable species differentiation. Copyright © 2014 Elsevier GmbH. All rights reserved.

  14. An improved model for whole genome phylogenetic analysis by Fourier transform.

    PubMed

    Yin, Changchuan; Yau, Stephen S-T

    2015-10-07

    DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Transcriptome characterization and SSR discovery in large-scale loach Paramisgurnus dabryanus (Cobitidae, Cypriniformes).

    PubMed

    Li, Caijuan; Ling, Qufei; Ge, Chen; Ye, Zhuqing; Han, Xiaofei

    2015-02-25

    The large-scale loach (Paramisgurnus dabryanus, Cypriniformes) is a bottom-dwelling freshwater species of fish found mainly in eastern Asia. The natural germplasm resources of this important aquaculture species has been recently threatened due to overfishing and artificial propagation. The objective of this study is to obtain the first functional genomic resource and candidate molecular markers for future conservation and breeding research. Illumina paired-end sequencing generated over one hundred million reads that resulted in 71,887 assembled transcripts, with an average length of 1465bp. 42,093 (58.56%) protein-coding sequences were predicted; and 43,837 transcripts had significant matches to NCBI nonredundant protein (Nr) database. 29,389 and 14,419 transcripts were assigned into gene ontology (GO) categories and Eukaryotic Orthologous Groups (KOG), respectively. 22,102 (31.14%) transcripts were mapped to 302 KEGG pathways. In addition, 15,106 candidate SSR markers were identified, with 11,037 pairs of PCR primers designed. 400 primers pairs of SSR selected randomly were validated, of which 364 (91%) pairs of primers were able to produce PCR products. Further test with 41 loci and 20 large-scale loach specimens collected from the four largest lakes in China showed that 36 (87.8%) loci were polymorphic. The transcriptomic profile and SSR repertoire obtained in this study will facilitate population genetic studies and selective breeding of large-scale loach in the future. Copyright © 2015. Published by Elsevier B.V.

  16. Integrating High-Resolution Datasets to Target Mitigation Efforts for Improving Air Quality and Public Health in Urban Neighborhoods

    PubMed Central

    Shandas, Vivek; Voelkel, Jackson; Rao, Meenakshi; George, Linda

    2016-01-01

    Reducing exposure to degraded air quality is essential for building healthy cities. Although air quality and population vary at fine spatial scales, current regulatory and public health frameworks assess human exposures using county- or city-scales. We build on a spatial analysis technique, dasymetric mapping, for allocating urban populations that, together with emerging fine-scale measurements of air pollution, addresses three objectives: (1) evaluate the role of spatial scale in estimating exposure; (2) identify urban communities that are disproportionately burdened by poor air quality; and (3) estimate reduction in mobile sources of pollutants due to local tree-planting efforts using nitrogen dioxide. Our results show a maximum value of 197% difference between cadastrally-informed dasymetric system (CIDS) and standard estimations of population exposure to degraded air quality for small spatial extent analyses, and a lack of substantial difference for large spatial extent analyses. These results provide the foundation for improving policies for managing air quality, and targeting mitigation efforts to address challenges of environmental justice. PMID:27527205

  17. BLOOM: BLoom filter based oblivious outsourced matchings.

    PubMed

    Ziegeldorf, Jan Henrik; Pennekamp, Jan; Hellmanns, David; Schwinger, Felix; Kunze, Ike; Henze, Martin; Hiller, Jens; Matzutt, Roman; Wehrle, Klaus

    2017-07-26

    Whole genome sequencing has become fast, accurate, and cheap, paving the way towards the large-scale collection and processing of human genome data. Unfortunately, this dawning genome era does not only promise tremendous advances in biomedical research but also causes unprecedented privacy risks for the many. Handling storage and processing of large genome datasets through cloud services greatly aggravates these concerns. Current research efforts thus investigate the use of strong cryptographic methods and protocols to implement privacy-preserving genomic computations. We propose FHE-BLOOM and PHE-BLOOM, two efficient approaches for genetic disease testing using homomorphically encrypted Bloom filters. Both approaches allow the data owner to securely outsource storage and computation to an untrusted cloud. FHE-BLOOM is fully secure in the semi-honest model while PHE-BLOOM slightly relaxes security guarantees in a trade-off for highly improved performance. We implement and evaluate both approaches on a large dataset of up to 50 patient genomes each with up to 1000000 variations (single nucleotide polymorphisms). For both implementations, overheads scale linearly in the number of patients and variations, while PHE-BLOOM is faster by at least three orders of magnitude. For example, testing disease susceptibility of 50 patients with 100000 variations requires only a total of 308.31 s (σ=8.73 s) with our first approach and a mere 0.07 s (σ=0.00 s) with the second. We additionally discuss security guarantees of both approaches and their limitations as well as possible extensions towards more complex query types, e.g., fuzzy or range queries. Both approaches handle practical problem sizes efficiently and are easily parallelized to scale with the elastic resources available in the cloud. The fully homomorphic scheme, FHE-BLOOM, realizes a comprehensive outsourcing to the cloud, while the partially homomorphic scheme, PHE-BLOOM, trades a slight relaxation of security guarantees against performance improvements by at least three orders of magnitude.

  18. PoPLAR: Portal for Petascale Lifescience Applications and Research

    PubMed Central

    2013-01-01

    Background We are focusing specifically on fast data analysis and retrieval in bioinformatics that will have a direct impact on the quality of human health and the environment. The exponential growth of data generated in biology research, from small atoms to big ecosystems, necessitates an increasingly large computational component to perform analyses. Novel DNA sequencing technologies and complementary high-throughput approaches--such as proteomics, genomics, metabolomics, and meta-genomics--drive data-intensive bioinformatics. While individual research centers or universities could once provide for these applications, this is no longer the case. Today, only specialized national centers can deliver the level of computing resources required to meet the challenges posed by rapid data growth and the resulting computational demand. Consequently, we are developing massively parallel applications to analyze the growing flood of biological data and contribute to the rapid discovery of novel knowledge. Methods The efforts of previous National Science Foundation (NSF) projects provided for the generation of parallel modules for widely used bioinformatics applications on the Kraken supercomputer. We have profiled and optimized the code of some of the scientific community's most widely used desktop and small-cluster-based applications, including BLAST from the National Center for Biotechnology Information (NCBI), HMMER, and MUSCLE; scaled them to tens of thousands of cores on high-performance computing (HPC) architectures; made them robust and portable to next-generation architectures; and incorporated these parallel applications in science gateways with a web-based portal. Results This paper will discuss the various developmental stages, challenges, and solutions involved in taking bioinformatics applications from the desktop to petascale with a front-end portal for very-large-scale data analysis in the life sciences. Conclusions This research will help to bridge the gap between the rate of data generation and the speed at which scientists can study this data. The ability to rapidly analyze data at such a large scale is having a significant, direct impact on science achieved by collaborators who are currently using these tools on supercomputers. PMID:23902523

  19. DNA Barcoding Reveals Hidden Diversity of Sand Flies (Diptera: Psychodidae) at Fine and Broad Spatial Scales in Brazilian Endemic Regions for Leishmaniasis.

    PubMed

    Rodrigues, Bruno Leite; Carvalho-Costa, Luís Fernando; Pinto, Israel de Souza; Rebêlo, José Manuel Macário

    2018-03-17

    Sand fly (Diptera: Psychodidae) taxonomy is complex and time-consuming, which hampers epidemiological efforts directed toward controlling leishmaniasis in endemic regions such as northeastern Brazil. Here, we used a fragment of the mitochondrial cytochrome c oxidase I (COI) gene to identify sand fly species in Maranhão State (northeastern Brazil) and to assess cryptic diversity occurring at different spatial scales. For this, we obtained 148 COI sequences of 15 sand fly species (10 genera) from Maranhão (fine spatial scale), and joined them to COI sequences from other Brazilian localities (distant about 2,000 km from Maranhão, broad spatial scale) available in GenBank. We revealed cases of cryptic diversity in sand flies both at fine (Lutzomyia longipalpis (Lutz and Neiva) and Evandromyia termitophila (Martins, Falcão and Silva)) and broad spatial scales (Migonemyia migonei (França), Pressatia choti (Floch and Abonnenc), Psychodopygus davisi (Root), Sciopemyia sordellii (Shannon and Del Ponte), and Bichromomyia flaviscutellata (Mangabeira)). We argue that in the case of Bi. flaviscutellata, the cryptic diversity is associated with a putative new species. Cases in which DNA taxonomy was not as effective as morphological identification possibly involved recent speciation and/or introgressive hybridization, highlighting the need for integrative approaches to identify some sand fly species. Finally, we provide the first barcode sequences for four species (Brumptomyia avellari (Costa Lima), Evandromyia infraspinosa (Mangabeira), Evandromyia evandroi (Costa Lima and Antunes), and Psychodopygus complexus (Mangabeira)), which will be useful for further molecular identification of neotropical species.

  20. Periodic, chaotic, and doubled earthquake recurrence intervals on the deep San Andreas Fault

    USGS Publications Warehouse

    Shelly, David R.

    2010-01-01

    Earthquake recurrence histories may provide clues to the timing of future events, but long intervals between large events obscure full recurrence variability. In contrast, small earthquakes occur frequently, and recurrence intervals are quantifiable on a much shorter time scale. In this work, I examine an 8.5-year sequence of more than 900 recurring low-frequency earthquake bursts composing tremor beneath the San Andreas fault near Parkfield, California. These events exhibit tightly clustered recurrence intervals that, at times, oscillate between ~3 and ~6 days, but the patterns sometimes change abruptly. Although the environments of large and low-frequency earthquakes are different, these observations suggest that similar complexity might underlie sequences of large earthquakes.

  1. Automated Scheduling of Science Activities for Titan Encounters by Cassini

    NASA Technical Reports Server (NTRS)

    Ray, Trina L.; Knight, Russel L.; Mohr, Dave

    2014-01-01

    In an effort to demonstrate the efficacy of automated planning and scheduling techniques for large missions, we have adapted ASPEN (Activity Scheduling and Planning Environment) [1] and CLASP (Compressed Large-scale Activity Scheduling and Planning) [2] to the domain of scheduling high-level science goals into conflict-free operations plans for Titan encounters by the Cassini spacecraft.

  2. Designing a Large-Scale Multilevel Improvement Initiative: The Improving Performance in Practice Program

    ERIC Educational Resources Information Center

    Margolis, Peter A.; DeWalt, Darren A.; Simon, Janet E.; Horowitz, Sheldon; Scoville, Richard; Kahn, Norman; Perelman, Robert; Bagley, Bruce; Miles, Paul

    2010-01-01

    Improving Performance in Practice (IPIP) is a large system intervention designed to align efforts and motivate the creation of a tiered system of improvement at the national, state, practice, and patient levels, assisting primary-care physicians and their practice teams to assess and measurably improve the quality of care for chronic illness and…

  3. New convergence results for the scaled gradient projection method

    NASA Astrophysics Data System (ADS)

    Bonettini, S.; Prato, M.

    2015-09-01

    The aim of this paper is to deepen the convergence analysis of the scaled gradient projection (SGP) method, proposed by Bonettini et al in a recent paper for constrained smooth optimization. The main feature of SGP is the presence of a variable scaling matrix multiplying the gradient, which may change at each iteration. In the last few years, extensive numerical experimentation showed that SGP equipped with a suitable choice of the scaling matrix is a very effective tool for solving large scale variational problems arising in image and signal processing. In spite of the very reliable numerical results observed, only a weak convergence theorem is provided establishing that any limit point of the sequence generated by SGP is stationary. Here, under the only assumption that the objective function is convex and that a solution exists, we prove that the sequence generated by SGP converges to a minimum point, if the scaling matrices sequence satisfies a simple and implementable condition. Moreover, assuming that the gradient of the objective function is Lipschitz continuous, we are also able to prove the {O}(1/k) convergence rate with respect to the objective function values. Finally, we present the results of a numerical experience on some relevant image restoration problems, showing that the proposed scaling matrix selection rule performs well also from the computational point of view.

  4. Principles for Predicting RNA Secondary Structure Design Difficulty.

    PubMed

    Anderson-Lee, Jeff; Fisker, Eli; Kosaraju, Vineet; Wu, Michelle; Kong, Justin; Lee, Jeehyung; Lee, Minjae; Zada, Mathew; Treuille, Adrien; Das, Rhiju

    2016-02-27

    Designing RNAs that form specific secondary structures is enabling better understanding and control of living systems through RNA-guided silencing, genome editing and protein organization. Little is known, however, about which RNA secondary structures might be tractable for downstream sequence design, increasing the time and expense of design efforts due to inefficient secondary structure choices. Here, we present insights into specific structural features that increase the difficulty of finding sequences that fold into a target RNA secondary structure, summarizing the design efforts of tens of thousands of human participants and three automated algorithms (RNAInverse, INFO-RNA and RNA-SSD) in the Eterna massive open laboratory. Subsequent tests through three independent RNA design algorithms (NUPACK, DSS-Opt and MODENA) confirmed the hypothesized importance of several features in determining design difficulty, including sequence length, mean stem length, symmetry and specific difficult-to-design motifs such as zigzags. Based on these results, we have compiled an Eterna100 benchmark of 100 secondary structure design challenges that span a large range in design difficulty to help test future efforts. Our in silico results suggest new routes for improving computational RNA design methods and for extending these insights to assess "designability" of single RNA structures, as well as of switches for in vitro and in vivo applications. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  5. DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.

    PubMed

    Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard

    2004-09-09

    Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.

  6. Large-scale horizontal flows from SOUP observations of solar granulation

    NASA Technical Reports Server (NTRS)

    November, L. J.; Simon, G. W.; Tarbell, T. D.; Title, A. M.; Ferguson, S. H.

    1987-01-01

    Using high resolution time sequence photographs of solar granulation from the SOUP experiment on Spacelab 2, large scale horizontal flows were observed in the solar surface. The measurement method is based upon a local spatial cross correlation analysis. The horizontal motions have amplitudes in the range 300 to 1000 m/s. Radial outflow of granulation from a sunspot penumbra into surrounding photosphere is a striking new discovery. Both the supergranulation pattern and cellular structures having the scale of mesogranulation are seen. The vertical flows that are inferred by continuity of mass from these observed horizontal flows have larger upflow amplitudes in cell centers than downflow amplitudes at cell boundaries.

  7. First Pass Annotation of Promoters on Human Chromosome 22

    PubMed Central

    Scherf, Matthias; Klingenhoff, Andreas; Frech, Kornelie; Quandt, Kerstin; Schneider, Ralf; Grote, Korbinian; Frisch, Matthias; Gailus-Durner, Valérie; Seidel, Alexander; Brack-Werner, Ruth; Werner, Thomas

    2001-01-01

    The publication of the first almost complete sequence of a human chromosome (chromosome 22) is a major milestone in human genomics. Together with the sequence, an excellent annotation of genes was published which certainly will serve as an information resource for numerous future projects. We noted that the annotation did not cover regulatory regions; in particular, no promoter annotation has been provided. Here we present an analysis of the complete published chromosome 22 sequence for promoters. A recent breakthrough in specific in silico prediction of promoter regions enabled us to attempt large-scale prediction of promoter regions on chromosome 22. Scanning of sequence databases revealed only 20 experimentally verified promoters, of which 10 were correctly predicted by our approach. Nearly 40% of our 465 predicted promoter regions are supported by the currently available gene annotation. Promoter finding also provides a biologically meaningful method for “chromosomal scaffolding”, by which long genomic sequences can be divided into segments starting with a gene. As one example, the combination of promoter region prediction with exon/intron structure predictions greatly enhances the specificity of de novo gene finding. The present study demonstrates that it is possible to identify promoters in silico on the chromosomal level with sufficient reliability for experimental planning and indicates that a wealth of information about regulatory regions can be extracted from current large-scale (megabase) sequencing projects. Results are available on-line at http://genomatix.gsf.de/chr22/. PMID:11230158

  8. Quantifying understorey vegetation in the US Lake States: a proposed framework to inform regional forest carbon stocks

    Treesearch

    Matthew B. Russell; Anthony W. D' Amato; Bethany K. Schulz; Christopher W. Woodall; Grant M. Domke; John B. Bradford

    2014-01-01

    The contribution of understorey vegetation (UVEG) to forest ecosystem biomass and carbon (C) across diverse forest types has, to date, eluded quantification at regional and national scales. Efforts to quantify UVEG C have been limited to field-intensive studies or broad-scalemodelling approaches lacking fieldmeasurements. Although large-scale inventories of UVEG C are...

  9. Large-Scale Genomic Analysis of Codon Usage in Dengue Virus and Evaluation of Its Phylogenetic Dependence

    PubMed Central

    Lara-Ramírez, Edgar E.; Salazar, Ma Isabel; López-López, María de Jesús; Salas-Benito, Juan Santiago; Sánchez-Varela, Alejandro

    2014-01-01

    The increasing number of dengue virus (DENV) genome sequences available allows identifying the contributing factors to DENV evolution. In the present study, the codon usage in serotypes 1–4 (DENV1–4) has been explored for 3047 sequenced genomes using different statistics methods. The correlation analysis of total GC content (GC) with GC content at the three nucleotide positions of codons (GC1, GC2, and GC3) as well as the effective number of codons (ENC, ENCp) versus GC3 plots revealed mutational bias and purifying selection pressures as the major forces influencing the codon usage, but with distinct pressure on specific nucleotide position in the codon. The correspondence analysis (CA) and clustering analysis on relative synonymous codon usage (RSCU) within each serotype showed similar clustering patterns to the phylogenetic analysis of nucleotide sequences for DENV1–4. These clustering patterns are strongly related to the virus geographic origin. The phylogenetic dependence analysis also suggests that stabilizing selection acts on the codon usage bias. Our analysis of a large scale reveals new feature on DENV genomic evolution. PMID:25136631

  10. Progress and limitations on quantifying nutrient and carbon loading to coastal waters

    NASA Astrophysics Data System (ADS)

    Stets, E.; Oelsner, G. P.; Stackpoole, S. M.

    2017-12-01

    Riverine export of nutrients and carbon to estuarine and coastal waters are important determinants of coastal ecosystem health and provide necessary insight into global biogeochemical cycles. Quantification of coastal solute loads typically relies upon modeling based on observations of concentration and discharge from selected rivers draining to the coast. Most large-scale river export models require unidirectional flow and thus are referenced to monitoring locations at the head of tide, which can be located far inland. As a result, the contributions of the coastal plain, tidal wetlands, and concentrated coastal development are often poorly represented in regional and continental-scale estimates of solute delivery to coastal waters. However, site-specific studies have found that these areas are disproportionately active in terms of nutrient and carbon export. Modeling efforts to upscale fluxes from these areas, while not common, also suggest an outsized importance to coastal flux estimates. This presentation will focus on illustrating how the problem of under-representation of near-shore environments impacts large-scale coastal flux estimates in the context of recent regional and continental-scale assessments. Alternate approaches to capturing the influence of the near-coastal terrestrial inputs including recent data aggregation efforts and modeling approaches will be discussed.

  11. CPTAC | Office of Cancer Clinical Proteomics Research

    Cancer.gov

    The National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics.

  12. Leveraging Resources to Address Transportation Needs: Transportation Pooled Fund Program

    DOT National Transportation Integrated Search

    2004-05-28

    This brochure describes the Transportation Pooled Fund (TPF) Program. The objectives of the TPF Program are to leverage resources, avoid duplication of effort, undertake large-scale projects, obtain greater input on project definition, achieve broade...

  13. Decadal opportunities for space architects

    NASA Astrophysics Data System (ADS)

    Sherwood, Brent

    2012-12-01

    A significant challenge for the new field of space architecture is the dearth of project opportunities. Yet every year more young professionals express interest to enter the field. This paper derives projections that bound the number, type, and range of global development opportunities that may be reasonably expected over the next few decades for human space flight (HSF) systems so those interested in the field can benchmark their goals. Four categories of HSF activity are described: human Exploration of solar system bodies; human Servicing of space-based assets; large-scale development of space Resources; and Breakout of self-sustaining human societies into the solar system. A progressive sequence of capabilities for each category starts with its earliest feasible missions and leads toward its full expression. The four sequences are compared in scale, distance from Earth, and readiness. Scenarios hybridize the most synergistic features from the four sequences for comparison to status quo, government-funded HSF program plans. Finally qualitative, decadal, order-of-magnitude estimates are derived for system development needs, and hence opportunities for space architects. Government investment towards human planetary exploration is the weakest generator of space architecture work. Conversely, the strongest generator is a combination of three market drivers: (1) commercial passenger travel in low Earth orbit; (2) in parallel, government extension of HSF capability to GEO; both followed by (3) scale-up demonstration of end-to-end solar power satellites in GEO. The rich end of this scale affords space architecture opportunities which are more diverse, complex, large-scale, and sociologically challenging than traditional exploration vehicle cabins and habitats.

  14. BioPig: Developing Cloud Computing Applications for Next-Generation Sequence Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bhatia, Karan; Wang, Zhong

    Next Generation sequencing is producing ever larger data sizes with a growth rate outpacing Moore's Law. The data deluge has made many of the current sequenceanalysis tools obsolete because they do not scale with data. Here we present BioPig, a collection of cloud computing tools to scale data analysis and management. Pig is aflexible data scripting language that uses Apache's Hadoop data structure and map reduce framework to process very large data files in parallel and combine the results.BioPig extends Pig with capability with sequence analysis. We will show the performance of BioPig on a variety of bioinformatics tasks, includingmore » screeningsequence contaminants, Illumina QA/QC, and gene discovery from metagenome data sets using the Rumen metagenome as an example.« less

  15. A 2-step penalized regression method for family-based next-generation sequencing association studies.

    PubMed

    Ding, Xiuhua; Su, Shaoyong; Nandakumar, Kannabiran; Wang, Xiaoling; Fardo, David W

    2014-01-01

    Large-scale genetic studies are often composed of related participants, and utilizing familial relationships can be cumbersome and computationally challenging. We present an approach to efficiently handle sequencing data from complex pedigrees that incorporates information from rare variants as well as common variants. Our method employs a 2-step procedure that sequentially regresses out correlation from familial relatedness and then uses the resulting phenotypic residuals in a penalized regression framework to test for associations with variants within genetic units. The operating characteristics of this approach are detailed using simulation data based on a large, multigenerational cohort.

  16. Scaling and self-organized criticality in proteins I

    PubMed Central

    Phillips, J. C.

    2009-01-01

    The complexity of proteins is substantially simplified by regarding them as archetypical examples of self-organized criticality (SOC). To test this idea and elaborate on it, this article applies the Moret–Zebende SOC hydrophobicity scale to the large-scale scaffold repeat protein of the HEAT superfamily, PR65/A. Hydrophobic plasticity is defined and used to identify docking platforms and hinges from repeat sequences alone. The difference between the MZ scale and conventional hydrophobicity scales reflects long-range conformational forces that are central to protein functionality. PMID:19218446

  17. Open source database of images DEIMOS: extension for large-scale subjective image quality assessment

    NASA Astrophysics Data System (ADS)

    Vítek, Stanislav

    2014-09-01

    DEIMOS (Database of Images: Open Source) is an open-source database of images and video sequences for testing, verification and comparison of various image and/or video processing techniques such as compression, reconstruction and enhancement. This paper deals with extension of the database allowing performing large-scale web-based subjective image quality assessment. Extension implements both administrative and client interface. The proposed system is aimed mainly at mobile communication devices, taking into account advantages of HTML5 technology; it means that participants don't need to install any application and assessment could be performed using web browser. The assessment campaign administrator can select images from the large database and then apply rules defined by various test procedure recommendations. The standard test procedures may be fully customized and saved as a template. Alternatively the administrator can define a custom test, using images from the pool and other components, such as evaluating forms and ongoing questionnaires. Image sequence is delivered to the online client, e.g. smartphone or tablet, as a fully automated assessment sequence or viewer can decide on timing of the assessment if required. Environmental data and viewing conditions (e.g. illumination, vibrations, GPS coordinates, etc.), may be collected and subsequently analyzed.

  18. Enabling large-scale viscoelastic calculations via neural network acceleration

    NASA Astrophysics Data System (ADS)

    Robinson DeVries, P.; Thompson, T. B.; Meade, B. J.

    2017-12-01

    One of the most significant challenges involved in efforts to understand the effects of repeated earthquake cycle activity are the computational costs of large-scale viscoelastic earthquake cycle models. Deep artificial neural networks (ANNs) can be used to discover new, compact, and accurate computational representations of viscoelastic physics. Once found, these efficient ANN representations may replace computationally intensive viscoelastic codes and accelerate large-scale viscoelastic calculations by more than 50,000%. This magnitude of acceleration enables the modeling of geometrically complex faults over thousands of earthquake cycles across wider ranges of model parameters and at larger spatial and temporal scales than have been previously possible. Perhaps most interestingly from a scientific perspective, ANN representations of viscoelastic physics may lead to basic advances in the understanding of the underlying model phenomenology. We demonstrate the potential of artificial neural networks to illuminate fundamental physical insights with specific examples.

  19. Toward exascale production of recombinant adeno-associated virus for gene transfer applications.

    PubMed

    Cecchini, S; Negrete, A; Kotin, R M

    2008-06-01

    To gain acceptance as a medical treatment, adeno-associated virus (AAV) vectors require a scalable and economical production method. Recent developments indicate that recombinant AAV (rAAV) production in insect cells is compatible with current good manufacturing practice production on an industrial scale. This platform can fully support development of rAAV therapeutics from tissue culture to small animal models, to large animal models, to toxicology studies, to Phase I clinical trials and beyond. Efforts to characterize, optimize and develop insect cell-based rAAV production have culminated in successful bioreactor-scale production of rAAV, with total yields potentially capable of approaching the exa-(10(18)) scale. These advances in large-scale AAV production will allow us to address specific catastrophic, intractable human diseases such as Duchenne muscular dystrophy, for which large amounts of recombinant vector are essential for successful outcome.

  20. Remote Imaging Applied to Schistosomiasis Control: The Anning River Project

    NASA Technical Reports Server (NTRS)

    Seto, Edmund Y. W.; Maszle, Don R.; Spear, Robert C.; Gong, Peng

    1997-01-01

    The use of satellite imaging to remotely detect areas of high risk for transmission of infectious disease is an appealing prospect for large-scale monitoring of these diseases. The detection of large-scale environmental determinants of disease risk, often called landscape epidemiology, has been motivated by several authors (Pavlovsky 1966; Meade et al. 1988). The basic notion is that large-scale factors such as population density, air temperature, hydrological conditions, soil type, and vegetation can determine in a coarse fashion the local conditions contributing to disease vector abundance and human contact with disease agents. These large-scale factors can often be remotely detected by sensors or cameras mounted on satellite or aircraft platforms and can thus be used in a predictive model to mark high risk areas of transmission and to target control or monitoring efforts. A review of satellite technologies for this purpose was recently presented by Washino and Wood (1994) and Hay (1997) and Hay et al. (1997).

  1. Application and research of block caving in Pulang copper mine

    NASA Astrophysics Data System (ADS)

    Ge, Qifa; Fan, Wenlu; Zhu, Weigen; Chen, Xiaowei

    2018-01-01

    The application of block caving in mines shows significant advantages in large scale, low cost and high efficiency, thus block caving is worth promoting in the mines that meets the requirement of natural caving. Due to large scale of production and low ore grade in Pulang copper mine in China, comprehensive analysis and research were conducted on rock mechanics, mining sequence, undercutting and stability of bottom structure in terms of raising mine benefit and maximizing the recovery mineral resources. Finally this study summarizes that block caving is completely suitable for Pulang copper mine.

  2. Academic-industrial partnerships in drug discovery in the age of genomics.

    PubMed

    Harris, Tim; Papadopoulos, Stelios; Goldstein, David B

    2015-06-01

    Many US FDA-approved drugs have been developed through productive interactions between the biotechnology industry and academia. Technological breakthroughs in genomics, in particular large-scale sequencing of human genomes, is creating new opportunities to understand the biology of disease and to identify high-value targets relevant to a broad range of disorders. However, the scale of the work required to appropriately analyze large genomic and clinical data sets is challenging industry to develop a broader view of what areas of work constitute precompetitive research. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Large-scale gene discovery in the pea aphid Acyrthosiphon pisum (Hemiptera)

    PubMed Central

    Sabater-Muñoz, Beatriz; Legeai, Fabrice; Rispe, Claude; Bonhomme, Joël; Dearden, Peter; Dossat, Carole; Duclert, Aymeric; Gauthier, Jean-Pierre; Ducray, Danièle Giblot; Hunter, Wayne; Dang, Phat; Kambhampati, Srini; Martinez-Torres, David; Cortes, Teresa; Moya, Andrès; Nakabachi, Atsushi; Philippe, Cathy; Prunier-Leterme, Nathalie; Rahbé, Yvan; Simon, Jean-Christophe; Stern, David L; Wincker, Patrick; Tagu, Denis

    2006-01-01

    Aphids are the leading pests in agricultural crops. A large-scale sequencing of 40,904 ESTs from the pea aphid Acyrthosiphon pisum was carried out to define a catalog of 12,082 unique transcripts. A strong AT bias was found, indicating a compositional shift between Drosophila melanogaster and A. pisum. An in silico profiling analysis characterized 135 transcripts specific to pea-aphid tissues (relating to bacteriocytes and parthenogenetic embryos). This project is the first to address the genetics of the Hemiptera and of a hemimetabolous insect. PMID:16542494

  4. Intrinsic flexibility of B-DNA: the experimental TRX scale.

    PubMed

    Heddi, Brahim; Oguey, Christophe; Lavelle, Christophe; Foloppe, Nicolas; Hartmann, Brigitte

    2010-01-01

    B-DNA flexibility, crucial for DNA-protein recognition, is sequence dependent. Free DNA in solution would in principle be the best reference state to uncover the relation between base sequences and their intrinsic flexibility; however, this has long been hampered by a lack of suitable experimental data. We investigated this relationship by compiling and analyzing a large dataset of NMR (31)P chemical shifts in solution. These measurements reflect the BI <--> BII equilibrium in DNA, intimately correlated to helicoidal descriptors of the curvature, winding and groove dimensions. Comparing the ten complementary DNA dinucleotide steps indicates that some steps are much more flexible than others. This malleability is primarily controlled at the dinucleotide level, modulated by the tetranucleotide environment. Our analyses provide an experimental scale called TRX that quantifies the intrinsic flexibility of the ten dinucleotide steps in terms of Twist, Roll, and X-disp (base pair displacement). Applying the TRX scale to DNA sequences optimized for nucleosome formation reveals a 10 base-pair periodic alternation of stiff and flexible regions. Thus, DNA flexibility captured by the TRX scale is relevant to nucleosome formation, suggesting that this scale may be of general interest to better understand protein-DNA recognition.

  5. Organizational Influences on Interdisciplinary Interactions during Research and Design of Large-Scale Complex Engineered Systems

    NASA Technical Reports Server (NTRS)

    McGowan, Anna-Maria R.; Seifert, Colleen M.; Papalambros, Panos Y.

    2012-01-01

    The design of large-scale complex engineered systems (LaCES) such as an aircraft is inherently interdisciplinary. Multiple engineering disciplines, drawing from a team of hundreds to thousands of engineers and scientists, are woven together throughout the research, development, and systems engineering processes to realize one system. Though research and development (R&D) is typically focused in single disciplines, the interdependencies involved in LaCES require interdisciplinary R&D efforts. This study investigates the interdisciplinary interactions that take place during the R&D and early conceptual design phases in the design of LaCES. Our theoretical framework is informed by both engineering practices and social science research on complex organizations. This paper provides preliminary perspective on some of the organizational influences on interdisciplinary interactions based on organization theory (specifically sensemaking), data from a survey of LaCES experts, and the authors experience in the research and design. The analysis reveals couplings between the engineered system and the organization that creates it. Survey respondents noted the importance of interdisciplinary interactions and their significant benefit to the engineered system, such as innovation and problem mitigation. Substantial obstacles to interdisciplinarity are uncovered beyond engineering that include communication and organizational challenges. Addressing these challenges may ultimately foster greater efficiencies in the design and development of LaCES and improved system performance by assisting with the collective integration of interdependent knowledge bases early in the R&D effort. This research suggests that organizational and human dynamics heavily influence and even constrain the engineering effort for large-scale complex systems.

  6. Real-time fast physical random number generator with a photonic integrated circuit.

    PubMed

    Ugajin, Kazusa; Terashima, Yuta; Iwakawa, Kento; Uchida, Atsushi; Harayama, Takahisa; Yoshimura, Kazuyuki; Inubushi, Masanobu

    2017-03-20

    Random number generators are essential for applications in information security and numerical simulations. Most optical-chaos-based random number generators produce random bit sequences by offline post-processing with large optical components. We demonstrate a real-time hardware implementation of a fast physical random number generator with a photonic integrated circuit and a field programmable gate array (FPGA) electronic board. We generate 1-Tbit random bit sequences and evaluate their statistical randomness using NIST Special Publication 800-22 and TestU01. All of the BigCrush tests in TestU01 are passed using 410-Gbit random bit sequences. A maximum real-time generation rate of 21.1 Gb/s is achieved for random bit sequences in binary format stored in a computer, which can be directly used for applications involving secret keys in cryptography and random seeds in large-scale numerical simulations.

  7. Construction of a large collection of small genome variations in French dairy and beef breeds using whole-genome sequences.

    PubMed

    Boussaha, Mekki; Michot, Pauline; Letaief, Rabia; Hozé, Chris; Fritz, Sébastien; Grohs, Cécile; Esquerré, Diane; Duchesne, Amandine; Philippe, Romain; Blanquet, Véronique; Phocas, Florence; Floriot, Sandrine; Rocha, Dominique; Klopp, Christophe; Capitan, Aurélien; Boichard, Didier

    2016-11-15

    In recent years, several bovine genome sequencing projects were carried out with the aim of developing genomic tools to improve dairy and beef production efficiency and sustainability. In this study, we describe the first French cattle genome variation dataset obtained by sequencing 274 whole genomes representing several major dairy and beef breeds. This dataset contains over 28 million single nucleotide polymorphisms (SNPs) and small insertions and deletions. Comparisons between sequencing results and SNP array genotypes revealed a very high genotype concordance rate, which indicates the good quality of our data. To our knowledge, this is the first large-scale catalog of small genomic variations in French dairy and beef cattle. This resource will contribute to the study of gene functions and population structure and also help to improve traits through genotype-guided selection.

  8. An Overview of the Launch Vehicle Blast Environments Development Efforts

    NASA Technical Reports Server (NTRS)

    Richardson, Erin; Bangham, Mike; Blackwood, James; Skinner, Troy; Hays, Michael; Jackson, Austin; Richman, Ben

    2014-01-01

    NASA has been funding an ongoing development program to characterize the explosive environments produced during a catastrophic launch vehicle accident. These studies and small-scale tests are focused on the near field environments that threaten the crew. The results indicate that these environments are unlikely to result in immediate destruction of the crew modules. The effort began as an independent assessment by NASA safety organizations, followed by the Ares program and NASA Engineering and Safety Center and now as a Space Launch Systems (SLS) focused effort. The development effort is using the test and accident data available from public or NASA sources as well as focused scaled tests that are examining the fundamental aspects of uncontained explosions of Hydrogen and air and Hydrogen and Oxygen. The primary risk to the crew appears to be the high-energy fragments and these are being characterized for the SLS. The development efforts will characterize the thermal environment of the explosions as well to ensure that the risk is well understood and to document the overall energy balance of an explosion. The effort is multi-path in that analytical, computational and focused testing is being used to develop the knowledge to understand potential SLS explosions. This is an ongoing program with plans that expand the development from fundamental testing at small-scale levels to large-scale tests that can be used to validate models for commercial programs. The ultimate goal is to develop a knowledge base that can be used by vehicle designers to maximize crew survival in an explosion.

  9. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.

    PubMed

    Haghverdi, Laleh; Lun, Aaron T L; Morgan, Michael D; Marioni, John C

    2018-06-01

    Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.

  10. A streamlined collecting and preparation protocol for DNA barcoding of Lepidoptera as part of large-scale rapid biodiversity assessment projects, exemplified by the Indonesian Biodiversity Discovery and Information System (IndoBioSys)

    PubMed Central

    Hausmann, Axel; Cancian de Araujo, Bruno; Sutrisno, Hari; Peggie, Djunijanti; Schmidt, Stefan

    2017-01-01

    Abstract Here we present a general collecting and preparation protocol for DNA barcoding of Lepidoptera as part of large-scale rapid biodiversity assessment projects, and a comparison with alternative preserving and vouchering methods. About 98% of the sequenced specimens processed using the present collecting and preparation protocol yielded sequences with more than 500 base pairs. The study is based on the first outcomes of the Indonesian Biodiversity Discovery and Information System (IndoBioSys). IndoBioSys is a German-Indonesian research project that is conducted by the Museum für Naturkunde in Berlin and the Zoologische Staatssammlung München, in close cooperation with the Research Center for Biology – Indonesian Institute of Sciences (RCB-LIPI, Bogor). PMID:29134041

  11. Fast Algorithms for Mining Co-evolving Time Series

    DTIC Science & Technology

    2011-09-01

    Keogh et al., 2001, 2004] and (b) forecasting, like an autoregressive integrated moving average model ( ARIMA ) and related meth- ods [Box et al., 1994...computing hardware? We develop models to mine time series with missing values, to extract compact representation from time sequences, to segment the...sequences, and to do forecasting. For large scale data, we propose algorithms for learning time series models , in particular, including Linear Dynamical

  12. Genome assembly reborn: recent computational challenges

    PubMed Central

    2009-01-01

    Research into genome assembly algorithms has experienced a resurgence due to new challenges created by the development of next generation sequencing technologies. Several genome assemblers have been published in recent years specifically targeted at the new sequence data; however, the ever-changing technological landscape leads to the need for continued research. In addition, the low cost of next generation sequencing data has led to an increased use of sequencing in new settings. For example, the new field of metagenomics relies on large-scale sequencing of entire microbial communities instead of isolate genomes, leading to new computational challenges. In this article, we outline the major algorithmic approaches for genome assembly and describe recent developments in this domain. PMID:19482960

  13. Detection of DNA Methylation by Whole-Genome Bisulfite Sequencing.

    PubMed

    Li, Qing; Hermanson, Peter J; Springer, Nathan M

    2018-01-01

    DNA methylation plays an important role in the regulation of the expression of transposons and genes. Various methods have been developed to assay DNA methylation levels. Bisulfite sequencing is considered to be the "gold standard" for single-base resolution measurement of DNA methylation levels. Coupled with next-generation sequencing, whole-genome bisulfite sequencing (WGBS) allows DNA methylation to be evaluated at a genome-wide scale. Here, we described a protocol for WGBS in plant species with large genomes. This protocol has been successfully applied to assay genome-wide DNA methylation levels in maize and barley. This protocol has also been successfully coupled with sequence capture technology to assay DNA methylation levels in a targeted set of genomic regions.

  14. Plotting Confucian and Disability Rights Paradigms on the Advocacy-Activism Continuum: Experiences of Chinese Parents of Children with Dyslexia in Hong Kong

    ERIC Educational Resources Information Center

    Poon-McBrayer, Kim Fong; McBrayer, Philip Allen

    2014-01-01

    This study represents an initial effort to understand the advocacy journey of Chinese parents of children with dyslexia in Hong Kong. Qualitative methods involving individual and group interviews were used to solicit detailed recount and perceptions of the experiences of 25 parents. Findings revealed a largely sequenced three-stage journey of…

  15. Introduction to bioinformatics.

    PubMed

    Can, Tolga

    2014-01-01

    Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.

  16. bpRNA: large-scale automated annotation and analysis of RNA secondary structure.

    PubMed

    Danaee, Padideh; Rouches, Mason; Wiley, Michelle; Deng, Dezhong; Huang, Liang; Hendrix, David

    2018-05-09

    While RNA secondary structure prediction from sequence data has made remarkable progress, there is a need for improved strategies for annotating the features of RNA secondary structures. Here, we present bpRNA, a novel annotation tool capable of parsing RNA structures, including complex pseudoknot-containing RNAs, to yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature. We also introduce several new informative representations of RNA structure types to improve structure visualization and interpretation. We have further used bpRNA to generate a web-accessible meta-database, 'bpRNA-1m', of over 100 000 single-molecule, known secondary structures; this is both more fully and accurately annotated and over 20-times larger than existing databases. We use a subset of the database with highly similar (≥90% identical) sequences filtered out to report on statistical trends in sequence, flanking base pairs, and length. Both the bpRNA method and the bpRNA-1m database will be valuable resources both for specific analysis of individual RNA molecules and large-scale analyses such as are useful for updating RNA energy parameters for computational thermodynamic predictions, improving machine learning models for structure prediction, and for benchmarking structure-prediction algorithms.

  17. MONITORING DECLINING METAPOPULATIONS: INSIGHTS FROM A MODEL SIMULATION

    EPA Science Inventory

    Pond-breeding amphibians, host-specialist butterflies, and a variety of other organisms have been shown to exhibit population structures and dynamics consistent with metapopulation theory. In recent years large-scale biodiversity monitoring efforts have been initiated in many reg...

  18. Lessons Learned from the Everglades Collaborative Adaptive Management Program

    EPA Science Inventory

    Recent technical papers explore whether adaptive management (AM) is useful for environmental management and restoration efforts and discuss the many challenges to overcome for successful implementation, especially for large-scale restoration programs (McLain and Lee 1996; Levine ...

  19. Large Scale Evaluation fo Nickel Aluminide Rolls

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    None

    2005-09-01

    This completed project was a joint effort between Oak Ridge National Laboratory and Bethlehem Steel (now Mittal Steel) to demonstrate the effectiveness of using nickel aluminide intermetallic alloy rolls as part of an updated, energy-efficient, commercial annealing furnace system.

  20. Simple and efficient identification of rare recessive pathologically important sequence variants from next generation exome sequence data.

    PubMed

    Carr, Ian M; Morgan, Joanne; Watson, Christopher; Melnik, Svitlana; Diggle, Christine P; Logan, Clare V; Harrison, Sally M; Taylor, Graham R; Pena, Sergio D J; Markham, Alexander F; Alkuraya, Fowzan S; Black, Graeme C M; Ali, Manir; Bonthron, David T

    2013-07-01

    Massively parallel ("next generation") DNA sequencing (NGS) has quickly become the method of choice for seeking pathogenic mutations in rare uncharacterized monogenic diseases. Typically, before DNA sequencing, protein-coding regions are enriched from patient genomic DNA, representing either the entire genome ("exome sequencing") or selected mapped candidate loci. Sequence variants, identified as differences between the patient's and the human genome reference sequences, are then filtered according to various quality parameters. Changes are screened against datasets of known polymorphisms, such as dbSNP and the 1000 Genomes Project, in the effort to narrow the list of candidate causative variants. An increasing number of commercial services now offer to both generate and align NGS data to a reference genome. This potentially allows small groups with limited computing infrastructure and informatics skills to utilize this technology. However, the capability to effectively filter and assess sequence variants is still an important bottleneck in the identification of deleterious sequence variants in both research and diagnostic settings. We have developed an approach to this problem comprising a user-friendly suite of programs that can interactively analyze, filter and screen data from enrichment-capture NGS data. These programs ("Agile Suite") are particularly suitable for small-scale gene discovery or for diagnostic analysis. © 2013 WILEY PERIODICALS, INC.

  1. ProteinInferencer: Confident protein identification and multiple experiment comparison for large scale proteomics projects.

    PubMed

    Zhang, Yaoyang; Xu, Tao; Shan, Bing; Hart, Jonathan; Aslanian, Aaron; Han, Xuemei; Zong, Nobel; Li, Haomin; Choi, Howard; Wang, Dong; Acharya, Lipi; Du, Lisa; Vogt, Peter K; Ping, Peipei; Yates, John R

    2015-11-03

    Shotgun proteomics generates valuable information from large-scale and target protein characterizations, including protein expression, protein quantification, protein post-translational modifications (PTMs), protein localization, and protein-protein interactions. Typically, peptides derived from proteolytic digestion, rather than intact proteins, are analyzed by mass spectrometers because peptides are more readily separated, ionized and fragmented. The amino acid sequences of peptides can be interpreted by matching the observed tandem mass spectra to theoretical spectra derived from a protein sequence database. Identified peptides serve as surrogates for their proteins and are often used to establish what proteins were present in the original mixture and to quantify protein abundance. Two major issues exist for assigning peptides to their originating protein. The first issue is maintaining a desired false discovery rate (FDR) when comparing or combining multiple large datasets generated by shotgun analysis and the second issue is properly assigning peptides to proteins when homologous proteins are present in the database. Herein we demonstrate a new computational tool, ProteinInferencer, which can be used for protein inference with both small- or large-scale data sets to produce a well-controlled protein FDR. In addition, ProteinInferencer introduces confidence scoring for individual proteins, which makes protein identifications evaluable. This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015. Published by Elsevier B.V.

  2. Low rank approximation methods for MR fingerprinting with large scale dictionaries.

    PubMed

    Yang, Mingrui; Ma, Dan; Jiang, Yun; Hamilton, Jesse; Seiberlich, Nicole; Griswold, Mark A; McGivney, Debra

    2018-04-01

    This work proposes new low rank approximation approaches with significant memory savings for large scale MR fingerprinting (MRF) problems. We introduce a compressed MRF with randomized singular value decomposition method to significantly reduce the memory requirement for calculating a low rank approximation of large sized MRF dictionaries. We further relax this requirement by exploiting the structures of MRF dictionaries in the randomized singular value decomposition space and fitting them to low-degree polynomials to generate high resolution MRF parameter maps. In vivo 1.5T and 3T brain scan data are used to validate the approaches. T 1 , T 2 , and off-resonance maps are in good agreement with that of the standard MRF approach. Moreover, the memory savings is up to 1000 times for the MRF-fast imaging with steady-state precession sequence and more than 15 times for the MRF-balanced, steady-state free precession sequence. The proposed compressed MRF with randomized singular value decomposition and dictionary fitting methods are memory efficient low rank approximation methods, which can benefit the usage of MRF in clinical settings. They also have great potentials in large scale MRF problems, such as problems considering multi-component MRF parameters or high resolution in the parameter space. Magn Reson Med 79:2392-2400, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.

  3. ATLAS and LHC computing on CRAY

    NASA Astrophysics Data System (ADS)

    Sciacca, F. G.; Haug, S.; ATLAS Collaboration

    2017-10-01

    Access and exploitation of large scale computing resources, such as those offered by general purpose HPC centres, is one important measure for ATLAS and the other Large Hadron Collider experiments in order to meet the challenge posed by the full exploitation of the future data within the constraints of flat budgets. We report on the effort of moving the Swiss WLCG T2 computing, serving ATLAS, CMS and LHCb, from a dedicated cluster to the large Cray systems at the Swiss National Supercomputing Centre CSCS. These systems do not only offer very efficient hardware, cooling and highly competent operators, but also have large backfill potentials due to size and multidisciplinary usage and potential gains due to economy at scale. Technical solutions, performance, expected return and future plans are discussed.

  4. CoCoNUT: an efficient system for the comparison and analysis of genomes

    PubMed Central

    2008-01-01

    Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit) that allows solving several different tasks in a unified framework: (1) finding regions of high similarity among multiple genomic sequences and aligning them, (2) comparing two draft or multi-chromosomal genomes, (3) locating large segmental duplications in large genomic sequences, and (4) mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component), CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics. PMID:19014477

  5. Vulnerability of China's nearshore ecosystems under intensive mariculture development.

    PubMed

    Liu, Hui; Su, Jilan

    2017-04-01

    Rapid economic development and increasing population in China have exerted tremendous pressures on the coastal ecosystems. In addition to land-based pollutants and reclamation, fast expansion of large-scale intensive mariculture activities has also brought about additional effects. So far, the ecological impact of rapid mariculture development and its large-scale operations has not drawn enough attention. In this paper, the rapid development of mariculture in China is reviewed, China's effort in the application of ecological mariculture is examined, and the vulnerability of marine ecosystem to mariculture impact is evaluated through a number of examples. Removal or reduced large and forage fish, due to both habitat loss to reclamation/mariculture and overfishing for food or fishmeal, may have far-reaching effects on the coastal and shelf ecosystems in the long run. Large-scale intensive mariculture operations carry with them undesirable biological and biochemical characteristics, which may have consequences on natural ecosystems beyond normally perceived spatial and temporal boundaries. As our understanding of possible impacts of large-scale intensive mariculture is lagging far behind its development, much research is urgently needed.

  6. Ensembl Genomes 2016: more genomes, more complexity.

    PubMed

    Kersey, Paul Julian; Allen, James E; Armean, Irina; Boddu, Sanjay; Bolt, Bruce J; Carvalho-Silva, Denise; Christensen, Mikkel; Davis, Paul; Falin, Lee J; Grabmueller, Christoph; Humphrey, Jay; Kerhornou, Arnaud; Khobova, Julia; Aranganathan, Naveen K; Langridge, Nicholas; Lowy, Ernesto; McDowall, Mark D; Maheswari, Uma; Nuhn, Michael; Ong, Chuang Kee; Overduin, Bert; Paulini, Michael; Pedro, Helder; Perry, Emily; Spudich, Giulietta; Tapanari, Electra; Walts, Brandon; Williams, Gareth; Tello-Ruiz, Marcela; Stein, Joshua; Wei, Sharon; Ware, Doreen; Bolser, Daniel M; Howe, Kevin L; Kulesha, Eugene; Lawson, Daniel; Maslen, Gareth; Staines, Daniel M

    2016-01-04

    Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Ensembl Genomes 2016: more genomes, more complexity

    PubMed Central

    Kersey, Paul Julian; Allen, James E.; Armean, Irina; Boddu, Sanjay; Bolt, Bruce J.; Carvalho-Silva, Denise; Christensen, Mikkel; Davis, Paul; Falin, Lee J.; Grabmueller, Christoph; Humphrey, Jay; Kerhornou, Arnaud; Khobova, Julia; Aranganathan, Naveen K.; Langridge, Nicholas; Lowy, Ernesto; McDowall, Mark D.; Maheswari, Uma; Nuhn, Michael; Ong, Chuang Kee; Overduin, Bert; Paulini, Michael; Pedro, Helder; Perry, Emily; Spudich, Giulietta; Tapanari, Electra; Walts, Brandon; Williams, Gareth; Tello–Ruiz, Marcela; Stein, Joshua; Wei, Sharon; Ware, Doreen; Bolser, Daniel M.; Howe, Kevin L.; Kulesha, Eugene; Lawson, Daniel; Maslen, Gareth; Staines, Daniel M.

    2016-01-01

    Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces. PMID:26578574

  8. Evolution of the Alpine Tethys (Sava) suture zone in Fruška Gora Mountains (N Serbia): from orogenic building to tectonic omissions

    NASA Astrophysics Data System (ADS)

    Toljić, Marinko; Matenco, Liviu; ĐErić, Nevenka; Milivojević, Jelena; Gerzina, Nataša.; Stojadinović, Uros

    2010-05-01

    The Fru\\vska Gora Mountains in northern Serbia offers an unique opportunity to study the Cretaceous-Eocene evolution of the NE part of the Dinarides, which is largely covered elsewhere beneath the thick Miocene sediments of the Pannonian basin, deposited during the back-arc collapse associated with the subduction and roll-back recorded in the external Carpathians. The structural grain of the Fru\\vska Gora Mountains is the one of a large scale antiform, exposing a complex puzzle of highly deformed metamorphic rocks in its centre and Triassic-Miocene sequence of non-metamorphosed sediments, ophiolites and volcanics along its flanks. The metamorphic rocks were the target of structural investigations coupled with paleontological dating (conodonts, palynomorphs and radiolarians) in an effort to unravel the geodynamic evolution of an area thought to be located near the suture zone between the Tisza upper plate and the Adriatic lower plate, i.e. the Sava subduction zone of the Dinarides (e.g., Pamic, 2002; Schmid et al., 2008). The existence of this subduction zone was previously inferred here by local observations, such as metamorphosed Mesozoic sediments containing Middle Triassic conodonts (Đurđanović, 1971) or Early Cretaceous blue schists metamorphism (123±5 Ma, Milovanović et al., 1995). The metamorphic sequence is characterized by a Paleozoic age meta-sedimentary basement which contains palynomorphs of Upper Paleozoic - Carboniferous age and a meta-sedimentary and meta-volcanic sequence which contain a succession of contrasting metamorphosed lithologies such sandstones, black limestones, shallow water white limestones, basic volcanic sequences, deep nodular limestiones, radiolarites, meta-ophiolites and turbiditic sequences. The lower part of the sequence is contrastingly similar with the Triassic cover of the Drina-Ivanijca thrust sheet and its metamorphosed equivalent observed in the Kopaonik and Studenica series (Schefer et al., in press). This observation is supported by the newly found micro-fauna of Upper Triassic in age in the meta-sandstones associated with meta-volcanics on the SW slopes of the mountain. The upper part of the sequence display metamorphosed "flysh"-type of sequences and meta-basalts. In these deposits, slightly metamorphosed siliciclastics (lithic sandstones with volcanic-derived clasts) previously interpreted as Upper Jurassic mélange have proved to contain Upper Cretaceous palynomorphs. Among the rocks exposed in the metamorphic core of the mountains, the SW slope of Fru\\vska Gora offers the optimal locality for the study of the kinematic evolution. Here, four phases of folding have been mapped, being associated mainly with large-scale regional contraction. The first phase is characterized by isoclinal folding, with reconstructed SW vergence. The second generation of E-W oriented and coaxial folds is asymmetric and is up to metres in size, displaying a south vergence and has largely refolded the previous generation. The third event was responsible for the formation of upright folds, yet again E-W oriented, re-folding earlier structures. The first two phases of folding are associated with metamorphic conditions, while the third was apparently near the transition with the brittle domain. The relationship with a fourth folding event observed also in the non-metamorphosed clastic-carbonate rocks is rather uncertain, but is apparently associated with the present day antiformal structure of the Fuska Gora Mountains. Interestingly, the metamorphosed Triassic and Upper Cretaceous carbonatic-clastic sequence in the core of the antiform is in structural contact along the antiformal flanks with Lower-Middle Triassic and Upper Cretaceous-Paleogene sediments which display the same facies, but these are not metamorphosed. This demonstrates a large scale tectonic omission along the flanks of the Fru\\vska Gora antiform, 9-10km of rocks being removed by what we speculatively define as an extensional detachment exhuming the metamorphic core. This detachment has been subsequently folded into the present-day antiformal geometry of the Fru\\vska Gora Mountains. These findings demonstrate that the metamorphic and non-metamorphic Upper Cretaceous - Paleogene clastic-carbonate sediments belongs to the main Alpine Tethys (Sava) subduction zone of the Dinarides. The Paleozoic-Triassic metamorphic and non-metamorphic rocks belong to the distal Adriatic lower plate, or more precisely to the Jadar-Kopaonik composite thrust sheet (Schmid et al., 2008), while the layer of serpentinized peridotite found at their contact most probably belongs to the Western Vardar ophiolites obducted over the Adriatic plate during Late Jurassic - Earliest Cretaceous. The distal Jadar-Kopaonik composite unit was partly affected by strong contractional deformation and a Late Eocene greenschist facies metamorphism during the main phase of subduction and collision, similarly to what has been observed elsewhere in the Dinarides (Pamić, 2002; Schefer et al., in press). A Miocene phase of core-complex formation was responsible for the large tectonic omission observed, being probably followed by the formation of a wide open antiformal structure during the Pliocene-Quaternary inversion of the Pannonian basin.

  9. Development of manufacturing methods for the production of superconductive devices. Final technical documentary report, 28 Jun 1965--27 Jun 1969. [Plasma arc process

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Haid, D.A.; Fietz, W.A.

    1969-06-01

    The effort to scale-up the plasma-arc process to produce large solenoids and saddle coils is described. Large coils (up to 16-/sup 3///sub 4/ in. and 41-in. length) of three different configurations, helical, ''pancake'' and ''saddle,'' were fabricated using the plasma arc process.

  10. Site-level habitat models for the endemic, threatened Cheat Mountain salamander (Plethodon nettingi): the importance of geophysical and biotic attributes for predicting occurrence

    Treesearch

    Lester O. Dillard; Kevin R. Russell; W. Mark Ford

    2008-01-01

    The federally threatened Cheat Mountain salamander (Plethodon nettingi; hereafter CMS) is known to occur in approximately 70 small, scattered populations in the Allegheny Mountains of eastern West Virginia, USA. Current conservation and management efforts on federal, state, and private lands involving CMS largely rely on small scale, largely...

  11. Integrating SMOS brightness temperatures with a new conceptual spatially distributed hydrological model for improving flood and drought predictions at large scale.

    NASA Astrophysics Data System (ADS)

    Hostache, Renaud; Rains, Dominik; Chini, Marco; Lievens, Hans; Verhoest, Niko E. C.; Matgen, Patrick

    2017-04-01

    Motivated by climate change and its impact on the scarcity or excess of water in many parts of the world, several agencies and research institutions have taken initiatives in monitoring and predicting the hydrologic cycle at a global scale. Such a monitoring/prediction effort is important for understanding the vulnerability to extreme hydrological events and for providing early warnings. This can be based on an optimal combination of hydro-meteorological models and remote sensing, in which satellite measurements can be used as forcing or calibration data or for regularly updating the model states or parameters. Many advances have been made in these domains and the near future will bring new opportunities with respect to remote sensing as a result of the increasing number of spaceborn sensors enabling the large scale monitoring of water resources. Besides of these advances, there is currently a tendency to refine and further complicate physically-based hydrologic models to better capture the hydrologic processes at hand. However, this may not necessarily be beneficial for large-scale hydrology, as computational efforts are therefore increasing significantly. As a matter of fact, a novel thematic science question that is to be investigated is whether a flexible conceptual model can match the performance of a complex physically-based model for hydrologic simulations at large scale. In this context, the main objective of this study is to investigate how innovative techniques that allow for the estimation of soil moisture from satellite data can help in reducing errors and uncertainties in large scale conceptual hydro-meteorological modelling. A spatially distributed conceptual hydrologic model has been set up based on recent developments of the SUPERFLEX modelling framework. As it requires limited computational efforts, this model enables early warnings for large areas. Using as forcings the ERA-Interim public dataset and coupled with the CMEM radiative transfer model, SUPERFLEX is capable of predicting runoff, soil moisture, and SMOS-like brightness temperature time series. Such a model is traditionally calibrated using only discharge measurements. In this study we designed a multi-objective calibration procedure based on both discharge measurements and SMOS-derived brightness temperature observations in order to evaluate the added value of remotely sensed soil moisture data in the calibration process. As a test case we set up the SUPERFLEX model for the large scale Murray-Darling catchment in Australia ( 1 Million km2). When compared to in situ soil moisture time series, model predictions show good agreement resulting in correlation coefficients exceeding 70 % and Root Mean Squared Errors below 1 %. When benchmarked with the physically based land surface model CLM, SUPERFLEX exhibits similar performance levels. By adapting the runoff routing function within the SUPERFLEX model, the predicted discharge results in a Nash Sutcliff Efficiency exceeding 0.7 over both the calibration and the validation periods.

  12. De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum)

    PubMed Central

    2011-01-01

    Background Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, Fagopyrum esculentum and F. tataricum, belong to the order Caryophyllales - a large group of flowering plants with uncertain evolutionary relationships. F. esculentum (common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations Fagopyrum species have not been the subject of large-scale sequencing projects. Results Normalized cDNA corresponding to genes expressed in flowers and inflorescences of F. esculentum and F. tataricum was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for F. esculentum) and 229 (F. tataricum) thousands of reads with average length of 341-349 nucleotides. De novo assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences. Conclusions 454 transcriptome sequencing and de novo assembly was performed for two congeneric flowering plant species, F. esculentum and F. tataricum. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated. PMID:21232141

  13. Scalable Parallel Methods for Analyzing Metagenomics Data at Extreme Scale

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daily, Jeffrey A.

    2015-05-01

    The field of bioinformatics and computational biology is currently experiencing a data revolution. The exciting prospect of making fundamental biological discoveries is fueling the rapid development and deployment of numerous cost-effective, high-throughput next-generation sequencing technologies. The result is that the DNA and protein sequence repositories are being bombarded with new sequence information. Databases are continuing to report a Moore’s law-like growth trajectory in their database sizes, roughly doubling every 18 months. In what seems to be a paradigm-shift, individual projects are now capable of generating billions of raw sequence data that need to be analyzed in the presence of alreadymore » annotated sequence information. While it is clear that data-driven methods, such as sequencing homology detection, are becoming the mainstay in the field of computational life sciences, the algorithmic advancements essential for implementing complex data analytics at scale have mostly lagged behind. Sequence homology detection is central to a number of bioinformatics applications including genome sequencing and protein family characterization. Given millions of sequences, the goal is to identify all pairs of sequences that are highly similar (or “homologous”) on the basis of alignment criteria. While there are optimal alignment algorithms to compute pairwise homology, their deployment for large-scale is currently not feasible; instead, heuristic methods are used at the expense of quality. In this dissertation, we present the design and evaluation of a parallel implementation for conducting optimal homology detection on distributed memory supercomputers. Our approach uses a combination of techniques from asynchronous load balancing (viz. work stealing, dynamic task counters), data replication, and exact-matching filters to achieve homology detection at scale. Results for a collection of 2.56M sequences show parallel efficiencies of ~75-100% on up to 8K cores, representing a time-to-solution of 33 seconds. We extend this work with a detailed analysis of single-node sequence alignment performance using the latest CPU vector instruction set extensions. Preliminary results reveal that current sequence alignment algorithms are unable to fully utilize widening vector registers.« less

  14. An Analysis of Rich Cluster Redshift Survey Data for Large Scale Structure Studies

    NASA Astrophysics Data System (ADS)

    Slinglend, K.; Batuski, D.; Haase, S.; Hill, J.

    1994-12-01

    The results from the COBE satellite show the existence of structure on scales on the order of 10% or more of the horizon scale of the universe. Rich clusters of galaxies from Abell's catalog show evidence of structure on scales of 100 Mpc and may hold the promise of confirming structure on the scale of the COBE result. However, many Abell clusters have zero or only one measured redshift, so present knowledge of their three dimensional distribution has quite large uncertainties. The shortage of measured redshifts for these clusters may also mask a problem of projection effects corrupting the membership counts for the clusters. Our approach in this effort has been to use the MX multifiber spectrometer on the Steward 2.3m to measure redshifts of at least ten galaxies in each of 80 Abell cluster fields with richness class R>= 1 and mag10 <= 16.8 (estimated z<= 0.12) and zero or one measured redshifts. This work will result in a deeper, more complete (and reliable) sample of positions of rich clusters. Our primary intent for the sample is for two-point correlation and other studies of the large scale structure traced by these clusters in an effort to constrain theoretical models for structure formation. We are also obtaining enough redshifts per cluster so that a much better sample of reliable cluster velocity dispersions will be available for other studies of cluster properties. To date, we have collected such data for 64 clusters, and for most of them, we have seven or more cluster members with redshifts, allowing for reliable velocity dispersion calculations. Velocity histograms and stripe density plots for several interesting cluster fields are presented, along with summary tables of cluster redshift results. Also, with 10 or more redshifts in most of our cluster fields (30({') } square, just about an `Abell diameter' at z ~ 0.1) we have investigated the extent of projection effects within the Abell catalog in an effort to quantify and understand how this may effect the Abell sample.

  15. Heavy hydrocarbon main injector technology

    NASA Technical Reports Server (NTRS)

    Fisher, S. C.; Arbit, H. A.

    1988-01-01

    One of the key components of the Advanced Launch System (ALS) is a large liquid rocket, booster engine. To keep the overall vehicle size and cost down, this engine will probably use liquid oxygen (LOX) and a heavy hydrocarbon, such as RP-1, as propellants and operate at relatively high chamber pressures to increase overall performance. A technology program (Heavy Hydrocarbon Main Injector Technology) is being studied. The main objective of this effort is to develop a logic plan and supporting experimental data base to reduce the risk of developing a large scale (approximately 750,000 lb thrust), high performance main injector system. The overall approach and program plan, from initial analyses to large scale, two dimensional combustor design and test, and the current status of the program are discussed. Progress includes performance and stability analyses, cold flow tests of injector model, design and fabrication of subscale injectors and calorimeter combustors for performance, heat transfer, and dynamic stability tests, and preparation of hot fire test plans. Related, current, high pressure, LOX/RP-1 injector technology efforts are also briefly discussed.

  16. Guiding principles for peptide nanotechnology through directed discovery.

    PubMed

    Lampel, A; Ulijn, R V; Tuttle, T

    2018-05-21

    Life's diverse molecular functions are largely based on only a small number of highly conserved building blocks - the twenty canonical amino acids. These building blocks are chemically simple, but when they are organized in three-dimensional structures of tremendous complexity, new properties emerge. This review explores recent efforts in the directed discovery of functional nanoscale systems and materials based on these same amino acids, but that are not guided by copying or editing biological systems. The review summarises insights obtained using three complementary approaches of searching the sequence space to explore sequence-structure relationships for assembly, reactivity and complexation, namely: (i) strategic editing of short peptide sequences; (ii) computational approaches to predicting and comparing assembly behaviours; (iii) dynamic peptide libraries that explore the free energy landscape. These approaches give rise to guiding principles on controlling order/disorder, complexation and reactivity by peptide sequence design.

  17. Towards decoding the conifer giga-genome.

    PubMed

    Mackay, John; Dean, Jeffrey F D; Plomion, Christophe; Peterson, Daniel G; Cánovas, Francisco M; Pavy, Nathalie; Ingvarsson, Pär K; Savolainen, Outi; Guevara, M Ángeles; Fluch, Silvia; Vinceti, Barbara; Abarca, Dolores; Díaz-Sala, Carmen; Cervera, María-Teresa

    2012-12-01

    Several new initiatives have been launched recently to sequence conifer genomes including pines, spruces and Douglas-fir. Owing to the very large genome sizes ranging from 18 to 35 gigabases, sequencing even a single conifer genome had been considered unattainable until the recent throughput increases and cost reductions afforded by next generation sequencers. The purpose of this review is to describe the context for these new initiatives. A knowledge foundation has been acquired in several conifers of commercial and ecological interest through large-scale cDNA analyses, construction of genetic maps and gene mapping studies aiming to link phenotype and genotype. Exploratory sequencing in pines and spruces have pointed out some of the unique properties of these giga-genomes and suggested strategies that may be needed to extract value from their sequencing. The hope is that recent and pending developments in sequencing technology will contribute to rapidly filling the knowledge vacuum surrounding their structure, contents and evolution. Researchers are also making plans to use comparative analyses that will help to turn the data into a valuable resource for enhancing and protecting the world's conifer forests.

  18. Shear-driven dynamo waves at high magnetic Reynolds number.

    PubMed

    Tobias, S M; Cattaneo, F

    2013-05-23

    Astrophysical magnetic fields often display remarkable organization, despite being generated by dynamo action driven by turbulent flows at high conductivity. An example is the eleven-year solar cycle, which shows spatial coherence over the entire solar surface. The difficulty in understanding the emergence of this large-scale organization is that whereas at low conductivity (measured by the magnetic Reynolds number, Rm) dynamo fields are well organized, at high Rm their structure is dominated by rapidly varying small-scale fluctuations. This arises because the smallest scales have the highest rate of strain, and can amplify magnetic field most efficiently. Therefore most of the effort to find flows whose large-scale dynamo properties persist at high Rm has been frustrated. Here we report high-resolution simulations of a dynamo that can generate organized fields at high Rm; indeed, the generation mechanism, which involves the interaction between helical flows and shear, only becomes effective at large Rm. The shear does not enhance generation at large scales, as is commonly thought; instead it reduces generation at small scales. The solution consists of propagating dynamo waves, whose existence was postulated more than 60 years ago and which have since been used to model the solar cycle.

  19. TOXICOGENOMICS AND HUMAN DISEASE RISK ASSESSMENT

    EPA Science Inventory


    Toxicogenomics and Human Disease Risk Assessment.

    Complete sequencing of human and other genomes, availability of large-scale gene
    expression arrays with ever-increasing numbers of genes displayed, and steady
    improvements in protein expression technology can hav...

  20. Leveraging Large-Scale Cancer Genomics Datasets for Germline Discovery - TCGA

    Cancer.gov

    The session will review how data types have changed over time, focusing on how next-generation sequencing is being employed to yield more precise information about the underlying genomic variation that influences tumor etiology and biology.

  1. Human Y chromosome copy number variation in the next generation sequencing era and beyond.

    PubMed

    Massaia, Andrea; Xue, Yali

    2017-05-01

    The human Y chromosome provides a fertile ground for structural rearrangements owing to its haploidy and high content of repeated sequences. The methodologies used for copy number variation (CNV) studies have developed over the years. Low-throughput techniques based on direct observation of rearrangements were developed early on, and are still used, often to complement array-based or sequencing approaches which have limited power in regions with high repeat content and specifically in the presence of long, identical repeats, such as those found in human sex chromosomes. Some specific rearrangements have been investigated for decades; because of their effects on fertility, or their outstanding evolutionary features, the interest in these has not diminished. However, following the flourishing of large-scale genomics, several studies have investigated CNVs across the whole chromosome. These studies sometimes employ data generated within large genomic projects such as the DDD study or the 1000 Genomes Project, and often survey large samples of healthy individuals without any prior selection. Novel technologies based on sequencing long molecules and combinations of technologies, promise to stimulate the study of Y-CNVs in the immediate future.

  2. Identifying molecular drivers of gastric cancer through next-generation sequencing.

    PubMed

    Liang, Han; Kim, Yon Hui

    2013-11-01

    Gastric cancer is the second most common cause of cancer-related death in the world, representing a major global health issue. The high mortality rate is largely due to the lack of effective medical treatment for advanced stages of this disease. Recently next-generation sequencing (NGS) technology has become a revolutionary tool for cancer research, and several NGS studies in gastric cancer have been published. Here we review the insights gained from these studies regarding how use NGS to elucidate the molecular basis of gastric cancer and identify potential therapeutic targets. We also discuss the challenges and future directions of such efforts. Published by Elsevier Ireland Ltd.

  3. SNP-VISTA: An Interactive SNPs Visualization Tool

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shah, Nameeta; Teplitsky, Michael V.; Pennacchio, Len A.

    2005-07-05

    Recent advances in sequencing technologies promise better diagnostics for many diseases as well as better understanding of evolution of microbial populations. Single Nucleotide Polymorphisms(SNPs) are established genetic markers that aid in the identification of loci affecting quantitative traits and/or disease in a wide variety of eukaryotic species. With today's technological capabilities, it is possible to re-sequence a large set of appropriate candidate genes in individuals with a given disease and then screen for causative mutations.In addition, SNPs have been used extensively in efforts to study the evolution of microbial populations, and the recent application of random shotgun sequencing to environmentalmore » samples makes possible more extensive SNP analysis of co-occurring and co-evolving microbial populations. The program is available at http://genome.lbl.gov/vista/snpvista.« less

  4. MODFLOW-LGR: Practical application to a large regional dataset

    NASA Astrophysics Data System (ADS)

    Barnes, D.; Coulibaly, K. M.

    2011-12-01

    In many areas of the US, including southwest Florida, large regional-scale groundwater models have been developed to aid in decision making and water resources management. These models are subsequently used as a basis for site-specific investigations. Because the large scale of these regional models is not appropriate for local application, refinement is necessary to analyze the local effects of pumping wells and groundwater related projects at specific sites. The most commonly used approach to date is Telescopic Mesh Refinement or TMR. It allows the extraction of a subset of the large regional model with boundary conditions derived from the regional model results. The extracted model is then updated and refined for local use using a variable sized grid focused on the area of interest. MODFLOW-LGR, local grid refinement, is an alternative approach which allows model discretization at a finer resolution in areas of interest and provides coupling between the larger "parent" model and the locally refined "child." In the present work, these two approaches are tested on a mining impact assessment case in southwest Florida using a large regional dataset (The Lower West Coast Surficial Aquifer System Model). Various metrics for performance are considered. They include: computation time, water balance (as compared to the variable sized grid), calibration, implementation effort, and application advantages and limitations. The results indicate that MODFLOW-LGR is a useful tool to improve local resolution of regional scale models. While performance metrics, such as computation time, are case-dependent (model size, refinement level, stresses involved), implementation effort, particularly when regional models of suitable scale are available, can be minimized. The creation of multiple child models within a larger scale parent model makes it possible to reuse the same calibrated regional dataset with minimal modification. In cases similar to the Lower West Coast model, where a model is larger than optimal for direct application as a parent grid, a combination of TMR and LGR approaches should be used to develop a suitable parent grid.

  5. Effects of Ensemble Configuration on Estimates of Regional Climate Uncertainties

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Goldenson, N.; Mauger, G.; Leung, L. R.

    Internal variability in the climate system can contribute substantial uncertainty in climate projections, particularly at regional scales. Internal variability can be quantified using large ensembles of simulations that are identical but for perturbed initial conditions. Here we compare methods for quantifying internal variability. Our study region spans the west coast of North America, which is strongly influenced by El Niño and other large-scale dynamics through their contribution to large-scale internal variability. Using a statistical framework to simultaneously account for multiple sources of uncertainty, we find that internal variability can be quantified consistently using a large ensemble or an ensemble ofmore » opportunity that includes small ensembles from multiple models and climate scenarios. The latter also produce estimates of uncertainty due to model differences. We conclude that projection uncertainties are best assessed using small single-model ensembles from as many model-scenario pairings as computationally feasible, which has implications for ensemble design in large modeling efforts.« less

  6. Lessons learnt on the analysis of large sequence data in animal genomics.

    PubMed

    Biscarini, F; Cozzi, P; Orozco-Ter Wengel, P

    2018-04-06

    The 'omics revolution has made a large amount of sequence data available to researchers and the industry. This has had a profound impact in the field of bioinformatics, stimulating unprecedented advancements in this discipline. Mostly, this is usually looked at from the perspective of human 'omics, in particular human genomics. Plant and animal genomics, however, have also been deeply influenced by next-generation sequencing technologies, with several genomics applications now popular among researchers and the breeding industry. Genomics tends to generate huge amounts of data, and genomic sequence data account for an increasing proportion of big data in biological sciences, due largely to decreasing sequencing and genotyping costs and to large-scale sequencing and resequencing projects. The analysis of big data poses a challenge to scientists, as data gathering currently takes place at a faster pace than does data processing and analysis, and the associated computational burden is increasingly taxing, making even simple manipulation, visualization and transferring of data a cumbersome operation. The time consumed by the processing and analysing of huge data sets may be at the expense of data quality assessment and critical interpretation. Additionally, when analysing lots of data, something is likely to go awry-the software may crash or stop-and it can be very frustrating to track the error. We herein review the most relevant issues related to tackling these challenges and problems, from the perspective of animal genomics, and provide researchers that lack extensive computing experience with guidelines that will help when processing large genomic data sets. © 2018 Stichting International Foundation for Animal Genetics.

  7. xHMMER3x2: Utilizing HMMER3's speed and HMMER2's sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation.

    PubMed

    Yap, Choon-Kong; Eisenhaber, Birgit; Eisenhaber, Frank; Wong, Wing-Cheong

    2016-11-29

    While the local-mode HMMER3 is notable for its massive speed improvement, the slower glocal-mode HMMER2 is more exact for domain annotation by enforcing full domain-to-sequence alignments. Since a unit of domain necessarily implies a unit of function, local-mode HMMER3 alone remains insufficient for precise function annotation tasks. In addition, the incomparable E-values for the same domain model by different HMMER builds create difficulty when checking for domain annotation consistency on a large-scale basis. In this work, both the speed of HMMER3 and glocal-mode alignment of HMMER2 are combined within the xHMMER3x2 framework for tackling the large-scale domain annotation task. Briefly, HMMER3 is utilized for initial domain detection so that HMMER2 can subsequently perform the glocal-mode, sequence-to-full-domain alignments for the detected HMMER3 hits. An E-value calibration procedure is required to ensure that the search space by HMMER2 is sufficiently replicated by HMMER3. We find that the latter is straightforwardly possible for ~80% of the models in the Pfam domain library (release 29). However in the case of the remaining ~20% of HMMER3 domain models, the respective HMMER2 counterparts are more sensitive. Thus, HMMER3 searches alone are insufficient to ensure sensitivity and a HMMER2-based search needs to be initiated. When tested on the set of UniProt human sequences, xHMMER3x2 can be configured to be between 7× and 201× faster than HMMER2, but with descending domain detection sensitivity from 99.8 to 95.7% with respect to HMMER2 alone; HMMER3's sensitivity was 95.7%. At extremes, xHMMER3x2 is either the slow glocal-mode HMMER2 or the fast HMMER3 with glocal-mode. Finally, the E-values to false-positive rates (FPR) mapping by xHMMER3x2 allows E-values of different model builds to be compared, so that any annotation discrepancies in a large-scale annotation exercise can be flagged for further examination by dissectHMMER. The xHMMER3x2 workflow allows large-scale domain annotation speed to be drastically improved over HMMER2 without compromising for domain-detection with regard to sensitivity and sequence-to-domain alignment incompleteness. The xHMMER3x2 code and its webserver (for Pfam release 27, 28 and 29) are freely available at http://xhmmer3x2.bii.a-star.edu.sg/ . Reviewed by Thomas Dandekar, L. Aravind, Oliviero Carugo and Shamil Sunyaev. For the full reviews, please go to the Reviewers' comments section.

  8. Large-scale horizontal flows from SOUP observations of solar granulation

    NASA Astrophysics Data System (ADS)

    November, L. J.; Simon, G. W.; Tarbell, T. D.; Title, A. M.; Ferguson, S. H.

    1987-09-01

    Using high-resolution time-sequence photographs of solar granulation from the SOUP experiment on Spacelab 2 the authors observed large-scale horizontal flows in the solar surface. The measurement method is based upon a local spatial cross correlation analysis. The horizontal motions have amplitudes in the range 300 to 1000 m/s. Radial outflow of granulation from a sunspot penumbra into the surrounding photosphere is a striking new discovery. Both the supergranulation pattern and cellular structures having the scale of mesogranulation are seen. The vertical flows that are inferred by continuity of mass from these observed horizontal flows have larger upflow amplitudes in cell centers than downflow amplitudes at cell boundaries.

  9. Data integration in the era of omics: current and future challenges

    PubMed Central

    2014-01-01

    To integrate heterogeneous and large omics data constitutes not only a conceptual challenge but a practical hurdle in the daily analysis of omics data. With the rise of novel omics technologies and through large-scale consortia projects, biological systems are being further investigated at an unprecedented scale generating heterogeneous and often large data sets. These data-sets encourage researchers to develop novel data integration methodologies. In this introduction we review the definition and characterize current efforts on data integration in the life sciences. We have used a web-survey to assess current research projects on data-integration to tap into the views, needs and challenges as currently perceived by parts of the research community. PMID:25032990

  10. A multilocus population genetic survey of the greater sage-grouse across their range.

    PubMed

    Oyler-McCance, S J; Taylor, S E; Quinn, T W

    2005-04-01

    The distribution and abundance of the greater sage-grouse (Centrocercus urophasianus) have declined dramatically, and as a result the species has become the focus of conservation efforts. We conducted a range-wide genetic survey of the species which included 46 populations and over 1000 individuals using both mitochondrial sequence data and data from seven nuclear microsatellites. Nested clade and structure analyses revealed that, in general, the greater sage-grouse populations follow an isolation-by-distance model of restricted gene flow. This suggests that movements of the greater sage-grouse are typically among neighbouring populations and not across the species, range. This may have important implications if management is considering translocations as they should involve neighbouring rather than distant populations to preserve any effects of local adaptation. We identified two populations in Washington with low levels of genetic variation that reflect severe habitat loss and dramatic population decline. Managers of these populations may consider augmentation from geographically close populations. One population (Lyon/Mono) on the southwestern edge of the species' range appears to have been isolated from all other greater sage-grouse populations. This population is sufficiently genetically distinct that it warrants protection and management as a separate unit. The genetic data presented here, in conjunction with large-scale demographic and habitat data, will provide an integrated approach to conservation efforts for the greater sage-grouse.

  11. Cancer Genomics: Diversity and Disparity Across Ethnicity and Geography.

    PubMed

    Tan, Daniel S W; Mok, Tony S K; Rebbeck, Timothy R

    2016-01-01

    Ethnic and geographic differences in cancer incidence, prognosis, and treatment outcomes can be attributed to diversity in the inherited (germline) and somatic genome. Although international large-scale sequencing efforts are beginning to unravel the genomic underpinnings of cancer traits, much remains to be known about the underlying mechanisms and determinants of genomic diversity. Carcinogenesis is a dynamic, complex phenomenon representing the interplay between genetic and environmental factors that results in divergent phenotypes across ethnicities and geography. For example, compared with whites, there is a higher incidence of prostate cancer among Africans and African Americans, and the disease is generally more aggressive and fatal. Genome-wide association studies have identified germline susceptibility loci that may account for differences between the African and non-African patients, but the lack of availability of appropriate cohorts for replication studies and the incomplete understanding of genomic architecture across populations pose major limitations. We further discuss the transformative potential of routine diagnostic evaluation for actionable somatic alterations, using lung cancer as an example, highlighting implications of population disparities, current hurdles in implementation, and the far-reaching potential of clinical genomics in enhancing cancer prevention, diagnosis, and treatment. As we enter the era of precision cancer medicine, a concerted multinational effort is key to addressing population and genomic diversity as well as overcoming barriers and geographical disparities in research and health care delivery. © 2015 by American Society of Clinical Oncology.

  12. Next-Generation Sequencing of the Chrysanthemum nankingense (Asteraceae) Transcriptome Permits Large-Scale Unigene Assembly and SSR Marker Discovery

    PubMed Central

    Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi

    2013-01-01

    Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799

  13. NCBI BLAST+ integrated into Galaxy.

    PubMed

    Cock, Peter J A; Chilton, John M; Grüning, Björn; Johnson, James E; Soranzo, Nicola

    2015-01-01

    The NCBI BLAST suite has become ubiquitous in modern molecular biology and is used for small tasks such as checking capillary sequencing results of single PCR products, genome annotation or even larger scale pan-genome analyses. For early adopters of the Galaxy web-based biomedical data analysis platform, integrating BLAST into Galaxy was a natural step for sequence comparison workflows. The command line NCBI BLAST+ tool suite was wrapped for use within Galaxy. Appropriate datatypes were defined as needed. The integration of the BLAST+ tool suite into Galaxy has the goal of making common BLAST tasks easy and advanced tasks possible. This project is an informal international collaborative effort, and is deployed and used on Galaxy servers worldwide. Several examples of applications are described here.

  14. Applications of large-scale density functional theory in biology

    NASA Astrophysics Data System (ADS)

    Cole, Daniel J.; Hine, Nicholas D. M.

    2016-10-01

    Density functional theory (DFT) has become a routine tool for the computation of electronic structure in the physics, materials and chemistry fields. Yet the application of traditional DFT to problems in the biological sciences is hindered, to a large extent, by the unfavourable scaling of the computational effort with system size. Here, we review some of the major software and functionality advances that enable insightful electronic structure calculations to be performed on systems comprising many thousands of atoms. We describe some of the early applications of large-scale DFT to the computation of the electronic properties and structure of biomolecules, as well as to paradigmatic problems in enzymology, metalloproteins, photosynthesis and computer-aided drug design. With this review, we hope to demonstrate that first principles modelling of biological structure-function relationships are approaching a reality.

  15. Chromosome-level assembly of Arabidopsis thaliana Ler reveals the extent of translocation and inversion polymorphisms.

    PubMed

    Zapata, Luis; Ding, Jia; Willing, Eva-Maria; Hartwig, Benjamin; Bezdan, Daniela; Jiao, Wen-Biao; Patel, Vipul; Velikkakam James, Geo; Koornneef, Maarten; Ossowski, Stephan; Schneeberger, Korbinian

    2016-07-12

    Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thaliana accessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of the A. thaliana Landsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ∼3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.

  16. A comparison of selected MMPI-2 and MMPI-2-RF validity scales in assessing effort on cognitive tests in a military sample.

    PubMed

    Jones, Alvin; Ingram, M Victoria

    2011-10-01

    Using a relatively new statistical paradigm, Optimal Data Analysis (ODA; Yarnold & Soltysik, 2005), this research demonstrated that newly developed scales for the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) and MMPI-2 Restructured Form (MMPI-2-RF) specifically designed to assess over-reporting of cognitive and/or somatic symptoms were more effective than the MMPI-2 F-family of scales in predicting effort status on tests of cognitive functioning in a sample of 288 military members. ODA demonstrated that when all scales were performing at their theoretical maximum possible level of classification accuracy, the Henry Heilbronner Index (HHI), Response Bias Scale (RBS), Fake Bad Scale (FBS), and the Symptom Validity Scale (FBS-r) outperformed the F-family of scales on a variety of ODA indexes of classification accuracy, including an omnibus measure (effect strength total, EST) of the descriptive and prognostic utility of ODA models developed for each scale. Based on the guidelines suggested by Yarnold and Soltysik for evaluating effect strengths for ODA models, the newly developed scales had effects sizes that were moderate in size (37.66 to 45.68), whereas the F-family scales had effects strengths that ranged from weak to moderate (15.42 to 32.80). In addition, traditional analysis demonstrated that HHI, RBS, FBS, and FBS-R had large effect sizes (0.98 to 1.16) based on Cohen's (1988) suggested categorization of effect size when comparing mean scores for adequate versus inadequate effort groups, whereas F-family of scales had small to medium effect sizes (0.25 to 0.76). The MMPI-2-RF Infrequent Somatic Responses Scale (F(S)) tended to perform in a fashion similar to F, the best performing F-family scale.

  17. Large-Scale Low-Cost NGS Library Preparation Using a Robust Tn5 Purification and Tagmentation Protocol

    PubMed Central

    Hennig, Bianca P.; Velten, Lars; Racke, Ines; Tu, Chelsea Szu; Thoms, Matthias; Rybin, Vladimir; Besir, Hüseyin; Remans, Kim; Steinmetz, Lars M.

    2017-01-01

    Efficient preparation of high-quality sequencing libraries that well represent the biological sample is a key step for using next-generation sequencing in research. Tn5 enables fast, robust, and highly efficient processing of limited input material while scaling to the parallel processing of hundreds of samples. Here, we present a robust Tn5 transposase purification strategy based on an N-terminal His6-Sumo3 tag. We demonstrate that libraries prepared with our in-house Tn5 are of the same quality as those processed with a commercially available kit (Nextera XT), while they dramatically reduce the cost of large-scale experiments. We introduce improved purification strategies for two versions of the Tn5 enzyme. The first version carries the previously reported point mutations E54K and L372P, and stably produces libraries of constant fragment size distribution, even if the Tn5-to-input molecule ratio varies. The second Tn5 construct carries an additional point mutation (R27S) in the DNA-binding domain. This construct allows for adjustment of the fragment size distribution based on enzyme concentration during tagmentation, a feature that opens new opportunities for use of Tn5 in customized experimental designs. We demonstrate the versatility of our Tn5 enzymes in different experimental settings, including a novel single-cell polyadenylation site mapping protocol as well as ultralow input DNA sequencing. PMID:29118030

  18. Genome Partitioner: A web tool for multi-level partitioning of large-scale DNA constructs for synthetic biology applications.

    PubMed

    Christen, Matthias; Del Medico, Luca; Christen, Heinz; Christen, Beat

    2017-01-01

    Recent advances in lower-cost DNA synthesis techniques have enabled new innovations in the field of synthetic biology. Still, efficient design and higher-order assembly of genome-scale DNA constructs remains a labor-intensive process. Given the complexity, computer assisted design tools that fragment large DNA sequences into fabricable DNA blocks are needed to pave the way towards streamlined assembly of biological systems. Here, we present the Genome Partitioner software implemented as a web-based interface that permits multi-level partitioning of genome-scale DNA designs. Without the need for specialized computing skills, biologists can submit their DNA designs to a fully automated pipeline that generates the optimal retrosynthetic route for higher-order DNA assembly. To test the algorithm, we partitioned a 783 kb Caulobacter crescentus genome design. We validated the partitioning strategy by assembling a 20 kb test segment encompassing a difficult to synthesize DNA sequence. Successful assembly from 1 kb subblocks into the 20 kb segment highlights the effectiveness of the Genome Partitioner for reducing synthesis costs and timelines for higher-order DNA assembly. The Genome Partitioner is broadly applicable to translate DNA designs into ready to order sequences that can be assembled with standardized protocols, thus offering new opportunities to harness the diversity of microbial genomes for synthetic biology applications. The Genome Partitioner web tool can be accessed at https://christenlab.ethz.ch/GenomePartitioner.

  19. History and current status of wheat miRNAs using next-generation sequencing and their roles in development and stress.

    PubMed

    Budak, Hikmet; Khan, Zaeema; Kantar, Melda

    2015-05-01

    As small molecules that aid in posttranscriptional silencing, microRNA (miRNA) discovery and characterization have vastly benefited from the recent development and widespread application of next-generation sequencing (NGS) technologies. Several miRNAs were identified through sequencing of constructed small RNA libraries, whereas others were predicted by in silico methods using the recently accumulating sequence data. NGS was a major breakthrough in efforts to sequence and dissect the genomes of plants, including bread wheat and its progenitors, which have large, repetitive and complex genomes. Availability of survey sequences of wheat whole genome and its individual chromosomes enabled researchers to predict and assess wheat miRNAs both in the subgenomic and whole genome levels. Moreover, small RNA construction and sequencing-based studies identified several putative development- and stress-related wheat miRNAs, revealing their differential expression patterns in specific developmental stages and/or in response to stress conditions. With the vast amount of wheat miRNAs identified in recent years, we are approaching to an overall knowledge on the wheat miRNA repertoire. In the following years, more comprehensive research in relation to miRNA conservation or divergence across wheat and its close relatives or progenitors should be performed. Results may serve valuable in understanding both the significant roles of species-specific miRNAs and also provide us information in relation to the dynamics between miRNAs and evolution in wheat. Furthermore, putative development- or stress-related miRNAs identified should be subjected to further functional analysis, which may be valuable in efforts to develop wheat with better resistance and/or yield. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  20. Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

    DOE PAGES

    Yim, Won Cheol; Cushman, John C.

    2017-07-22

    Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less

  1. Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yim, Won Cheol; Cushman, John C.

    Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less

  2. GOLabeler: Improving Sequence-based Large-scale Protein Function Prediction by Learning to Rank.

    PubMed

    You, Ronghui; Zhang, Zihan; Xiong, Yi; Sun, Fengzhu; Mamitsuka, Hiroshi; Zhu, Shanfeng

    2018-03-07

    Gene Ontology (GO) has been widely used to annotate functions of proteins and understand their biological roles. Currently only <1% of more than 70 million proteins in UniProtKB have experimental GO annotations, implying the strong necessity of automated function prediction (AFP) of proteins, where AFP is a hard multilabel classification problem due to one protein with a diverse number of GO terms. Most of these proteins have only sequences as input information, indicating the importance of sequence-based AFP (SAFP: sequences are the only input). Furthermore homology-based SAFP tools are competitive in AFP competitions, while they do not necessarily work well for so-called difficult proteins, which have <60% sequence identity to proteins with annotations already. Thus the vital and challenging problem now is how to develop a method for SAFP, particularly for difficult proteins. The key of this method is to extract not only homology information but also diverse, deep- rooted information/evidence from sequence inputs and integrate them into a predictor in a both effective and efficient manner. We propose GOLabeler, which integrates five component classifiers, trained from different features, including GO term frequency, sequence alignment, amino acid trigram, domains and motifs, and biophysical properties, etc., in the framework of learning to rank (LTR), a paradigm of machine learning, especially powerful for multilabel classification. The empirical results obtained by examining GOLabeler extensively and thoroughly by using large-scale datasets revealed numerous favorable aspects of GOLabeler, including significant performance advantage over state-of-the-art AFP methods. http://datamining-iip.fudan.edu.cn/golabeler. zhusf@fudan.edu.cn. Supplementary data are available at Bioinformatics online.

  3. Use of Whole Genome Sequencing for Diagnosis and Discovery in the Cancer Genetics Clinic

    PubMed Central

    Foley, Samantha B.; Rios, Jonathan J.; Mgbemena, Victoria E.; Robinson, Linda S.; Hampel, Heather L.; Toland, Amanda E.; Durham, Leslie; Ross, Theodora S.

    2014-01-01

    Despite the potential of whole-genome sequencing (WGS) to improve patient diagnosis and care, the empirical value of WGS in the cancer genetics clinic is unknown. We performed WGS on members of two cohorts of cancer genetics patients: those with BRCA1/2 mutations (n = 176) and those without (n = 82). Initial analysis of potentially pathogenic variants (PPVs, defined as nonsynonymous variants with allele frequency < 1% in ESP6500) in 163 clinically-relevant genes suggested that WGS will provide useful clinical results. This is despite the fact that a majority of PPVs were novel missense variants likely to be classified as variants of unknown significance (VUS). Furthermore, previously reported pathogenic missense variants did not always associate with their predicted diseases in our patients. This suggests that the clinical use of WGS will require large-scale efforts to consolidate WGS and patient data to improve accuracy of interpretation of rare variants. While loss-of-function (LoF) variants represented only a small fraction of PPVs, WGS identified additional cancer risk LoF PPVs in patients with known BRCA1/2 mutations and led to cancer risk diagnoses in 21% of non-BRCA cancer genetics patients after expanding our analysis to 3209 ClinVar genes. These data illustrate how WGS can be used to improve our ability to discover patients' cancer genetic risks. PMID:26023681

  4. Localized structural frustration for evaluating the impact of sequence variants.

    PubMed

    Kumar, Sushant; Clarke, Declan; Gerstein, Mark

    2016-12-01

    Population-scale sequencing is increasingly uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions of the genome. The rarity of these variants makes it challenging to evaluate their deleteriousness with conventional phenotype-genotype associations. Protein structures provide a way of addressing this challenge. Previous efforts have focused on globally quantifying the impact of SNVs on protein stability. However, local perturbations may severely impact protein functionality without strongly disrupting global stability (e.g. in relation to catalysis or allostery). Here, we describe a workflow in which localized frustration, quantifying unfavorable local interactions, is employed as a metric to investigate such effects. Using this workflow on the Protein Databank, we find that frustration produces many immediately intuitive results: for instance, disease-related SNVs create stronger changes in localized frustration than non-disease related variants, and rare SNVs tend to disrupt local interactions to a larger extent than common variants. Less obviously, we observe that somatic SNVs associated with oncogenes and tumor suppressor genes (TSGs) induce very different changes in frustration. In particular, those associated with TSGs change the frustration more in the core than the surface (by introducing loss-of-function events), whereas those associated with oncogenes manifest the opposite pattern, creating gain-of-function events. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale

    PubMed Central

    Schmidt, Thomas S. B.; Matias Rodrigues, João F.; von Mering, Christian

    2014-01-01

    Operational Taxonomic Units (OTUs), usually defined as clusters of similar 16S/18S rRNA sequences, are the most widely used basic diversity units in large-scale characterizations of microbial communities. However, it remains unclear how well the various proposed OTU clustering algorithms approximate ‘true’ microbial taxa. Here, we explore the ecological consistency of OTUs – based on the assumption that, like true microbial taxa, they should show measurable habitat preferences (niche conservatism). In a global and comprehensive survey of available microbial sequence data, we systematically parse sequence annotations to obtain broad ecological descriptions of sampling sites. Based on these, we observe that sequence-based microbial OTUs generally show high levels of ecological consistency. However, different OTU clustering methods result in marked differences in the strength of this signal. Assuming that ecological consistency can serve as an objective external benchmark for cluster quality, we conclude that hierarchical complete linkage clustering, which provided the most ecologically consistent partitions, should be the default choice for OTU clustering. To our knowledge, this is the first approach to assess cluster quality using an external, biologically meaningful parameter as a benchmark, on a global scale. PMID:24763141

  6. Cost-Driven Design of a Large Scale X-Plane

    NASA Technical Reports Server (NTRS)

    Welstead, Jason R.; Frederic, Peter C.; Frederick, Michael A.; Jacobson, Steven R.; Berton, Jeffrey J.

    2017-01-01

    A conceptual design process focused on the development of a low-cost, large scale X-plane was developed as part of an internal research and development effort. One of the concepts considered for this process was the double-bubble configuration recently developed as an advanced single-aisle class commercial transport similar in size to a Boeing 737-800 or Airbus A320. The study objective was to reduce the contractor cost from contract award to first test flight to less than $100 million, and having the first flight within three years of contract award. Methods and strategies for reduced cost are discussed.

  7. Large-Scale Demonstration of Liquid Hydrogen Storage with Zero Boiloff for In-Space Applications

    NASA Technical Reports Server (NTRS)

    Hastings, L. J.; Bryant, C. B.; Flachbart, R. H.; Holt, K. A.; Johnson, E.; Hedayat, A.; Hipp, B.; Plachta, D. W.

    2010-01-01

    Cryocooler and passive insulation technology advances have substantially improved prospects for zero-boiloff cryogenic storage. Therefore, a cooperative effort by NASA s Ames Research Center, Glenn Research Center, and Marshall Space Flight Center (MSFC) was implemented to develop zero-boiloff concepts for in-space cryogenic storage. Described herein is one program element - a large-scale, zero-boiloff demonstration using the MSFC multipurpose hydrogen test bed (MHTB). A commercial cryocooler was interfaced with an existing MHTB spray bar mixer and insulation system in a manner that enabled a balance between incoming and extracted thermal energy.

  8. Computational psychiatry

    PubMed Central

    Montague, P. Read; Dolan, Raymond J.; Friston, Karl J.; Dayan, Peter

    2013-01-01

    Computational ideas pervade many areas of science and have an integrative explanatory role in neuroscience and cognitive science. However, computational depictions of cognitive function have had surprisingly little impact on the way we assess mental illness because diseases of the mind have not been systematically conceptualized in computational terms. Here, we outline goals and nascent efforts in the new field of computational psychiatry, which seeks to characterize mental dysfunction in terms of aberrant computations over multiple scales. We highlight early efforts in this area that employ reinforcement learning and game theoretic frameworks to elucidate decision-making in health and disease. Looking forwards, we emphasize a need for theory development and large-scale computational phenotyping in human subjects. PMID:22177032

  9. Effect of small versus large clusters of fish school on the yield of a purse-seine small pelagic fishery including a marine protected area.

    PubMed

    Hieu, Nguyen Trong; Brochier, Timothée; Tri, Nguyen-Huu; Auger, Pierre; Brehmer, Patrice

    2014-09-01

    We consider a fishery model with two sites: (1) a marine protected area (MPA) where fishing is prohibited and (2) an area where the fish population is harvested. We assume that fish can migrate from MPA to fishing area at a very fast time scale and fish spatial organisation can change from small to large clusters of school at a fast time scale. The growth of the fish population and the catch are assumed to occur at a slow time scale. The complete model is a system of five ordinary differential equations with three time scales. We take advantage of the time scales using aggregation of variables methods to derive a reduced model governing the total fish density and fishing effort at the slow time scale. We analyze this aggregated model and show that under some conditions, there exists an equilibrium corresponding to a sustainable fishery. Our results suggest that in small pelagic fisheries the yield is maximum for a fish population distributed among both small and large clusters of school.

  10. CORALINA: a universal method for the generation of gRNA libraries for CRISPR-based screening.

    PubMed

    Köferle, Anna; Worf, Karolina; Breunig, Christopher; Baumann, Valentin; Herrero, Javier; Wiesbeck, Maximilian; Hutter, Lukas H; Götz, Magdalena; Fuchs, Christiane; Beck, Stephan; Stricker, Stefan H

    2016-11-14

    The bacterial CRISPR system is fast becoming the most popular genetic and epigenetic engineering tool due to its universal applicability and adaptability. The desire to deploy CRISPR-based methods in a large variety of species and contexts has created an urgent need for the development of easy, time- and cost-effective methods enabling large-scale screening approaches. Here we describe CORALINA (comprehensive gRNA library generation through controlled nuclease activity), a method for the generation of comprehensive gRNA libraries for CRISPR-based screens. CORALINA gRNA libraries can be derived from any source of DNA without the need of complex oligonucleotide synthesis. We show the utility of CORALINA for human and mouse genomic DNA, its reproducibility in covering the most relevant genomic features including regulatory, coding and non-coding sequences and confirm the functionality of CORALINA generated gRNAs. The simplicity and cost-effectiveness make CORALINA suitable for any experimental system. The unprecedented sequence complexities obtainable with CORALINA libraries are a necessary pre-requisite for less biased large scale genomic and epigenomic screens.

  11. Large-scale particle acceleration by magnetic reconnection during solar flares

    NASA Astrophysics Data System (ADS)

    Li, X.; Guo, F.; Li, H.; Li, G.; Li, S.

    2017-12-01

    Magnetic reconnection that triggers explosive magnetic energy release has been widely invoked to explain the large-scale particle acceleration during solar flares. While great efforts have been spent in studying the acceleration mechanism in small-scale kinetic simulations, there have been rare studies that make predictions to acceleration in the large scale comparable to the flare reconnection region. Here we present a new arrangement to study this problem. We solve the large-scale energetic-particle transport equation in the fluid velocity and magnetic fields from high-Lundquist-number MHD simulations of reconnection layers. This approach is based on examining the dominant acceleration mechanism and pitch-angle scattering in kinetic simulations. Due to the fluid compression in reconnection outflows and merging magnetic islands, particles are accelerated to high energies and develop power-law energy distributions. We find that the acceleration efficiency and power-law index depend critically on upstream plasma beta and the magnitude of guide field (the magnetic field component perpendicular to the reconnecting component) as they influence the compressibility of the reconnection layer. We also find that the accelerated high-energy particles are mostly concentrated in large magnetic islands, making the islands a source of energetic particles and high-energy emissions. These findings may provide explanations for acceleration process in large-scale magnetic reconnection during solar flares and the temporal and spatial emission properties observed in different flare events.

  12. DNA-encoded chemistry: enabling the deeper sampling of chemical space.

    PubMed

    Goodnow, Robert A; Dumelin, Christoph E; Keefe, Anthony D

    2017-02-01

    DNA-encoded chemical library technologies are increasingly being adopted in drug discovery for hit and lead generation. DNA-encoded chemistry enables the exploration of chemical spaces four to five orders of magnitude more deeply than is achievable by traditional high-throughput screening methods. Operation of this technology requires developing a range of capabilities including aqueous synthetic chemistry, building block acquisition, oligonucleotide conjugation, large-scale molecular biological transformations, selection methodologies, PCR, sequencing, sequence data analysis and the analysis of large chemistry spaces. This Review provides an overview of the development and applications of DNA-encoded chemistry, highlighting the challenges and future directions for the use of this technology.

  13. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.

    PubMed

    Meinicke, Peter

    2009-09-02

    Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  14. Genomic catastrophes frequently arise in esophageal adenocarcinoma and drive tumorigenesis

    PubMed Central

    Patch, Ann-Marie; Bailey, Peter; Newell, Felicity; Holmes, Oliver; Fink, J. Lynn; Quinn, Michael C.J.; Tang, Yue Hang; Lampe, Guy; Quek, Kelly; Loffler, Kelly A.; Manning, Suzanne; Idrisoglu, Senel; Miller, David; Xu, Qinying; Waddell, Nick; Wilson, Peter J.; Bruxner, Timothy J.C.; Christ, Angelika N.; Harliwong, Ivon; Nourse, Craig; Nourbakhsh, Ehsan; Anderson, Matthew; Kazakoff, Stephen; Leonard, Conrad; Wood, Scott; Simpson, Peter T.; Reid, Lynne E.; Krause, Lutz; Hussey, Damian J.; Watson, David I.; Lord, Reginald V.; Nancarrow, Derek; Phillips, Wayne A.; Gotley, David; Smithers, B. Mark; Whiteman, David C.; Hayward, Nicholas K.; Campbell, Peter J.; Pearson, John V.; Grimmond, Sean M.; Barbour, Andrew P.

    2015-01-01

    Oesophageal adenocarcinoma (EAC) incidence is rapidly increasing in Western countries. A better understanding of EAC underpins efforts to improve early detection and treatment outcomes. While large EAC exome sequencing efforts to date have found recurrent loss-of-function mutations, oncogenic driving events have been underrepresented. Here we use a combination of whole-genome sequencing (WGS) and single-nucleotide polymorphism-array profiling to show that genomic catastrophes are frequent in EAC, with almost a third (32%, n = 40/123) undergoing chromothriptic events. WGS of 22 EAC cases show that catastrophes may lead to oncogene amplification through chromothripsis-derived double-minute chromosome formation (MYC and MDM2) or breakage-fusion-bridge (KRAS, MDM2 and RFC3). Telomere shortening is more prominent in EACs bearing localized complex rearrangements. Mutational signature analysis also confirms that extreme genomic instability in EAC can be driven by somatic BRCA2 mutations. These findings suggest that genomic catastrophes have a significant role in the malignant transformation of EAC. PMID:25351503

  15. APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins.

    PubMed

    Sharan, Malvika; Förstner, Konrad U; Eulalio, Ana; Vogel, Jörg

    2017-06-20

    RNA-binding proteins (RBPs) have been established as core components of several post-transcriptional gene regulation mechanisms. Experimental techniques such as cross-linking and co-immunoprecipitation have enabled the identification of RBPs, RNA-binding domains (RBDs) and their regulatory roles in the eukaryotic species such as human and yeast in large-scale. In contrast, our knowledge of the number and potential diversity of RBPs in bacteria is poorer due to the technical challenges associated with the existing global screening approaches. We introduce APRICOT, a computational pipeline for the sequence-based identification and characterization of proteins using RBDs known from experimental studies. The pipeline identifies functional motifs in protein sequences using position-specific scoring matrices and Hidden Markov Models of the functional domains and statistically scores them based on a series of sequence-based features. Subsequently, APRICOT identifies putative RBPs and characterizes them by several biological properties. Here we demonstrate the application and adaptability of the pipeline on large-scale protein sets, including the bacterial proteome of Escherichia coli. APRICOT showed better performance on various datasets compared to other existing tools for the sequence-based prediction of RBPs by achieving an average sensitivity and specificity of 0.90 and 0.91 respectively. The command-line tool and its documentation are available at https://pypi.python.org/pypi/bio-apricot. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Large-scale sequence and structural comparisons of human naive and antigen-experienced antibody repertoires.

    PubMed

    DeKosky, Brandon J; Lungu, Oana I; Park, Daechan; Johnson, Erik L; Charab, Wissam; Chrysostomou, Constantine; Kuroda, Daisuke; Ellington, Andrew D; Ippolito, Gregory C; Gray, Jeffrey J; Georgiou, George

    2016-05-10

    Elucidating how antigen exposure and selection shape the human antibody repertoire is fundamental to our understanding of B-cell immunity. We sequenced the paired heavy- and light-chain variable regions (VH and VL, respectively) from large populations of single B cells combined with computational modeling of antibody structures to evaluate sequence and structural features of human antibody repertoires at unprecedented depth. Analysis of a dataset comprising 55,000 antibody clusters from CD19(+)CD20(+)CD27(-) IgM-naive B cells, >120,000 antibody clusters from CD19(+)CD20(+)CD27(+) antigen-experienced B cells, and >2,000 RosettaAntibody-predicted structural models across three healthy donors led to a number of key findings: (i) VH and VL gene sequences pair in a combinatorial fashion without detectable pairing restrictions at the population level; (ii) certain VH:VL gene pairs were significantly enriched or depleted in the antigen-experienced repertoire relative to the naive repertoire; (iii) antigen selection increased antibody paratope net charge and solvent-accessible surface area; and (iv) public heavy-chain third complementarity-determining region (CDR-H3) antibodies in the antigen-experienced repertoire showed signs of convergent paired light-chain genetic signatures, including shared light-chain third complementarity-determining region (CDR-L3) amino acid sequences and/or Vκ,λ-Jκ,λ genes. The data reported here address several longstanding questions regarding antibody repertoire selection and development and provide a benchmark for future repertoire-scale analyses of antibody responses to vaccination and disease.

  17. De novo assembly and next-generation sequencing to analyse full-length gene variants from codon-barcoded libraries.

    PubMed

    Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee

    2015-09-21

    Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.

  18. Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

    PubMed

    Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

    2004-03-01

    One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.

  19. Arrays of probes for positional sequencing by hybridization

    DOEpatents

    Cantor, Charles R [Boston, MA; Prezetakiewiczr, Marek [East Boston, MA; Smith, Cassandra L [Boston, MA; Sano, Takeshi [Waltham, MA

    2008-01-15

    This invention is directed to methods and reagents useful for sequencing nucleic acid targets utilizing sequencing by hybridization technology comprising probes, arrays of probes and methods whereby sequence information is obtained rapidly and efficiently in discrete packages. That information can be used for the detection, identification, purification and complete or partial sequencing of a particular target nucleic acid. When coupled with a ligation step, these methods can be performed under a single set of hybridization conditions. The invention also relates to the replication of probe arrays and methods for making and replicating arrays of probes which are useful for the large scale manufacture of diagnostic aids used to screen biological samples for specific target sequences. Arrays created using PCR technology may comprise probes with 5'- and/or 3'-overhangs.

  20. Mycotoxins: A fungal genomics perspective

    USDA-ARS?s Scientific Manuscript database

    The chemical and enzymatic diversity in the fungal kingdom is staggering. Large-scale fungal genome sequencing projects are generating a massive catalog of secondary metabolite biosynthetic genes and pathways. Fungal natural products are a boon and bane to man as valuable pharmaceuticals and harmful...

  1. Ocean Sciences meets Big Data Analytics

    NASA Astrophysics Data System (ADS)

    Hurwitz, B. L.; Choi, I.; Hartman, J.

    2016-02-01

    Hundreds of researchers worldwide have joined forces in the Tara Oceans Expedition to create an unprecedented planetary-scale dataset comprised of state-of-the-art next generation sequencing, microscopy, and physical/chemical metadata to explore ocean biodiversity. This summer the complete collection of data from the 2009-2013 Tara voyage was released. Yet, despite herculean efforts by the Tara Oceans Consortium to make raw data and computationally derived assemblies and gene catalogs available, most researchers are stymied by the sheer volume of the data. Specifically, the most tantalizing research questions lie in understanding the unifying principles that guide the distribution of organisms across the sea and affect climate and ecosystem function. To use the data in this capacity researchers must download, integrate, and analyze more than 7.2 trillion bases of metagenomic data and associated metadata from viruses, bacteria, archaea and small eukaryotes at their own data centers ( 9 TB of raw data). Accessing large-scale data sets in this way impedes scientists' from replicating and building on prior work. To this end, we are developing a data platform called the Ocean Cloud Commons (OCC) as part of the iMicrobe project. The OCC is built using an algorithm we developed to pre-compute massive comparative metagenomic analyses in a Hadoop big data framework. By maintaining data in a cloud commons researchers have access to scalable computation and real-time analytics to promote the integrated and broad use of planetary-scale datasets, such as Tara.

  2. Automated Interpretation of Subcellular Patterns in Fluorescence Microscope Images for Location Proteomics

    PubMed Central

    Chen, Xiang; Velliste, Meel; Murphy, Robert F.

    2010-01-01

    Proteomics, the large scale identification and characterization of many or all proteins expressed in a given cell type, has become a major area of biological research. In addition to information on protein sequence, structure and expression levels, knowledge of a protein’s subcellular location is essential to a complete understanding of its functions. Currently subcellular location patterns are routinely determined by visual inspection of fluorescence microscope images. We review here research aimed at creating systems for automated, systematic determination of location. These employ numerical feature extraction from images, feature reduction to identify the most useful features, and various supervised learning (classification) and unsupervised learning (clustering) methods. These methods have been shown to perform significantly better than human interpretation of the same images. When coupled with technologies for tagging large numbers of proteins and high-throughput microscope systems, the computational methods reviewed here enable the new subfield of location proteomics. This subfield will make critical contributions in two related areas. First, it will provide structured, high-resolution information on location to enable Systems Biology efforts to simulate cell behavior from the gene level on up. Second, it will provide tools for Cytomics projects aimed at characterizing the behaviors of all cell types before, during and after the onset of various diseases. PMID:16752421

  3. Law Enforcement Efforts to Control Domestically Grown Marijuana.

    DTIC Science & Technology

    1984-05-25

    mari- juana grown indoors , the involvement of large criminal organizations, and the patterns of domestic marijuana distribution. In response to a GAO...information is particularly important if the amount of marijuana grown indoors and the number of large-scale cultiva- tion and distribution organizations... marijuana indoors is becoming increasingly popular. A 1982 narcotics assessment by the Western States Information Network (WSIN)2 of marijuana

  4. Testing the modeled effectiveness of an operational fuel reduction treatment in a small Western Montana interface landscape using two spatial scales

    Treesearch

    Michael G. Harrington; Erin Noonan-Wright; Mitchell Doherty

    2007-01-01

    Much of the coniferous zones in the Western United States where fires were historically frequent have seen large increases in stand densities and associated forest fuels due to 20th century anthropogenic influences. This condition is partially responsible for contemporary large, uncharacteristically severe wildfires. Therefore, considerable effort is under way to...

  5. SAR STUDY OF NASAL TOXICITY: LESSONS FOR MODELING SMALL TOXICITY DATASETS

    EPA Science Inventory

    Most toxicity data, particularly from whole animal bioassays, are generated without the needs or capabilities of structure-activity relationship (SAR) modeling in mind. Some toxicity endpoints have been of sufficient regulatory concern to warrant large scale testing efforts (e.g....

  6. TREATMENT OF MUNICIPAL WASTEWATERS BY THE FLUIDIZED BED BIOREACTOR PROCESS

    EPA Science Inventory

    A 2-year, large-scale pilot investigation was conducted at the City of Newburgh Water Pollution Control Plant, Newburgh, NY, to demonstrate the application of the fluidized bed bioreactor process to the treatment of municipal wastewaters. The experimental effort investigated the ...

  7. A MANAGEMENT SUPPORT SYSTEM FOR GREAT LAKES COASTAL WETLANDS

    EPA Science Inventory

    The Great Lakes National Program Office in conjunction with the Great Lakes Commission and other researchers is leading a large scale collaborative effort that will yield, in unprecedented detail, a management support system for Great Lakes coastal wetlands. This entails the dev...

  8. Evaluating Green/Gray Infrastructure for CSO/Stormwater Control

    EPA Science Inventory

    The NRMRL is conducting this project to evaluate the water quality and quantity benefits of a large-scale application of green infrastructure (low-impact development/best management practices) retrofits in an entire subcatchment. It will document ORD's effort to demonstrate the e...

  9. Engineering youth service system infrastructure: Hawaii's continued efforts at large-scale implementation through knowledge management strategies.

    PubMed

    Nakamura, Brad J; Mueller, Charles W; Higa-McMillan, Charmaine; Okamura, Kelsie H; Chang, Jaime P; Slavin, Lesley; Shimabukuro, Scott

    2014-01-01

    Hawaii's Child and Adolescent Mental Health Division provides a unique illustration of a youth public mental health system with a long and successful history of large-scale quality improvement initiatives. Many advances are linked to flexibly organizing and applying knowledge gained from the scientific literature and move beyond installing a limited number of brand-named treatment approaches that might be directly relevant only to a small handful of system youth. This article takes a knowledge-to-action perspective and outlines five knowledge management strategies currently under way in Hawaii. Each strategy represents one component of a larger coordinated effort at engineering a service system focused on delivering both brand-named treatment approaches and complimentary strategies informed by the evidence base. The five knowledge management examples are (a) a set of modular-based professional training activities for currently practicing therapists, (b) an outreach initiative for supporting youth evidence-based practices training at Hawaii's mental health-related professional programs, (c) an effort to increase consumer knowledge of and demand for youth evidence-based practices, (d) a practice and progress agency performance feedback system, and (e) a sampling of system-level research studies focused on understanding treatment as usual. We end by outlining a small set of lessons learned and a longer term vision for embedding these efforts into the system's infrastructure.

  10. DNA fingerprinting, DNA barcoding, and next generation sequencing technology in plants.

    PubMed

    Sucher, Nikolaus J; Hennell, James R; Carles, Maria C

    2012-01-01

    DNA fingerprinting of plants has become an invaluable tool in forensic, scientific, and industrial laboratories all over the world. PCR has become part of virtually every variation of the plethora of approaches used for DNA fingerprinting today. DNA sequencing is increasingly used either in combination with or as a replacement for traditional DNA fingerprinting techniques. A prime example is the use of short, standardized regions of the genome as taxon barcodes for biological identification of plants. Rapid advances in "next generation sequencing" (NGS) technology are driving down the cost of sequencing and bringing large-scale sequencing projects into the reach of individual investigators. We present an overview of recent publications that demonstrate the use of "NGS" technology for DNA fingerprinting and DNA barcoding applications.

  11. Land use and climate affect Black Tern, Northern Harrier, and Marsh Wren abundance in the Prairie Pothole Region of the United States

    USGS Publications Warehouse

    Forcey, Greg M.; Thogmartin, Wayne E.; Linz, George M.; McKann, Patrick C.

    2014-01-01

    Bird populations are influenced by many environmental factors at both large and small scales. Our study evaluated the influences of regional climate and land-use variables on the Northern Harrier (Circus cyaneus), Black Tern (Childonias niger), and Marsh Wren (Cistothorus palustris) in the prairie potholes of the upper Midwest of the United States. These species were chosen because their diverse habitat preference represent the spectrum of habitat conditions present in the Prairie Potholes, ranging from open prairies to dense cattail marshes. We evaluated land-use covariates at three logarithmic spatial scales (1,000 ha, 10,000 ha, and 100,000 ha) and constructed models a priori using information from published habitat associations and climatic influences. The strongest influences on the abundance of each of the three species were the percentage of wetland area across all three spatial scales and precipitation in the year preceding that when bird surveys were conducted. Even among scales ranging over three orders of magnitude the influence of spatial scale was small, as models with the same variables expressed at different scales were often in the best model subset. Examination of the effects of large-scale environmental variables on wetland birds elucidated relationships overlooked in many smaller-scale studies, such as the influences of climate and habitat variables at landscape scales. Given the spatial variation in the abundance of our focal species within the prairie potholes, our model predictions are especially useful for targeting locations, such as northeastern South Dakota and central North Dakota, where management and conservation efforts would be optimally beneficial. This modeling approach can also be applied to other species and geographic areas to focus landscape conservation efforts and subsequent small-scale studies, especially in constrained economic climates.

  12. SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters.

    PubMed

    Rattei, Thomas; Tischler, Patrick; Götz, Stefan; Jehl, Marc-André; Hoser, Jonathan; Arnold, Roland; Conesa, Ana; Mewes, Hans-Werner

    2010-01-01

    The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).

  13. Value-focused framework for defining landscape-scale conservation targets

    USGS Publications Warehouse

    Romañach, Stephanie; Benscoter, Allison M.; Brandt, Laura A.

    2016-01-01

    Conservation of natural resources can be challenging in a rapidly changing world and require collaborative efforts for success. Conservation planning is the process of deciding how to protect, conserve, and enhance or minimize loss of natural and cultural resources. Establishing conservation targets (also called indicators or endpoints), the measurable expressions of desired resource conditions, can help with site-specific up to landscape-scale conservation planning. Using conservation targets and tracking them through time can deliver benefits such as insight into ecosystem health and providing early warnings about undesirable trends. We describe an approach using value-focused thinking to develop statewide conservation targets for Florida. Using such an approach allowed us to first identify stakeholder objectives and then define conservation targets to meet those objectives. Stakeholders were able to see how their shared efforts fit into the broader conservation context, and also anticipate the benefits of multi-agency and -organization collaboration. We developed an iterative process for large-scale conservation planning that included defining a shared framework for the process, defining the conservation targets themselves, as well as developing management and monitoring strategies for evaluation of their effectiveness. The process we describe is applicable to other geographies where multiple parties are seeking to implement collaborative, large-scale biological planning.

  14. Organization and evolution of highly repeated satellite DNA sequences in plant chromosomes.

    PubMed

    Sharma, S; Raina, S N

    2005-01-01

    A major component of the plant nuclear genome is constituted by different classes of repetitive DNA sequences. The structural, functional and evolutionary aspects of the satellite repetitive DNA families, and their organization in the chromosomes is reviewed. The tandem satellite DNA sequences exhibit characteristic chromosomal locations, usually at subtelomeric and centromeric regions. The repetitive DNA family(ies) may be widely distributed in a taxonomic family or a genus, or may be specific for a species, genome or even a chromosome. They may acquire large-scale variations in their sequence and copy number over an evolutionary time-scale. These features have formed the basis of extensive utilization of repetitive sequences for taxonomic and phylogenetic studies. Hybrid polyploids have especially proven to be excellent models for studying the evolution of repetitive DNA sequences. Recent studies explicitly show that some repetitive DNA families localized at the telomeres and centromeres have acquired important structural and functional significance. The repetitive elements are under different evolutionary constraints as compared to the genes. Satellite DNA families are thought to arise de novo as a consequence of molecular mechanisms such as unequal crossing over, rolling circle amplification, replication slippage and mutation that constitute "molecular drive". Copyright 2005 S. Karger AG, Basel.

  15. Comparison of the genomic sequence of the microminipig, a novel breed of swine, with the genomic database for conventional pig.

    PubMed

    Miura, Naoki; Kucho, Ken-Ichi; Noguchi, Michiko; Miyoshi, Noriaki; Uchiumi, Toshiki; Kawaguchi, Hiroaki; Tanimoto, Akihide

    2014-01-01

    The microminipig, which weighs less than 10 kg at an early stage of maturity, has been reported as a potential experimental model animal. Its extremely small size and other distinct characteristics suggest the possibility of a number of differences between the genome of the microminipig and that of conventional pigs. In this study, we analyzed the genomes of two healthy microminipigs using a next-generation sequencer SOLiD™ system. We then compared the obtained genomic sequences with a genomic database for the domestic pig (Sus scrofa). The mapping coverage of sequenced tag from the microminipig to conventional pig genomic sequences was greater than 96% and we detected no clear, substantial genomic variance from these data. The results may indicate that the distinct characteristics of the microminipig derive from small-scale alterations in the genome, such as Single Nucleotide Polymorphisms or translational modifications, rather than large-scale deletion or insertion polymorphisms. Further investigation of the entire genomic sequence of the microminipig with methods enabling deeper coverage is required to elucidate the genetic basis of its distinct phenotypic traits. Copyright © 2014 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved.

  16. Imaging different components of a tectonic tremor sequence in southwestern Japan using an automatic statistical detection and location method

    NASA Astrophysics Data System (ADS)

    Poiata, Natalia; Vilotte, Jean-Pierre; Bernard, Pascal; Satriano, Claudio; Obara, Kazushige

    2018-02-01

    In this study, we demonstrate the capability of an automatic network-based detection and location method to extract and analyse different components of tectonic tremor activity by analysing a 9-day energetic tectonic tremor sequence occurring at the down-dip extension of the subducting slab in southwestern Japan. The applied method exploits the coherency of multi-scale, frequency-selective characteristics of non-stationary signals recorded across the seismic network. Use of different characteristic functions, in the signal processing step of the method, allows to extract and locate the sources of short-duration impulsive signal transients associated with low-frequency earthquakes and of longer-duration energy transients during the tectonic tremor sequence. Frequency-dependent characteristic functions, based on higher-order statistics' properties of the seismic signals, are used for the detection and location of low-frequency earthquakes. This allows extracting a more complete (˜6.5 times more events) and time-resolved catalogue of low-frequency earthquakes than the routine catalogue provided by the Japan Meteorological Agency. As such, this catalogue allows resolving the space-time evolution of the low-frequency earthquakes activity in great detail, unravelling spatial and temporal clustering, modulation in response to tide, and different scales of space-time migration patterns. In the second part of the study, the detection and source location of longer-duration signal energy transients within the tectonic tremor sequence is performed using characteristic functions built from smoothed frequency-dependent energy envelopes. This leads to a catalogue of longer-duration energy sources during the tectonic tremor sequence, characterized by their durations and 3-D spatial likelihood maps of the energy-release source regions. The summary 3-D likelihood map for the 9-day tectonic tremor sequence, built from this catalogue, exhibits an along-strike spatial segmentation of the long-duration energy-release regions, matching the large-scale clustering features evidenced from the low-frequency earthquake's activity analysis. Further examination of the two catalogues showed that the extracted short-duration low-frequency earthquakes activity coincides in space, within about 10-15 km distance, with the longer-duration energy sources during the tectonic tremor sequence. This observation provides a potential constraint on the size of the longer-duration energy-radiating source region in relation with the clustering of low-frequency earthquakes activity during the analysed tectonic tremor sequence. We show that advanced statistical network-based methods offer new capabilities for automatic high-resolution detection, location and monitoring of different scale-components of tectonic tremor activity, enriching existing slow earthquakes catalogues. Systematic application of such methods to large continuous data sets will allow imaging the slow transient seismic energy-release activity at higher resolution, and therefore, provide new insights into the underlying multi-scale mechanisms of slow earthquakes generation.

  17. Pooled-DNA Sequencing for Elucidating New Genomic Risk Factors, Rare Variants Underlying Alzheimer's Disease.

    PubMed

    Jin, Sheng Chih; Benitez, Bruno A; Deming, Yuetiva; Cruchaga, Carlos

    2016-01-01

    Analyses of genome-wide association studies (GWAS) for complex disorders usually identify common variants with a relatively small effect size that only explain a small proportion of phenotypic heritability. Several studies have suggested that a significant fraction of heritability may be explained by low-frequency (minor allele frequency (MAF) of 1-5 %) and rare-variants that are not contained in the commercial GWAS genotyping arrays (Schork et al., Curr Opin Genet Dev 19:212, 2009). Rare variants can also have relatively large effects on risk for developing human diseases or disease phenotype (Cruchaga et al., PLoS One 7:e31039, 2012). However, it is necessary to perform next-generation sequencing (NGS) studies in a large population (>4,000 samples) to detect a significant rare-variant association. Several NGS methods, such as custom capture sequencing and amplicon-based sequencing, are designed to screen a small proportion of the genome, but most of these methods are limited in the number of samples that can be multiplexed (i.e. most sequencing kits only provide 96 distinct index). Additionally, the sequencing library preparation for 4,000 samples remains expensive and thus conducting NGS studies with the aforementioned methods are not feasible for most research laboratories.The need for low-cost large scale rare-variant detection makes pooled-DNA sequencing an ideally efficient and cost-effective technique to identify rare variants in target regions by sequencing hundreds to thousands of samples. Our recent work has demonstrated that pooled-DNA sequencing can accurately detect rare variants in targeted regions in multiple DNA samples with high sensitivity and specificity (Jin et al., Alzheimers Res Ther 4:34, 2012). In these studies we used a well-established pooled-DNA sequencing approach and a computational package, SPLINTER (short indel prediction by large deviation inference and nonlinear true frequency estimation by recursion) (Vallania et al., Genome Res 20:1711, 2010), for accurate identification of rare variants in large DNA pools. Given an average sequencing coverage of 30× per haploid genome, SPLINTER can detect rare variants and short indels up to 4 base pairs (bp) with high sensitivity and specificity (up to 1 haploid allele in a pool as large as 500 individuals). Step-by-step instructions on how to conduct pooled-DNA sequencing experiments and data analyses are described in this chapter.

  18. Status of the Microbial Census

    PubMed Central

    Schloss, Patrick D.; Handelsman, Jo

    2004-01-01

    Over the past 20 years, more than 78,000 16S rRNA gene sequences have been deposited in GenBank and the Ribosomal Database Project, making the 16S rRNA gene the most widely studied gene for reconstructing bacterial phylogeny. While there is a general appreciation that these sequences are largely unique and derived from diverse species of bacteria, there has not been a quantitative attempt to describe the extent of sequencing efforts to date. We constructed rarefaction curves for each bacterial phylum and for the entire bacterial domain to assess the current state of sampling and the relative taxonomic richness of each phylum. This analysis quantifies the general sense among microbiologists that we are a long way from a complete census of the bacteria on Earth. Moreover, the analysis indicates that current sampling strategies might not be the most effective ones to describe novel diversity because there remain numerous phyla that are globally distributed yet poorly sampled. Based on the current level of sampling, it is not possible to estimate the total number of bacterial species on Earth, but the minimum species richness is 35,498. Considering previous global species richness estimates of 107 to 109, we are certain that this estimate will increase with additional sequencing efforts. The data support previous calls for extensive surveys of multiple chemically disparate environments and of specific phylogenetic groups to advance the census most rapidly. PMID:15590780

  19. Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book.

    PubMed

    Sadygov, Rovshan G; Cociorva, Daniel; Yates, John R

    2004-12-01

    Database searching is an essential element of large-scale proteomics. Because these methods are widely used, it is important to understand the rationale of the algorithms. Most algorithms are based on concepts first developed in SEQUEST and PeptideSearch. Four basic approaches are used to determine a match between a spectrum and sequence: descriptive, interpretative, stochastic and probability-based matching. We review the basic concepts used by most search algorithms, the computational modeling of peptide identification and current challenges and limitations of this approach for protein identification.

  20. SEED Servers: High-Performance Access to the SEED Genomes, Annotations, and Metabolic Models

    PubMed Central

    Aziz, Ramy K.; Devoid, Scott; Disz, Terrence; Edwards, Robert A.; Henry, Christopher S.; Olsen, Gary J.; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Stevens, Rick L.; Vonstein, Veronika; Xia, Fangfang

    2012-01-01

    The remarkable advance in sequencing technology and the rising interest in medical and environmental microbiology, biotechnology, and synthetic biology resulted in a deluge of published microbial genomes. Yet, genome annotation, comparison, and modeling remain a major bottleneck to the translation of sequence information into biological knowledge, hence computational analysis tools are continuously being developed for rapid genome annotation and interpretation. Among the earliest, most comprehensive resources for prokaryotic genome analysis, the SEED project, initiated in 2003 as an integration of genomic data and analysis tools, now contains >5,000 complete genomes, a constantly updated set of curated annotations embodied in a large and growing collection of encoded subsystems, a derived set of protein families, and hundreds of genome-scale metabolic models. Until recently, however, maintaining current copies of the SEED code and data at remote locations has been a pressing issue. To allow high-performance remote access to the SEED database, we developed the SEED Servers (http://www.theseed.org/servers): four network-based servers intended to expose the data in the underlying relational database, support basic annotation services, offer programmatic access to the capabilities of the RAST annotation server, and provide access to a growing collection of metabolic models that support flux balance analysis. The SEED servers offer open access to regularly updated data, the ability to annotate prokaryotic genomes, the ability to create metabolic reconstructions and detailed models of metabolism, and access to hundreds of existing metabolic models. This work offers and supports a framework upon which other groups can build independent research efforts. Large integrations of genomic data represent one of the major intellectual resources driving research in biology, and programmatic access to the SEED data will provide significant utility to a broad collection of potential users. PMID:23110173

Top