sequencing projects alpheus: Topics by Science.gov

Sample records for sequencing projects alpheus

Re-examination of the eastern Pacific and Atlantic material of Alpheus malleator Dana, 1852, with the description of Alpheus wonkimi sp. nov. (Crustacea, Decapoda, Alpheidae).

PubMed

Anker, Arthur; Pachelle, Paulo P G

2013-01-01

The bumpy-clawed snapping shrimp, Alpheus malleator Dana, 1852 (Alpheidae), is revised based on the recently collected and older museum material from the eastern Pacific (Panama, Ecuador), Caribbean (Panama, Puerto Rico, Trinidad & Tobago), Brazil (São Paulo), and West Africa (Cape Verde, Senegal, Guinea, Equatorial Guinea, Congo). The eastern Pacific material is assigned to A. wonkimi sp. nov., based on one morphological difference in the colour and thickness of the uropodal spiniform seta, as well as previously published molecular data. The Caribbean, Brazilian and West African material is considered to represent a single, widespread, morphologically variable, amphi-Atlantic taxon, A. malleator. Alpheus pugilator A. Milne-Edwards, 1878 is retained as ajunior synonym of A. malleator, whereas A. tuberculosus Osorio, 1892, A. malleator var. edentatus Zimmer, 1913 and A. belli Coutière, 1898, the latter two based on juvenile specimens, are tentatively placed in the synonymy of A. malleator. Illustrations, including colour photographs, are provided for A. wonkimi sp. nov. and A. malleator and their morphological variability is discussed and illustrated.
On some interesting marine decapod crustaceans (Alpheidae, Laomediidae, Strahlaxiidae) from Lombok, Indonesia.

PubMed

Anker, Arthur; Pratama, Idham Sumarto; Firdaus, Muhammad; Rahayu, Dwi Listyo

2015-01-20

Several rare or uncommon, mostly infaunal decapod crustaceans are reported from intertidal and shallow subtidal habitats of Lombok, Indonesia. The alpheid shrimps Alpheus angustilineatus Nomura & Anker, 2005, Athanas shawnsmithi Anker, 2011, Jengalpheops rufus Anker & Dworschak, 2007, Salmoneus alpheophilus Anker & Marin, 2006, Salmoneus colinorum De Grave, 2004, and the laomediid mud-shrimp Naushonia carinata Dworschak, Marin & Anker, 2006, are reported for the first time since their original descriptions and represent new records for the marine fauna of Indonesia. The alpheid shrimps Alpheus macellarius Chace, 1988, Alpheus platyunguiculatus (Banner, 1953), Athanas japonicus Kubo, 1936, Athanas polymorphus Kemp, 1915, Leptalpheus denticulatus Anker & Marin, 2009, Richalpheus palmeri Anker & Jeng, 2006, Salmoneus gracilipes Miya, 1972, Salmoneus tricristatus Banner, 1959 and the laomediid mudshrimps Laomedia astacina De Haan, 1841 and Naushonia lactoalbida Berggren, 1992 are new records for Indonesian waters. The remaining alpheid shrimps, namely Alpheopsis yaldwyni Banner & Banner, 1973, Alpheus savuensis De Man, 1908, Automate anacanthopus De Man, 1910, Automate dolichognatha De Man, 1888, Salmoneus serratidigitus (Coutière, 1896), and the strahlaxiid mud-shrimp Neaxius glyptocercus (von Martens, 1869), all previously known from Indonesia, are recorded for the first time from Lombok. Colour photographs are provided for all species reported, some shown in colour for the first time.
New records and description of two new species of carideans shrimps from Bahía Santa María-La Reforma lagoon, Gulf of California, Mexico (Crustacea, Caridea, Alpheidae and Processidae)

PubMed Central

Salgado-Barragán, José; Ayón-Parente, Manuel; Zamora-Tavares, Pilar

2017-01-01

Abstract Two new species of the family Alpheidae: Alpheus margaritae sp. n. and Leptalpheus melendezensis sp. n. are described from Santa María-La Reforma, coastal lagoon, SE Gulf of California. Alpheus margaritae sp. n. is closely related to A. antepaenultimus and A. mazatlanicus from the Eastern Pacific and to A. chacei from the Western Atlantic, but can be differentiated from these by a combination of characters, especially the morphology of the scaphocerite and the first pereopods. Leptalpheus melendezensis sp. n. resembles L. mexicanus but can be easily differentiated because L. melendezensis sp. n. has the anterior margin of the carapace broadly rounded and has only one spine on the mesial margin of ischium in the major cheliped, versus an acute rostrum and an unarmed major cheliped. Additionally, a phylogenetic analysis was used to explore the relationships of these two new taxa. These results show that Alpheus margaritae sp. n. and Leptalpheus melendezensis sp. n. are indeed related to the species against which we are comparing them, and demonstrate that they can be considered as different species. Additional specimens of Leptalpheus cf. mexicanus, Ambidexter panamensis and A. swifti are recorded for the first time in the Santa María-La Reforma coastal lagoon. PMID:28769664
Extending the southern range of four shrimps (Crustacea: Decapoda: Stenopodidae, Hippolytidae and Alpheidae) in southwestern Atlantic (27^oS) and confirming the presence of Mediterranean Stenopus spinosus Risso, 1827 in Brazil.

PubMed

Giraldes, Bruno Welter; Freire, Andrea Santarosa

2015-06-12

In subtidal zones, certain shrimp species with cryptic behaviour represent a gap in the biodiversity description in many places in the world. This study extends the southern limit of Stenopus hispidus (Oliver, 1811), Alpheus formosus Gibbes, 1850, Alpheus cf. packardii Kingsley, 1880 and Lysmata ankeri Rhyne & Lin, 2006 to Santa Catarina State-Brazil, 27^oS. The results also confirm the new occurrence of Stenopus spinosus Risso, 1827 in Brazilian waters. All specimens were collected by scuba diving from rocky islands between 3 and 25 meters depth. We present for each species certain taxonomic features in colour images that will help to identify these decapods in situ in further monitoring programs.
Shrimp burrow in tropical seagrass meadows: An important sink for litter

NASA Astrophysics Data System (ADS)

Vonk, Jan Arie; Kneer, Dominik; Stapel, Johan; Asmus, Harald

2008-08-01

The abundance, burrow characteristics, and in situ behaviour of the burrowing shrimps Neaxius acanthus (Decapoda: Strahlaxiidae) and Alpheus macellarius (Decapoda: Alpheidae) were studied to quantify the collection of seagrass material, to identify the fate of this collected material, and to determine the importance of these burrowing crustaceans in the nutrient (nitrogen and phosphorus) cycling of two tropical seagrass meadows on Bone Batang, South Sulawesi, Indonesia. Alpheus macellarius harvested 0.70 g dry weight (DW) burrow -1 d -1 seagrass material, dominantly by active cutting of fresh seagrass leaves. Neaxius acanthus collected 1.66 g DW burrow -1 d -1, mainly detached leaves which floated past the burrow opening. The A. macellarius and N. acanthus communities together collected in their burrows an amount of seagrass leaf material corresponding to more than 50% of the leaf production in the meadows studied. The crustacean species studied might therefore fulfil an important function in the nutrient cycling of tropical meadows. In the burrow most of the collected material is shredded into pieces. The burrows of both species had special chambers which serve as a storage for seagrass leaf material. Neaxius acanthus incorporated most of the material into the burrow wall lining, which is made of small sediment particles and macerated seagrass leaves. Phosphate concentrations measured in N. acanthus burrows compared with pore-water and water-column concentrations suggests that a substantial amount of the seagrass material undergoes decomposition in the burrows. Oxygen levels measured in these water bodies are indicative for a possible exchange of water between the burrow and its surroundings, most likely supported by the shrimps irrigating their burrows. By collecting leaf material in their burrows, nutrients that are otherwise lost from the seagrass meadow associated with detached leaves and leaf fragments carried away in the water column, are maintained in the meadow and may form an important source of recycled nutrients.
The recreation of a unique shrimp's mechanically induced cavitation bubble

NASA Astrophysics Data System (ADS)

Miller, Ryan; Dougherty, Christopher; Eliasson, Veronica; Khanolkar, Gauri

2014-11-01

The Alpheus heterochaelis, appropriately nicknamed the ``pistol shrimp,'' possesses an oversized claw that creates a cavitation bubble upon rapid closure. The implosion of this bubble results in a shock wave that can stun or even kill the shrimp's prey (Versluis et al., 2000). Additionally, the implosion is so violent that sonoluminescence may occur. This light implies extreme temperatures, which have been recorded to reach as high as 10,000 K (Roach, 2001). By developing an analogous mechanism to the oversized claw, the goal of this experiment is to verify that cavitation can be produced similar to that of the pistol shrimp in nature as well as to analyze the resulting shock wave and sonoluminescence. High-speed schlieren imaging was used to observe the shock dynamics. Furthermore, results on cavitation collapse and light emission will be presented. USC Provost Undergraduate Research Fellowship/Rose Hills Undergraduate Research Fellowship.
High-Throughput Mapping of Single-Neuron Projections by Sequencing of Barcoded RNA.

PubMed

Kebschull, Justus M; Garcia da Silva, Pedro; Reid, Ashlan P; Peikon, Ian D; Albeanu, Dinu F; Zador, Anthony M

2016-09-07

Neurons transmit information to distant brain regions via long-range axonal projections. In the mouse, area-to-area connections have only been systematically mapped using bulk labeling techniques, which obscure the diverse projections of intermingled single neurons. Here we describe MAPseq (Multiplexed Analysis of Projections by Sequencing), a technique that can map the projections of thousands or even millions of single neurons by labeling large sets of neurons with random RNA sequences ("barcodes"). Axons are filled with barcode mRNA, each putative projection area is dissected, and the barcode mRNA is extracted and sequenced. Applying MAPseq to the locus coeruleus (LC), we find that individual LC neurons have preferred cortical targets. By recasting neuroanatomy, which is traditionally viewed as a problem of microscopy, as a problem of sequencing, MAPseq harnesses advances in sequencing technology to permit high-throughput interrogation of brain circuits. Copyright © 2016 Elsevier Inc. All rights reserved.
Shallow-water stenopodidean and caridean shrimps from Abrolhos Archipelago, Brazil: new records and updated checklist.

PubMed

Soledade, Guidomar O; Fonseca, Mytalle S; Almeida, Alexandre O

2015-01-09

This study deals with a recent collection of stenopodidean and caridean shrimps made in the Abrolhos Archipelago, Bahia, Brazil, in July and August 2013. Sampling was carried out in the vicinity of Ilha de Santa Bárbara (17°57'49"S 38°41'53"W). Specimens were obtained by hand or using small hand nets in tide pools or under rocks in the intertidal zone. Part of the material was collected by scuba diving in the shallow subtidal, to a maximum depth of 11 m. We obtained a total of 18 species, 12 of which are reported for the first time for the Abrolhos and 4 as new records for the state of Bahia. The distributions of Microprosthema semilaeve (von Martens, 1872), Typton gnathophylloides Holthuis, 1951, Alpheus verrilli (Schmitt, 1924) and Alpheopsis cf. trigona (Rathbun, 1901) are extended from their previously known ranges. The occurrence of Automate cf. rectifrons Chace, 1972 on the Brazilian coast is confirmed. We thus provide an updated checklist of stenopodidean (2 species) and caridean (29 species) shrimps from the Abrolhos Archipelago, incorporating and critically evaluating previous records.
From cooperation to combat: adverse effect of thermal stress in a symbiotic coral-crustacean community.

PubMed

Stella, J S; Munday, P L; Walker, S P W; Pratchett, M S; Jones, G P

2014-04-01

Although mutualisms are ubiquitous in nature, our understanding of the potential impacts of climate change on these important ecological interactions is deficient. Here, we report on a thermal stress-related shift from cooperation to antagonism between members of a mutualistic coral-dwelling community. Increased mortality of coral-defending crustacean symbionts Trapezia cymodoce (coral crab) and Alpheus lottini (snapping shrimp) was observed in response to experimentally elevated temperatures and reduced coral-host (Pocillopora damicornis) condition. However, strong differential numerical effects occurred among crustaceans as a function of species and sex, with shrimp (75%), and female crabs (55%), exhibiting the fastest and greatest declines in numbers. These declines were due to forceful eviction from the coral-host by male crabs. Furthermore, surviving female crabs were impacted by a dramatic decline (85%) in egg production, which could have deleterious consequences for population sustainability. Our results suggest that elevated temperature switches the fundamental nature of this interaction from cooperation to competition, leading to asymmetrical effects on species and/or sexes. Our study illustrates the importance of evaluating not only individual responses to climate change, but also potentially fragile interactions within and among susceptible species.
FOUNTAIN: A JAVA open-source package to assist large sequencing projects

PubMed Central

Buerstedde, Jean-Marie; Prill, Florian

2001-01-01

Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort. PMID:11591214
Meeting the challenges of non-referenced genome assembly from short-read sequence data

Treesearch

M. Parks; A. Liston; R. Cronn

2010-01-01

Massively parallel sequencing technologies (MPST) offer unprecedented opportunities for novel sequencing projects. MPST, while offering tremendous sequencing capacity, are typically most effective in resequencing projects (as opposed to the sequencing of novel genomes) due to the fact that sequence is returned in relatively short reads. Nonetheless, there is great...
Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements

PubMed Central

Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon; Ovchinnikova, Galina; Verezemska, Olena; Isbandi, Michelle; Thomas, Alex D.; Ali, Rida; Sharma, Kaushal; Kyrpides, Nikos C.; Reddy, T. B. K.

2017-01-01

The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is a manually curated data management system that catalogs sequencing projects with associated metadata from around the world. In the current version of GOLD (v.6), all projects are organized based on a four level classification system in the form of a Study, Organism (for isolates) or Biosample (for environmental samples), Sequencing Project and Analysis Project. Currently, GOLD provides information for 26 117 Studies, 239 100 Organisms, 15 887 Biosamples, 97 212 Sequencing Projects and 78 579 Analysis Projects. These are integrated with over 312 metadata fields from which 58 are controlled vocabularies with 2067 terms. The web interface facilitates submission of a diverse range of Sequencing Projects (such as isolate genome, single-cell genome, metagenome, metatranscriptome) and complex Analysis Projects (such as genome from metagenome, or combined assembly from multiple Sequencing Projects). GOLD provides a seamless interface with the Integrated Microbial Genomes (IMG) system and supports and promotes the Genomic Standards Consortium (GSC) Minimum Information standards. This paper describes the data updates and additional features added during the last two years. PMID:27794040
Identification of genes in anonymous DNA sequences. Annual performance report, February 1, 1991--January 31, 1992

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fields, C.A.

1996-06-01

The objective of this project is the development of practical software to automate the identification of genes in anonymous DNA sequences from the human, and other higher eukaryotic genomes. A software system for automated sequence analysis, gm (gene modeler) has been designed, implemented, tested, and distributed to several dozen laboratories worldwide. A significantly faster, more robust, and more flexible version of this software, gm 2.0 has now been completed, and is being tested by operational use to analyze human cosmid sequence data. A range of efforts to further understand the features of eukaryoyic gene sequences are also underway. This progressmore » report also contains papers coming out of the project including the following: gm: a Tool for Exploratory Analysis of DNA Sequence Data; The Human THE-LTR(O) and MstII Interspersed Repeats are subfamilies of a single widely distruted highly variable repeat family; Information contents and dinucleotide compostions of plant intron sequences vary with evolutionary origin; Splicing signals in Drosophila: intron size, information content, and consensus sequences; Integration of automated sequence analysis into mapping and sequencing projects; Software for the C. elegans genome project.« less
Genomic Encyclopedia of Type Strains, Phase I: The one thousand microbial genomes (KMG-I) project

DOE PAGES

Kyrpides, Nikos C.; Woyke, Tanja; Eisen, Jonathan A.; ...

2014-06-15

The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project was launched by the JGI in 2007 as a pilot project with the objective of sequencing 250 bacterial and archaeal genomes. The two major goals of that project were (a) to test the hypothesis that there are many benefits to the use the phylogenetic diversity of organisms in the tree of life as a primary criterion for generating their genome sequence and (b) to develop the necessary framework, technology and organization for large-scale sequencing of microbial isolate genomes. While the GEBA pilot project has not yet been entirely completed, both ofmore » the original goals have already been successfully accomplished, leading the way for the next phase of the project. Here we propose taking the GEBA project to the next level, by generating high quality draft genomes for 1,000 bacterial and archaeal strains. This represents a combined 16-fold increase in both scale and speed as compared to the GEBA pilot project (250 isolate genomes in 4+ years). We will follow a similar approach for organism selection and sequencing prioritization as was done for the GEBA pilot project (i.e. phylogenetic novelty, availability and growth of cultures of type strains and DNA extraction capability), focusing on type strains as this ensures reproducibility of our results and provides the strongest linkage between genome sequences and other knowledge about each strain. In turn, this project will constitute a pilot phase of a larger effort that will target the genome sequences of all available type strains of the Bacteria and Archaea.« less
Genomic Encyclopedia of Type Strains, Phase I: The one thousand microbial genomes (KMG-I) project

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kyrpides, Nikos C.; Woyke, Tanja; Eisen, Jonathan A.

The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project was launched by the JGI in 2007 as a pilot project with the objective of sequencing 250 bacterial and archaeal genomes. The two major goals of that project were (a) to test the hypothesis that there are many benefits to the use the phylogenetic diversity of organisms in the tree of life as a primary criterion for generating their genome sequence and (b) to develop the necessary framework, technology and organization for large-scale sequencing of microbial isolate genomes. While the GEBA pilot project has not yet been entirely completed, both ofmore » the original goals have already been successfully accomplished, leading the way for the next phase of the project. Here we propose taking the GEBA project to the next level, by generating high quality draft genomes for 1,000 bacterial and archaeal strains. This represents a combined 16-fold increase in both scale and speed as compared to the GEBA pilot project (250 isolate genomes in 4+ years). We will follow a similar approach for organism selection and sequencing prioritization as was done for the GEBA pilot project (i.e. phylogenetic novelty, availability and growth of cultures of type strains and DNA extraction capability), focusing on type strains as this ensures reproducibility of our results and provides the strongest linkage between genome sequences and other knowledge about each strain. In turn, this project will constitute a pilot phase of a larger effort that will target the genome sequences of all available type strains of the Bacteria and Archaea.« less
Animal selection for whole genome sequencing by quantifying the unique contribution of homozygous haplotypes sequenced

USDA-ARS?s Scientific Manuscript database

Major whole genome sequencing projects promise to identify rare and causal variants within livestock species; however, the efficient selection of animals for sequencing remains a major problem within these surveys. The goal of this project was to develop a library of high accuracy genetic variants f...
An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data.

PubMed

Jun, Goo; Wing, Mary Kate; Abecasis, Gonçalo R; Kang, Hyun Min

2015-06-01

The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies. © 2015 Jun et al.; Published by Cold Spring Harbor Laboratory Press.
Vortex formation with a snapping shrimp claw.

PubMed

Hess, David; Brücker, Christoph; Hegner, Franziska; Balmert, Alexander; Bleckmann, Horst

2013-01-01

Snapping shrimp use one oversized claw to generate a cavitating high speed water jet for hunting, defence and communication. This work is an experimental investigation about the jet generation. Snapping shrimp (Alpheus-bellulus) were investigated by using an enlarged transparent model reproducing the closure of the snapper claw. Flow inside the model was studied using both High-Speed Particle Image Velocimetry (HS-PIV) and flow visualization. During claw closure a channel-like cavity was formed between the plunger and the socket featuring a nozzle-type contour at the orifice. Closing the mechanism led to the formation of a leading vortex ring with a dimensionless formation number of approximate ΔT*≈4. This indicates that the claw might work at maximum efficiency, i.e. maximum vortex strength was achieved by a minimum of fluid volume ejected. The subsequent vortex cavitation with the formation of an axial reentrant jet is a reasonable explanation for the large penetration depth of the water jet. That snapping shrimp can reach with their claw-induced flow. Within such a cavitation process, an axial reentrant jet is generated in the hollow cylindrical core of the cavitated vortex that pushes the front further downstream and whose length can exceed the initial jet penetration depth by several times.
A rapid and cost-effective method for sequencing pooled cDNA clones by using a combination of transposon insertion and Gateway technology.

PubMed

Morozumi, Takeya; Toki, Daisuke; Eguchi-Ogawa, Tomoko; Uenishi, Hirohide

2011-09-01

Large-scale cDNA-sequencing projects require an efficient strategy for mass sequencing. Here we describe a method for sequencing pooled cDNA clones using a combination of transposon insertion and Gateway technology. Our method reduces the number of shotgun clones that are unsuitable for reconstruction of cDNA sequences, and has the advantage of reducing the total costs of the sequencing project.
Deep whole-genome sequencing of 90 Han Chinese genomes.

PubMed

Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen

2017-09-01

Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000 Genomes Project, as well as to other human genome projects. © The Authors 2017. Published by Oxford University Press.

in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Xiaofan; Peris, David; Kominek, Jacek

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

DOE PAGES

Zhou, Xiaofan; Peris, David; Kominek, Jacek; ...

2016-09-16

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
Optical mapping and its potential for large-scale sequencing projects.

PubMed

Aston, C; Mishra, B; Schwartz, D C

1999-07-01

Physical mapping has been rediscovered as an important component of large-scale sequencing projects. Restriction maps provide landmark sequences at defined intervals, and high-resolution restriction maps can be assembled from ensembles of single molecules by optical means. Such optical maps can be constructed from both large-insert clones and genomic DNA, and are used as a scaffold for accurately aligning sequence contigs generated by shotgun sequencing.
Construction of random sheared fosmid library from Chinese cabbage and its use for Brassica rapa genome sequencing project.

PubMed

Park, Tae-Ho; Park, Beom-Seok; Kim, Jin-A; Hong, Joon Ki; Jin, Mina; Seol, Young-Joo; Mun, Jeong-Hwan

2011-01-01

As a part of the Multinational Genome Sequencing Project of Brassica rapa, linkage group R9 and R3 were sequenced using a bacterial artificial chromosome (BAC) by BAC strategy. The current physical contigs are expected to cover approximately 90% euchromatins of both chromosomes. As the project progresses, BAC selection for sequence extension becomes more limited because BAC libraries are restriction enzyme-specific. To support the project, a random sheared fosmid library was constructed. The library consists of 97536 clones with average insert size of approximately 40 kb corresponding to seven genome equivalents, assuming a Chinese cabbage genome size of 550 Mb. The library was screened with primers designed at the end of sequences of nine points of scaffold gaps where BAC clones cannot be selected to extend the physical contigs. The selected positive clones were end-sequenced to check the overlap between the fosmid clones and the adjacent BAC clones. Nine fosmid clones were selected and fully sequenced. The sequences revealed two completed gap filling and seven sequence extensions, which can be used for further selection of BAC clones confirming that the fosmid library will facilitate the sequence completion of B. rapa. Copyright © 2011. Published by Elsevier Ltd.
Sequence verification of synthetic DNA by assembly of sequencing reads

PubMed Central

Wilson, Mandy L.; Cai, Yizhi; Hanlon, Regina; Taylor, Samantha; Chevreux, Bastien; Setubal, João C.; Tyler, Brett M.; Peccoud, Jean

2013-01-01

Gene synthesis attempts to assemble user-defined DNA sequences with base-level precision. Verifying the sequences of construction intermediates and the final product of a gene synthesis project is a critical part of the workflow, yet one that has received the least attention. Sequence validation is equally important for other kinds of curated clone collections. Ensuring that the physical sequence of a clone matches its published sequence is a common quality control step performed at least once over the course of a research project. GenoREAD is a web-based application that breaks the sequence verification process into two steps: the assembly of sequencing reads and the alignment of the resulting contig with a reference sequence. GenoREAD can determine if a clone matches its reference sequence. Its sophisticated reporting features help identify and troubleshoot problems that arise during the sequence verification process. GenoREAD has been experimentally validated on thousands of gene-sized constructs from an ORFeome project, and on longer sequences including whole plasmids and synthetic chromosomes. Comparing GenoREAD results with those from manual analysis of the sequencing data demonstrates that GenoREAD tends to be conservative in its diagnostic. GenoREAD is available at www.genoread.org. PMID:23042248
Personal Genome Sequencing in Ostensibly Healthy Individuals and the PeopleSeq Consortium

PubMed Central

Linderman, Michael D.; Nielsen, Daiva E.; Green, Robert C.

2016-01-01

Thousands of ostensibly healthy individuals have had their exome or genome sequenced, but a much smaller number of these individuals have received any personal genomic results from that sequencing. We term those projects in which ostensibly healthy participants can receive sequencing-derived genetic findings and may also have access to their genomic data as participatory predispositional personal genome sequencing (PPGS). Here we are focused on genome sequencing applied in a pre-symptomatic context and so define PPGS to exclude diagnostic genome sequencing intended to identify the molecular cause of suspected or diagnosed genetic disease. In this report we describe the design of completed and underway PPGS projects, briefly summarize the results reported to date and introduce the PeopleSeq Consortium, a newly formed collaboration of PPGS projects designed to collect much-needed longitudinal outcome data. PMID:27023617
JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow.

PubMed

Latorre, Mariano; Silva, Herman; Saba, Juan; Guziolowski, Carito; Vizoso, Paula; Martinez, Veronica; Maldonado, Jonathan; Morales, Andrea; Caroca, Rodrigo; Cambiazo, Veronica; Campos-Vargas, Reinaldo; Gonzalez, Mauricio; Orellana, Ariel; Retamales, Julio; Meisel, Lee A

2006-11-23

Expressed sequence tag (EST) analyses provide a rapid and economical means to identify candidate genes that may be involved in a particular biological process. These ESTs are useful in many Functional Genomics studies. However, the large quantity and complexity of the data generated during an EST sequencing project can make the analysis of this information a daunting task. In an attempt to make this task friendlier, we have developed JUICE, an open source data management system (Apache + PHP + MySQL on Linux), which enables the user to easily upload, organize, visualize and search the different types of data generated in an EST project pipeline. In contrast to other systems, the JUICE data management system allows a branched pipeline to be established, modified and expanded, during the course of an EST project. The web interfaces and tools in JUICE enable the users to visualize the information in a graphical, user-friendly manner. The user may browse or search for sequences and/or sequence information within all the branches of the pipeline. The user can search using terms associated with the sequence name, annotation or other characteristics stored in JUICE and associated with sequences or sequence groups. Groups of sequences can be created by the user, stored in a clipboard and/or downloaded for further analyses. Different user profiles restrict the access of each user depending upon their role in the project. The user may have access exclusively to visualize sequence information, access to annotate sequences and sequence information, or administrative access. JUICE is an open source data management system that has been developed to aid users in organizing and analyzing the large amount of data generated in an EST Project workflow. JUICE has been used in one of the first functional genomics projects in Chile, entitled "Functional Genomics in nectarines: Platform to potentiate the competitiveness of Chile in fruit exportation". However, due to its ability to organize and visualize data from external pipelines, JUICE is a flexible data management system that should be useful for other EST/Genome projects. The JUICE data management system is released under the Open Source GNU Lesser General Public License (LGPL). JUICE may be downloaded from http://genoma.unab.cl/juice_system/ or http://www.genomavegetal.cl/juice_system/.
JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow

PubMed Central

Latorre, Mariano; Silva, Herman; Saba, Juan; Guziolowski, Carito; Vizoso, Paula; Martinez, Veronica; Maldonado, Jonathan; Morales, Andrea; Caroca, Rodrigo; Cambiazo, Veronica; Campos-Vargas, Reinaldo; Gonzalez, Mauricio; Orellana, Ariel; Retamales, Julio; Meisel, Lee A

2006-01-01

Background Expressed sequence tag (EST) analyses provide a rapid and economical means to identify candidate genes that may be involved in a particular biological process. These ESTs are useful in many Functional Genomics studies. However, the large quantity and complexity of the data generated during an EST sequencing project can make the analysis of this information a daunting task. Results In an attempt to make this task friendlier, we have developed JUICE, an open source data management system (Apache + PHP + MySQL on Linux), which enables the user to easily upload, organize, visualize and search the different types of data generated in an EST project pipeline. In contrast to other systems, the JUICE data management system allows a branched pipeline to be established, modified and expanded, during the course of an EST project. The web interfaces and tools in JUICE enable the users to visualize the information in a graphical, user-friendly manner. The user may browse or search for sequences and/or sequence information within all the branches of the pipeline. The user can search using terms associated with the sequence name, annotation or other characteristics stored in JUICE and associated with sequences or sequence groups. Groups of sequences can be created by the user, stored in a clipboard and/or downloaded for further analyses. Different user profiles restrict the access of each user depending upon their role in the project. The user may have access exclusively to visualize sequence information, access to annotate sequences and sequence information, or administrative access. Conclusion JUICE is an open source data management system that has been developed to aid users in organizing and analyzing the large amount of data generated in an EST Project workflow. JUICE has been used in one of the first functional genomics projects in Chile, entitled "Functional Genomics in nectarines: Platform to potentiate the competitiveness of Chile in fruit exportation". However, due to its ability to organize and visualize data from external pipelines, JUICE is a flexible data management system that should be useful for other EST/Genome projects. The JUICE data management system is released under the Open Source GNU Lesser General Public License (LGPL). JUICE may be downloaded from or . PMID:17123449
A computational genomics pipeline for prokaryotic sequencing projects.

PubMed

Kislyuk, Andrey O; Katz, Lee S; Agrawal, Sonia; Hagen, Matthew S; Conley, Andrew B; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C; Sammons, Scott A; Govil, Dhwani; Mair, Raydel D; Tatti, Kathleen M; Tondella, Maria L; Harcourt, Brian H; Mayer, Leonard W; Jordan, I King

2010-08-01

New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems.
EPSE Project 2: Designing and Evaluating Short Teaching Sequences, Informed by Research Evidence.

ERIC Educational Resources Information Center

Leach, John; Hind, Andy; Lewis, Jenny; Scott, Phil

2002-01-01

Reports on Project 2 from the Evidence-based Practice in Science Education (EPSE) Research Network. In this project, teachers and researchers worked collaboratively on the design of three short teaching sequences on electric circuits. (DDR)
Genome Improvement at JGI-HAGSC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grimwood, Jane; Schmutz, Jeremy J.; Myers, Richard M.

Since the completion of the sequencing of the human genome, the Joint Genome Institute (JGI) has rapidly expanded its scientific goals in several DOE mission-relevant areas. At the JGI-HAGSC, we have kept pace with this rapid expansion of projects with our focus on assessing, assembling, improving and finishing eukaryotic whole genome shotgun (WGS) projects for which the shotgun sequence is generated at the Production Genomic Facility (JGI-PGF). We follow this by combining the draft WGS with genomic resources generated at JGI-HAGSC or in collaborator laboratories (including BAC end sequences, genetic maps and FLcDNA sequences) to produce an improved draft sequence.more » For eukaryotic genomes important to the DOE mission, we then add further information from directed experiments to produce reference genomic sequences that are publicly available for any scientific researcher. Also, we have continued our program for producing BAC-based finished sequence, both for adding information to JGI genome projects and for small BAC-based sequencing projects proposed through any of the JGI sequencing programs. We have now built our computational expertise in WGS assembly and analysis and have moved eukaryotic genome assembly from the JGI-PGF to JGI-HAGSC. We have concentrated our assembly development work on large plant genomes and complex fungal and algal genomes.« less
Automated Array Assembly, Phase 2. Low-cost Solar Array Project, Task 4

NASA Technical Reports Server (NTRS)

Lopez, M.

1978-01-01

Work was done to verify the technological readiness of a select process sequence with respect to satisfying the Low Cost Solar Array Project objectives of meeting the designated goals of $.50 per peak watt in 1986 (1975 dollars). The sequence examined consisted of: (1) 3 inches diameter as-sawn Czochralski grown 1:0:0 silicon, (2) texture etching, (3) ion implanting, (4) laser annealing, (5) screen printing of ohmic contacts and (6) sprayed anti-reflective coatings. High volume production projections were made on the selected process sequence. Automated processing and movement of hardware at high rates were conceptualized to satisfy the PROJECT's 500 MW/yr capability. A production plan was formulated with flow diagrams integrating the various processes in the cell fabrication sequence.
An Efficient Method for Electroporation of Small Interfering RNAs into ENCODE Project Tier 1 GM12878 and K562 Cell Lines.

PubMed

Muller, Ryan Y; Hammond, Ming C; Rio, Donald C; Lee, Yeon J

2015-12-01

The Encyclopedia of DNA Elements (ENCODE) Project aims to identify all functional sequence elements in the human genome sequence by use of high-throughput DNA/cDNA sequencing approaches. To aid the standardization, comparison, and integration of data sets produced from different technologies and platforms, the ENCODE Consortium selected several standard human cell lines to be used by the ENCODE Projects. The Tier 1 ENCODE cell lines include GM12878, K562, and H1 human embryonic stem cell lines. GM12878 is a lymphoblastoid cell line, transformed with the Epstein-Barr virus, that was selected by the International HapMap Project for whole genome and transcriptome sequencing by use of the Illumina platform. K562 is an immortalized myelogenous leukemia cell line. The GM12878 cell line is attractive for the ENCODE Projects, as it offers potential synergy with the International HapMap Project. Despite the vast amount of sequencing data available on the GM12878 cell line through the ENCODE Project, including transcriptome, chromatin immunoprecipitation-sequencing for histone marks, and transcription factors, no small interfering siRNA-mediated knockdown studies have been performed in the GM12878 cell line, as cationic lipid-mediated transfection methods are inefficient for lymphoid cell lines. Here, we present an efficient and reproducible method for transfection of a variety of siRNAs into the GM12878 and K562 cell lines, which subsequently results in targeted protein depletion.
Noncoding sequence classification based on wavelet transform analysis: part I

NASA Astrophysics Data System (ADS)

Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

2017-09-01

DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.
A computational genomics pipeline for prokaryotic sequencing projects

PubMed Central

Kislyuk, Andrey O.; Katz, Lee S.; Agrawal, Sonia; Hagen, Matthew S.; Conley, Andrew B.; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C.; Sammons, Scott A.; Govil, Dhwani; Mair, Raydel D.; Tatti, Kathleen M.; Tondella, Maria L.; Harcourt, Brian H.; Mayer, Leonard W.; Jordan, I. King

2010-01-01

Motivation: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. Results: We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. Availability and implementation: The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems. Contact: king.jordan@biology.gatech.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20519285
The Status, Quality, and Expansion of the NIH Full-Length cDNA Project: The Mammalian Gene Collection (MGC)

PubMed Central

2004-01-01

The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5′-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline. PMID:15489334
The Human Genome Project: big science transforms biology and medicine.

PubMed

Hood, Leroy; Rowen, Lee

2013-01-01

The Human Genome Project has transformed biology through its integrated big science approach to deciphering a reference human genome sequence along with the complete sequences of key model organisms. The project exemplifies the power, necessity and success of large, integrated, cross-disciplinary efforts - so-called 'big science' - directed towards complex major objectives. In this article, we discuss the ways in which this ambitious endeavor led to the development of novel technologies and analytical tools, and how it brought the expertise of engineers, computer scientists and mathematicians together with biologists. It established an open approach to data sharing and open-source software, thereby making the data resulting from the project accessible to all. The genome sequences of microbes, plants and animals have revolutionized many fields of science, including microbiology, virology, infectious disease and plant biology. Moreover, deeper knowledge of human sequence variation has begun to alter the practice of medicine. The Human Genome Project has inspired subsequent large-scale data acquisition initiatives such as the International HapMap Project, 1000 Genomes, and The Cancer Genome Atlas, as well as the recently announced Human Brain Project and the emerging Human Proteome Project.
The Human Genome Project: big science transforms biology and medicine

PubMed Central

2013-01-01

The Human Genome Project has transformed biology through its integrated big science approach to deciphering a reference human genome sequence along with the complete sequences of key model organisms. The project exemplifies the power, necessity and success of large, integrated, cross-disciplinary efforts - so-called ‘big science’ - directed towards complex major objectives. In this article, we discuss the ways in which this ambitious endeavor led to the development of novel technologies and analytical tools, and how it brought the expertise of engineers, computer scientists and mathematicians together with biologists. It established an open approach to data sharing and open-source software, thereby making the data resulting from the project accessible to all. The genome sequences of microbes, plants and animals have revolutionized many fields of science, including microbiology, virology, infectious disease and plant biology. Moreover, deeper knowledge of human sequence variation has begun to alter the practice of medicine. The Human Genome Project has inspired subsequent large-scale data acquisition initiatives such as the International HapMap Project, 1000 Genomes, and The Cancer Genome Atlas, as well as the recently announced Human Brain Project and the emerging Human Proteome Project. PMID:24040834
The need for an assembly pilot project

USDA-ARS?s Scientific Manuscript database

Progress has been rapid since the June 2008 start of the cacao genome sequencing project with the completion of the physical map and the accumulation of approximately 10x coverage of the genome with Titanium 454 sequence data of Matina1-6, the highly homozygous Amelonado tree chosen for the project....
EARLY TRAINING PROJECT. INTERIM REPORT.

ERIC Educational Resources Information Center

GRAY, SUSAN W.; KLAUS, RUPERT A.

THE EARLY TRAINING PROJECT ATTEMPTED TO IMPROVE THE INTELLECTUAL FUNCTIONING AND PERSONAL ADJUSTMENT OF CULTURALLY DISADVANTAGED CHILDREN THROUGH SPECIAL EXPERIENCES IN THE 15- OR 24-MONTHS PRECEDING FIRST GRADE AND IN THE FIRST YEAR OF SCHOOL. THE PROCEDURES OF THE PROJECT CONSISTED OF TWO TRAINING SEQUENCES. THE FIRST SEQUENCE INVOLVED TWO…

Project 1: Microbial Genomes: A Genomic Approach to Understanding the Evolution of Virulence. Project 2: From Genomes to Life: Drosophilia Development in Space and Time

DOE Office of Scientific and Technical Information (OSTI.GOV)

Robert DeSalle

2004-09-10

This project seeks to use the genomes of two close relatives, A. actinomycetemcomitans and H. aphrophilus, to understand the evolutionary changes that take place in a genome to make it more or less virulent. Our primary specific aim of this project was to sequence, annotate, and analyze the genomes of Actinobacillus actinomycetemcomitans (CU1000, serotype f) and Haemophilus aphrophilus. With these genome sequences we have then compared the whole genome sequences to each other and to the current Aa (HK1651 www.genome.ou.edu) genome project sequence along with other fully sequenced Pasteurellaceae to determine inter and intra species differences that may account formore » the differences and similarities in disease. We also propose to create and curate a comprehensive database where sequence information and analysis for the Pasteurellaceae (family that includes the genera Actinobacillus and Haemophilus) are readily accessible. And finally we have proposed to develop phylogenetic techniques that can be used to efficiently and accurately examine the evolution of genomes. Below we report on progress we have made on these major specific aims. Progress on the specific aims is reported below under two major headings--experimental approaches and bioinformatics and systematic biology approaches.« less
Enrichment of target sequences for next-generation sequencing applications in research and diagnostics.

PubMed

Altmüller, Janine; Budde, Birgit S; Nürnberg, Peter

2014-02-01

Abstract Targeted re-sequencing such as gene panel sequencing (GPS) has become very popular in medical genetics, both for research projects and in diagnostic settings. The technical principles of the different enrichment methods have been reviewed several times before; however, new enrichment products are constantly entering the market, and researchers are often puzzled about the requirement to take decisions about long-term commitments, both for the enrichment product and the sequencing technology. This review summarizes important considerations for the experimental design and provides helpful recommendations in choosing the best sequencing strategy for various research projects and diagnostic applications.
The Ensembl genome database project.

PubMed

Hubbard, T; Barker, D; Birney, E; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Huminiecki, L; Kasprzyk, A; Lehvaslaiho, H; Lijnzaad, P; Melsopp, C; Mongin, E; Pettett, R; Pocock, M; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Clamp, M

2002-01-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.
Seward Park High School Project CABES 1983-1984.

ERIC Educational Resources Information Center

New York City Board of Education, Brooklyn. Office of Educational Assessment.

Project CABES (Career Advancement through Bilingual Education) was established in 1983 at Seward Park High School in New York, New York. Its major goal is to serve a population of 250 Hispanic students of limited English proficiency (LEP) interested in pursuing a career advancement sequence rather than a regular academic sequence. Project CABES…
Fast and low-cost structured light pattern sequence projection.

PubMed

Wissmann, Patrick; Forster, Frank; Schmitt, Robert

2011-11-21

We present a high-speed and low-cost approach for structured light pattern sequence projection. Using a fast rotating binary spatial light modulator, our method is potentially capable of projection frequencies in the kHz domain, while enabling pattern rasterization as low as 2 μm pixel size and inherently linear grayscale reproduction quantized at 12 bits/pixel or better. Due to the circular arrangement of the projected fringe patterns, we extend the widely used ray-plane triangulation method to ray-cone triangulation and provide a detailed description of the optical calibration procedure. Using the proposed projection concept in conjunction with the recently published coded phase shift (CPS) pattern sequence, we demonstrate high accuracy 3-D measurement at 200 Hz projection frequency and 20 Hz 3-D reconstruction rate. © 2011 Optical Society of America
The Citizen Cyberscience Lectures - 1) Mobile phones and Africa: a success story 2) Citizen Problem Solving

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ibrahim, Mo

2009-10-28

Mobile phones and Africa: a success story Dr. Mo Ibrahim, Mo Ibrahim Foundation Citizen Problem Solving Dr. Alpheus Bingham, InnoCentive The Citizen Cyberscience Lectures are hosted by the partners of the Citizen Cyberscience Centre, CERN, The UN Institute of Training and Research and the University of Geneva. The goal of the Lectures is to provide an inspirational forum for participants from the various international organizations and academic institutions in Geneva to explore how information technology is enabling greater citizen participation in tackling global development challenges as well as global scientific research. The first Citizen Cyberscience Lectures will welcome two speakersmore » who have both made major innovative contributions in this area. Dr. Mo Ibrahim, founder of Celtel International, one of Africa’s most successful mobile network operators, will talk about “Mobile phones and Africa: a success story”. Dr. Alpheus Bingham, founder of InnoCentive, a Web-based community that solves industrial R&D; challenges, will discuss “Citizen Problem Solving”. The Citizen Cyberscience Lectures are open and free of charge. Participants from outside CERN must register by sending an email to Yasemin.Hauser@cern.ch BEFORE the 23rd october to be able to access CERN. THE LECTURES Mobile phones and Africa: a success story Dr. Mo Ibrahim, Mo Ibrahim Foundation Abstract The introduction of mobile phones into Africa changed the continent, enabling business and the commercial sector, creating directly and indirectly, millions of jobs. It enriched the social lives of many people. Surprisingly, it supported the emerging civil society and advanced the course of democracy Bio Dr Mo Ibrahim is a global expert in mobile communications with a distinguished academic and business career. In 1998, Dr Ibrahim founded Celtel International to build and operate mobile networks in Africa. Celtel became one of Africa’s most successful companies with operations in 15 countries, covering more than a third of the continent’s population and investing more than US 750 millionin Africa.The company was sold to MTC Kuwaitin 2005 for 3.4billion. In 2006 Dr Ibrahim established the Mo Ibrahim Foundation to support great African leadership. The Foundation focuses on two major initiatives to stimulate debate around, and improve the quality of, governance in Africa. The Ibrahim Prize for Achievement in African Leadership recognises and celebrates excellence; and the Ibrahim Index of African Governance provides civil society with a comprehensive and quantifiable tool to promote government accountability. Dr Ibrahim is also Founding Chairman of Satya Capital Ltd, an investment company focused on opportunities in Africa. Dr Ibrahim has been awarded an Honorary Doctorate by the University of London’s School of Oriental and African Studies, the University of Birmingham and De Montfort University, Leicester as well as an Honorary Fellowship Award from the London Business School. He has also received the Chairman’s Award for Lifetime Achievement from the GSM Association in 2007 and the Economists Innovation Award 2007 for Social & Economic Innovation. In 2008 Dr Ibrahim was presented with the BNP Paribas Prize for Philanthropy, and also listed by TIME magazine as one of the 100 most influential people in the world. Citizen Problem Solving Dr. Alpheus Bingham, InnoCentive Abstract American playwright Damien Runyon (Guys and Dolls) once remarked, "the race is not always to the swift, nor the victory to the strong -- but that IS how you bet." Not only does a system of race handicapping follow from this logic, but the whole notion of expertise and technical qualifications. Such 'credentials' allow one to 'bet' on who might most likely solve a difficult challenge, whether as consultant, contractor or employee. Of course, the approach would differ if one were allowed to bet AFTER the race. When such systems came into broad use, i.e., chat rooms, usenets, innocentive, etc., and were subsequently studied, it was often found that the greatest probability of solution lies in the "long tail" of the function rather than in the head representing formally vetted 'experts.' Insight into a problem is often the intersection of training, experience, metaphor and provocation (think Archimedes). Examples of "citizens" outside a targeted field of expertise providing uniques solutions will illustrate the principles involved. Bio Dr. Alph Bingham is a pioneer in the field of open innovation and an advocate of collaborative approaches to research and development. He is co-founder, and former president and chief executive officer of InnoCentive Inc., a Web-based community that matches companies facing R&D; challenges with scientists who propose solutions. Through InnoCentive, a platform that leverages the ability to connect to a whole planet of people through the Internet, organizations can access individuals – problem solvers – who might never have been found. Alph spent more than 25 years with Eli Lilly and Company, and offers deep experience in pharmaceutical research and development, research acquisitions and collaborations, and R&D; strategic planning. During his career he was instrumental in creating and developing Eli Lilly's portfolio management process as well as establishing the divisions of Research Acquisitions, the Office of Alliance Management and e.Lilly, a business innovation unit, from which various other ventures were spun out that create the advantages of open and networked organizational structures, including: InnoCentive, YourEncore, Inc., Coalesix, Inc., Maaguzi, Inc., Indigo Biosystems, Seriosity, Chorus and Collaborative Drug Discovery, Inc. He currently serves on the Board of Directors of InnoCentive, Inc., and Collaborative Drug Discovery, Inc.; the advisory boards of the Center for Collective Intelligence (MIT), and the Business Innovation Factory, as well as a member of the board of trustees of the Bankinter Foundation for Innovation in Madrid. He has lectured extensively at both national and international events and serves as a Visiting Scholar at the National Center for Supercomputing Application at the University of Illinois at Champaign-Urbana. He is also the former chairman of the Board of Editors of the Research Technology Management Journal. Dr. Bingham was the recipient of the Economist's Fourth Annual Innovation Summit "Business Process Award" for InnoCentive. He was also named as one of Project Management Institute's "Power 50" leaders in October 2005. Dr. Bingham received a Ph.D. in organic chemistry from Stanford University.« less
Use of intertidal areas by shrimps (Decapoda) in a Brazilian Amazon estuary.

PubMed

Sampaio, Hebert A; Martinelli-Lemos, Jussara M

2014-03-01

The present work investigated the occupation and the correlation of the shrimp abundance in relation to environmental variables in different habitats (mangroves, salt marshes and rocky outcrops) in an Amazon estuary. The collections were made in August and November 2009, at low syzygy tide on Areuá Beach, situated in the Extractive Reserve of Mãe Grande de Curuçá, Pará, Brazil totaling 20 pools. In each environment, we recorded the physical-chemical factors (pH, salinity, and temperature) and measured the area (m2) and volume (m3) of every pool through bathymetry. The average pH, salinity, temperature, area and volume of tide pools were 8.75 (± 0.8 standard deviation), 35.45 (± 3), 29.49 °C (± 2.32), 27.41 m2 (± 41.18), and 5.19 m3 (± 8.01), respectively. We caught a total of 4,871 shrimps, distributed in three families and four species: Farfantepenaeus subtilis (98.36%) (marine) followed by Alpheus pontederiae (0.76%) (estuarine), Macrobrachium surinamicum (0.45%) and Macrobrachium amazonicum (0.43%) predominantly freshwater. The species F. subtilis and A. pontederiae occurred in the three habitats, whereas M. surinamicum occurred in salt marsh and rocky outcrop and M. amazonicum only in marisma. Temperature and pH were the most important environmental descriptors that significantly affected the density and biomass of shrimps.
Automated sample-preparation technologies in genome sequencing projects.

PubMed

Hilbert, H; Lauber, J; Lubenow, H; Düsterhöft, A

2000-01-01

A robotic workstation system (BioRobot 96OO, QIAGEN) and a 96-well UV spectrophotometer (Spectramax 250, Molecular Devices) were integrated in to the process of high-throughput automated sequencing of double-stranded plasmid DNA templates. An automated 96-well miniprep kit protocol (QIAprep Turbo, QIAGEN) provided high-quality plasmid DNA from shotgun clones. The DNA prepared by this procedure was used to generate more than two mega bases of final sequence data for two genomic projects (Arabidopsis thaliana and Schizosaccharomyces pombe), three thousand expressed sequence tags (ESTs) plus half a mega base of human full-length cDNA clones, and approximately 53,000 single reads for a whole genome shotgun project (Pseudomonas putida).
Identification of genes in anonymous DNA sequences. Final report: Report period, 15 April 1993--15 April 1994

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fields, C.A.

1994-09-01

This Report concludes the DOE Human Genome Program project, ``Identification of Genes in Anonymous DNA Sequence.`` The central goals of this project have been (1) understanding the problem of identifying genes in anonymous sequences, and (2) development of tools, primarily the automated identification system gm, for identifying genes. The activities supported under the previous award are summarized here to provide a single complete report on the activities supported as part of the project from its inception to its completion.
Self-Organizing Hidden Markov Model Map (SOHMMM): Biological Sequence Clustering and Cluster Visualization.

PubMed

Ferles, Christos; Beaufort, William-Scott; Ferle, Vanessa

2017-01-01

The present study devises mapping methodologies and projection techniques that visualize and demonstrate biological sequence data clustering results. The Sequence Data Density Display (SDDD) and Sequence Likelihood Projection (SLP) visualizations represent the input symbolical sequences in a lower-dimensional space in such a way that the clusters and relations of data elements are depicted graphically. Both operate in combination/synergy with the Self-Organizing Hidden Markov Model Map (SOHMMM). The resulting unified framework is in position to analyze automatically and directly raw sequence data. This analysis is carried out with little, or even complete absence of, prior information/domain knowledge.
Human genetics and genomics a decade after the release of the draft sequence of the human genome.

PubMed

Naidoo, Nasheen; Pawitan, Yudi; Soong, Richie; Cooper, David N; Ku, Chee-Seng

2011-10-01

Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.
Human genetics and genomics a decade after the release of the draft sequence of the human genome

PubMed Central

2011-01-01

Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade. PMID:22155605
[The ENCODE project and functional genomics studies].

PubMed

Ding, Nan; Qu, Hongzhu; Fang, Xiangdong

2014-03-01

Upon the completion of the Human Genome Project, scientists have been trying to interpret the underlying genomic code for human biology. Since 2003, National Human Genome Research Institute (NHGRI) has invested nearly $0.3 billion and gathered over 440 scientists from more than 32 institutions in the United States, China, United Kingdom, Japan, Spain and Singapore to initiate the Encyclopedia of DNA Elements (ENCODE) project, aiming to identify and analyze all regulatory elements in the human genome. Taking advantage of the development of next-generation sequencing technologies and continuous improvement of experimental methods, ENCODE had made remarkable achievements: identified methylation and histone modification of DNA sequences and their regulatory effects on gene expression through altering chromatin structures, categorized binding sites of various transcription factors and constructed their regulatory networks, further revised and updated database for pseudogenes and non-coding RNA, and identified SNPs in regulatory sequences associated with diseases. These findings help to comprehensively understand information embedded in gene and genome sequences, the function of regulatory elements as well as the molecular mechanism underlying the transcriptional regulation by noncoding regions, and provide extensive data resource for life sciences, particularly for translational medicine. We re-viewed the contributions of high-throughput sequencing platform development and bioinformatical technology improve-ment to the ENCODE project, the association between epigenetics studies and the ENCODE project, and the major achievement of the ENCODE project. We also provided our prospective on the role of the ENCODE project in promoting the development of basic and clinical medicine.
The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata

PubMed Central

Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A.; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M.; Kyrpides, Nikos C.

2012-01-01

The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11 472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond. PMID:22135293
The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata.

PubMed

Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M; Kyrpides, Nikos C

2012-01-01

The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11,472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond.
Extra projection data identification method for fast-continuous-rotation industrial cone-beam CT.

PubMed

Yang, Min; Duan, Shengling; Duan, Jinghui; Wang, Xiaolong; Li, Xingdong; Meng, Fanyong; Zhang, Jianhai

2013-01-01

Fast-continuous-rotation is an effective measure to improve the scanning speed and decrease the radiation dose for cone-beam CT. However, because of acceleration and deceleration of the motor, as well as the response lag of the scanning control terminals to the host PC, uneven-distributed and redundant projections are inevitably created, which seriously decrease the quality of the reconstruction images. In this paper, we first analyzed the aspects of the theoretical sequence chart of the fast-continuous-rotation mode. Then, an optimized sequence chart was proposed by extending the rotation angle span to ensure the effective 2π-span projections were situated in the stable rotation stage. In order to match the rotation angle with the projection image accurately, structure similarity (SSIM) index was used as a control parameter for extraction of the effective projection sequence which was exactly the complete projection data for image reconstruction. The experimental results showed that SSIM based method had a high accuracy of projection view locating and was easy to realize.
SIMBA: a web tool for managing bacterial genome assembly generated by Ion PGM sequencing technology.

PubMed

Mariano, Diego C B; Pereira, Felipe L; Aguiar, Edgar L; Oliveira, Letícia C; Benevides, Leandro; Guimarães, Luís C; Folador, Edson L; Sousa, Thiago J; Ghosh, Preetam; Barh, Debmalya; Figueiredo, Henrique C P; Silva, Artur; Ramos, Rommel T J; Azevedo, Vasco A C

2016-12-15

The evolution of Next-Generation Sequencing (NGS) has considerably reduced the cost per sequenced-base, allowing a significant rise of sequencing projects, mainly in prokaryotes. However, the range of available NGS platforms requires different strategies and software to correctly assemble genomes. Different strategies are necessary to properly complete an assembly project, in addition to the installation or modification of various software. This requires users to have significant expertise in these software and command line scripting experience on Unix platforms, besides possessing the basic expertise on methodologies and techniques for genome assembly. These difficulties often delay the complete genome assembly projects. In order to overcome this, we developed SIMBA (SImple Manager for Bacterial Assemblies), a freely available web tool that integrates several component tools for assembling and finishing bacterial genomes. SIMBA provides a friendly and intuitive user interface so bioinformaticians, even with low computational expertise, can work under a centralized administrative control system of assemblies managed by the assembly center head. SIMBA guides the users to execute assembly process through simple and interactive pages. SIMBA workflow was divided in three modules: (i) projects: allows a general vision of genome sequencing projects, in addition to data quality analysis and data format conversions; (ii) assemblies: allows de novo assemblies with the software Mira, Minia, Newbler and SPAdes, also assembly quality validations using QUAST software; and (iii) curation: presents methods to finishing assemblies through tools for scaffolding contigs and close gaps. We also presented a case study that validated the efficacy of SIMBA to manage bacterial assemblies projects sequenced using Ion Torrent PGM. Besides to be a web tool for genome assembly, SIMBA is a complete genome assemblies project management system, which can be useful for managing of several projects in laboratories. SIMBA source code is available to download and install in local webservers at http://ufmg-simba.sourceforge.net .
The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes.

PubMed

Adams, David J; Doran, Anthony G; Lilue, Jingtao; Keane, Thomas M

2015-10-01

The Mouse Genomes Project was initiated in 2009 with the goal of using next-generation sequencing technologies to catalogue molecular variation in the common laboratory mouse strains, and a selected set of wild-derived inbred strains. The initial sequencing and survey of sequence variation in 17 inbred strains was completed in 2011 and included comprehensive catalogue of single nucleotide polymorphisms, short insertion/deletions, larger structural variants including their fine scale architecture and landscape of transposable element variation, and genomic sites subject to post-transcriptional alteration of RNA. From this beginning, the resource has expanded significantly to include 36 fully sequenced inbred laboratory mouse strains, a refined and updated data processing pipeline, and new variation querying and data visualisation tools which are available on the project's website ( http://www.sanger.ac.uk/resources/mouse/genomes/ ). The focus of the project is now the completion of de novo assembled chromosome sequences and strain-specific gene structures for the core strains. We discuss how the assembled chromosomes will power comparative analysis, data access tools and future directions of mouse genetics.
Site-Directed Mutagenesis Study of an Antibiotic-Sensing Noncoding RNA Integrated into a One-Semester Project-Based Biochemistry Lab Course

ERIC Educational Resources Information Center

Gerczei, Timea

2017-01-01

A laboratory sequence is described that is suitable for upper-level biochemistry or molecular biology laboratories that combines project-based and traditional laboratory experiments. In the project-based sequence, the individual laboratory experiments are thematically linked and aim to show how a bacterial antibiotic sensing noncoding RNA (the…
The Giardia genome project database.

PubMed

McArthur, A G; Morrison, H G; Nixon, J E; Passamaneck, N Q; Kim, U; Hinkle, G; Crocker, M K; Holder, M E; Farr, R; Reich, C I; Olsen, G E; Aley, S B; Adam, R D; Gillin, F D; Sogin, M L

2000-08-15

The Giardia genome project database provides an online resource for Giardia lamblia (WB strain, clone C6) genome sequence information. The database includes edited single-pass reads, the results of BLASTX searches, and details of progress towards sequencing the entire 12 million-bp Giardia genome. Pre-sorted BLASTX results can be retrieved based on keyword searches and BLAST searches of the high throughput Giardia data can be initiated from the web site or through NCBI. Descriptions of the genomic DNA libraries, project protocols and summary statistics are also available. Although the Giardia genome project is ongoing, new sequences are made available on a bi-monthly basis to ensure that researchers have access to information that may assist them in the search for genes and their biological function. The current URL of the Giardia genome project database is www.mbl.edu/Giardia.

The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata.

PubMed

Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C

2008-01-01

The Genomes On Line Database (GOLD) is a comprehensive resource that provides information on genome and metagenome projects worldwide. Complete and ongoing projects and their associated metadata can be accessed in GOLD through pre-computed lists and a search page. As of September 2007, GOLD contains information on more than 2900 sequencing projects, out of which 639 have been completed and their sequence data deposited in the public databases. GOLD continues to expand with the goal of providing metadata information related to the projects and the organisms/environments towards the Minimum Information about a Genome Sequence' (MIGS) guideline. GOLD is available at http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece at http://gold.imbb.forth.gr/
Diuretic-enhanced gadolinium excretory MR urography: comparison of conventional gradient-echo sequences and echo-planar imaging.

PubMed

Nolte-Ernsting, C C; Tacke, J; Adam, G B; Haage, P; Jung, P; Jakse, G; Günther, R W

2001-01-01

The aim of this study was to investigate the utility of different gadolinium-enhanced T1-weighted gradient-echo techniques in excretory MR urography. In 74 urologic patients, excretory MR urography was performed using various T1-weighted gradient-echo (GRE) sequences after injection of gadolinium-DTPA and low-dose furosemide. The examinations included conventional GRE sequences and echo-planar imaging (GRE EPI), both obtained with 3D data sets and 2D projection images. Breath-hold acquisition was used primarily. In 20 of 74 examinations, we compared breath-hold imaging with respiratory gating. Breath-hold imaging was significantly superior to respiratory gating for the visualization of pelvicaliceal systems, but not for the ureters. Complete MR urograms were obtained within 14-20 s using 3D GRE EPI sequences and in 20-30 s with conventional 3D GRE sequences. Ghost artefacts caused by ureteral peristalsis often occurred with conventional 3D GRE imaging and were almost completely suppressed in EPI sequences (p < 0.0001). Susceptibility effects were more pronounced on GRE EPI MR urograms and calculi measured 0.8-21.7% greater in diameter compared with conventional GRE sequences. Increased spatial resolution degraded the image quality only in GRE-EPI urograms. In projection MR urography, the entire pelvicaliceal system was imaged by acquisition of a fast single-slice sequence and the conventional 2D GRE technique provided superior morphological accuracy than 2D GRE EPI projection images (p < 0.0003). Fast 3D GRE EPI sequences improve the clinical practicability of excretory MR urography especially in old or critically ill patients unable to suspend breathing for more than 20 s. Conventional GRE sequences are superior to EPI in high-resolution detail MR urograms and in projection imaging.
A Molecular Genetics Laboratory Course Applying Bioinformatics and Cell Biology in the Context of Original Research

PubMed Central

Pruitt, Wendy M.; Robinson, Lucy C.

2008-01-01

Research based laboratory courses have been shown to stimulate student interest in science and to improve scientific skills. We describe here a project developed for a semester-long research-based laboratory course that accompanies a genetics lecture course. The project was designed to allow students to become familiar with the use of bioinformatics tools and molecular biology and genetic approaches while carrying out original research. Students were required to present their hypotheses, experiments, and results in a comprehensive lab report. The lab project concerned the yeast casein kinase 1 (CK1) protein kinase Yck2. CK1 protein kinases are present in all organisms and are well conserved in primary structure. These enzymes display sequence features that differ from other protein kinase subfamilies. Students identified such sequences within the CK1 subfamily, chose a sequence to analyze, used available structural data to determine possible functions for their sequences, and designed mutations within the sequences. After generating the mutant alleles, these were expressed in yeast and tested for function by using two growth assays. The student response to the project was positive, both in terms of knowledge and skills increases and interest in research, and several students are continuing the analysis of mutant alleles as summer projects. PMID:19047427
The DNA Data Bank of Japan launches a new resource, the DDBJ Omics Archive of functional genomics experiments.

PubMed

Kodama, Yuichi; Mashima, Jun; Kaminuma, Eli; Gojobori, Takashi; Ogasawara, Osamu; Takagi, Toshihisa; Okubo, Kousaku; Nakamura, Yasukazu

2012-01-01

The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. The central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: the 'DDBJ Omics Archive' (DOR; http://trace.ddbj.nig.ac.jp/dor) and BioProject (http://trace.ddbj.nig.ac.jp/bioproject). DOR is an archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides an organizational framework to access metadata about research projects and the data from the projects that are deposited into different databases. In this article, we describe major changes and improvements introduced to the DDBJ services, and the launch of two new resources: DOR and BioProject.
Ebbie: automated analysis and storage of small RNA cloning data using a dynamic web server

PubMed Central

Ebhardt, H Alexander; Wiese, Kay C; Unrau, Peter J

2006-01-01

Background DNA sequencing is used ubiquitously: from deciphering genomes[1] to determining the primary sequence of small RNAs (smRNAs) [2-5]. The cloning of smRNAs is currently the most conventional method to determine the actual sequence of these important regulators of gene expression. Typical smRNA cloning projects involve the sequencing of hundreds to thousands of smRNA clones that are delimited at their 5' and 3' ends by fixed sequence regions. These primers result from the biochemical protocol used to isolate and convert the smRNA into clonable PCR products. Recently we completed a smRNA cloning project involving tobacco plants, where analysis was required for ~700 smRNA sequences[6]. Finding no easily accessible research tool to enter and analyze smRNA sequences we developed Ebbie to assist us with our study. Results Ebbie is a semi-automated smRNA cloning data processing algorithm, which initially searches for any substring within a DNA sequencing text file, which is flanked by two constant strings. The substring, also termed smRNA or insert, is stored in a MySQL and BlastN database. These inserts are then compared using BlastN to locally installed databases allowing the rapid comparison of the insert to both the growing smRNA database and to other static sequence databases. Our laboratory used Ebbie to analyze scores of DNA sequencing data originating from an smRNA cloning project[6]. Through its built-in instant analysis of all inserts using BlastN, we were able to quickly identify 33 groups of smRNAs from ~700 database entries. This clustering allowed the easy identification of novel and highly expressed clusters of smRNAs. Ebbie is available under GNU GPL and currently implemented on Conclusion Ebbie was designed for medium sized smRNA cloning projects with about 1,000 database entries [6-8].Ebbie can be used for any type of sequence analysis where two constant primer regions flank a sequence of interest. The reliable storage of inserts, and their annotation in a MySQL database, BlastN[9] comparison of new inserts to dynamic and static databases make it a powerful new tool in any laboratory using DNA sequencing. Ebbie also prevents manual mistakes during the excision process and speeds up annotation and data-entry. Once the server is installed locally, its access can be restricted to protect sensitive new DNA sequencing data. Ebbie was primarily designed for smRNA cloning projects, but can be applied to a variety of RNA and DNA cloning projects[2,3,10,11]. PMID:16584563
The Pediatric Cancer Genome Project

PubMed Central

Downing, James R; Wilson, Richard K; Zhang, Jinghui; Mardis, Elaine R; Pui, Ching-Hon; Ding, Li; Ley, Timothy J; Evans, William E

2013-01-01

The St. Jude Children’s Research Hospital–Washington University Pediatric Cancer Genome Project (PCGP) is participating in the international effort to identify somatic mutations that drive cancer. These cancer genome sequencing efforts will not only yield an unparalleled view of the altered signaling pathways in cancer but should also identify new targets against which novel therapeutics can be developed. Although these projects are still deep in the phase of generating primary DNA sequence data, important results are emerging and valuable community resources are being generated that should catalyze future cancer research. We describe here the rationale for conducting the PCGP, present some of the early results of this project and discuss the major lessons learned and how these will affect the application of genomic sequencing in the clinic. PMID:22641210
MIPS: a database for genomes and protein sequences

PubMed Central

Mewes, H. W.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Mayer, K.; Mokrejs, M.; Morgenstern, B.; Münsterkötter, M.; Rudd, S.; Weil, B.

2002-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz–Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91–93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155–158; Barker et al. (2001) Nucleic Acids Res., 29, 29–32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de). PMID:11752246
MIPS: a database for genomes and protein sequences.

PubMed

Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

2002-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).
The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata.

PubMed

Liolios, Konstantinos; Chen, I-Min A; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M; Kyrpides, Nikos C

2010-01-01

The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/
The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

PubMed Central

Liolios, Konstantinos; Chen, I-Min A.; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M.; Kyrpides, Nikos C.

2010-01-01

The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/ PMID:19914934
GI-POP: a combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects.

PubMed

Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi

2013-04-10

Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. Copyright © 2012 Elsevier B.V. All rights reserved.
33 CFR 385.30 - Master Implementation Sequencing Plan.

Code of Federal Regulations, 2011 CFR

2011-07-01

... Incorporating New Information Into the Plan § 385.30 Master Implementation Sequencing Plan. (a) Not later than... projects of the Plan, including pilot projects and operational elements, based on the best scientific, technical, funding, contracting, and other information available. The Corps of Engineers and the South...
Automated Finishing with Autofinish

PubMed Central

Gordon, David; Desmarais, Cindy; Green, Phil

2001-01-01

Currently, the genome sequencing community is producing shotgun sequence data at a very high rate, but finishing (collecting additional directed sequence data to close gaps and improve the quality of the data) is not matching that rate. One reason for the difference is that shotgun sequencing is highly automated but finishing is not: Most finishing decisions, such as which directed reads to obtain and which specialized sequencing techniques to use, are made by people. If finishing rates are to increase to match shotgun sequencing rates, most finishing decisions also must be automated. The Autofinish computer program (which is part of the Consed computer software package) does this by automatically choosing finishing reads. Autofinish is able to suggest most finishing reads required for completion of each sequencing project, greatly reducing the amount of human attention needed. Autofinish sometimes completely finishes the project, with no human decisions required. It cannot solve the most complex problems, so we recommend that Autofinish be allowed to suggest reads for the first three rounds of finishing, and if the project still is not finished completely, a human finisher complete the work. We compared this Autofinish-Hybrid method of finishing against a human finisher in five different projects with a variety of shotgun depths by finishing each project twice—once with each method. This comparison shows that the Autofinish-Hybrid method saves many hours over a human finisher alone, while using roughly the same number and type of reads and closing gaps at roughly the same rate. Autofinish currently is in production use at several large sequencing centers. It is designed to be adaptable to the finishing strategy of the lab—it can finish using some or all of the following: resequencing reads, reverses, custom primer walks on either subclone templates or whole clone templates, PCR, or minilibraries. Autofinish has been used for finishing cDNA, genomic clones, and whole bacterial genomes (see http://www.phrap.org). PMID:11282977
Design methodology and projects for space engineering

NASA Technical Reports Server (NTRS)

Nichols, S.; Kleespies, H.; Wood, K.; Crawford, R.

1993-01-01

NASA/USRA is an ongoing sponsor of space design projects in the senior design course of the Mechanical Engineering Department at The University of Texas at Austin. This paper describes the UT senior design sequence, consisting of a design methodology course and a capstone design course. The philosophical basis of this sequence is briefly summarized. A history of the Department's activities in the Advanced Design Program is then presented. The paper concludes with a description of the projects completed during the 1991-92 academic year and the ongoing projects for the Fall 1992 semester.
Filling Gaps in Biodiversity Knowledge for Macrofungi: Contributions and Assessment of an Herbarium Collection DNA Barcode Sequencing Project

PubMed Central

Osmundson, Todd W.; Robert, Vincent A.; Schoch, Conrad L.; Baker, Lydia J.; Smith, Amy; Robich, Giovanni; Mizzan, Luca; Garbelotto, Matteo M.

2013-01-01

Despite recent advances spearheaded by molecular approaches and novel technologies, species description and DNA sequence information are significantly lagging for fungi compared to many other groups of organisms. Large scale sequencing of vouchered herbarium material can aid in closing this gap. Here, we describe an effort to obtain broad ITS sequence coverage of the approximately 6000 macrofungal-species-rich herbarium of the Museum of Natural History in Venice, Italy. Our goals were to investigate issues related to large sequencing projects, develop heuristic methods for assessing the overall performance of such a project, and evaluate the prospects of such efforts to reduce the current gap in fungal biodiversity knowledge. The effort generated 1107 sequences submitted to GenBank, including 416 previously unrepresented taxa and 398 sequences exhibiting a best BLAST match to an unidentified environmental sequence. Specimen age and taxon affected sequencing success, and subsequent work on failed specimens showed that an ITS1 mini-barcode greatly increased sequencing success without greatly reducing the discriminating power of the barcode. Similarity comparisons and nonmetric multidimensional scaling ordinations based on pairwise distance matrices proved to be useful heuristic tools for validating the overall accuracy of specimen identifications, flagging potential misidentifications, and identifying taxa in need of additional species-level revision. Comparison of within- and among-species nucleotide variation showed a strong increase in species discriminating power at 1–2% dissimilarity, and identified potential barcoding issues (same sequence for different species and vice-versa). All sequences are linked to a vouchered specimen, and results from this study have already prompted revisions of species-sequence assignments in several taxa. PMID:23638077
Filling gaps in biodiversity knowledge for macrofungi: contributions and assessment of an herbarium collection DNA barcode sequencing project.

PubMed

Osmundson, Todd W; Robert, Vincent A; Schoch, Conrad L; Baker, Lydia J; Smith, Amy; Robich, Giovanni; Mizzan, Luca; Garbelotto, Matteo M

2013-01-01

Despite recent advances spearheaded by molecular approaches and novel technologies, species description and DNA sequence information are significantly lagging for fungi compared to many other groups of organisms. Large scale sequencing of vouchered herbarium material can aid in closing this gap. Here, we describe an effort to obtain broad ITS sequence coverage of the approximately 6000 macrofungal-species-rich herbarium of the Museum of Natural History in Venice, Italy. Our goals were to investigate issues related to large sequencing projects, develop heuristic methods for assessing the overall performance of such a project, and evaluate the prospects of such efforts to reduce the current gap in fungal biodiversity knowledge. The effort generated 1107 sequences submitted to GenBank, including 416 previously unrepresented taxa and 398 sequences exhibiting a best BLAST match to an unidentified environmental sequence. Specimen age and taxon affected sequencing success, and subsequent work on failed specimens showed that an ITS1 mini-barcode greatly increased sequencing success without greatly reducing the discriminating power of the barcode. Similarity comparisons and nonmetric multidimensional scaling ordinations based on pairwise distance matrices proved to be useful heuristic tools for validating the overall accuracy of specimen identifications, flagging potential misidentifications, and identifying taxa in need of additional species-level revision. Comparison of within- and among-species nucleotide variation showed a strong increase in species discriminating power at 1-2% dissimilarity, and identified potential barcoding issues (same sequence for different species and vice-versa). All sequences are linked to a vouchered specimen, and results from this study have already prompted revisions of species-sequence assignments in several taxa.
An efficient approach to BAC based assembly of complex genomes.

PubMed

Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David

2016-01-01

There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.
i-rDNA: alignment-free algorithm for rapid in silico detection of ribosomal gene fragments from metagenomic sequence data sets.

PubMed

Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Chadaram, Sudha; Mande, Sharmila S

2011-11-30

Obtaining accurate estimates of microbial diversity using rDNA profiling is the first step in most metagenomics projects. Consequently, most metagenomic projects spend considerable amounts of time, money and manpower for experimentally cloning, amplifying and sequencing the rDNA content in a metagenomic sample. In the second step, the entire genomic content of the metagenome is extracted, sequenced and analyzed. Since DNA sequences obtained in this second step also contain rDNA fragments, rapid in silico identification of these rDNA fragments would drastically reduce the cost, time and effort of current metagenomic projects by entirely bypassing the experimental steps of primer based rDNA amplification, cloning and sequencing. In this study, we present an algorithm called i-rDNA that can facilitate the rapid detection of 16S rDNA fragments from amongst millions of sequences in metagenomic data sets with high detection sensitivity. Performance evaluation with data sets/database variants simulating typical metagenomic scenarios indicates the significantly high detection sensitivity of i-rDNA. Moreover, i-rDNA can process a million sequences in less than an hour on a simple desktop with modest hardware specifications. In addition to the speed of execution, high sensitivity and low false positive rate, the utility of the algorithmic approach discussed in this paper is immense given that it would help in bypassing the entire experimental step of primer-based rDNA amplification, cloning and sequencing. Application of this algorithmic approach would thus drastically reduce the cost, time and human efforts invested in all metagenomic projects. A web-server for the i-rDNA algorithm is available at http://metagenomics.atc.tcs.com/i-rDNA/
Illumina GA IIx& HiSeq 2000 Production Sequenccing and QC Analysis Pipelines at the DOE Joint Genome Institute

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daum, Christopher; Zane, Matthew; Han, James

2011-01-31

The U.S. Department of Energy (DOE) Joint Genome Institute's (JGI) Production Sequencing group is committed to the generation of high-quality genomic DNA sequence to support the mission areas of renewable energy generation, global carbon management, and environmental characterization and clean-up. Within the JGI's Production Sequencing group, a robust Illumina Genome Analyzer and HiSeq pipeline has been established. Optimization of the sesequencer pipelines has been ongoing with the aim of continual process improvement of the laboratory workflow, reducing operational costs and project cycle times to increases ample throughput, and improving the overall quality of the sequence generated. A sequence QC analysismore » pipeline has been implemented to automatically generate read and assembly level quality metrics. The foremost of these optimization projects, along with sequencing and operational strategies, throughput numbers, and sequencing quality results will be presented.« less
ITEMS Project: An online sequence for teaching mathematics and astronomy

NASA Astrophysics Data System (ADS)

Martínez, Bernat; Pérez, Josep

2010-10-01

This work describes an elearning sequence for teaching geometry and astronomy in lower secondary school created inside the ITEMS (Improving Teacher Education in Mathematics and Science) project. It is based on results from the astronomy education research about studentsŠ difficulties in understanding elementary astronomical observations and models. The sequence consists of a set of computer animations embedded in an elearning environment aimed at supporting students in learning about astronomy ideas that require the use of geometrical concepts and visual-spatial reasoning.

A Team Taught Interdisciplinary Approach To Physics and Calculus Education.

ERIC Educational Resources Information Center

Johnson, David B.

The Special Intensive Program for Scientists and Engineers (SIPSE) at Diablo Valley College in California replaces the traditional engineering calculus and physics sequences with a single sequence that combines the two subjects into an integrated whole. The project report provides an overview of SIPSE, a section that traces the project from…
Teaching Research Methodology Using a Project-Based Three Course Sequence Critical Reflections on Practice

ERIC Educational Resources Information Center

Braguglia, Kay H.; Jackson, Kanata A.

2012-01-01

This article presents a reflective analysis of teaching research methodology through a three course sequence using a project-based approach. The authors reflect critically on their experiences in teaching research methods courses in an undergraduate business management program. The introduction of a range of specific techniques including student…
Human genome project: revolutionizing biology through leveraging technology

NASA Astrophysics Data System (ADS)

Dahl, Carol A.; Strausberg, Robert L.

1996-04-01

The Human Genome Project (HGP) is an international project to develop genetic, physical, and sequence-based maps of the human genome. Since the inception of the HGP it has been clear that substantially improved technology would be required to meet the scientific goals, particularly in order to acquire the complete sequence of the human genome, and that these technologies coupled with the information forthcoming from the project would have a dramatic effect on the way biomedical research is performed in the future. In this paper, we discuss the state-of-the-art for genomic DNA sequencing, technological challenges that remain, and the potential technological paths that could yield substantially improved genomic sequencing technology. The impact of the technology developed from the HGP is broad-reaching and a discussion of other research and medical applications that are leveraging HGP-derived DNA analysis technologies is included. The multidisciplinary approach to the development of new technologies that has been successful for the HGP provides a paradigm for facilitating new genomic approaches toward understanding the biological role of functional elements and systems within the cell, including those encoded within genomic DNA and their molecular products.
SalmonDB: a bioinformatics resource for Salmo salar and Oncorhynchus mykiss

PubMed Central

Di Génova, Alex; Aravena, Andrés; Zapata, Luis; González, Mauricio; Maass, Alejandro; Iturra, Patricia

2011-01-01

SalmonDB is a new multiorganism database containing EST sequences from Salmo salar, Oncorhynchus mykiss and the whole genome sequence of Danio rerio, Gasterosteus aculeatus, Tetraodon nigroviridis, Oryzias latipes and Takifugu rubripes, built with core components from GMOD project, GOPArc system and the BioMart project. The information provided by this resource includes Gene Ontology terms, metabolic pathways, SNP prediction, CDS prediction, orthologs prediction, several precalculated BLAST searches and domains. It also provides a BLAST server for matching user-provided sequences to any of the databases and an advanced query tool (BioMart) that allows easy browsing of EST databases with user-defined criteria. These tools make SalmonDB database a valuable resource for researchers searching for transcripts and genomic information regarding S. salar and other salmonid species. The database is expected to grow in the near feature, particularly with the S. salar genome sequencing project. Database URL: http://genomicasalmones.dim.uchile.cl/ PMID:22120661
SalmonDB: a bioinformatics resource for Salmo salar and Oncorhynchus mykiss.

PubMed

Di Génova, Alex; Aravena, Andrés; Zapata, Luis; González, Mauricio; Maass, Alejandro; Iturra, Patricia

2011-01-01

SalmonDB is a new multiorganism database containing EST sequences from Salmo salar, Oncorhynchus mykiss and the whole genome sequence of Danio rerio, Gasterosteus aculeatus, Tetraodon nigroviridis, Oryzias latipes and Takifugu rubripes, built with core components from GMOD project, GOPArc system and the BioMart project. The information provided by this resource includes Gene Ontology terms, metabolic pathways, SNP prediction, CDS prediction, orthologs prediction, several precalculated BLAST searches and domains. It also provides a BLAST server for matching user-provided sequences to any of the databases and an advanced query tool (BioMart) that allows easy browsing of EST databases with user-defined criteria. These tools make SalmonDB database a valuable resource for researchers searching for transcripts and genomic information regarding S. salar and other salmonid species. The database is expected to grow in the near feature, particularly with the S. salar genome sequencing project. Database URL: http://genomicasalmones.dim.uchile.cl/
Exploiting long read sequencing technologies to establish high quality highly contiguous pig reference genome assemblies

USDA-ARS?s Scientific Manuscript database

The current pig reference genome sequence (Sscrofa10.2) was established using Sanger sequencing and following the clone-by-clone hierarchical shotgun sequencing approach used in the public human genome project. However, as sequence coverage was low (4-6x) the resulting assembly was only of draft qua...
The Citizen Cyberscience Lectures - 1) Mobile phones and Africa: a success story 2) Citizen Problem Solving

ScienceCinema

Ibrahim, Mo

2018-05-25

Mobile phones and Africa: a success story Dr. Mo Ibrahim, Mo Ibrahim Foundation Citizen Problem Solving Dr. Alpheus Bingham, InnoCentive The Citizen Cyberscience Lectures are hosted by the partners of the Citizen Cyberscience Centre, CERN, The UN Institute of Training and Research and the University of Geneva. The goal of the Lectures is to provide an inspirational forum for participants from the various international organizations and academic institutions in Geneva to explore how information technology is enabling greater citizen participation in tackling global development challenges as well as global scientific research. The first Citizen Cyberscience Lectures will welcome two speakers who have both made major innovative contributions in this area. Dr. Mo Ibrahim, founder of Celtel International, one of Africaâs most successful mobile network operators, will talk about âMobile phones and Africa: a success storyâ. Dr. Alpheus Bingham, founder of InnoCentive, a Web-based community that solves industrial R&D; challenges, will discuss âCitizen Problem Solvingâ. The Citizen Cyberscience Lectures are open and free of charge. Participants from outside CERN must register by sending an email to Yasemin.Hauser@cern.ch BEFORE the 23rd october to be able to access CERN. THE LECTURES Mobile phones and Africa: a success story Dr. Mo Ibrahim, Mo Ibrahim Foundation Abstract The introduction of mobile phones into Africa changed the continent, enabling business and the commercial sector, creating directly and indirectly, millions of jobs. It enriched the social lives of many people. Surprisingly, it supported the emerging civil society and advanced the course of democracy Bio Dr Mo Ibrahim is a global expert in mobile communications with a distinguished academic and business career. In 1998, Dr Ibrahim founded Celtel International to build and operate mobile networks in Africa. Celtel became one of Africaâs most successful companies with operations in 15 countries, covering more than a third of the continentâs population and investing more than US 750 millionin Africa.The company was sold to MTC Kuwaitin 2005 for 3.4billion. In 2006 Dr Ibrahim established the Mo Ibrahim Foundation to support great African leadership. The Foundation focuses on two major initiatives to stimulate debate around, and improve the quality of, governance in Africa. The Ibrahim Prize for Achievement in African Leadership recognises and celebrates excellence; and the Ibrahim Index of African Governance provides civil society with a comprehensive and quantifiable tool to promote government accountability. Dr Ibrahim is also Founding Chairman of Satya Capital Ltd, an investment company focused on opportunities in Africa. Dr Ibrahim has been awarded an Honorary Doctorate by the University of Londonâs School of Oriental and African Studies, the University of Birmingham and De Montfort University, Leicester as well as an Honorary Fellowship Award from the London Business School. He has also received the Chairmanâs Award for Lifetime Achievement from the GSM Association in 2007 and the Economists Innovation Award 2007 for Social & Economic Innovation. In 2008 Dr Ibrahim was presented with the BNP Paribas Prize for Philanthropy, and also listed by TIME magazine as one of the 100 most influential people in the world. Citizen Problem Solving Dr. Alpheus Bingham, InnoCentive Abstract American playwright Damien Runyon (Guys and Dolls) once remarked, "the race is not always to the swift, nor the victory to the strong -- but that IS how you bet." Not only does a system of race handicapping follow from this logic, but the whole notion of expertise and technical qualifications. Such 'credentials' allow one to 'bet' on who might most likely solve a difficult challenge, whether as consultant, contractor or employee. Of course, the approach would differ if one were allowed to bet AFTER the race. When such systems came into broad use, i.e., chat rooms, usenets, innocentive, etc., and were subsequently studied, it was often found that the greatest probability of solution lies in the "long tail" of the function rather than in the head representing formally vetted 'experts.' Insight into a problem is often the intersection of training, experience, metaphor and provocation (think Archimedes). Examples of "citizens" outside a targeted field of expertise providing uniques solutions will illustrate the principles involved. Bio Dr. Alph Bingham is a pioneer in the field of open innovation and an advocate of collaborative approaches to research and development. He is co-founder, and former president and chief executive officer of InnoCentive Inc., a Web-based community that matches companies facing R&D; challenges with scientists who propose solutions. Through InnoCentive, a platform that leverages the ability to connect to a whole planet of people through the Internet, organizations can access individuals â problem solvers â who might never have been found. Alph spent more than 25 years with Eli Lilly and Company, and offers deep experience in pharmaceutical research and development, research acquisitions and collaborations, and R&D; strategic planning. During his career he was instrumental in creating and developing Eli Lilly's portfolio management process as well as establishing the divisions of Research Acquisitions, the Office of Alliance Management and e.Lilly, a business innovation unit, from which various other ventures were spun out that create the advantages of open and networked organizational structures, including: InnoCentive, YourEncore, Inc., Coalesix, Inc., Maaguzi, Inc., Indigo Biosystems, Seriosity, Chorus and Collaborative Drug Discovery, Inc. He currently serves on the Board of Directors of InnoCentive, Inc., and Collaborative Drug Discovery, Inc.; the advisory boards of the Center for Collective Intelligence (MIT), and the Business Innovation Factory, as well as a member of the board of trustees of the Bankinter Foundation for Innovation in Madrid. He has lectured extensively at both national and international events and serves as a Visiting Scholar at the National Center for Supercomputing Application at the University of Illinois at Champaign-Urbana. He is also the former chairman of the Board of Editors of the Research Technology Management Journal. Dr. Bingham was the recipient of the Economist's Fourth Annual Innovation Summit "Business Process Award" for InnoCentive. He was also named as one of Project Management Institute's "Power 50" leaders in October 2005. Dr. Bingham received a Ph.D. in organic chemistry from Stanford University.
Registration methods for nonblind watermark detection in digital cinema applications

NASA Astrophysics Data System (ADS)

Nguyen, Philippe; Balter, Raphaele; Montfort, Nicolas; Baudry, Severine

2003-06-01

Digital watermarking may be used to enforce copyright protection of digital cinema, by embedding in each projected movie an unique identifier (fingerprint). By identifying the source of illegal copies, watermarking will thus incite movie theatre managers to enforce copyright protection, in particular by preventing people from coming in with a handy cam. We propose here a non-blind watermark method to improve the watermark detection on very impaired sequences. We first present a study on the picture impairments caused by the projection on a screen, then acquisition with a handy cam. We show that images undergo geometric deformations, which are fully described by a projective geometry model. The sequence also undergoes spatial and temporal luminance variation. Based on this study and on the impairments models which follow, we propose a method to match the retrieved sequence to the original one. First, temporal registration is performed by comparing the average luminance variation on both sequences. To compensate for geometric transformations, we used paired points from both sequences, obtained by applying a feature points detector. The matching of the feature points then enables to retrieve the geometric transform parameters. Tests show that the watermark retrieval on rectified sequences is greatly improved.
Selecting sequence variants to improve genomic predictions for dairy cattle

USDA-ARS?s Scientific Manuscript database

Millions of genetic variants have been identified by population-scale sequencing projects, but subsets are needed for routine genomic predictions or to include on genotyping arrays. Methods of selecting sequence variants were compared using both simulated sequence genotypes and actual data from run ...
The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide

PubMed Central

Liolios, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Kyrpides, Nikos C.

2006-01-01

The Genomes On Line Database (GOLD) is a web resource for comprehensive access to information regarding complete and ongoing genome sequencing projects worldwide. The database currently incorporates information on over 1500 sequencing projects, of which 294 have been completed and the data deposited in the public databases. GOLD v.2 has been expanded to provide information related to organism properties such as phenotype, ecotype and disease. Furthermore, project relevance and availability information is now included. GOLD is available at . It is also mirrored at the Institute of Molecular Biology and Biotechnology, Crete, Greece at PMID:16381880
Serendipitous discovery of Wolbachia genomes in multiple Drosophila species.

PubMed

Salzberg, Steven L; Dunning Hotopp, Julie C; Delcher, Arthur L; Pop, Mihai; Smith, Douglas R; Eisen, Michael B; Nelson, William C

2005-01-01

The Trace Archive is a repository for the raw, unanalyzed data generated by large-scale genome sequencing projects. The existence of this data offers scientists the possibility of discovering additional genomic sequences beyond those originally sequenced. In particular, if the source DNA for a sequencing project came from a species that was colonized by another organism, then the project may yield substantial amounts of genomic DNA, including near-complete genomes, from the symbiotic or parasitic organism. By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly: Drosophila ananassae, D. simulans, and D. mojavensis. We extracted all sequences with partial matches to a previously sequenced Wolbachia strain and assembled those sequences using customized software. For one of the three new species, the data recovered were sufficient to produce an assembly that covers more than 95% of the genome; for a second species the data produce the equivalent of a 'light shotgun' sampling of the genome, covering an estimated 75-80% of the genome; and for the third species the data cover approximately 6-7% of the genome. The results of this study reveal an unexpected benefit of depositing raw data in a central genome sequence repository: new species can be discovered within this data. The differences between these three new Wolbachia genomes and the previously sequenced strain revealed numerous rearrangements and insertions within each lineage and hundreds of novel genes. The three new genomes, with annotation, have been deposited in GenBank.
Standardized Metadata for Human Pathogen/Vector Genomic Sequences

PubMed Central

Dugan, Vivien G.; Emrich, Scott J.; Giraldo-Calderón, Gloria I.; Harb, Omar S.; Newman, Ruchi M.; Pickett, Brett E.; Schriml, Lynn M.; Stockwell, Timothy B.; Stoeckert, Christian J.; Sullivan, Dan E.; Singh, Indresh; Ward, Doyle V.; Yao, Alison; Zheng, Jie; Barrett, Tanya; Birren, Bruce; Brinkac, Lauren; Bruno, Vincent M.; Caler, Elizabet; Chapman, Sinéad; Collins, Frank H.; Cuomo, Christina A.; Di Francesco, Valentina; Durkin, Scott; Eppinger, Mark; Feldgarden, Michael; Fraser, Claire; Fricke, W. Florian; Giovanni, Maria; Henn, Matthew R.; Hine, Erin; Hotopp, Julie Dunning; Karsch-Mizrachi, Ilene; Kissinger, Jessica C.; Lee, Eun Mi; Mathur, Punam; Mongodin, Emmanuel F.; Murphy, Cheryl I.; Myers, Garry; Neafsey, Daniel E.; Nelson, Karen E.; Nierman, William C.; Puzak, Julia; Rasko, David; Roos, David S.; Sadzewicz, Lisa; Silva, Joana C.; Sobral, Bruno; Squires, R. Burke; Stevens, Rick L.; Tallon, Luke; Tettelin, Herve; Wentworth, David; White, Owen; Will, Rebecca; Wortman, Jennifer; Zhang, Yun; Scheuermann, Richard H.

2014-01-01

High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium’s minimal information (MIxS) and NCBI’s BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant. PMID:24936976
Standardized metadata for human pathogen/vector genomic sequences.

PubMed

Dugan, Vivien G; Emrich, Scott J; Giraldo-Calderón, Gloria I; Harb, Omar S; Newman, Ruchi M; Pickett, Brett E; Schriml, Lynn M; Stockwell, Timothy B; Stoeckert, Christian J; Sullivan, Dan E; Singh, Indresh; Ward, Doyle V; Yao, Alison; Zheng, Jie; Barrett, Tanya; Birren, Bruce; Brinkac, Lauren; Bruno, Vincent M; Caler, Elizabet; Chapman, Sinéad; Collins, Frank H; Cuomo, Christina A; Di Francesco, Valentina; Durkin, Scott; Eppinger, Mark; Feldgarden, Michael; Fraser, Claire; Fricke, W Florian; Giovanni, Maria; Henn, Matthew R; Hine, Erin; Hotopp, Julie Dunning; Karsch-Mizrachi, Ilene; Kissinger, Jessica C; Lee, Eun Mi; Mathur, Punam; Mongodin, Emmanuel F; Murphy, Cheryl I; Myers, Garry; Neafsey, Daniel E; Nelson, Karen E; Nierman, William C; Puzak, Julia; Rasko, David; Roos, David S; Sadzewicz, Lisa; Silva, Joana C; Sobral, Bruno; Squires, R Burke; Stevens, Rick L; Tallon, Luke; Tettelin, Herve; Wentworth, David; White, Owen; Will, Rebecca; Wortman, Jennifer; Zhang, Yun; Scheuermann, Richard H

2014-01-01

High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.

PubMed

O'Leary, Nuala A; Wright, Mathew W; Brister, J Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S; Kodali, Vamsi K; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M; Murphy, Michael R; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H; Rausch, Daniel; Riddick, Lillian D; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E; Vatsan, Anjana R; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J; Kimchi, Avi; Tatusova, Tatiana; DiCuccio, Michael; Kitts, Paul; Murphy, Terence D; Pruitt, Kim D

2016-01-04

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Use of sequence-independent-single-primer-amplification (SISPA) for whole genome sequencing using illumina MiSeq platform for avian influenza virus, Newcastle disease virus, and infectious bronchitis virus

USDA-ARS?s Scientific Manuscript database

Over the past decade, Next Generation Sequencing (NGS) technologies, also called deep sequencing, have continued to evolve, increasing capacity and lower the cost necessary for large genome sequencing projects. The one of the advantage of NGS platforms is the possibility to sequence the samples with...
Genome sequencing of the redbanded stink bug (Piezodorus guildinii)

USDA-ARS?s Scientific Manuscript database

We assembled a partial genome sequence from the redbanded stink bug, Piezodorus guildinii from Illumina MiSeq sequencing runs. The sequence has been submitted and published under NCBI GenBank Accession Number JTEQ01000000. The BioProject and BioSample Accession numbers are PRJNA263369 and SAMN030997...
A remark on copy number variation detection methods.

PubMed

Li, Shuo; Dou, Xialiang; Gao, Ruiqi; Ge, Xinzhou; Qian, Minping; Wan, Lin

2018-01-01

Copy number variations (CNVs) are gain and loss of DNA sequence of a genome. High throughput platforms such as microarrays and next generation sequencing technologies (NGS) have been applied for genome wide copy number losses. Although progress has been made in both approaches, the accuracy and consistency of CNV calling from the two platforms remain in dispute. In this study, we perform a deep analysis on copy number losses on 254 human DNA samples, which have both SNP microarray data and NGS data publicly available from Hapmap Project and 1000 Genomes Project respectively. We show that the copy number losses reported from Hapmap Project and 1000 Genome Project only have < 30% overlap, while these reports are required to have cross-platform (e.g. PCR, microarray and high-throughput sequencing) experimental supporting by their corresponding projects, even though state-of-art calling methods were employed. On the other hand, copy number losses are found directly from HapMap microarray data by an accurate algorithm, i.e. CNVhac, almost all of which have lower read mapping depth in NGS data; furthermore, 88% of which can be supported by the sequences with breakpoint in NGS data. Our results suggest the ability of microarray calling CNVs and the possible introduction of false negatives from the unessential requirement of the additional cross-platform supporting. The inconsistency of CNV reports from Hapmap Project and 1000 Genomes Project might result from the inadequate information containing in microarray data, the inconsistent detection criteria, or the filtration effect of cross-platform supporting. The statistical test on CNVs called from CNVhac show that the microarray data can offer reliable CNV reports, and majority of CNV candidates can be confirmed by raw sequences. Therefore, the CNV candidates given by a good caller could be highly reliable without cross-platform supporting, so additional experimental information should be applied in need instead of necessarily.
Species Choice for Comparative Genomics: Being Greedy Works

PubMed Central

Pardi, Fabio; Goldman, Nick

2005-01-01

Several projects investigating genetic function and evolution through sequencing and comparison of multiple genomes are now underway. These projects consume many resources, and appropriate planning should be devoted to choosing which species to sequence, potentially involving cooperation among different sequencing centres. A widely discussed criterion for species choice is the maximisation of evolutionary divergence. Our mathematical formalization of this problem surprisingly shows that the best long-term cooperative strategy coincides with the seemingly short-term “greedy” strategy of always choosing the next best single species. Other criteria influencing species choice, such as medical relevance or sequencing costs, can also be accommodated in our approach, suggesting our results' broad relevance in scientific policy decisions. PMID:16327885
preAssemble: a tool for automatic sequencer trace data processing.

PubMed

Adzhubei, Alexei A; Laerdahl, Jon K; Vlasova, Anna V

2006-01-17

Trace or chromatogram files (raw data) are produced by automatic nucleic acid sequencing equipment or sequencers. Each file contains information which can be interpreted by specialised software to reveal the sequence (base calling). This is done by the sequencer proprietary software or publicly available programs. Depending on the size of a sequencing project the number of trace files can vary from just a few to thousands of files. Sequencing quality assessment on various criteria is important at the stage preceding clustering and contig assembly. Two major publicly available packages--Phred and Staden are used by preAssemble to perform sequence quality processing. The preAssemble pre-assembly sequence processing pipeline has been developed for small to large scale automatic processing of DNA sequencer chromatogram (trace) data. The Staden Package Pregap4 module and base-calling program Phred are utilized in the pipeline, which produces detailed and self-explanatory output that can be displayed with a web browser. preAssemble can be used successfully with very little previous experience, however options for parameter tuning are provided for advanced users. preAssemble runs under UNIX and LINUX operating systems. It is available for downloading and will run as stand-alone software. It can also be accessed on the Norwegian Salmon Genome Project web site where preAssemble jobs can be run on the project server. preAssemble is a tool allowing to perform quality assessment of sequences generated by automatic sequencing equipment. preAssemble is flexible since both interactive jobs on the preAssemble server and the stand alone downloadable version are available. Virtually no previous experience is necessary to run a default preAssemble job, on the other hand options for parameter tuning are provided. Consequently preAssemble can be used as efficiently for just several trace files as for large scale sequence processing.
The ENCODE Project at UC Santa Cruz.

PubMed

Thomas, Daryl J; Rosenbloom, Kate R; Clawson, Hiram; Hinrichs, Angie S; Trumbower, Heather; Raney, Brian J; Karolchik, Donna; Barber, Galt P; Harte, Rachel A; Hillman-Jackson, Jennifer; Kuhn, Robert M; Rhead, Brooke L; Smith, Kayla E; Thakkapallayil, Archana; Zweig, Ann S; Haussler, David; Kent, W James

2007-01-01

The goal of the Encyclopedia Of DNA Elements (ENCODE) Project is to identify all functional elements in the human genome. The pilot phase is for comparison of existing methods and for the development of new methods to rigorously analyze a defined 1% of the human genome sequence. Experimental datasets are focused on the origin of replication, DNase I hypersensitivity, chromatin immunoprecipitation, promoter function, gene structure, pseudogenes, non-protein-coding RNAs, transcribed RNAs, multiple sequence alignment and evolutionarily constrained elements. The ENCODE project at UCSC website (http://genome.ucsc.edu/ENCODE) is the primary portal for the sequence-based data produced as part of the ENCODE project. In the pilot phase of the project, over 30 labs provided experimental results for a total of 56 browser tracks supported by 385 database tables. The site provides researchers with a number of tools that allow them to visualize and analyze the data as well as download data for local analyses. This paper describes the portal to the data, highlights the data that has been made available, and presents the tools that have been developed within the ENCODE project. Access to the data and types of interactive analysis that are possible are illustrated through supplemental examples.

A Sequenced Instructional Program in Physical Education for the Handicapped, Phase III. Producing and Disseminating Demonstration Packages. Final Report.

ERIC Educational Resources Information Center

Carr, Dorothy B.; Avance, Lyonel D.

Presented is a sequenced instructional program in physical education which constitutes the third of a three-phase, 4-year project, funded by Title III, for handicapped children, preschool through high school levels, in the Los Angeles Unified School District. Described are the project setting and the following accomplishments: a curriculum guide…
Waves and Particles, The Orbital Atom, Parts One and Two of an Integrated Science Sequence, Teacher's Guide, 1973 Edition.

ERIC Educational Resources Information Center

Portland Project Committee, OR.

This teacher's guide includes parts one and two of the four-part third year Portland Project, a three-year integrated secondary science curriculum sequence. The Harvard Project Physics textbook is used for reading assignments for part one. Assignments relate to waves, light, electricity, magnetic fields, Faraday and the electrical age,…
Cassini Mission Sequence Subsystem (MSS)

NASA Technical Reports Server (NTRS)

Alland, Robert

2011-01-01

This paper describes my work with the Cassini Mission Sequence Subsystem (MSS) team during the summer of 2011. It gives some background on the motivation for this project and describes the expected benefit to the Cassini program. It then introduces the two tasks that I worked on - an automatic system auditing tool and a series of corrections to the Cassini Sequence Generator (SEQ_GEN) - and the specific objectives these tasks were to accomplish. Next, it details the approach I took to meet these objectives and the results of this approach, followed by a discussion of how the outcome of the project compares with my initial expectations. The paper concludes with a summary of my experience working on this project, lists what the next steps are, and acknowledges the help of my Cassini colleagues.
GWAS and fine-mapping of 35 production, reproduction, and conformation traits with imputed sequences of 27K Holstein bulls

USDA-ARS?s Scientific Manuscript database

Imputation has been routinely applied to ascertain sequence variants in large genotyped populations based on reference populations of sequenced animals. With the implementation of the 1000 Bull Genomes Project and increasing numbers of animals sequenced, fine-mapping of causal variants is becoming f...
CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects

PubMed Central

Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

2014-01-01

CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB PMID:25281234
CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects.

PubMed

Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

2014-01-01

CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB. © The Author(s) 2014. Published by Oxford University Press.
Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes.

PubMed

Papudeshi, Bhavya; Haggerty, J Matthew; Doane, Michael; Morris, Megan M; Walsh, Kevin; Beattie, Douglas T; Pande, Dnyanada; Zaeri, Parisa; Silva, Genivaldo G Z; Thompson, Fabiano; Edwards, Robert A; Dinsdale, Elizabeth A

2017-11-28

Microbiome/host interactions describe characteristics that affect the host's health. Shotgun metagenomics includes sequencing a random subset of the microbiome to analyze its taxonomic and metabolic potential. Reconstruction of DNA fragments into genomes from metagenomes (called metagenome-assembled genomes) assigns unknown fragments to taxa/function and facilitates discovery of novel organisms. Genome reconstruction incorporates sequence assembly and sorting of assembled sequences into bins, characteristic of a genome. However, the microbial community composition, including taxonomic and phylogenetic diversity may influence genome reconstruction. We determine the optimal reconstruction method for four microbiome projects that had variable sequencing platforms (IonTorrent and Illumina), diversity (high or low), and environment (coral reefs and kelp forests), using a set of parameters to select for optimal assembly and binning tools. We tested the effects of the assembly and binning processes on population genome reconstruction using 105 marine metagenomes from 4 projects. Reconstructed genomes were obtained from each project using 3 assemblers (IDBA, MetaVelvet, and SPAdes) and 2 binning tools (GroopM and MetaBat). We assessed the efficiency of assemblers using statistics that including contig continuity and contig chimerism and the effectiveness of binning tools using genome completeness and taxonomic identification. We concluded that SPAdes, assembled more contigs (143,718 ± 124 contigs) of longer length (N50 = 1632 ± 108 bp), and incorporated the most sequences (sequences-assembled = 19.65%). The microbial richness and evenness were maintained across the assembly, suggesting low contig chimeras. SPAdes assembly was responsive to the biological and technological variations within the project, compared with other assemblers. Among binning tools, we conclude that MetaBat produced bins with less variation in GC content (average standard deviation: 1.49), low species richness (4.91 ± 0.66), and higher genome completeness (40.92 ± 1.75) across all projects. MetaBat extracted 115 bins from the 4 projects of which 66 bins were identified as reconstructed metagenome-assembled genomes with sequences belonging to a specific genus. We identified 13 novel genomes, some of which were 100% complete, but show low similarity to genomes within databases. In conclusion, we present a set of biologically relevant parameters for evaluation to select for optimal assembly and binning tools. For the tools we tested, SPAdes assembler and MetaBat binning tools reconstructed quality metagenome-assembled genomes for the four projects. We also conclude that metagenomes from microbial communities that have high coverage of phylogenetically distinct, and low taxonomic diversity results in highest quality metagenome-assembled genomes.
Development of Audio and Visual Media to Accompany Sequenced Instructional Programs in Physical Education for the Handicapped. Final Report. July 31, 1972.

ERIC Educational Resources Information Center

Avance, Lyonel D.; Carr, Dorothy B.

Presented is the final report of a project to develop and field test audio and visual media to accompany developmentally sequenced activities appropriate for a physical education program for handicapped children from preschool through high school. Brief sections cover the following: the purposes and accomplishments of the project; the population…
Ensembl 2004.

PubMed

Birney, E; Andrews, D; Bevan, P; Caccamo, M; Cameron, G; Chen, Y; Clarke, L; Coates, G; Cox, T; Cuff, J; Curwen, V; Cutts, T; Down, T; Durbin, R; Eyras, E; Fernandez-Suarez, X M; Gane, P; Gibbins, B; Gilbert, J; Hammond, M; Hotz, H; Iyer, V; Kahari, A; Jekosch, K; Kasprzyk, A; Keefe, D; Keenan, S; Lehvaslaiho, H; McVicker, G; Melsopp, C; Meidl, P; Mongin, E; Pettett, R; Potter, S; Proctor, G; Rae, M; Searle, S; Slater, G; Smedley, D; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Storey, R; Ureta-Vidal, A; Woodwark, C; Clamp, M; Hubbard, T

2004-01-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is a comprehensive and integrated source of annotation of large genome sequences, available via interactive website, web services or flat files. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. The facilities of the system range from sequence analysis to data storage and visualization and installations exist around the world both in companies and at academic sites. With a total of nine genome sequences available from Ensembl and more genomes to follow, recent developments have focused mainly on closer integration between genomes and external data.
(New hosts and vectors for genome cloning)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

The main goal of our project remains the development of new bacterial hosts and vectors for the stable propagation of human DNA clones in E. coli. During the past six months of our current budget period, we have (1) continued to develop new hosts that permit the stable maintenance of unstable features of human DNA, and (2) developed a series of vectors for (a) cloning large DNA inserts, (b) assessing the frequency of human sequences that are lethal to the growth of E. coli, and (c) assessing the stability of human sequences cloned in M13 for large-scale sequencing projects.
[New hosts and vectors for genome cloning]. Progress report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

The main goal of our project remains the development of new bacterial hosts and vectors for the stable propagation of human DNA clones in E. coli. During the past six months of our current budget period, we have (1) continued to develop new hosts that permit the stable maintenance of unstable features of human DNA, and (2) developed a series of vectors for (a) cloning large DNA inserts, (b) assessing the frequency of human sequences that are lethal to the growth of E. coli, and (c) assessing the stability of human sequences cloned in M13 for large-scale sequencing projects.
Short reads from honey bee (Apis sp.) sequencing projects reflect microbial associate diversity

PubMed Central

Hurst, Gregory D.D.

2017-01-01

High throughput (or ‘next generation’) sequencing has transformed most areas of biological research and is now a standard method that underpins empirical study of organismal biology, and (through comparison of genomes), reveals patterns of evolution. For projects focused on animals, these sequencing methods do not discriminate between the primary target of sequencing (the animal genome) and ‘contaminating’ material, such as associated microbes. A common first step is to filter out these contaminants to allow better assembly of the animal genome or transcriptome. Here, we aimed to assess if these ‘contaminations’ provide information with regard to biologically important microorganisms associated with the individual. To achieve this, we examined whether the short read data from Apis retrieved elements of its well established microbiome. To this end, we screened almost 1,000 short read libraries of honey bee (Apis sp.) DNA sequencing project for the presence of microbial sequences, and find sequences from known honey bee microbial associates in at least 11% of them. Further to this, we screened ∼500 Apis RNA sequencing libraries for evidence of viral infections, which were found to be present in about half of them. We then used the data to reconstruct draft genomes of three Apis associated bacteria, as well as several viral strains de novo. We conclude that ‘contamination’ in short read sequencing libraries can provide useful genomic information on microbial taxa known to be associated with the target organisms, and may even lead to the discovery of novel associations. Finally, we demonstrate that RNAseq samples from experiments commonly carry uneven viral loads across libraries. We note variation in viral presence and load may be a confounding feature of differential gene expression analyses, and as such it should be incorporated as a random factor in analyses. PMID:28717593
Short reads from honey bee (Apis sp.) sequencing projects reflect microbial associate diversity.

PubMed

Gerth, Michael; Hurst, Gregory D D

2017-01-01

High throughput (or 'next generation') sequencing has transformed most areas of biological research and is now a standard method that underpins empirical study of organismal biology, and (through comparison of genomes), reveals patterns of evolution. For projects focused on animals, these sequencing methods do not discriminate between the primary target of sequencing (the animal genome) and 'contaminating' material, such as associated microbes. A common first step is to filter out these contaminants to allow better assembly of the animal genome or transcriptome. Here, we aimed to assess if these 'contaminations' provide information with regard to biologically important microorganisms associated with the individual. To achieve this, we examined whether the short read data from Apis retrieved elements of its well established microbiome. To this end, we screened almost 1,000 short read libraries of honey bee ( Apis sp.) DNA sequencing project for the presence of microbial sequences, and find sequences from known honey bee microbial associates in at least 11% of them. Further to this, we screened ∼500 Apis RNA sequencing libraries for evidence of viral infections, which were found to be present in about half of them. We then used the data to reconstruct draft genomes of three Apis associated bacteria, as well as several viral strains de novo . We conclude that 'contamination' in short read sequencing libraries can provide useful genomic information on microbial taxa known to be associated with the target organisms, and may even lead to the discovery of novel associations. Finally, we demonstrate that RNAseq samples from experiments commonly carry uneven viral loads across libraries. We note variation in viral presence and load may be a confounding feature of differential gene expression analyses, and as such it should be incorporated as a random factor in analyses.
Alignment of 1000 Genomes Project reads to reference assembly GRCh38.

PubMed

Zheng-Bradley, Xiangqun; Streeter, Ian; Fairley, Susan; Richardson, David; Clarke, Laura; Flicek, Paul

2017-07-01

The 1000 Genomes Project produced more than 100 trillion basepairs of short read sequence from more than 2600 samples in 26 populations over a period of five years. In its final phase, the project released over 85 million genotyped and phased variants on human reference genome assembly GRCh37. An updated reference assembly, GRCh38, was released in late 2013, but there was insufficient time for the final phase of the project analysis to change to the new assembly. Although it is possible to lift the coordinates of the 1000 Genomes Project variants to the new assembly, this is a potentially error-prone process as coordinate remapping is most appropriate only for non-repetitive regions of the genome and those that did not see significant change between the two assemblies. It will also miss variants in any region that was newly added to GRCh38. Thus, to produce the highest quality variants and genotypes on GRCh38, the best strategy is to realign the reads and recall the variants based on the new alignment. As the first step of variant calling for the 1000 Genomes Project data, we have finished remapping all of the 1000 Genomes sequence reads to GRCh38 with alternative scaffold-aware BWA-MEM. The resulting alignments are available as CRAM, a reference-based sequence compression format. The data have been released on our FTP site and are also available from European Nucleotide Archive to facilitate researchers discovering variants on the primary sequences and alternative contigs of GRCh38. © The Authors 2017. Published by Oxford University Press.
Sequencing artifacts in the type A influenza databases and attempts to correct them.

PubMed

Suarez, David L; Chester, Nikki; Hatfield, Jason

2014-07-01

There are over 276 000 influenza gene sequences in public databases, with the quality of the sequences determined by the contributor. As part of a high school class project, influenza sequences with possible errors were identified in the public databases based on the size of the gene being longer than expected, with the hypothesis that these sequences would have an error. Students contacted sequence submitters alerting them of the possible sequence issue(s) and requested they the suspect sequence(s) be correct as appropriate. Type A influenza viruses were screened, and gene segments longer than the accepted size were identified for further analysis. Attention was placed on sequences with additional nucleotides upstream or downstream of the highly conserved non-coding ends of the viral segments. A total of 1081 sequences were identified that met this criterion. Three types of errors were commonly observed: non-influenza primer sequence wasn't removed from the sequence; PCR product was cloned and plasmid sequence was included in the sequence; and Taq polymerase added an adenine at the end of the PCR product. Internal insertions of nucleotide sequence were also commonly observed, but in many cases it was unclear if the sequence was correct or actually contained an error. A total of 215 sequences, or 22.8% of the suspect sequences, were corrected in the public databases in the first year of the student project. Unfortunately 138 additional sequences with possible errors were added to the databases in the second year. Additional awareness of the need for data integrity of sequences submitted to public databases is needed to fully reap the benefits of these large data sets. © 2014 The Authors. Influenza and Other Respiratory Viruses Published by John Wiley & Sons Ltd.
Action recognition using multi-scale histograms of oriented gradients based depth motion trail Images

NASA Astrophysics Data System (ADS)

Wang, Guanxi; Tie, Yun; Qi, Lin

2017-07-01

In this paper, we propose a novel approach based on Depth Maps and compute Multi-Scale Histograms of Oriented Gradient (MSHOG) from sequences of depth maps to recognize actions. Each depth frame in a depth video sequence is projected onto three orthogonal Cartesian planes. Under each projection view, the absolute difference between two consecutive projected maps is accumulated through a depth video sequence to form a Depth Map, which is called Depth Motion Trail Images (DMTI). The MSHOG is then computed from the Depth Maps for the representation of an action. In addition, we apply L2-Regularized Collaborative Representation (L2-CRC) to classify actions. We evaluate the proposed approach on MSR Action3D dataset and MSRGesture3D dataset. Promising experimental result demonstrates the effectiveness of our proposed method.
Ensembl 2002: accommodating comparative genomics.

PubMed

Clamp, M; Andrews, D; Barker, D; Bevan, P; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Hubbard, T; Kasprzyk, A; Keefe, D; Lehvaslaiho, H; Iyer, V; Melsopp, C; Mongin, E; Pettett, R; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Birney, E

2003-01-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.
Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.

PubMed

Delaneau, Olivier; Marchini, Jonathan

2014-06-13

A major use of the 1000 Genomes Project (1000 GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000 GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants.
Modeling read counts for CNV detection in exome sequencing data.

PubMed

Love, Michael I; Myšičková, Alena; Sun, Ruping; Kalscheuer, Vera; Vingron, Martin; Haas, Stefan A

2011-11-08

Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods.
The current status and portability of our sequence handling software.

PubMed Central

Staden, R

1986-01-01

I describe the current status of our sequence analysis software. The package contains a comprehensive suite of programs for managing large shotgun sequencing projects, a program containing 61 functions for analysing single sequences and a program for comparing pairs of sequences for similarity. The programs that have been described before have been improved by the addition of new functions and by being made very much easier to use. The major interactive programs have 125 pages of online help available from within them. Several new programs are described including screen editing of aligned gel readings for shotgun sequencing projects; a method to highlight errors in aligned gel readings, new methods for searching for putative signals in sequences. We use the programs on a VAX computer but the whole package has been rewritten to make it easy to transport it to other machines. I believe the programs will now run on any machine with a FORTRAN77 compiler and sufficient memory. We are currently putting the programs onto an IBM PC XT/AT and another micro running under UNIX. PMID:3511446

Evaluation of nearest-neighbor methods for detection of chimeric small-subunit rRNA sequences

NASA Technical Reports Server (NTRS)

Robison-Cox, J. F.; Bateson, M. M.; Ward, D. M.

1995-01-01

Detection of chimeric artifacts formed when PCR is used to retrieve naturally occurring small-subunit (SSU) rRNA sequences may rely on demonstrating that different sequence domains have different phylogenetic affiliations. We evaluated the CHECK_CHIMERA method of the Ribosomal Database Project and another method which we developed, both based on determining nearest neighbors of different sequence domains, for their ability to discern artificially generated SSU rRNA chimeras from authentic Ribosomal Database Project sequences. The reliability of both methods decreases when the parental sequences which contribute to chimera formation are more than 82 to 84% similar. Detection is also complicated by the occurrence of authentic SSU rRNA sequences that behave like chimeras. We developed a naive statistical test based on CHECK_CHIMERA output and used it to evaluate previously reported SSU rRNA chimeras. Application of this test also suggests that chimeras might be formed by retrieving SSU rRNAs as cDNA. The amount of uncertainty associated with nearest-neighbor analyses indicates that such tests alone are insufficient and that better methods are needed.
Empowering Students to Actively Learn Systems Analysis and Design: The Success of an Entrepreneurial-Inspired Project in a Hybrid Learning Environment

ERIC Educational Resources Information Center

Wong, Wang-chan

2017-01-01

Systems Analysis and Design (SA&D) is the cornerstone course of a traditional information system curriculum. Conventionally, it is a sequence of two courses with the second course dedicated to the completion of a project. However, it has recently become more common to reduce the two-course sequence into one, especially for IS departments that…
Applications of a Sequence of Points in Teaching Linear Algebra, Numerical Methods and Discrete Mathematics

ERIC Educational Resources Information Center

Shi, Yixun

2009-01-01

Based on a sequence of points and a particular linear transformation generalized from this sequence, two recent papers (E. Mauch and Y. Shi, "Using a sequence of number pairs as an example in teaching mathematics". Math. Comput. Educ., 39 (2005), pp. 198-205; Y. Shi, "Case study projects for college mathematics courses based on a particular…
MPS Editor

NASA Technical Reports Server (NTRS)

Mathews, William S.; Liu, Ning; Francis, Laurie K.; OReilly, Taifun L.; Schrock, Mitchell; Page, Dennis N.; Morris, John R.; Joswig, Joseph C.; Crockett, Thomas M.; Shams, Khawaja S.

2011-01-01

Previously, it was time-consuming to hand-edit data and then set up simulation runs to find the effect and impact of the input data on a spacecraft. MPS Editor provides the user the capability to create/edit/update models and sequences, and immediately try them out using what appears to the user as one piece of software. MPS Editor provides an integrated sequencing environment for users. It provides them with software that can be utilized during development as well as actual operations. In addition, it provides them with a single, consistent, user friendly interface. MPS Editor uses the Eclipse Rich Client Platform to provide an environment that can be tailored to specific missions. It provides the capability to create and edit, and includes an Activity Dictionary to build the simulation spacecraft models, build and edit sequences of commands, and model the effects of those commands on the spacecraft. MPS Editor is written in Java using the Eclipse Rich Client Platform. It is currently built with four perspectives: the Activity Dictionary Perspective, the Project Adaptation Perspective, the Sequence Building Perspective, and the Sequence Modeling Perspective. Each perspective performs a given task. If a mission doesn't require that task, the unneeded perspective is not added to that project's delivery. In the Activity Dictionary Perspective, the user builds the project-specific activities, observations, calibrations, etc. Typically, this is used during the development phases of the mission, although it can be used later to make changes and updates to the Project Activity Dictionary. In the Adaptation Perspective, the user creates the spacecraft models such as power, data store, etc. Again, this is typically used during development, but will be used to update or add models of the spacecraft. The Sequence Building Perspective allows the user to create a sequence of activities or commands that go to the spacecraft. It provides a simulation of the activities and commands that have been created.
Large-scale sequencing trials begin

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roberts, L.

1990-12-07

As genome sequencing gets under way, investigators are grappling not just with new techniques but also with questions about what is acceptable accuracy and when data should be released. Four groups are embarking on projects that could make or break the human genome project. They are setting out to sequence the longest stretches of DNA ever tackled-several million bases each-and to do it faster and cheaper than anyone has before. If these groups can't pull it off, then prospects for knocking off the entire human genome, all 3 billion bases, in 15 years and for $3 billion will look increasinglymore » unlikely. Harvard's Walter Gilbert, is first tackling the genome of Mycoplasma capricolum. At Stanford, David Botstein and Ron Davis are sequencing Saccharomyces cerevisiae. In a collaborative effort, Robert Waterson at Washington University and John Sulston at the Medical Research Council lab in Cambridge, England, have already started on the nematode Caenorhabditis elegans. And in the only longstanding project of the bunch, University of Wisconsin geneticist Fred Blattner is already several hundred kilobases into the Escherichia coli genome.« less
[New hosts and vectors for genome cloning]. Progress report, 1990--1991

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

The main goal of our project remains the development of new bacterial hosts and vectors for the stable propagation of human DNA clones in E. coli. During the past six months of our current budget period, we have (1) continued to develop new hosts that permit the stable maintenance of unstable features of human DNA, and (2) developed a series of vectors for (a) cloning large DNA inserts, (b) assessing the frequency of human sequences that are lethal to the growth of E. coli, and (c) assessing the stability of human sequences cloned in M13 for large-scale sequencing projects.
Using Markov chains of nucleotide sequences as a possible precursor to predict functional roles of human genome: a case study on inactive chromatin regions.

PubMed

Lee, K-E; Lee, E-J; Park, H-S

2016-08-30

Recent advances in computational epigenetics have provided new opportunities to evaluate n-gram probabilistic language models. In this paper, we describe a systematic genome-wide approach for predicting functional roles in inactive chromatin regions by using a sequence-based Markovian chromatin map of the human genome. We demonstrate that Markov chains of sequences can be used as a precursor to predict functional roles in heterochromatin regions and provide an example comparing two publicly available chromatin annotations of large-scale epigenomics projects: ENCODE project consortium and Roadmap Epigenomics consortium.
The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata

PubMed Central

Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C.

2008-01-01

The Genomes On Line Database (GOLD) is a comprehensive resource that provides information on genome and metagenome projects worldwide. Complete and ongoing projects and their associated metadata can be accessed in GOLD through pre-computed lists and a search page. As of September 2007, GOLD contains information on more than 2900 sequencing projects, out of which 639 have been completed and their sequence data deposited in the public databases. GOLD continues to expand with the goal of providing metadata information related to the projects and the organisms/environments towards the Minimum Information about a Genome Sequence’ (MIGS) guideline. GOLD is available at http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece at http://gold.imbb.forth.gr/ PMID:17981842
An ultra-sparse code underliesthe generation of neural sequences in a songbird

NASA Astrophysics Data System (ADS)

Hahnloser, Richard H. R.; Kozhevnikov, Alexay A.; Fee, Michale S.

2002-09-01

Sequences of motor activity are encoded in many vertebrate brains by complex spatio-temporal patterns of neural activity; however, the neural circuit mechanisms underlying the generation of these pre-motor patterns are poorly understood. In songbirds, one prominent site of pre-motor activity is the forebrain robust nucleus of the archistriatum (RA), which generates stereotyped sequences of spike bursts during song and recapitulates these sequences during sleep. We show that the stereotyped sequences in RA are driven from nucleus HVC (high vocal centre), the principal pre-motor input to RA. Recordings of identified HVC neurons in sleeping and singing birds show that individual HVC neurons projecting onto RA neurons produce bursts sparsely, at a single, precise time during the RA sequence. These HVC neurons burst sequentially with respect to one another. We suggest that at each time in the RA sequence, the ensemble of active RA neurons is driven by a subpopulation of RA-projecting HVC neurons that is active only at that time. As a population, these HVC neurons may form an explicit representation of time in the sequence. Such a sparse representation, a temporal analogue of the `grandmother cell' concept for object recognition, eliminates the problem of temporal interference during sequence generation and learning attributed to more distributed representations.
The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika

2010-01-27

Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set ofmore » tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in EST library sequencing approaches, and thus represent a rich resource for studies of environmental genomics.« less
Toward an Integrated BAC Library Resource for Genome Sequencing and Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Simon, M. I.; Kim, U.-J.

We developed a great deal of expertise in building large BAC libraries from a variety of DNA sources including humans, mice, corn, microorganisms, worms, and Arabidopsis. We greatly improved the technology for screening these libraries rapidly and for selecting appropriate BACs and mapping BACs to develop large overlapping contigs. We became involved in supplying BACs and BAC contigs to a variety of sequencing and mapping projects and we began to collaborate with Drs. Adams and Venter at TIGR and with Dr. Leroy Hood and his group at University of Washington to provide BACs for end sequencing and for mapping andmore » sequencing of large fragments of chromosome 16. Together with Dr. Ian Dunham and his co-workers at the Sanger Center we completed the mapping and they completed the sequencing of the first human chromosome, chromosome 22. This was published in Nature in 1999 and our BAC contigs made a major contribution to this sequencing effort. Drs. Shizuya and Ding invented an automated highly accurate BAC mapping technique. We also developed long-term collaborations with Dr. Uli Weier at UCSF in the design of BAC probes for characterization of human tumors and specific chromosome deletions and breakpoints. Finally the contribution of our work to the human genome project has been recognized in the publication both by the international consortium and the NIH of a draft sequence of the human genome in Nature last year. Dr. Shizuya was acknowledged in the authorship of that landmark paper. Dr. Simon was also an author on the Venter/Adams Celera project sequencing the human genome that was published in Science last year.« less
A Statistical Framework for the Functional Analysis of Metagenomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sharon, Itai; Pati, Amrita; Markowitz, Victor

2008-10-01

Metagenomic studies consider the genetic makeup of microbial communities as a whole, rather than their individual member organisms. The functional and metabolic potential of microbial communities can be analyzed by comparing the relative abundance of gene families in their collective genomic sequences (metagenome) under different conditions. Such comparisons require accurate estimation of gene family frequencies. They present a statistical framework for assessing these frequencies based on the Lander-Waterman theory developed originally for Whole Genome Shotgun (WGS) sequencing projects. They also provide a novel method for assessing the reliability of the estimations which can be used for removing seemingly unreliable measurements.more » They tested their method on a wide range of datasets, including simulated genomes and real WGS data from sequencing projects of whole genomes. Results suggest that their framework corrects inherent biases in accepted methods and provides a good approximation to the true statistics of gene families in WGS projects.« less
A standard MIGS/MIMS compliant XML Schema: toward the development of the Genomic Contextual Data Markup Language (GCDML).

PubMed

Kottmann, Renzo; Gray, Tanya; Murphy, Sean; Kagan, Leonid; Kravitz, Saul; Lombardot, Thierry; Field, Dawn; Glöckner, Frank Oliver

2008-06-01

The Genomic Contextual Data Markup Language (GCDML) is a core project of the Genomic Standards Consortium (GSC) that implements the "Minimum Information about a Genome Sequence" (MIGS) specification and its extension, the "Minimum Information about a Metagenome Sequence" (MIMS). GCDML is an XML Schema for generating MIGS/MIMS compliant reports for data entry, exchange, and storage. When mature, this sample-centric, strongly-typed schema will provide a diverse set of descriptors for describing the exact origin and processing of a biological sample, from sampling to sequencing, and subsequent analysis. Here we describe the need for such a project, outline design principles required to support the project, and make an open call for participation in defining the future content of GCDML. GCDML is freely available, and can be downloaded, along with documentation, from the GSC Web site (http://gensc.org).
Detecting atypical examples of known domain types by sequence similarity searching: the SBASE domain library approach.

PubMed

Dhir, Somdutta; Pacurar, Mircea; Franklin, Dino; Gáspári, Zoltán; Kertész-Farkas, Attila; Kocsor, András; Eisenhaber, Frank; Pongor, Sándor

2010-11-01

SBASE is a project initiated to detect known domain types and predicting domain architectures using sequence similarity searching (Simon et al., Protein Seq Data Anal, 5: 39-42, 1992, Pongor et al, Nucl. Acids. Res. 21:3111-3115, 1992). The current approach uses a curated collection of domain sequences - the SBASE domain library - and standard similarity search algorithms, followed by postprocessing which is based on a simple statistics of the domain similarity network (http://hydra.icgeb.trieste.it/sbase/). It is especially useful in detecting rare, atypical examples of known domain types which are sometimes missed even by more sophisticated methodologies. This approach does not require multiple alignment or machine learning techniques, and can be a useful complement to other domain detection methodologies. This article gives an overview of the project history as well as of the concepts and principles developed within this the project.
33 CFR 385.30 - Master Implementation Sequencing Plan.

Code of Federal Regulations, 2010 CFR

2010-07-01

... projects of the Plan, including pilot projects and operational elements, based on the best scientific... Florida Water Management District shall also consult with the South Florida Ecosystem Restoration Task...; (ii) Information obtained from pilot projects; (iii) Updated funding information; (iv) Approved...
All about the Human Genome Project (HGP)

MedlinePlus

... CSER), and Genome Sequencing Informatics Tools (GS-IT) Comparative Genomics Background information prepared for the media on ... other species to the human sequence. Background on Comparative Genomic Analysis New Process to Prioritize Animal Genomes ...
Defining Genome Project Standards in a New Era of Sequencing

ScienceCinema

Chain, Patrick

2018-01-16

Patrick Chain of the DOE Joint Genome Institute gives a talk on behalf of the International Genome Sequencing Standards Consortium on the need for intermediate genome classifications between "draft" and "finished".
From parasite genomes to one healthy world; are we having fun yet?

USDA-ARS?s Scientific Manuscript database

In 1990, the Human Genome Sequencing Project was established. This laid the ground work for an explosion of sequence data that has since followed. As a result of this effort, the first complete genome of an animal, Caenorhabditis elegans was published in 1998. The sequence of Drosophila melanogaster...
Targeted parallel sequencing of the Musa species: searching for an alternative model system for polyploidy studies

USDA-ARS?s Scientific Manuscript database

Modern day genomics holds the promise of solving the complexities of basic plant sciences, and of catalyzing practical advances in plant breeding. While contiguous, "base perfect" deep sequencing is a key module of any genome project, recent advances in parallel next generation sequencing technologi...
Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

USDA-ARS?s Scientific Manuscript database

The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...

Tools to exploit sequence data to find new markers and disease loci in dairy cattle

USDA-ARS?s Scientific Manuscript database

The decrease in cost of Next-Generation Sequencing has brought the technology into the realm of practical applications in livestock genomics. Recently, the 1000 Bulls Project has heralded the possibility of using full sequence data to improve imputation and detect disease loci within select founder ...
What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual

USDA-ARS?s Scientific Manuscript database

BACKGROUND: Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain u...
The landscape of transposable elements in the finished genome of the fungal wheat pathogen Mycosphaerella graminicola

USDA-ARS?s Scientific Manuscript database

Repetitive sequence analysis has become an integral part of genome sequencing projects in addition to gene identification and annotation. Identification of repeats is important not only because it improves gene prediction, but also because of the role that repetitive sequences play in determining th...
Fungal Genomics for Energy and Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grigoriev, Igor V.

2013-03-11

Genomes of fungi relevant to energy and environment are in focus of the Fungal Genomic Program at the US Department of Energy Joint Genome Institute (JGI). One of its projects, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts) by means of genome sequencing and analysis. New chapters of the Encyclopedia can be opened with user proposals to the JGI Community Sequencing Program (CSP). Another JGI project, the 1000 fungal genomes, explores fungal diversity on genome level at scale and is open for usersmore » to nominate new species for sequencing. Over 200 fungal genomes have been sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supported by functional genomics leads to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such parts suggested by comparative genomics and functional analysis in these areas are presented here.« less
GenBank.

PubMed

Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

2010-01-01

GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bi-monthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI homepage: www.ncbi.nlm.nih.gov.
High-throughput sequence alignment using Graphics Processing Units

PubMed Central

Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

2007-01-01

Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356
The Genome of the Netherlands: design, and project goals.

PubMed

Boomsma, Dorret I; Wijmenga, Cisca; Slagboom, Eline P; Swertz, Morris A; Karssen, Lennart C; Abdellaoui, Abdel; Ye, Kai; Guryev, Victor; Vermaat, Martijn; van Dijk, Freerk; Francioli, Laurent C; Hottenga, Jouke Jan; Laros, Jeroen F J; Li, Qibin; Li, Yingrui; Cao, Hongzhi; Chen, Ruoyan; Du, Yuanping; Li, Ning; Cao, Sujie; van Setten, Jessica; Menelaou, Androniki; Pulit, Sara L; Hehir-Kwa, Jayne Y; Beekman, Marian; Elbers, Clara C; Byelas, Heorhiy; de Craen, Anton J M; Deelen, Patrick; Dijkstra, Martijn; den Dunnen, Johan T; de Knijff, Peter; Houwing-Duistermaat, Jeanine; Koval, Vyacheslav; Estrada, Karol; Hofman, Albert; Kanterakis, Alexandros; Enckevort, David van; Mai, Hailiang; Kattenberg, Mathijs; van Leeuwen, Elisabeth M; Neerincx, Pieter B T; Oostra, Ben; Rivadeneira, Fernanodo; Suchiman, Eka H D; Uitterlinden, Andre G; Willemsen, Gonneke; Wolffenbuttel, Bruce H; Wang, Jun; de Bakker, Paul I W; van Ommen, Gert-Jan; van Duijn, Cornelia M

2014-02-01

Within the Netherlands a national network of biobanks has been established (Biobanking and Biomolecular Research Infrastructure-Netherlands (BBMRI-NL)) as a national node of the European BBMRI. One of the aims of BBMRI-NL is to enrich biobanks with different types of molecular and phenotype data. Here, we describe the Genome of the Netherlands (GoNL), one of the projects within BBMRI-NL. GoNL is a whole-genome-sequencing project in a representative sample consisting of 250 trio-families from all provinces in the Netherlands, which aims to characterize DNA sequence variation in the Dutch population. The parent-offspring trios include adult individuals ranging in age from 19 to 87 years (mean=53 years; SD=16 years) from birth cohorts 1910-1994. Sequencing was done on blood-derived DNA from uncultured cells and accomplished coverage was 14-15x. The family-based design represents a unique resource to assess the frequency of regional variants, accurately reconstruct haplotypes by family-based phasing, characterize short indels and complex structural variants, and establish the rate of de novo mutational events. GoNL will also serve as a reference panel for imputation in the available genome-wide association studies in Dutch and other cohorts to refine association signals and uncover population-specific variants. GoNL will create a catalog of human genetic variation in this sample that is uniquely characterized with respect to micro-geographic location and a wide range of phenotypes. The resource will be made available to the research and medical community to guide the interpretation of sequencing projects. The present paper summarizes the global characteristics of the project.
The UK’s 100,000 Genomes Project: manifesting policymakers’ expectations

PubMed Central

Samuel, Gabrielle Natalie; Farsides, Bobbie

2017-01-01

The UK’s 100,000 Genomes Project has the aim of sequencing 100,000 genomes from UK National Health Service (NHS) patients while concomitantly transforming clinical care such that whole genome sequencing becomes routine clinical practice in the UK. Policymakers claim that the project will revolutionize NHS care. We wished to explore the 100,000 Genomes Project, and in particular, the extent to which policymaker claims have helped or hindered the work of those associated with Genomics England – the company established by the Department of Health to deliver the project. We interviewed 20 individuals linked to, or working for Genomics England. Interviewees had double-edged views about the context within which they were working. On the one hand, policymakers’ expectations attached to the venture were considered vacuous “genohype”; on the other hand, they were considered the impetus needed for those trying to advance genomic research into clinical practice. Findings should be considered for future genomes projects. PMID:29238265
The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

PubMed Central

Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

2015-01-01

The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402
"First generation" automated DNA sequencing technology.

PubMed

Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

2011-10-01

Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.
GDC 2: Compression of large collections of genomes

PubMed Central

Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin

2015-01-01

The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about. PMID:26108279
GDC 2: Compression of large collections of genomes.

PubMed

Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin

2015-06-25

The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about.
A Project Course Sequence in Innovation and Commercialization of Medical Devices.

PubMed

Eberhardt, Alan W; Tillman, Shea; Kirkland, Brandon; Sherrod, Brandon

2017-07-01

There exists a need for educational processes in which students gain experience with design and commercialization of medical devices. This manuscript describes the implementation of, and assessment results from, the first year offering of a project course sequence in Master of Engineering (MEng) in Design and Commercialization at our institution. The three-semester course sequence focused on developing and applying hands-on skills that contribute to product development to address medical device needs found within our university hospital and local community. The first semester integrated computer-aided drawing (CAD) as preparation for manufacturing of device-related components (hand machining, computer numeric control (CNC), three-dimensional (3D) printing, and plastics molding), followed by an introduction to microcontrollers (MCUs) and printed circuit boards (PCBs) for associated electronics and control systems. In the second semester, the students applied these skills on a unified project, working together to construct and test multiple weighing scales for wheelchair users. In the final semester, the students applied industrial design concepts to four distinct device designs, including user and context reassessment, human factors (functional and aesthetic) design refinement, and advanced visualization for commercialization. The assessment results are described, along with lessons learned and plans for enhancement of the course sequence.
DHS-STEM Internship at Lawrence Livermore National Laboratory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feldman, B

2008-08-18

This summer I had the fortunate opportunity through the DHS-STEM program to attend Lawrence Livermore National Laboratories (LLNL) to work with Tom Slezak on the bioinformatics team. The bioinformatics team, among other things, helps to develop TaqMan and microarray probes for the identification of pathogens. My main project at the laboratory was to test such probe identification capabilities against metagenomic (unsequenced) data from around the world. Using various sequence analysis tools (Vmatch and Blastall) and several we developed ourselves, about 120 metagenomic sequencing projects were compared against a collection of all completely sequenced genomes and Lawrence Livermore National Laboratory's (LLNL)more » current probe database. For the probes, the Blastall algorithms compared each individual metagenomic project using various parameters allowing for the natural ambiguities of in vitro hybridization (mismatches, deletions, insertions, hairpinning, etc.). A low level cutoff was used to eliminate poor sequence matches, and to leave a large variety of higher quality matches for future research into the hybridization of sequences with mutations and variations. Any hits with at least 80% base pair conservation over 80% of the length of the match. Because of the size of our whole genome database, we utilized the exact match algorithm of Vmatch to quickly search and compare genomes for exact matches with varying lower level limits on sequence length. I also provided preliminary feasibility analyses to support a potential industry-funded project to develop a multiplex assay on several genera and species. Each genus and species was evaluated based on the amount of sequenced genomes, amount of near neighbor sequenced genomes, presence of identifying genes--metabolistic or antibiotic resistant genes--and the availability of research on the identification of the specific genera or species. Utilizing the bioinformatic team's software, I was able to develop and/or update several TaqMan probes for these and develop a plan of identification for the more difficult ones. One suggestion for a genus with low conservation was to separate species into several groups and look for probes within these and then use a combination of probes to identify a genus. This has the added benefit of also providing subgenus identification in larger genera. During both projects I had developed a set of computer programs to simplify or consolidate several processes. These programs were constructed with the intent of being reused to either repeat these results, further this research, or to start a similar project. A big problem in the bioinformatic/sequencing field is the variability of data storage formats which make using data from various sources extremely difficult. Excluding for the moment the many errors present in online database genome sequences, there are still many difficulties in converting one data type into another successfully every time. Dealing with hundreds of files, each hundreds of megabytes, requires automation which in turn requires good data mining software. The programs I developed will help ease this issue and make more genomic sources available for use. With these programs it is extremely easy to gather the data, cleanse it, convert it and run it through some analysis software and even analyze the output of this software. When dealing with vast amounts of data it is vital for the researcher to optimize the process--which became clear to me with only ten weeks to work with. Due to the time constraint of the internship, I was unable to finish my metagenomic project; I did finish with success, my second project, discovering TaqMan identification for genera and species. Although I did not complete my first project I made significant findings along the way that suggest the need for further research on the subject. I found several instances of false positives in the metagenomic data from our microarrays which indicates the need to sequence more metagenomic samples. My initial research shows the importance of expanding our known metagenomic world; at this point there is always the likelihood of developing probes with unknown interactions because there is not enough sequencing. On the other hand my research did point out the sensitivity and quality of LLNL's microarrays when it identified a parvoviridae infection in a mosquito metagenomic sample from southern California. It also uniquely identified the presence of several species of the adenovirus which could mean that there was some archaic strain of the adenovirus present in the metagenomic sample or there was a contamination in the sample, requiring a further investigation to clarify.« less
Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)

PubMed Central

Andreassen, Rune; Lunner, Sigbjørn; Høyheim, Bjørn

2009-01-01

Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining cDNA libraries generated by SGP represent a valuable cCDS FLIc source. The conservation of 7-mers in 3'UTRs indicates that these motifs are functionally important. Identity between some of these 7-mers and miRNA target sequences suggests that they are miRNA targets in Salmo salar transcripts as well. PMID:19878547
The Transcriptome Analysis and Comparison Explorer--T-ACE: a platform-independent, graphical tool to process large RNAseq datasets of non-model organisms.

PubMed

Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P

2012-03-15

Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.
Human Genome Program

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

1993-01-01

The DOE Human Genome program has grown tremendously, as shown by the marked increase in the number of genome-funded projects since the last workshop held in 1991. The abstracts in this book describe the genome research of DOE-funded grantees and contractors and invited guests, and all projects are represented at the workshop by posters. The 3-day meeting includes plenary sessions on ethical, legal, and social issues pertaining to the availability of genetic data; sequencing techniques, informatics support; and chromosome and cDNA mapping and sequencing.
Haematobia irritans dataset of raw sequence reads from Illumina and Pac Bio sequencing of genomic DNA

USDA-ARS?s Scientific Manuscript database

The genome of the horn fly, Haematobia irritans, was sequenced using Illumina- and Pac Bio-based protocols. Following quality filtering, the raw reads have been deposited at NCBI under the BioProject and BioSample accession numbers PRJNA30967 and SAMN07830356, respectively. The Illumina reads are un...
DOE Research and Development Accomplishments

Science.gov Websites

sector to explore the possibility of sequencing the human genome. This Workshop was sponsored by DOE and approach to sequence the human genome. The Human Genome Project (HGP) was formalized in mid-February 1990
A Foray into Fungal Ecology: Understanding Fungi and Their Functions Across Ecosystems

NASA Astrophysics Data System (ADS)

Francis, N.; Dunkirk, N. C.; Peay, K.

2015-12-01

Despite their incredible diversity and importance to terrestrial ecosystems, fungi are not included in a standard high school science curriculum. This past summer, however, my work for the Stanford EARTH High School Internship program introduced me to fungal ecology through experiments involving culturing, genomics and root dissections. The two fungal experiments I worked on had very different foci, both searching for answers to broad ecological questions of fungal function and physiology. The first, a symbiosis experiment, sought to determine if the partners of the nutrient exchange between pine trees and their fungal symbionts could choose one another. The second experiment, a dung fungal succession project, compared the genetic sequencing results of fungal extractions from dung versus fungal cultures from dung. My part in the symbiosis experiment involved dissection, weighing and encapsulation of root tissue samples characterized based on the root thickness and presence of ectomycorrhizal fungi. The dung fungi succession project required that I not only learn how to culture various genera of dung fungi but also learn how to extract DNA and RNA for sequencing from the fungal tissue. Although I primarily worked with dung fungi cultures and thereby learned about their unique physiologies, I also learned about the different types of genetic sequencing since the project compared sequences of cultured fungi versus Next Generation sequencing of all fungi present within a dung pellet. Through working on distinct fungal projects that reassess how information about fungi is known within the field of fungal ecology, I learned not only about the two experiments I worked on but also many past related experiments and inquiries through reading scientific papers. Thanks to my foray into fungal research, I now know not only the broader significance of fungi in ecological research but also how to design and conduct ecological experiments.

Importing statistical measures into Artemis enhances gene identification in the Leishmania genome project.

PubMed

Aggarwal, Gautam; Worthey, E A; McDonagh, Paul D; Myler, Peter J

2003-06-07

Seattle Biomedical Research Institute (SBRI) as part of the Leishmania Genome Network (LGN) is sequencing chromosomes of the trypanosomatid protozoan species Leishmania major. At SBRI, chromosomal sequence is annotated using a combination of trained and untrained non-consensus gene-prediction algorithms with ARTEMIS, an annotation platform with rich and user-friendly interfaces. Here we describe a methodology used to import results from three different protein-coding gene-prediction algorithms (GLIMMER, TESTCODE and GENESCAN) into the ARTEMIS sequence viewer and annotation tool. Comparison of these methods, along with the CODONUSAGE algorithm built into ARTEMIS, shows the importance of combining methods to more accurately annotate the L. major genomic sequence. An improvised and powerful tool for gene prediction has been developed by importing data from widely-used algorithms into an existing annotation platform. This approach is especially fruitful in the Leishmania genome project where there is large proportion of novel genes requiring manual annotation.
BACCardI--a tool for the validation of genomic assemblies, assisting genome finishing and intergenome comparison.

PubMed

Bartels, Daniela; Kespohl, Sebastian; Albaum, Stefan; Drüke, Tanja; Goesmann, Alexander; Herold, Julia; Kaiser, Olaf; Pühler, Alfred; Pfeiffer, Friedhelm; Raddatz, Günter; Stoye, Jens; Meyer, Folker; Schuster, Stephan C

2005-04-01

We provide the graphical tool BACCardI for the construction of virtual clone maps from standard assembler output files or BLAST based sequence comparisons. This new tool has been applied to numerous genome projects to solve various problems including (a) validation of whole genome shotgun assemblies, (b) support for contig ordering in the finishing phase of a genome project, and (c) intergenome comparison between related strains when only one of the strains has been sequenced and a large insert library is available for the other. The BACCardI software can seamlessly interact with various sequence assembly packages. Genomic assemblies generated from sequence information need to be validated by independent methods such as physical maps. The time-consuming task of building physical maps can be circumvented by virtual clone maps derived from read pair information of large insert libraries.
SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects.

PubMed

Dereeper, Alexis; Nicolas, Stéphane; Le Cunff, Loïc; Bacilieri, Roberto; Doligez, Agnès; Peros, Jean-Pierre; Ruiz, Manuel; This, Patrice

2011-05-05

High-throughput re-sequencing, new genotyping technologies and the availability of reference genomes allow the extensive characterization of Single Nucleotide Polymorphisms (SNPs) and insertion/deletion events (indels) in many plant species. The rapidly increasing amount of re-sequencing and genotyping data generated by large-scale genetic diversity projects requires the development of integrated bioinformatics tools able to efficiently manage, analyze, and combine these genetic data with genome structure and external data. In this context, we developed SNiPlay, a flexible, user-friendly and integrative web-based tool dedicated to polymorphism discovery and analysis. It integrates:1) a pipeline, freely accessible through the internet, combining existing softwares with new tools to detect SNPs and to compute different types of statistical indices and graphical layouts for SNP data. From standard sequence alignments, genotyping data or Sanger sequencing traces given as input, SNiPlay detects SNPs and indels events and outputs submission files for the design of Illumina's SNP chips. Subsequently, it sends sequences and genotyping data into a series of modules in charge of various processes: physical mapping to a reference genome, annotation (genomic position, intron/exon location, synonymous/non-synonymous substitutions), SNP frequency determination in user-defined groups, haplotype reconstruction and network, linkage disequilibrium evaluation, and diversity analysis (Pi, Watterson's Theta, Tajima's D).Furthermore, the pipeline allows the use of external data (such as phenotype, geographic origin, taxa, stratification) to define groups and compare statistical indices.2) a database storing polymorphisms, genotyping data and grapevine sequences released by public and private projects. It allows the user to retrieve SNPs using various filters (such as genomic position, missing data, polymorphism type, allele frequency), to compare SNP patterns between populations, and to export genotyping data or sequences in various formats. Our experiments on grapevine genetic projects showed that SNiPlay allows geneticists to rapidly obtain advanced results in several key research areas of plant genetic diversity. Both the management and treatment of large amounts of SNP data are rendered considerably easier for end-users through automation and integration. Current developments are taking into account new advances in high-throughput technologies.SNiPlay is available at: http://sniplay.cirad.fr/.
An alternative extragradient projection method for quasi-equilibrium problems.

PubMed

Chen, Haibin; Wang, Yiju; Xu, Yi

2018-01-01

For the quasi-equilibrium problem where the players' costs and their strategies both depend on the rival's decisions, an alternative extragradient projection method for solving it is designed. Different from the classical extragradient projection method whose generated sequence has the contraction property with respect to the solution set, the newly designed method possesses an expansion property with respect to a given initial point. The global convergence of the method is established under the assumptions of pseudomonotonicity of the equilibrium function and of continuity of the underlying multi-valued mapping. Furthermore, we show that the generated sequence converges to the nearest point in the solution set to the initial point. Numerical experiments show the efficiency of the method.
The Genome 10K Project: a way forward.

PubMed

Koepfli, Klaus-Peter; Paten, Benedict; O'Brien, Stephen J

2015-01-01

The Genome 10K Project was established in 2009 by a consortium of biologists and genome scientists determined to facilitate the sequencing and analysis of the complete genomes of 10,000 vertebrate species. Since then the number of selected and initiated species has risen from ∼26 to 277 sequenced or ongoing with funding, an approximately tenfold increase in five years. Here we summarize the advances and commitments that have occurred by mid-2014 and outline the achievements and present challenges of reaching the 10,000-species goal. We summarize the status of known vertebrate genome projects, recommend standards for pronouncing a genome as sequenced or completed, and provide our present and future vision of the landscape of Genome 10K. The endeavor is ambitious, bold, expensive, and uncertain, but together the Genome 10K Consortium of Scientists and the worldwide genomics community are moving toward their goal of delivering to the coming generation the gift of genome empowerment for many vertebrate species.
The Genome 10K Project: A Way Forward

PubMed Central

Koepfli, Klaus-Peter; Paten, Benedict; O’Brien, Stephen J.

2017-01-01

The Genome 10K Project was established in 2009 by a consortium of biologists and genome scientists determined to facilitate the sequencing and analysis of the complete genomes of 10,000 vertebrate species. Since then the number of selected and initiated species has risen from ~26 to 277 sequenced or ongoing with funding, an approximately tenfold increase in five years. Here we summarize the advances and commitments that have occurred by mid-2014 and outline the achievements and present challenges of reaching the 10,000-species goal. We summarize the status of known vertebrate genome projects, recommend standards for pronouncing a genome as sequenced or completed, and provide our present and future vision of the landscape of Genome 10K. The endeavor is ambitious, bold, expensive, and uncertain, but together the Genome 10K Consortium of Scientists and the worldwide genomics community are moving toward their goal of delivering to the coming generation the gift of genome empowerment for many vertebrate species. PMID:25689317
Logic system aids in evaluation of project readiness

NASA Technical Reports Server (NTRS)

Maris, S. J.; Obrien, T. J.

1966-01-01

Measurement Operational Readiness Requirements /MORR/ assignments logic is used for determining the readiness of a complex project to go forward as planned. The system used logic network which assigns qualities to all important criteria in a project and establishes a logical sequence of measurements to determine what the conditions are.
The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes.

PubMed

Mao, Qing; Ciotlos, Serban; Zhang, Rebecca Yu; Ball, Madeleine P; Chin, Robert; Carnevali, Paolo; Barua, Nina; Nguyen, Staci; Agarwal, Misha R; Clegg, Tom; Connelly, Abram; Vandewege, Ward; Zaranek, Alexander Wait; Estep, Preston W; Church, George M; Drmanac, Radoje; Peters, Brock A

2016-10-11

Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics' Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics' standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.
A map of human genome variation from population-scale sequencing.

PubMed

Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A

2010-10-28

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
Review of road user costs and methods.

DOT National Transportation Integrated Search

2013-07-01

The South Dakota Department of Transportation (SDDOT) uses road user costs (RUC) to calculate incentive or disincentive compensation for contractors, quantify project-specific liquidated damages, select the ideal sequencing of a project, and forecast...
A Primer on Infectious Disease Bacterial Genomics

PubMed Central

Petkau, Aaron; Knox, Natalie; Graham, Morag; Van Domselaar, Gary

2016-01-01

SUMMARY The number of large-scale genomics projects is increasing due to the availability of affordable high-throughput sequencing (HTS) technologies. The use of HTS for bacterial infectious disease research is attractive because one whole-genome sequencing (WGS) run can replace multiple assays for bacterial typing, molecular epidemiology investigations, and more in-depth pathogenomic studies. The computational resources and bioinformatics expertise required to accommodate and analyze the large amounts of data pose new challenges for researchers embarking on genomics projects for the first time. Here, we present a comprehensive overview of a bacterial genomics projects from beginning to end, with a particular focus on the planning and computational requirements for HTS data, and provide a general understanding of the analytical concepts to develop a workflow that will meet the objectives and goals of HTS projects. PMID:28590251
The Genome of the Netherlands: design, and project goals

PubMed Central

Boomsma, Dorret I; Wijmenga, Cisca; Slagboom, Eline P; Swertz, Morris A; Karssen, Lennart C; Abdellaoui, Abdel; Ye, Kai; Guryev, Victor; Vermaat, Martijn; van Dijk, Freerk; Francioli, Laurent C; Hottenga, Jouke Jan; Laros, Jeroen F J; Li, Qibin; Li, Yingrui; Cao, Hongzhi; Chen, Ruoyan; Du, Yuanping; Li, Ning; Cao, Sujie; van Setten, Jessica; Menelaou, Androniki; Pulit, Sara L; Hehir-Kwa, Jayne Y; Beekman, Marian; Elbers, Clara C; Byelas, Heorhiy; de Craen, Anton J M; Deelen, Patrick; Dijkstra, Martijn; den Dunnen, Johan T; de Knijff, Peter; Houwing-Duistermaat, Jeanine; Koval, Vyacheslav; Estrada, Karol; Hofman, Albert; Kanterakis, Alexandros; Enckevort, David van; Mai, Hailiang; Kattenberg, Mathijs; van Leeuwen, Elisabeth M; Neerincx, Pieter B T; Oostra, Ben; Rivadeneira, Fernanodo; Suchiman, Eka H D; Uitterlinden, Andre G; Willemsen, Gonneke; Wolffenbuttel, Bruce H; Wang, Jun; de Bakker, Paul I W; van Ommen, Gert-Jan; van Duijn, Cornelia M

2014-01-01

Within the Netherlands a national network of biobanks has been established (Biobanking and Biomolecular Research Infrastructure-Netherlands (BBMRI-NL)) as a national node of the European BBMRI. One of the aims of BBMRI-NL is to enrich biobanks with different types of molecular and phenotype data. Here, we describe the Genome of the Netherlands (GoNL), one of the projects within BBMRI-NL. GoNL is a whole-genome-sequencing project in a representative sample consisting of 250 trio-families from all provinces in the Netherlands, which aims to characterize DNA sequence variation in the Dutch population. The parent–offspring trios include adult individuals ranging in age from 19 to 87 years (mean=53 years; SD=16 years) from birth cohorts 1910–1994. Sequencing was done on blood-derived DNA from uncultured cells and accomplished coverage was 14–15x. The family-based design represents a unique resource to assess the frequency of regional variants, accurately reconstruct haplotypes by family-based phasing, characterize short indels and complex structural variants, and establish the rate of de novo mutational events. GoNL will also serve as a reference panel for imputation in the available genome-wide association studies in Dutch and other cohorts to refine association signals and uncover population-specific variants. GoNL will create a catalog of human genetic variation in this sample that is uniquely characterized with respect to micro-geographic location and a wide range of phenotypes. The resource will be made available to the research and medical community to guide the interpretation of sequencing projects. The present paper summarizes the global characteristics of the project. PMID:23714750
Project Apollo Flight Sequence

NASA Image and Video Library

1966-08-01

Lunar Orbiter's "Typical Flight sequence of Events" turned out to be quite typical indeed, as all five spacecraft performed exactly as planned. -- Published in James R. Hansen, Spaceflight Revolution: NASA Langley Research Center From Sputnik to Apollo, (Washington: NASA, 1995), p. 340.
Paul Spellman, Ph.D., Talks about TCGA at AACR 2011 - TCGA

Cancer.gov

Dr. Paul Spellman talks about The Cancer Genome Atlas (TCGA) and how this could help further the treatment of cancer. TCGA is a project working to catalog genetic mutations responsible for cancer. Clinicians are sequencing the genomes of patients with any of 20 different cancers and hope that this could target clinical trials at the specific patient sub-groups that would benefit most. Dr. Spellman explains how an increasing number of laboratories are becoming able to conduct genome sequencing and contribute to the TCGA project, discusses how clinicians could apply the findings in practice to decide on treatment and effect patient outlook and suggests that in future patients may start to request for their genome to be sequenced in order to aid their treatment.
Community annotation and bioinformatics workforce development in concert--Little Skate Genome Annotation Workshops and Jamborees.

PubMed

Wang, Qinghua; Arighi, Cecilia N; King, Benjamin L; Polson, Shawn W; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F; Page, Shallee T; Rendino, Marc Farnum; Thomas, William Kelley; Udwary, Daniel W; Wu, Cathy H

2012-01-01

Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome.
Community annotation and bioinformatics workforce development in concert—Little Skate Genome Annotation Workshops and Jamborees

PubMed Central

Wang, Qinghua; Arighi, Cecilia N.; King, Benjamin L.; Polson, Shawn W.; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F.; Page, Shallee T.; Farnum Rendino, Marc; Thomas, William Kelley; Udwary, Daniel W.; Wu, Cathy H.

2012-01-01

Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome. PMID:22434832
Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing

NASA Astrophysics Data System (ADS)

Ferreira, Pedro G.; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R.; Rivas, Manuel A.; Esteve-Codina, Anna; Estivill, Xavier; Guigó, Roderic; Dermitzakis, Emmanouil; Antonarakis, Stylianos; Meitinger, Thomas; Strom, Tim M.; Palotie, Aarno; François Deleuze, Jean; Sudbrak, Ralf; Lerach, Hans; Gut, Ivo; Syvänen, Ann-Christine; Gyllensten, Ulf; Schreiber, Stefan; Rosenstiel, Philip; Brunner, Han; Veltman, Joris; Hoen, Peter A. C. T.; Jan van Ommen, Gert; Carracedo, Angel; Brazma, Alvis; Flicek, Paul; Cambon-Thomsen, Anne; Mangion, Jonathan; Bentley, David; Hamosh, Ada; Rosenstiel, Philip; Strom, Tim M.; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael

2016-09-01

Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns, and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.
SMRT sequencing data for Garcinia mangostana L. variety Mesta.

PubMed

Midin, Mohd Razik; Loke, Kok-Keong; Madon, Maria; Nordin, Mohd Shukor; Goh, Hoe-Han; Mohd Noor, Normah

2017-06-01

The "Queen of Fruits" mangosteen ( Garcinia mangostana L.) produces commercially important fruits with desirable taste of flesh and pericarp rich in xanthones with medicinal properties. To date, only limited knowledge is available on the cytogenetics and genome sequences of a common variety of mangosteen (Abu Bakar et al., 2016 [1]). Here, we report the first single-molecule real-time (SMRT) sequencing data from whole genome sequencing of mangosteen of Mesta variety. Raw reads of the SMRT sequencing project can be obtained from SRA database with the accession numbers SRX2718652 until SRX2718659.
Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34

DOE PAGES

Anderson, Iain J.; DasSarma, Priya; Lucas, Susan; ...

2016-09-10

Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.
Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anderson, Iain J.; DasSarma, Priya; Lucas, Susan

Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

Draft Genome Sequence of Microbacterium sp. Strain UCD-TDU (Phylum Actinobacteria)

PubMed Central

Bendiks, Zachary A.; Lang, Jenna M.; Darling, Aaron E.; Coil, David A.

2013-01-01

Here, we present the draft genome sequence of Microbacterium sp. strain UCD-TDU, a member of the phylum Actinobacteria. The assembly contains 3,746,321 bp (in 8 scaffolds). This strain was isolated from a residential toilet as part of an undergraduate student research project to sequence reference genomes of microbes from the built environment. PMID:23516225
Sequencing Conservation Actions Through Threat Assessments in the Southeastern United States

Treesearch

Robert D. Sutter; Christopher C. Szell

2006-01-01

The identification of conservation priorities is one of the leading issues in conservation biology. We present a project of The Nature Conservancy, called Sequencing Conservation Actions, which prioritizes conservation areas and identifies foci for crosscutting strategies at various geographic scales. We use the term âSequencingâ to mean an ordering of actions over...
Sequences of Normative Evaluation in Two Telecollaboration Projects: A Comparative Study of Multimodal Feedback through Desktop Videoconference

ERIC Educational Resources Information Center

Cappellini, Marco; Azaoui, Brahim

2017-01-01

In our study we analyse how the same interactional dynamic is produced in two different pedagogical settings exploiting a desktop videoconference system. We propose to focus our attention on a specific type of conversational side sequence, known in the Francophone literature as sequences of normative evaluation. More particularly, we analyse data…
Next Generation Sequencing of Actinobacteria for the Discovery of Novel Natural Products

PubMed Central

Gomez-Escribano, Juan Pablo; Alt, Silke; Bibb, Mervyn J.

2016-01-01

Like many fields of the biosciences, actinomycete natural products research has been revolutionised by next-generation DNA sequencing (NGS). Hundreds of new genome sequences from actinobacteria are made public every year, many of them as a result of projects aimed at identifying new natural products and their biosynthetic pathways through genome mining. Advances in these technologies in the last five years have meant not only a reduction in the cost of whole genome sequencing, but also a substantial increase in the quality of the data, having moved from obtaining a draft genome sequence comprised of several hundred short contigs, sometimes of doubtful reliability, to the possibility of obtaining an almost complete and accurate chromosome sequence in a single contig, allowing a detailed study of gene clusters and the design of strategies for refactoring and full gene cluster synthesis. The impact that these technologies are having in the discovery and study of natural products from actinobacteria, including those from the marine environment, is only starting to be realised. In this review we provide a historical perspective of the field, analyse the strengths and limitations of the most relevant technologies, and share the insights acquired during our genome mining projects. PMID:27089350
Origins of the Human Genome Project.

PubMed

Watson, J D; Cook-Deegan, R M

1991-01-01

The Human Genome Project has become a reality. Building on a debate that dates back to 1985, several genome projects are now in full stride around the world, and more are likely to form in the next several years. Italy began its genome program in 1987, and the United Kingdom and U.S.S.R. in 1988. The European communities mounted several genome projects on yeast, bacteria, Drosophila, and Arabidospis thaliana (a rapidly growing plant with a small genome) in 1988, and in 1990 commenced a new 2-year program on the human genome. In the United States, we have completed the first year of operation of the National Center for Human Genome Research at the National Institutes of Health (NIH), now the largest single funding source for genome research in the world. There have been dedicated budgets focused on genome-scale research at NIH, the U.S. Department of Energy, and the Howard Hughes Medical Institute for several years, and results are beginning to accumulate. There were three annual meetings on genome mapping and sequencing at Cold Spring Harbor, New York, in the spring of 1988, 1989, and 1990; the talks have shifted from a discussion about how to approach problems to presenting results from experiments already performed. We have finally begun to work rather than merely talk. The purpose of genome projects is to assemble data on the structure of DNA in human chromosomes and those of other organisms. A second goal is to develop new technologies to perform mapping and sequencing. There have been impressive technical advances in the past 5 years since the debate about the human genome project began. We are on the verge of beginning pilot projects to test several approaches to sequencing long stretches of DNA, using both automation and manual methods. Ordered sets of yeast artificial chromosome and cosmid clones have been assembled to span more than 2 million base pairs of several human chromosomes, and a region of 10 million base pairs has been assembled for Caenorhabditis elegans by a collaboration between Washington University and the Medical Research Council laboratory in Cambridge, U.K. This project is now turning to sequencing C. elegans DNA as a logical extension of this work. These are but the first fruits of the genome project. There is much more to come.
Fueling the Future with Fungal Genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grigoriev, Igor V.

2014-10-27

Genomes of fungi relevant to energy and environment are in focus of the JGI Fungal Genomic Program. One of its projects, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts and pathogens) and biorefinery processes (cellulose degradation and sugar fermentation) by means of genome sequencing and analysis. New chapters of the Encyclopedia can be opened with user proposals to the JGI Community Science Program (CSP). Another JGI project, the 1000 fungal genomes, explores fungal diversity on genome level at scale and is open for users to nominate new species for sequencing. Over 400 fungal genomes have beenmore » sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supported by functional genomics will lead to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such ‘parts’ suggested by comparative genomics and functional analysis in these areas are presented here.« less
GenBank.

PubMed

Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

2011-01-01

GenBank® is a comprehensive database that contains publicly available nucleotide sequences for more than 380,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system that integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.
A trace display and editing program for data from fluorescence based sequencing machines.

PubMed

Gleeson, T; Hillier, L

1991-12-11

'Ted' (Trace editor) is a graphical editor for sequence and trace data from automated fluorescence sequencing machines. It provides facilities for viewing sequence and trace data (in top or bottom strand orientation), for editing the base sequence, for automated or manual trimming of the head (vector) and tail (uncertain data) from the sequence, for vertical and horizontal trace scaling, for keeping a history of sequence editing, and for output of the edited sequence. Ted has been used extensively in the C.elegans genome sequencing project, both as a stand-alone program and integrated into the Staden sequence assembly package, and has greatly aided in the efficiency and accuracy of sequence editing. It runs in the X windows environment on Sun workstations and is available from the authors. Ted currently supports sequence and trace data from the ABI 373A and Pharmacia A.L.F. sequencers.
Version Control in Project-Based Learning

ERIC Educational Resources Information Center

Milentijevic, Ivan; Ciric, Vladimir; Vojinovic, Oliver

2008-01-01

This paper deals with the development of a generalized model for version control systems application as a support in a range of project-based learning methods. The model is given as UML sequence diagram and described in detail. The proposed model encompasses a wide range of different project-based learning approaches by assigning a supervisory…
Pathogen Research Databases

Science.gov Websites

Hepatitis C Virus (HCV) database project is funded by the Division of Microbiology and Infectious Diseases of the National Institute of Allergies and Infectious Diseases (NIAID). The HCV database project started as a spin-off from the HIV database project. There are two databases for HCV, a sequence database
NHEXAS PHASE I ARIZONA STUDY--STANDARD OPERATING PROCEDURE FOR GENERAL LABORATORY TRAINING PLAN--BATTELLE (BCO-T-1.0)

EPA Science Inventory

This SOP describes the training sequence followed by each member of the technical staff at Battelle who participates in the NHEXAS project. The procedure is designed to provide them with an overview of the project in terms of project goals, structure, and laboratory requirements...
Opportunities and challenges for the integration of massively parallel genomic sequencing into clinical practice: lessons from the ClinSeq project.

PubMed

Biesecker, Leslie G

2012-04-01

The debate surrounding the return of results from high-throughput genomic interrogation encompasses many important issues including ethics, law, economics, and social policy. As well, the debate is also informed by the molecular, genetic, and clinical foundations of the emerging field of clinical genomics, which is based on this new technology. This article outlines the main biomedical considerations of sequencing technologies and demonstrates some of the early clinical experiences with the technology to enable the debate to stay focused on real-world practicalities. These experiences are based on early data from the ClinSeq project, which is a project to pilot the use of massively parallel sequencing in a clinical research context with a major aim to develop modes of returning results to individual subjects. The study has enrolled >900 subjects and generated exome sequence data on 572 subjects. These data are beginning to be interpreted and returned to the subjects, which provides examples of the potential usefulness and pitfalls of clinical genomics. There are numerous genetic results that can be readily derived from a genome including rare, high-penetrance traits, and carrier states. However, much work needs to be done to develop the tools and resources for genomic interpretation. The main lesson learned is that a genome sequence may be better considered as a health-care resource, rather than a test, one that can be interpreted and used over the lifetime of the patient.
Subsurface temperatures and geothermal gradients on the North Slope, Alaska

USGS Publications Warehouse

Collett, Timothy S.; Bird, Kenneth J.; Magoon, Leslie B.

1989-01-01

Geothermal gradients as interpreted from a series of high-resolution stabilized well-bore-temperature surveys from 46 North Slope, Alaska, wells vary laterally and vertically throughout the near-surface sediment (0-2,000 m). The data from these surveys have been used in conjunction with depths of ice-bearing permafrost, as interpreted from 102 well logs, to project geothermal gradients within and below the ice-bearing permafrost sequence. The geothermal gradients calculated from the projected temperature profiles are similar to the geothermal gradients measured in the temperature surveys. Measured and projected geothermal gradients in the ice-bearing permafrost sequence range from 1.5??C/100m in the Prudhoe Bay area to 5.1??C/100m in the National Petroleum Reserve in Alaska (NPRA).
MSeq-CNV: accurate detection of Copy Number Variation from Sequencing of Multiple samples.

PubMed

Malekpour, Seyed Amir; Pezeshk, Hamid; Sadeghi, Mehdi

2018-03-05

Currently a few tools are capable of detecting genome-wide Copy Number Variations (CNVs) based on sequencing of multiple samples. Although aberrations in mate pair insertion sizes provide additional hints for the CNV detection based on multiple samples, the majority of the current tools rely only on the depth of coverage. Here, we propose a new algorithm (MSeq-CNV) which allows detecting common CNVs across multiple samples. MSeq-CNV applies a mixture density for modeling aberrations in depth of coverage and abnormalities in the mate pair insertion sizes. Each component in this mixture density applies a Binomial distribution for modeling the number of mate pairs with aberration in the insertion size and also a Poisson distribution for emitting the read counts, in each genomic position. MSeq-CNV is applied on simulated data and also on real data of six HapMap individuals with high-coverage sequencing, in 1000 Genomes Project. These individuals include a CEU trio of European ancestry and a YRI trio of Nigerian ethnicity. Ancestry of these individuals is studied by clustering the identified CNVs. MSeq-CNV is also applied for detecting CNVs in two samples with low-coverage sequencing in 1000 Genomes Project and six samples form the Simons Genome Diversity Project.
DHS Internship Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

House, Samantha

2014-09-01

This summer I worked on projects that involved RNA sequencing of pathogens after an infection of host cells. The goal of these projects was to continue developing pathogen enrichment strategies for transcriptomic analysis, and also to perform hostpathogen interaction studies.
The Sphagnome Project: enabling ecological and evolutionary insights through a genus-level sequencing project

DOE PAGES

Weston, David J.; Turetsky, Merritt R.; Johnson, Matthew G.; ...

2017-10-27

Considerable progress has been made in ecological and evolutionary genetics with studies demonstrating how genes underlying plant and microbial traits can influence adaptation and even ‘extend’ to influence community structure and ecosystem level processes. The progress in this area is limited to model systems with deep genetic and genomic resources that often have negligible ecological impact or interest. Therefore, important linkages between genetic adaptations and their consequences at organismal and ecological scales are often lacking. We introduce the Sphagnome Project, which incorporates genomics into a long-running history of Sphagnum research that has documented unparalleled contributions to peatland ecology, carbon sequestration,more » biogeochemistry, microbiome research, niche construction, and ecosystem engineering. The Sphagnome Project encompasses a genus-level sequencing effort that represents a new type of model system driven not only by genetic tractability, but by ecologically relevant questions and hypotheses.« less
The Sphagnome Project: enabling ecological and evolutionary insights through a genus-level sequencing project

DOE Office of Scientific and Technical Information (OSTI.GOV)

Weston, David J.; Turetsky, Merritt R.; Johnson, Matthew G.

Considerable progress has been made in ecological and evolutionary genetics with studies demonstrating how genes underlying plant and microbial traits can influence adaptation and even ‘extend’ to influence community structure and ecosystem level processes. The progress in this area is limited to model systems with deep genetic and genomic resources that often have negligible ecological impact or interest. Therefore, important linkages between genetic adaptations and their consequences at organismal and ecological scales are often lacking. We introduce the Sphagnome Project, which incorporates genomics into a long-running history of Sphagnum research that has documented unparalleled contributions to peatland ecology, carbon sequestration,more » biogeochemistry, microbiome research, niche construction, and ecosystem engineering. The Sphagnome Project encompasses a genus-level sequencing effort that represents a new type of model system driven not only by genetic tractability, but by ecologically relevant questions and hypotheses.« less
The Sphagnome Project: enabling ecological and evolutionary insights through a genus-level sequencing project.

PubMed

Weston, David J; Turetsky, Merritt R; Johnson, Matthew G; Granath, Gustaf; Lindo, Zoë; Belyea, Lisa R; Rice, Steven K; Hanson, David T; Engelhardt, Katharina A M; Schmutz, Jeremy; Dorrepaal, Ellen; Euskirchen, Eugénie S; Stenøien, Hans K; Szövényi, Péter; Jackson, Michelle; Piatkowski, Bryan T; Muchero, Wellington; Norby, Richard J; Kostka, Joel E; Glass, Jennifer B; Rydin, Håkan; Limpens, Juul; Tuittila, Eeva-Stiina; Ullrich, Kristian K; Carrell, Alyssa; Benscoter, Brian W; Chen, Jin-Gui; Oke, Tobi A; Nilsson, Mats B; Ranjan, Priya; Jacobson, Daniel; Lilleskov, Erik A; Clymo, R S; Shaw, A Jonathan

2018-01-01

Considerable progress has been made in ecological and evolutionary genetics with studies demonstrating how genes underlying plant and microbial traits can influence adaptation and even 'extend' to influence community structure and ecosystem level processes. Progress in this area is limited to model systems with deep genetic and genomic resources that often have negligible ecological impact or interest. Thus, important linkages between genetic adaptations and their consequences at organismal and ecological scales are often lacking. Here we introduce the Sphagnome Project, which incorporates genomics into a long-running history of Sphagnum research that has documented unparalleled contributions to peatland ecology, carbon sequestration, biogeochemistry, microbiome research, niche construction, and ecosystem engineering. The Sphagnome Project encompasses a genus-level sequencing effort that represents a new type of model system driven not only by genetic tractability, but by ecologically relevant questions and hypotheses. © 2017 UT-Battelle New Phytologist © 2017 New Phytologist Trust.
Datasets for evolutionary comparative genomics

PubMed Central

Liberles, David A

2005-01-01

Many decisions about genome sequencing projects are directed by perceived gaps in the tree of life, or towards model organisms. With the goal of a better understanding of biology through the lens of evolution, however, there are additional genomes that are worth sequencing. One such rationale for whole-genome sequencing is discussed here, along with other important strategies for understanding the phenotypic divergence of species. PMID:16086856
Sequencing three crocodilian genomes to illuminate the evolution of archosaurs and amniotes

PubMed Central

2012-01-01

The International Crocodilian Genomes Working Group (ICGWG) will sequence and assemble the American alligator (Alligator mississippiensis), saltwater crocodile (Crocodylus porosus) and Indian gharial (Gavialis gangeticus) genomes. The status of these projects and our planned analyses are described. PMID:22293439

A whole-genome, radiation hybrid map of wheat

USDA-ARS?s Scientific Manuscript database

Generating a reference sequence of bread wheat (Triticum aestivum L.) is a challenging task because of its large, highly repetitive and allopolyploid genome. Ordering of BAC- and NGS-based contigs in ongoing wheat genome-sequencing projects primarily uses recombination and comparative genomics-base...
The human genome: Some assembly required. Final report

DOE Office of Scientific and Technical Information (OSTI.GOV)

NONE

1994-12-31

The Human Genome Project promises to be one of the most rewarding endeavors in modern biology. The cost and the ethical and social implications, however, have made this project the source of considerable debate both in the scientific community and in the public at large. The 1994 Graduate Student Symposium addresses the scientific merits of the project, the technical issues involved in accomplishing the task, as well as the medical and social issues which stem from the wealth of knowledge which the Human Genome Project will help create. To this end, speakers were brought together who represent the diverse areasmore » of expertise characteristic of this multidisciplinary project. The keynote speaker addresses the project`s motivations and goals in the larger context of biological and medical sciences. The first two sessions address relevant technical issues, data collection with a focus on high-throughput sequencing methods and data analysis with an emphasis on identification of coding sequences. The third session explores recent advances in the understanding of genetic diseases and possible routes to treatment. Finally, the last session addresses some of the ethical, social and legal issues which will undoubtedly arise from having a detailed knowledge of the human genome.« less
Risk-based Prioritization of Facility Decommissioning and Environmental Restoration Projects in the National Nuclear Legacy Liabilities Program at the Chalk River Laboratory - 13564

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nelson, Jerel G.; Kruzic, Michael; Castillo, Carlos

2013-07-01

Chalk River Laboratory (CRL), located in Ontario Canada, has a large number of remediation projects currently in the Nuclear Legacy Liabilities Program (NLLP), including hundreds of facility decommissioning projects and over one hundred environmental remediation projects, all to be executed over the next 70 years. Atomic Energy of Canada Limited (AECL) utilized WorleyParsons to prioritize the NLLP projects at the CRL through a risk-based prioritization and ranking process, using the WorleyParsons Sequencing Unit Prioritization and Estimating Risk Model (SUPERmodel). The prioritization project made use of the SUPERmodel which has been previously used for other large-scale site prioritization and sequencing ofmore » facilities at nuclear laboratories in the United States. The process included development and vetting of risk parameter matrices as well as confirmation/validation of project risks. Detailed sensitivity studies were also conducted to understand the impacts that risk parameter weighting and scoring had on prioritization. The repeatable prioritization process yielded an objective, risk-based and technically defendable process for prioritization that gained concurrence from all stakeholders, including Natural Resources Canada (NRCan) who is responsible for the oversight of the NLLP. (authors)« less
On the constrained minimization of smooth Kurdyka—Łojasiewicz functions with the scaled gradient projection method

NASA Astrophysics Data System (ADS)

Prato, Marco; Bonettini, Silvia; Loris, Ignace; Porta, Federica; Rebegoldi, Simone

2016-10-01

The scaled gradient projection (SGP) method is a first-order optimization method applicable to the constrained minimization of smooth functions and exploiting a scaling matrix multiplying the gradient and a variable steplength parameter to improve the convergence of the scheme. For a general nonconvex function, the limit points of the sequence generated by SGP have been proved to be stationary, while in the convex case and with some restrictions on the choice of the scaling matrix the sequence itself converges to a constrained minimum point. In this paper we extend these convergence results by showing that the SGP sequence converges to a limit point provided that the objective function satisfies the Kurdyka-Łojasiewicz property at each point of its domain and its gradient is Lipschitz continuous.
Genomic Identification and Analysis of Shared Cis-regulator Elements in a Developmentally Critical homeobox Cluster

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chris Amemiya

2003-04-01

The goals of this project were to isolate, characterize, and sequence the Dlx3/Dlx7 bigene cluster from twelve different species of mammals. The Dlx3 and Dlx7 genes are known to encode homeobox transcription factors involved in patterning of structures in the vertebrate jaw as well as vertebrate limbs. Genomic sequences from the respective taxa will subsequently be compared in order to identify conserved non-coding sequences that are potential cis-regulatory elements. Based on the comparisons they will fashion transgenic mouse experiments to functionally test the strength of the potential cis-regulatory elements. A goal of the project is to attempt to identify thosemore » elements that may function in coordinately regulating both Dlx3 and Dlx7 functions.« less
Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Catfish Genome Consortium; Wang, Shaolin; Peatman, Eric

2010-03-23

Background-Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results-A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35percent of the unique sequences had significant similarities tomore » known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions-This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.« less
Skate Genome Project: Cyber-Enabled Bioinformatics Collaboration

PubMed Central

Vincent, J.

2011-01-01

The Skate Genome Project, a pilot project of the North East Cyber infrastructure Consortium, aims to produce a draft genome sequence of Leucoraja erinacea, the Little Skate. The pilot project was designed to also develop expertise in large scale collaborations across the NECC region. An overview of the bioinformatics and infrastructure challenges faced during the first year of the project will be presented. Results to date and lessons learned from the perspective of a bioinformatics core will be highlighted.
The 1000 Genomes Project: data management and community access.

PubMed

Clarke, Laura; Zheng-Bradley, Xiangqun; Smith, Richard; Kulesha, Eugene; Xiao, Chunlin; Toneva, Iliana; Vaughan, Brendan; Preuss, Don; Leinonen, Rasko; Shumway, Martin; Sherry, Stephen; Flicek, Paul

2012-04-27

The 1000 Genomes Project was launched as one of the largest distributed data collection and analysis projects ever undertaken in biology. In addition to the primary scientific goals of creating both a deep catalog of human genetic variation and extensive methods to accurately discover and characterize variation using new sequencing technologies, the project makes all of its data publicly available. Members of the project data coordination center have developed and deployed several tools to enable widespread data access.
Comparison of next generation sequencing technologies for transcriptome characterization

PubMed Central

2009-01-01

Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms. PMID:19646272
HTSeq--a Python framework to work with high-throughput sequencing data.

PubMed

Anders, Simon; Pyl, Paul Theodor; Huber, Wolfgang

2015-01-15

A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. © The Author 2014. Published by Oxford University Press.
In vitro propagation of the microsporidian pathogen Brachiola algerae and studies of its chromosome and ribosomal DNA organization in the context of the complete genome sequencing project.

PubMed

Belkorchia, Abdel; Biderre, Corinne; Militon, Cécile; Polonais, Valérie; Wincker, Patrick; Jubin, Claire; Delbac, Frédéric; Peyretaillade, Eric; Peyret, Pierre

2008-03-01

Brachiola algerae has a broad host spectrum from human to mosquitoes. The successful infection of two mosquito cell lines (Mos55: embryonic cells and Sua 4.0: hemocyte-like cells) and a human cell line (HFF) highlights the efficient adaptive capacity of this microsporidian pathogen. The molecular karyotype of this microsporidian species was determined in the context of the B. algerae genome sequencing project, showing that its haploid genome consists of 30 chromosomal-sized DNAs ranging from 160 to 2240 kbp giving an estimated genome size of 23 Mbp. A contig of 12,269 bp including the DNA sequence of the B. algerae ribosomal transcription unit has been built from initial genomic sequences and the secondary structure of the large subunit rRNA constructed. The data obtained indicate that B. algerae should be an excellent parasitic model to understand genome evolution in relation to infectious capacity.
Mapping and Sequencing the Human Genome

DOE R&D Accomplishments Database

1988-01-01

Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.
Pre-earthquake multiparameter analysis of the 2016 Amatrice-Norcia (Central Italy) seismic sequence: a case study for the application of the SAFE project concepts

NASA Astrophysics Data System (ADS)

De Santis, A.

2017-12-01

The SAFE (Swarm for Earthquake study) project (funded by European Space Agency in the framework "STSE Swarm+Innovation", 2014-2016) aimed at applying the new approach of geosystemics to the analysis of Swarm satellite (ESA) electromagnetic data for investigating the preparatory phase of earthquakes. We present in this talk the case study of the most recent seismic sequence in Italy. First a M6 earthquake on 24 August 2016 and then a M6.5 earthquake on 30 October 2016 shocked almost in the same region of Central Italy causing about 300 deaths in total (mostly on 24 August), with a revival of other significant seismicity on January 2017. Analysing both geophysical and climatological satellite and ground data preceding the major earthquakes of the sequence we present results that confirm a complex solid earth-atmosphere coupling in the preparation phase of the whole sequence.
GAMES identifies and annotates mutations in next-generation sequencing projects.

PubMed

Sana, Maria Elena; Iascone, Maria; Marchetti, Daniela; Palatini, Jeff; Galasso, Marco; Volinia, Stefano

2011-01-01

Next-generation sequencing (NGS) methods have the potential for changing the landscape of biomedical science, but at the same time pose several problems in analysis and interpretation. Currently, there are many commercial and public software packages that analyze NGS data. However, the limitations of these applications include output which is insufficiently annotated and of difficult functional comprehension to end users. We developed GAMES (Genomic Analysis of Mutations Extracted by Sequencing), a pipeline aiming to serve as an efficient middleman between data deluge and investigators. GAMES attains multiple levels of filtering and annotation, such as aligning the reads to a reference genome, performing quality control and mutational analysis, integrating results with genome annotations and sorting each mismatch/deletion according to a range of parameters. Variations are matched to known polymorphisms. The prediction of functional mutations is achieved by using different approaches. Overall GAMES enables an effective complexity reduction in large-scale DNA-sequencing projects. GAMES is available free of charge to academic users and may be obtained from http://aqua.unife.it/GAMES.
BAC sequencing using pooled methods.

PubMed

Saski, Christopher A; Feltus, F Alex; Parida, Laxmi; Haiminen, Niina

2015-01-01

Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.
An Undergraduate Two-Course Sequence in Biomedical Engineering Design: A Simulation of an Industrial Environment with Group and Individual Project Participation.

ERIC Educational Resources Information Center

Jendrucko, Richard J.

The first half of a Biomedical Engineering course at Texas A&M University is devoted to group projects that require design planning and a search of the literature. The second half requires each student to individually prepare a research proposal and conduct a research project. (MLH)
Textile Science Leader's Guide. 4-H Textile Science.

ERIC Educational Resources Information Center

Scholl, Jan

This instructor's guide provides an overview of 4-H student project modules in the textile sciences area. The guide includes short notes explaining how to use the project modules, a flowchart chart showing how the project areas are sequenced, a synopsis of the design and content of the modules, and some program planning tips. For each of the…
U.S.-MEXICO BORDER PROGRAM ARIZONA BORDER STUDY--STANDARD OPERATING PROCEDURE FOR GENERAL LABORATORY TRAINING PLAN (BCO-T-1.0)

EPA Science Inventory

This SOP describes the training sequence followed by each member of the technical staff at Battelle who participates in the project. The procedure is designed to provide them with an overview of the project in terms of project goals, structure, and laboratory requirements. This...
Synthesis of Two Local Anesthetics from Toluene: An Organic Multistep Synthesis in a Project-Oriented Laboratory Course

ERIC Educational Resources Information Center

Demare, Patricia; Regla, Ignacio

2012-01-01

This article describes one of the projects in the advanced undergraduate organic chemistry laboratory course concerning the synthesis of two local anesthetic drugs, prilocaine and benzocaine, with a common three-step sequence starting from toluene. Students undertake, in a several-week independent project, the multistep synthesis of a…
Metasecretome-selective phage display approach for mining the functional potential of a rumen microbial community.

PubMed

Ciric, Milica; Moon, Christina D; Leahy, Sinead C; Creevey, Christopher J; Altermann, Eric; Attwood, Graeme T; Rakonjac, Jasna; Gagic, Dragana

2014-05-12

In silico, secretome proteins can be predicted from completely sequenced genomes using various available algorithms that identify membrane-targeting sequences. For metasecretome (collection of surface, secreted and transmembrane proteins from environmental microbial communities) this approach is impractical, considering that the metasecretome open reading frames (ORFs) comprise only 10% to 30% of total metagenome, and are poorly represented in the dataset due to overall low coverage of metagenomic gene pool, even in large-scale projects. By combining secretome-selective phage display and next-generation sequencing, we focused the sequence analysis of complex rumen microbial community on the metasecretome component of the metagenome. This approach achieved high enrichment (29 fold) of secreted fibrolytic enzymes from the plant-adherent microbial community of the bovine rumen. In particular, we identified hundreds of heretofore rare modules belonging to cellulosomes, cell-surface complexes specialised for recognition and degradation of the plant fibre. As a method, metasecretome phage display combined with next-generation sequencing has a power to sample the diversity of low-abundance surface and secreted proteins that would otherwise require exceptionally large metagenomic sequencing projects. As a resource, metasecretome display library backed by the dataset obtained by next-generation sequencing is ready for i) affinity selection by standard phage display methodology and ii) easy purification of displayed proteins as part of the virion for individual functional analysis.

The contribution of the DNA microarray technology to gene expression profiling in Leishmania spp.: a retrospective.

PubMed

Alonso, Ana; Larraga, Vicente; Alcolea, Pedro J

2018-05-07

The first genome project of any living organism excluding viruses, the gammaproteobacteria Haemophilus influenzae, was completed in 1995. Until the last decade, genome sequencing was very tedious because genome survey sequences (GSS) and/or expressed sequence tags (ESTs) belonging to plasmid, cosmid and artificial chromosome genome libraries had to be sequenced and assembled in silico. Nowadays, no genome is completely assembled actually, because gaps and unassembled contigs are always remaining. However, most represent the whole genome of the organism of origin from a practical point of view. The first genome sequencing projects of trypanosomatid parasites were completed in 2005 following those strategies, and belong to Leishmania major, Trypanosoma cruzi and T. brucei. The functional genomics era rapidly developed on the basis of the microarray technology and has been evolving. In the case of the genus Leishmania, substantial biological information about differentiation in the digenetic life cycle of the parasite has been obtained. Later on, next generation sequencing has revolutionized genome sequencing and functional genomics, leading to more sensitive, accurate results by using much less resources. This new technology is more advantageous, but does not invalidate microarray results. In fact, promising vaccine candidates and drug targets have been found on the basis of microarray-based screening and preliminary proof-of-concept tests. Copyright © 2018. Published by Elsevier B.V.
The African Genome Variation Project shapes medical genetics in Africa

NASA Astrophysics Data System (ADS)

Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.

2015-01-01

Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
The African Genome Variation Project shapes medical genetics in Africa.

PubMed

Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O; Choudhury, Ananyo; Ritchie, Graham R S; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N; Young, Elizabeth H; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S

2015-01-15

Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
Information on a Major New Initiative: Mapping and Sequencing the Human Genome (1986 DOE Memorandum)

DOE R&D Accomplishments Database

DeLisi, Charles (Associate Director, Health and Environmental Research, DOE Office of Energy Research)

1986-05-06

In the history of the Human Genome Program, Dr. Charles DeLisi and Dr. Alvin Trivelpiece of the Department of Energy (DOE) were instrumental in moving the seeds of the program forward. This May 1986 memo from DeLisi to Trivelpiece, Director of DOE's Office of Energy Research, documents this fact. Following the March 1986 Santa Fe workshop on the subject of mapping and sequencing the human genome, DeLisi's memo outlines workshop conclusions, explains the relevance of this project to DOE and the importance of the Department's laboratories and capabilities, notes the critical experience of DOE in managing projects of this scale and potential magnitude, and recognizes the fact that the project will impact biomedical science in ways which could not be fully anticipated at the time. Subsequently, program guidance was further sought from the DOE Health Effects Research Advisory Committee (HERAC) and the April 1987 HERAC report recommended that DOE and the nation commit to a large, multidisciplinary, scientific and technological undertaking to map and sequence the human genome.
MSLICE Sequencing

NASA Technical Reports Server (NTRS)

Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Norris, Jeffrey S.; Morris, John R.

2011-01-01

MSLICE Sequencing is a graphical tool for writing sequences and integrating them into RML files, as well as for producing SCMF files for uplink. When operated in a testbed environment, it also supports uplinking these SCMF files to the testbed via Chill. This software features a free-form textural sequence editor featuring syntax coloring, automatic content assistance (including command and argument completion proposals), complete with types, value ranges, unites, and descriptions from the command dictionary that appear as they are typed. The sequence editor also has a "field mode" that allows tabbing between arguments and displays type/range/units/description for each argument as it is edited. Color-coded error and warning annotations on problematic tokens are included, as well as indications of problems that are not visible in the current scroll range. "Quick Fix" suggestions are made for resolving problems, and all the features afforded by modern source editors are also included such as copy/cut/paste, undo/redo, and a sophisticated find-and-replace system optionally using regular expressions. The software offers a full XML editor for RML files, which features syntax coloring, content assistance and problem annotations as above. There is a form-based, "detail view" that allows structured editing of command arguments and sequence parameters when preferred. The "project view" shows the user s "workspace" as a tree of "resources" (projects, folders, and files) that can subsequently be opened in editors by double-clicking. Files can be added, deleted, dragged-dropped/copied-pasted between folders or projects, and these operations are undoable and redoable. A "problems view" contains a tabular list of all problems in the current workspace. Double-clicking on any row in the table opens an editor for the appropriate sequence, scrolling to the specific line with the problem, and highlighting the problematic characters. From there, one can invoke "quick fix" as described above to resolve the issue. Once resolved, saving the file causes the problem to be removed from the problem view.
Identification and mapping of conserved ortholog set(COS) II sequences of cacao and their conversion to SNP markers for marker-assisted selection in Theobroma cocoa and comparative genomics studies

USDA-ARS?s Scientific Manuscript database

Theobroma cacao is a tree cultivated in the tropics around the world for its seeds that are the source of both chocolate and cocoa butter. The cacao genome sequencing project initiated as a collaboration between USDA, Mars, Inc. and IBM has generated a great deal of transcriptome and genome sequenc...
Early Detection of NSCLC Using Stromal Markers in Peripheral Blood

DTIC Science & Technology

2016-09-01

circulating myeloid cells, flow cytometry, RNA -sequencing, expression profiling. 3. ACCOMPLISHMENTS:  What were the major goals of the project...Subtask 2: Flow cytometry sorting of circulating myeloid cells. Subtask 3: RNA -Sequencing Subtask 4: RNA -seq data analysis Subtask 5: Feasible RT-PCR...accomplished the patient recruitment, flow cytometry sorting of circulating myeloid cells, RNA -sequencing of the samples. During the RNA - seq data analysis, we
A new version of the RDP (Ribosomal Database Project)

NASA Technical Reports Server (NTRS)

Maidak, B. L.; Cole, J. R.; Parker, C. T. Jr; Garrity, G. M.; Larsen, N.; Li, B.; Lilburn, T. G.; McCaughey, M. J.; Olsen, G. J.; Overbeek, R.;

1999-01-01

The Ribosomal Database Project (RDP-II), previously described by Maidak et al. [ Nucleic Acids Res. (1997), 25, 109-111], is now hosted by the Center for Microbial Ecology at Michigan State University. RDP-II is a curated database that offers ribosomal RNA (rRNA) nucleotide sequence data in aligned and unaligned forms, analysis services, and associated computer programs. During the past two years, data alignments have been updated and now include >9700 small subunit rRNA sequences. The recent development of an ObjectStore database will provide more rapid updating of data, better data accuracy and increased user access. RDP-II includes phylogenetically ordered alignments of rRNA sequences, derived phylogenetic trees, rRNA secondary structure diagrams, and various software programs for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (ftp.cme.msu. edu) and WWW (http://www.cme.msu.edu/RDP). The WWW server provides ribosomal probe checking, approximate phylogenetic placement of user-submitted sequences, screening for possible chimeric rRNA sequences, automated alignment, and a suggested placement of an unknown sequence on an existing phylogenetic tree. Additional utilities also exist at RDP-II, including distance matrix, T-RFLP, and a Java-based viewer of the phylogenetic trees that can be used to create subtrees.

Genome Sequencing of Steroid Producing Bacteria Using Ion Torrent Technology and a Reference Genome.

PubMed

Sola-Landa, Alberto; Rodríguez-García, Antonio; Barreiro, Carlos; Pérez-Redondo, Rosario

2017-01-01

The Next-Generation Sequencing technology has enormously eased the bacterial genome sequencing and several tens of thousands of genomes have been sequenced during the last 10 years. Most of the genome projects are published as draft version, however, for certain applications the complete genome sequence is required.In this chapter, we describe the strategy that allowed the complete genome sequencing of Mycobacterium neoaurum NRRL B-3805, an industrial strain exploited for steroid production, using Ion Torrent sequencing reads and the genome of a close strain as the reference. This protocol can be applied to analyze the genetic variations between closely related strains; for example, to elucidate the point mutations between a parental strain and a random mutagenesis-derived mutant.
Brassica ASTRA: an integrated database for Brassica genomic research.

PubMed

Love, Christopher G; Robinson, Andrew J; Lim, Geraldine A C; Hopkins, Clare J; Batley, Jacqueline; Barker, Gary; Spangenberg, German C; Edwards, David

2005-01-01

Brassica ASTRA is a public database for genomic information on Brassica species. The database incorporates expressed sequences with Swiss-Prot and GenBank comparative sequence annotation as well as secondary Gene Ontology (GO) annotation derived from the comparison with Arabidopsis TAIR GO annotations. Simple sequence repeat molecular markers are identified within resident sequences and mapped onto the closely related Arabidopsis genome sequence. Bacterial artificial chromosome (BAC) end sequences derived from the Multinational Brassica Genome Project are also mapped onto the Arabidopsis genome sequence enabling users to identify candidate Brassica BACs corresponding to syntenic regions of Arabidopsis. This information is maintained in a MySQL database with a web interface providing the primary means of interrogation. The database is accessible at http://hornbill.cspp.latrobe.edu.au.
Mammalian genome projects reveal new growth hormone (GH) sequences. Characterization of the GH-encoding genes of armadillo (Dasypus novemcinctus), hedgehog (Erinaceus europaeus), bat (Myotis lucifugus), hyrax (Procavia capensis), shrew (Sorex araneus), ground squirrel (Spermophilus tridecemlineatus), elephant (Loxodonta africana), cat (Felis catus) and opossum (Monodelphis domestica).

PubMed

Wallis, Michael

2008-01-15

Mammalian growth hormone (GH) sequences have been shown previously to display episodic evolution: the sequence is generally strongly conserved but on at least two occasions during mammalian evolution (on lineages leading to higher primates and ruminants) bursts of rapid evolution occurred. However, the number of mammalian orders studied previously has been relatively limited, and the availability of sequence data via mammalian genome projects provides the potential for extending the range of GH gene sequences examined. Complete or nearly complete GH gene sequences for six mammalian species for which no data were previously available have been extracted from the genome databases-Dasypus novemcinctus (nine-banded armadillo), Erinaceus europaeus (western European hedgehog), Myotis lucifugus (little brown bat), Procavia capensis (cape rock hyrax), Sorex araneus (European shrew), Spermophilus tridecemlineatus (13-lined ground squirrel). In addition incomplete data for several other species have been extended. Examination of the data in detail and comparison with previously available sequences has allowed assessment of the reliability of deduced sequences. Several of the new sequences differ substantially from the consensus sequence previously determined for eutherian GHs, indicating greater variability than previously recognised, and confirming the episodic pattern of evolution. The episodic pattern is not seen for signal sequences, 5' upstream sequence or synonymous substitutions-it is specific to the mature protein sequence, suggesting that it relates to the hormonal function. The substitutions accumulated during the course of GH evolution have occurred mainly on the side of the hormone facing away from the receptor, in a non-random fashion, and it is suggested that this may reflect interaction of the receptor-bound hormone with other proteins or small ligands.
Middle Level SS&C Energy Series.

ERIC Educational Resources Information Center

Crow, Linda W.; Aldridge, Bill G.

The project on Scope Sequence and Coordination of Secondary School Science (SS&C) was initiated by the National Science Teachers Association (NSTA) and recommends that all students study science every year and advocates carefully sequenced, well-coordinated instruction in biology, chemistry, earth/space science, and physics. This document…
Evaluation of ribosomal RNA removal protocols for Salmonella RNA-Seq projects

USDA-ARS?s Scientific Manuscript database

Next generation sequencing is a powerful technology and its application to sequencing entire RNA populations of food-borne pathogens will provide valuable insights. A problem unique to prokaryotic RNA-Seq is the massive abundance of ribosomal RNA. Unlike eukaryotic messenger RNA (mRNA), bacterial ...
Gene sequences present in Citrullus sp. having been lost during domestication of watermelon

USDA-ARS?s Scientific Manuscript database

A wide genetic diversity exists among Citrullus species, while watermelon cultivars (Citrullus lanatus var. lanatus) share a narrow genetic base as a result of many years of domestication and selection for desirable fruit qualities. The recent international watermelon genome sequencing project reve...
Sustainable Design of EPA's Campus in Research Triangle Park, NC—Environmental Performance Specifications in Construction Contracts—Section 01450 Sequence of Finishes Installation

EPA Pesticide Factsheets

Learn more about the special construction scheduling/sequencing requirements and procedures necessary to assure achievement of designed Indoor Air Quality (IAQ) levels for the completed project required by the EPA IAQ Program.
Deep whole-genome sequencing of 100 southeast Asian Malays.

PubMed

Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

2013-01-10

Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Deep Whole-Genome Sequencing of 100 Southeast Asian Malays

PubMed Central

Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

2013-01-01

Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. PMID:23290073
Social network analysis of a project-based introductory physics course

NASA Astrophysics Data System (ADS)

Oakley, Christopher

2016-03-01

Research suggests that students benefit from peer interaction and active engagement in the classroom. The quality, nature, effect of these interactions is currently being explored by Physics Education Researchers. Spelman College offers an introductory physics sequence that addresses content and research skills by engaging students in open-ended research projects, a form of Project-Based Learning. Students have been surveyed at regular intervals during the second semester of trigonometry-based course to determine the frequency of interactions in and out of class. These interactions can be with current or past students, tutors, and instructors. This line of inquiry focuses on metrics of Social Network analysis, such as centrality of participants as well as segmentation of groups. Further research will refine and highlight deeper questions regarding student performance in this pedagogy and course sequence.
The Qatar genome project: translation of whole-genome sequencing into clinical practice.

PubMed

Zayed, Hatem

2016-10-01

Qatar Genome Project was launched in 2013 with the intent to sequence the genome of each Qatari citizen in an effort to protect Qataris from the high rate of indigenous genetic diseases by allowing the mapping of disease-causing variants/rare variants and establishing a Qatari reference genome. Indeed, this project is expected to have numerous global benefits because the elevated homogeneity of the Qatari population, that will make Qatar an excellent genetic laboratory that will generate a wealth of data that will allow us to make sense of the genotype-phenotype correlations of many diseases, especially the complex multifactorial diseases, and will pave the way for changing the traditional medical practice of looking first at the phenotype rather than the genotype. © 2016 John Wiley & Sons Ltd.
DOE Joint Genome Institute 2008 Progress Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gilbert, David

2009-03-12

While initially a virtual institute, the driving force behind the creation of the DOE Joint Genome Institute in Walnut Creek, California in the Fall of 1999 was the Department of Energy's commitment to sequencing the human genome. With the publication in 2004 of a trio of manuscripts describing the finished 'DOE Human Chromosomes', the Institute successfully completed its human genome mission. In the time between the creation of the Department of Energy Joint Genome Institute (DOE JGI) and completion of the Human Genome Project, sequencing and its role in biology spread to fields extending far beyond what could be imaginedmore » when the Human Genome Project first began. Accordingly, the targets of the DOE JGI's sequencing activities changed, moving from a single human genome to the genomes of large numbers of microbes, plants, and other organisms, and the community of users of DOE JGI data similarly expanded and diversified. Transitioning into operating as a user facility, the DOE JGI modeled itself after other DOE user facilities, such as synchrotron light sources and supercomputer facilities, empowering the science of large numbers of investigators working in areas of relevance to energy and the environment. The JGI's approach to being a user facility is based on the concept that by focusing state-of-the-art sequencing and analysis capabilities on the best peer-reviewed ideas drawn from a broad community of scientists, the DOE JGI will effectively encourage creative approaches to DOE mission areas and produce important science. This clearly has occurred, only partially reflected in the fact that the DOE JGI has played a major role in more than 45 papers published in just the past three years alone in Nature and Science. The involvement of a large and engaged community of users working on important problems has helped maximize the impact of JGI science. A seismic technological change is presently underway at the JGI. The Sanger capillary-based sequencing process that dominated how sequencing was done in the last decade is being replaced by a variety of new processes and sequencing instruments. The JGI, with an increasing number of next-generation sequencers, whose throughput is 100- to 1,000-fold greater than the Sanger capillary-based sequencers, is increasingly focused in new directions on projects of scale and complexity not previously attempted. These new directions for the JGI come, in part, from the 2008 National Research Council report on the goals of the National Plant Genome Initiative as well as the 2007 National Research Council report on the New Science of Metagenomics. Both reports outline a crucial need for systematic large-scale surveys of the plant and microbial components of the biosphere as well as an increasing need for large-scale analysis capabilities to meet the challenge of converting sequence data into knowledge. The JGI is extensively discussed in both reports as vital to progress in these fields of major national interest. JGI's future plan for plants and microbes includes a systematic approach for investigation of these organisms at a scale requiring the special capabilities of the JGI to generate, manage, and analyze the datasets. JGI will generate and provide not only community access to these plant and microbial datasets, but also the tools for analyzing them. These activities will produce essential knowledge that will be needed if we are to be able to respond to the world's energy and environmental challenges. As the JGI Plant and Microbial programs advance, the JGI as a user facility is also evolving. The Institute has been highly successful in bending its technical and analytical skills to help users solve large complex problems of major importance, and that effort will continue unabated. The JGI will increasingly move from a central focus on 'one-off' user projects coming from small user communities to much larger scale projects driven by systematic and problem-focused approaches to selection of sequencing targets. Entire communities of scientists working in a particular field, such as feedstock improvement or biomass degradation, will be users of this information. Despite this new emphasis, an investigator-initiated user program will remain. This program in the future will replace small projects that increasingly can be accomplished without the involvement of JGI, with imaginative large-scale 'Grand Challenge' projects of foundational relevance to energy and the environment that require a new scale of sequencing and analysis capabilities. Close interactions with the DOE Bioenergy Research Centers, and with other DOE institutions that may follow, will also play a major role in shaping aspects of how the JGI operates as a user facility. Based on increased availability of high-throughput sequencing, the JGI will increasingly provide to users, in addition to DNA sequencing, an array of both pre- and post-sequencing value-added capabilities to accelerate their science.« less

Sequence-of-events-driven automation of the deep space network

NASA Technical Reports Server (NTRS)

Hill, R., Jr.; Fayyad, K.; Smyth, C.; Santos, T.; Chen, R.; Chien, S.; Bevan, R.

1996-01-01

In February 1995, sequence-of-events (SOE)-driven automation technology was demonstrated for a Voyager telemetry downlink track at DSS 13. This demonstration entailed automated generation of an operations procedure (in the form of a temporal dependency network) from project SOE information using artificial intelligence planning technology and automated execution of the temporal dependency network using the link monitor and control operator assistant system. This article describes the overall approach to SOE-driven automation that was demonstrated, identifies gaps in SOE definitions and project profiles that hamper automation, and provides detailed measurements of the knowledge engineering effort required for automation.
Sequence-of-Events-Driven Automation of the Deep Space Network

NASA Technical Reports Server (NTRS)

Hill, R., Jr.; Fayyad, K.; Smyth, C.; Santos, T.; Chen, R.; Chien, S.; Bevan, R.

1996-01-01

In February 1995, sequence-of-events (SOE)-driven automation technology was demonstrated for a Voyager telemetry downlink track at DSS 13. This demonstration entailed automated generation of an operations procedure (in the form of a temporal dependency network) from project SOE information using artificial intelligence planning technology and automated execution of the temporal dependency network using the link monitor and control operator assistant system. This article describes the overall approach to SOE-driven automation that was demonstrated, identifies gaps in SOE definitions and project profiles that hamper automation, and provides detailed measurements of the knowledge engineering effort required for automation.
The FDA's Experience with Emerging Genomics Technologies-Past, Present, and Future.

PubMed

Xu, Joshua; Thakkar, Shraddha; Gong, Binsheng; Tong, Weida

2016-07-01

The rapid advancement of emerging genomics technologies and their application for assessing safety and efficacy of FDA-regulated products require a high standard of reliability and robustness supporting regulatory decision-making in the FDA. To facilitate the regulatory application, the FDA implemented a novel data submission program, Voluntary Genomics Data Submission (VGDS), and also to engage the stakeholders. As part of the endeavor, for the past 10 years, the FDA has led an international consortium of regulatory agencies, academia, pharmaceutical companies, and genomics platform providers, which was named MicroArray Quality Control Consortium (MAQC), to address issues such as reproducibility, precision, specificity/sensitivity, and data interpretation. Three projects have been completed so far assessing these genomics technologies: gene expression microarrays, whole genome genotyping arrays, and whole transcriptome sequencing (i.e., RNA-seq). The resultant studies provide the basic parameters for fit-for-purpose application of these new data streams in regulatory environments, and the solutions have been made available to the public through peer-reviewed publications. The latest MAQC project is also called the SEquencing Quality Control (SEQC) project focused on next-generation sequencing. Using reference samples with built-in controls, SEQC studies have demonstrated that relative gene expression can be measured accurately and reliably across laboratories and RNA-seq platforms. Besides prediction performance comparable to microarrays in clinical settings and safety assessments, RNA-seq is shown to have better sensitivity for low expression and reveal novel transcriptomic features. Future effort of MAQC will be focused on quality control of whole genome sequencing and targeted sequencing.
The FDA’s Experience with Emerging Genomics Technologies—Past, Present, and Future

PubMed Central

Xu, Joshua; Thakkar, Shraddha; Gong, Binsheng; Tong, Weida

2016-01-01

The rapid advancement of emerging genomics technologies and their application for assessing safety and efficacy of FDA-regulated products require a high standard of reliability and robustness supporting regulatory decision-making in the FDA. To facilitate the regulatory application, the FDA implemented a novel data submission program, Voluntary Genomics Data Submission (VGDS), and also to engage the stakeholders. As part of the endeavor, for the past 10 years, the FDA has led an international consortium of regulatory agencies, academia, pharmaceutical companies, and genomics platform providers, which was named MicroArray Quality Control Consortium (MAQC), to address issues such as reproducibility, precision, specificity/sensitivity, and data interpretation. Three projects have been completed so far assessing these genomics technologies: gene expression microarrays, whole genome genotyping arrays, and whole transcriptome sequencing (i.e., RNA-seq). The resultant studies provide the basic parameters for fit-for-purpose application of these new data streams in regulatory environments, and the solutions have been made available to the public through peer-reviewed publications. The latest MAQC project is also called the SEquencing Quality Control (SEQC) project focused on next-generation sequencing. Using reference samples with built-in controls, SEQC studies have demonstrated that relative gene expression can be measured accurately and reliably across laboratories and RNA-seq platforms. Besides prediction performance comparable to microarrays in clinical settings and safety assessments, RNA-seq is shown to have better sensitivity for low expression and reveal novel transcriptomic features. Future effort of MAQC will be focused on quality control of whole genome sequencing and targeted sequencing. PMID:27116022
Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing.

PubMed

Wong, Lai-Ping; Lai, Jason Kuan-Han; Saw, Woei-Yuh; Ong, Rick Twee-Hee; Cheng, Anthony Youzhi; Pillai, Nisha Esakimuthu; Liu, Xuanyao; Xu, Wenting; Chen, Peng; Foo, Jia-Nee; Tan, Linda Wei-Lin; Koo, Seok-Hwee; Soong, Richie; Wenk, Markus Rene; Lim, Wei-Yen; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

2014-05-01

South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language-speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP). The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP). SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal) identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP.
A global reference for human genetic variation

PubMed Central

2016-01-01

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. PMID:26432245
PHYSICO2: an UNIX based standalone procedure for computation of physicochemical, window-dependent and substitution based evolutionary properties of protein sequences along with automated block preparation tool, version 2.

PubMed

Banerjee, Shyamashree; Gupta, Parth Sarthi Sen; Nayek, Arnab; Das, Sunit; Sur, Vishma Pratap; Seth, Pratyay; Islam, Rifat Nawaz Ul; Bandyopadhyay, Amal K

2015-01-01

Automated genome sequencing procedure is enriching the sequence database very fast. To achieve a balance between the entry of sequences in the database and their analyses, efficient software is required. In this end PHYSICO2, compare to earlier PHYSICO and other public domain tools, is most efficient in that it i] extracts physicochemical, window-dependent and homologousposition-based-substitution (PWS) properties including positional and BLOCK-specific diversity and conservation, ii] provides users with optional-flexibility in setting relevant input-parameters, iii] helps users to prepare BLOCK-FASTA-file by the use of Automated Block Preparation Tool of the program, iv] performs fast, accurate and user-friendly analyses and v] redirects itemized outputs in excel format along with detailed methodology. The program package contains documentation describing application of methods. Overall the program acts as efficient PWS-analyzer and finds application in sequence-bioinformatics. PHYSICO2: is freely available at http://sourceforge.net/projects/physico2/ along with its documentation at https://sourceforge.net/projects/physico2/files/Documentation.pdf/download for all users.
PHYSICO2: an UNIX based standalone procedure for computation of physicochemical, window-dependent and substitution based evolutionary properties of protein sequences along with automated block preparation tool, version 2

PubMed Central

Banerjee, Shyamashree; Gupta, Parth Sarthi Sen; Nayek, Arnab; Das, Sunit; Sur, Vishma Pratap; Seth, Pratyay; Islam, Rifat Nawaz Ul; Bandyopadhyay, Amal K

2015-01-01

Automated genome sequencing procedure is enriching the sequence database very fast. To achieve a balance between the entry of sequences in the database and their analyses, efficient software is required. In this end PHYSICO2, compare to earlier PHYSICO and other public domain tools, is most efficient in that it i] extracts physicochemical, window-dependent and homologousposition-based-substitution (PWS) properties including positional and BLOCK-specific diversity and conservation, ii] provides users with optional-flexibility in setting relevant input-parameters, iii] helps users to prepare BLOCK-FASTA-file by the use of Automated Block Preparation Tool of the program, iv] performs fast, accurate and user-friendly analyses and v] redirects itemized outputs in excel format along with detailed methodology. The program package contains documentation describing application of methods. Overall the program acts as efficient PWS-analyzer and finds application in sequence-bioinformatics. Availability PHYSICO2: is freely available at http://sourceforge.net/projects/physico2/ along with its documentation at https://sourceforge.net/projects/physico2/files/Documentation.pdf/download for all users. PMID:26339154
Human action classification using procrustes shape theory

NASA Astrophysics Data System (ADS)

Cho, Wanhyun; Kim, Sangkyoon; Park, Soonyoung; Lee, Myungeun

2015-02-01

In this paper, we propose new method that can classify a human action using Procrustes shape theory. First, we extract a pre-shape configuration vector of landmarks from each frame of an image sequence representing an arbitrary human action, and then we have derived the Procrustes fit vector for pre-shape configuration vector. Second, we extract a set of pre-shape vectors from tanning sample stored at database, and we compute a Procrustes mean shape vector for these preshape vectors. Third, we extract a sequence of the pre-shape vectors from input video, and we project this sequence of pre-shape vectors on the tangent space with respect to the pole taking as a sequence of mean shape vectors corresponding with a target video. And we calculate the Procrustes distance between two sequences of the projection pre-shape vectors on the tangent space and the mean shape vectors. Finally, we classify the input video into the human action class with minimum Procrustes distance. We assess a performance of the proposed method using one public dataset, namely Weizmann human action dataset. Experimental results reveal that the proposed method performs very good on this dataset.
MIPS: analysis and annotation of proteins from whole genomes

PubMed Central

Mewes, H. W.; Amid, C.; Arnold, R.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Münsterkötter, M.; Pagel, P.; Strack, N.; Stümpflen, V.; Warfsmann, J.; Ruepp, A.

2004-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein–protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354
MIPS: analysis and annotation of proteins from whole genomes.

PubMed

Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

2004-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
Mapping and Sequencing the Human Genome: Science, Ethics, and Public Policy.

ERIC Educational Resources Information Center

Cutter, Mary Ann G.; Drexler, Edward; McCullough, Laurence B.; McInerney, Joseph D.; Murray, Jeffrey C.; Rossiter, Belinda; Zola, John

The human genome project started in 1989 with the collaboration of the National Institutes of Health (NIH) and the U.S. Department of Energy (DOE). This document aims to develop an understanding among students of the human genome project and relevant issues. Topics include the science and technology of the human genome project, and the ethical and…
Ethical considerations of research policy for personal genome analysis: the approach of the Genome Science Project in Japan.

PubMed

Minari, Jusaku; Shirai, Tetsuya; Kato, Kazuto

2014-12-01

As evidenced by high-throughput sequencers, genomic technologies have recently undergone radical advances. These technologies enable comprehensive sequencing of personal genomes considerably more efficiently and less expensively than heretofore. These developments present a challenge to the conventional framework of biomedical ethics; under these changing circumstances, each research project has to develop a pragmatic research policy. Based on the experience with a new large-scale project-the Genome Science Project-this article presents a novel approach to conducting a specific policy for personal genome research in the Japanese context. In creating an original informed-consent form template for the project, we present a two-tiered process: making the draft of the template following an analysis of national and international policies; refining the draft template in conjunction with genome project researchers for practical application. Through practical use of the template, we have gained valuable experience in addressing challenges in the ethical review process, such as the importance of sharing details of the latest developments in genomics with members of research ethics committees. We discuss certain limitations of the conventional concept of informed consent and its governance system and suggest the potential of an alternative process using information technology.
Interchangeable Positions in Interaction Sequences in Science Classrooms

ERIC Educational Resources Information Center

Rees, Carol; Roth, Wolff-Michael

2017-01-01

Triadic dialogue, the Initiation, Response, Evaluation sequence typical of teacher /student interactions in classrooms, has long been identified as a barrier to students' access to learning, including science learning. A large body of research on the subject has over the years led to projects and policies aimed at increasing opportunities for…
CIDR

Science.gov Websites

NIH CIDR Program Studies For whole exome sequencing projects, we pretest all samples using a high -density SNP array (>200,000 markers). For custom targeted sequencing, we pretest all samples using a 96 pretest samples using a 96 SNP GoldenGate assay. This extensive pretesting allows us to unambiguously tie
The testes transcriptome derived from the New World Screwworm, Cochliomyia hominivorax SRA

USDA-ARS?s Scientific Manuscript database

In a collaboration with National Center for Genome Resources researchers, we sequenced and assembled the testes transcriptome derived from the Pacora, Panama, production plant strain J06 of the New World Screwworm, Cochliomyia hominivorax. This sequencing project produced 72,750,822 raw reads and th...
California mild CTV strains that break resistance in Trifoliate Orange

USDA-ARS?s Scientific Manuscript database

This is the final report of a project to characterize California isolates of Citrus tristeza virus (CTV) that replicate in Poncirus trifoliata (trifoliate orange). Next Generation Sequencing (NGS) of viral small interfering RNAs (siRNAs) and assembly of full-length sequences of mild California CTV i...
18 CFR 401.37 - Sequence of approval.

Code of Federal Regulations, 2011 CFR

2011-04-01

... 18 Conservation of Power and Water Resources 2 2011-04-01 2011-04-01 false Sequence of approval. 401.37 Section 401.37 Conservation of Power and Water Resources DELAWARE RIVER BASIN COMMISSION ADMINISTRATIVE MANUAL RULES OF PRACTICE AND PROCEDURE Project Review Under Section 3.8 of the Compact § 401.37...
Opinion: Clarifying Two Controversies about Information Mapping's Method.

ERIC Educational Resources Information Center

Horn, Robert E.

1992-01-01

Describes Information Mapping, a methodology for the analysis, organization, sequencing, and presentation of information and explains three major parts of the method: (1) content analysis, (2) project life-cycle synthesis and integration of the content analysis, and (3) sequencing and formatting. Major criticisms of the methodology are addressed.…
Scientific Goals of the Human Genome Project.

ERIC Educational Resources Information Center

Wills, Christopher

1993-01-01

The Human Genome Project, an effort to sequence all the DNA of a human cell, is needed to better understand the behavior of chromosomes during cell division, with the ultimate goal of understanding the specific genes contributing to specific diseases and disabilities. (MSE)

Omics Metadata Management Software (OMMS).

PubMed

Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo

2015-01-01

Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. The OMMS can be obtained at http://omms.sandia.gov.
Omics Metadata Management Software (OMMS)

PubMed Central

Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo

2015-01-01

Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. Availability The OMMS can be obtained at http://omms.sandia.gov PMID:26124554
Numerical Solution of Optimal Control Problem under SPDE Constraints

DTIC Science & Technology

2011-10-14

Faure and Sobol sequences are used to evaluate high dimensional integrals, and the errors in the numerical results for over 30 dimensions become quite...sequence; right: 1000 points of dimension 26 and 27 projection for optimal Kronecker sequence. benchmark Faure and Sobol methods. 2.2 High order...J. Goodman and J. O’Rourke, Handbook of discrete and computational geome- try, CRC Press, Inc., (2004). [5] S. Joe and F. Kuo, Constructing Sobol
Using Sequence Diagrams to Detect Communication Problems Between Systems

NASA Technical Reports Server (NTRS)

Lindvall, Mikael; Ackermann, Chris; Stratton, William C.; Sibol, Deane E.; Ray, Arnab; Yonkwa, Lyly; Kresser, Jan; Godfrey, Sally H.; Knodel, Jens

2008-01-01

Many software systems are evolving complex system of systems (SoS) for which inter-system communication is both mission-critical and error-prone. Such communication problems ideally would be detected before deployment. In a NASA-supported Software Assurance Research Program (SARP) project, we are researching a new approach addressing such problems. In this paper, we show that problems in the communication between two systems can be detected by using sequence diagrams to model the planned communication and by comparing the planned sequence to the actual sequence. We identify different kinds of problems that can be addressed by modeling the planned sequence using different level of abstractions.
Model-based quality assessment and base-calling for second-generation sequencing data.

PubMed

Bravo, Héctor Corrada; Irizarry, Rafael A

2010-09-01

Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or T's, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base-calling performance. © 2009, The International Biometric Society.
Experimental Design-Based Functional Mining and Characterization of High-Throughput Sequencing Data in the Sequence Read Archive

PubMed Central

Nakazato, Takeru; Ohta, Tazro; Bono, Hidemasa

2013-01-01

High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called “Gendoo”. We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called “DBCLS SRA” (http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA. PMID:24167589
Y and W Chromosome Assemblies: Approaches and Discoveries.

PubMed

Tomaszkiewicz, Marta; Medvedev, Paul; Makova, Kateryna D

2017-04-01

Hundreds of vertebrate genomes have been sequenced and assembled to date. However, most sequencing projects have ignored the sex chromosomes unique to the heterogametic sex - Y and W - that are known as sex-limited chromosomes (SLCs). Indeed, haploid and repetitive Y chromosomes in species with male heterogamety (XY), and W chromosomes in species with female heterogamety (ZW), are difficult to sequence and assemble. Nevertheless, obtaining their sequences is important for understanding the intricacies of vertebrate genome function and evolution. Recent progress has been made towards the adaptation of next-generation sequencing (NGS) techniques to deciphering SLC sequences. We review here currently available methodology and results with regard to SLC sequencing and assembly. We focus on vertebrates, but bring in some examples from other taxa. Copyright © 2017 Elsevier Ltd. All rights reserved.
Human genomics projects and precision medicine.

PubMed

Carrasco-Ramiro, F; Peiró-Pastor, R; Aguado, B

2017-09-01

The completion of the Human Genome Project (HGP) in 2001 opened the floodgates to a deeper understanding of medicine. There are dozens of HGP-like projects which involve from a few tens to several million genomes currently in progress, which vary from having specialized goals or a more general approach. However, data generation, storage, management and analysis in public and private cloud computing platforms have raised concerns about privacy and security. The knowledge gained from further research has changed the field of genomics and is now slowly permeating into clinical medicine. The new precision (personalized) medicine, where genome sequencing and data analysis are essential components, allows tailored diagnosis and treatment according to the information from the patient's own genome and specific environmental factors. P4 (predictive, preventive, personalized and participatory) medicine is introducing new concepts, challenges and opportunities. This review summarizes current sequencing technologies, concentrates on ongoing human genomics projects, and provides some examples in which precision medicine has already demonstrated clinical impact in diagnosis and/or treatment.
Genomics England's implementation of its public engagement strategy: Blurred boundaries between engagement for the United Kingdom's 100,000 Genomes project and the need for public support.

PubMed

Samuel, Gabrielle Natalie; Farsides, Bobbie

2018-04-01

The United Kingdom's 100,000 Genomes Project has the aim of sequencing 100,000 genomes from National Health Service patients such that whole genome sequencing becomes routine clinical practice. It also has a research-focused goal to provide data for scientific discovery. Genomics England is the limited company established by the Department of Health to deliver the project. As an innovative scientific/clinical venture, it is interesting to consider how Genomics England positions itself in relation to public engagement activities. We set out to explore how individuals working at, or associated with, Genomics England enacted public engagement in practice. Our findings show that individuals offered a narrative in which public engagement performed more than one function. On one side, public engagement was seen as 'good practice'. On the other, public engagement was presented as core to the project's success - needed to encourage involvement and ultimately recruitment. We discuss the implications of this in this article.
Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction.

PubMed

Palmer, Lance E; Dejori, Mathaeus; Bolanos, Randall; Fasulo, Daniel

2010-01-15

With the rapid expansion of DNA sequencing databases, it is now feasible to identify relevant information from prior sequencing projects and completed genomes and apply it to de novo sequencing of new organisms. As an example, this paper demonstrates how such extra information can be used to improve de novo assemblies by augmenting the overlapping step. Finding all pairs of overlapping reads is a key task in many genome assemblers, and to this end, highly efficient algorithms have been developed to find alignments in large collections of sequences. It is well known that due to repeated sequences, many aligned pairs of reads nevertheless do not overlap. But no overlapping algorithm to date takes a rigorous approach to separating aligned but non-overlapping read pairs from true overlaps. We present an approach that extends the Minimus assembler by a data driven step to classify overlaps as true or false prior to contig construction. We trained several different classification models within the Weka framework using various statistics derived from overlaps of reads available from prior sequencing projects. These statistics included percent mismatch and k-mer frequencies within the overlaps as well as a comparative genomics score derived from mapping reads to multiple reference genomes. We show that in real whole-genome sequencing data from the E. coli and S. aureus genomes, by providing a curated set of overlaps to the contigging phase of the assembler, we nearly doubled the median contig length (N50) without sacrificing coverage of the genome or increasing the number of mis-assemblies. Machine learning methods that use comparative and non-comparative features to classify overlaps as true or false can be used to improve the quality of a sequence assembly.
DraGnET: Software for storing, managing and analyzing annotated draft genome sequence data

PubMed Central

2010-01-01

Background New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioinformatic tools are available for the analysis and management of genome sequences, limitations still remain. For example, restrictions on the submission of data and use of these tools may be imposed, thereby making them unsuitable for sequencing projects that need to remain in-house or proprietary during their initial stages. Furthermore, the availability and use of next generation sequencing in industrial, governmental and academic environments requires biologist to have access to computational support for the curation and analysis of the data generated; however, this type of support is not always immediately available. Results To address these limitations, we have developed DraGnET (Draft Genome Evaluation Tool). DraGnET is an open source web application which allows researchers, with no experience in programming and database management, to setup their own in-house projects for storing, retrieving, organizing and managing annotated draft and complete genome sequence data. The software provides a web interface for the use of BLAST, allowing users to perform preliminary comparative analysis among multiple genomes. We demonstrate the utility of DraGnET for performing comparative genomics on closely related bacterial strains. Furthermore, DraGnET can be further developed to incorporate additional tools for more sophisticated analyses. Conclusions DraGnET is designed for use either by individual researchers or as a collaborative tool available through Internet (or Intranet) deployment. For genome projects that require genome sequencing data to initially remain proprietary, DraGnET provides the means for researchers to keep their data in-house for analysis using local programs or until it is made publicly available, at which point it may be uploaded to additional analysis software applications. The DraGnET home page is available at http://www.dragnet.cvm.iastate.edu and includes example files for examining the functionalities, a link for downloading the DraGnET setup package and a link to the DraGnET source code hosted with full documentation on SourceForge. PMID:20175920
Perspectives from the Avian Phylogenomics Project: Questions that Can Be Answered with Sequencing All Genomes of a Vertebrate Class.

PubMed

Jarvis, Erich D

2016-01-01

The rapid pace of advances in genome technology, with concomitant reductions in cost, makes it feasible that one day in our lifetime we will have available extant genomes of entire classes of species, including vertebrates. I recently helped cocoordinate the large-scale Avian Phylogenomics Project, which collected and sequenced genomes of 48 bird species representing most currently classified orders to address a range of questions in phylogenomics and comparative genomics. The consortium was able to answer questions not previously possible with just a few genomes. This success spurred on the creation of a project to sequence the genomes of at least one individual of all extant ∼10,500 bird species. The initiation of this project has led us to consider what questions now impossible to answer could be answered with all genomes, and could drive new questions now unimaginable. These include the generation of a highly resolved family tree of extant species, genome-wide association studies across species to identify genetic substrates of many complex traits, redefinition of species and the species concept, reconstruction of the genomes of common ancestors, and generation of new computational tools to address these questions. Here I present visions for the future by posing and answering questions regarding what scientists could potentially do with available genomes of an entire vertebrate class.
The 1000 Genomes Project: new opportunities for research and social challenges

PubMed Central

2010-01-01

The 1000 Genomes Project, an international collaboration, is sequencing the whole genome of approximately 2,000 individuals from different worldwide populations. The central goal of this project is to describe most of the genetic variation that occurs at a population frequency greater than 1%. The results of this project will allow scientists to identify genetic variation at an unprecedented degree of resolution and will also help improve the imputation methods for determining unobserved genetic variants that are not represented on current genotyping arrays. By identifying novel or rare functional genetic variants, researchers will be able to pinpoint disease-causing genes in genomic regions initially identified by association studies. This level of detailed sequence information will also improve our knowledge of the evolutionary processes and the genomic patterns that have shaped the human species as we know it today. The new data will also lay the foundation for future clinical applications, such as prediction of disease susceptibility and drug response. However, the forthcoming availability of whole genome sequences at affordable prices will raise ethical concerns and pose potential threats to individual privacy. Nevertheless, we believe that these potential risks are outweighed by the benefits in terms of diagnosis and research, so long as rigorous safeguards are kept in place through legislation that prevents discrimination on the basis of the results of genetic testing. PMID:20193048
Snake Genome Sequencing: Results and Future Prospects

PubMed Central

Kerkkamp, Harald M. I.; Kini, R. Manjunatha; Pospelov, Alexey S.; Vonk, Freek J.; Henkel, Christiaan V.; Richardson, Michael K.

2016-01-01

Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression. PMID:27916957
Snake Genome Sequencing: Results and Future Prospects.

PubMed

Kerkkamp, Harald M I; Kini, R Manjunatha; Pospelov, Alexey S; Vonk, Freek J; Henkel, Christiaan V; Richardson, Michael K

2016-12-01

Snake genome sequencing is in its infancy-very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.
A streamlined collecting and preparation protocol for DNA barcoding of Lepidoptera as part of large-scale rapid biodiversity assessment projects, exemplified by the Indonesian Biodiversity Discovery and Information System (IndoBioSys).

PubMed

Schmidt, Olga; Hausmann, Axel; Cancian de Araujo, Bruno; Sutrisno, Hari; Peggie, Djunijanti; Schmidt, Stefan

2017-01-01

Here we present a general collecting and preparation protocol for DNA barcoding of Lepidoptera as part of large-scale rapid biodiversity assessment projects, and a comparison with alternative preserving and vouchering methods. About 98% of the sequenced specimens processed using the present collecting and preparation protocol yielded sequences with more than 500 base pairs. The study is based on the first outcomes of the Indonesian Biodiversity Discovery and Information System (IndoBioSys). IndoBioSys is a German-Indonesian research project that is conducted by the Museum für Naturkunde in Berlin and the Zoologische Staatssammlung München, in close cooperation with the Research Center for Biology - Indonesian Institute of Sciences (RCB-LIPI, Bogor).
The Human Genome Project: how do we protect Australians?

PubMed

Stott Despoja, N

It is the moon landing of the nineties: the ambitious Human Genome Project--identifying the up to 100,000 genes that make up human DNA and the sequences of the three billion base-pairs that comprise the human genome. However, unlike the moon landing, the effects of the genome project will have a fundamental impact on the way we see ourselves and each other.
Development of a Prognostic Marker for Lung Cancer Using Analysis of Tumor Evolution

DTIC Science & Technology

2017-08-01

SUPPLEMENTARY NOTES 14. ABSTRACT The goal of this project is to sequence the exomes of single tumor cells from tumors in order to construct evolutionary trees...dissociation, tumor cell isolation, whole genome amplification, and exome sequencing. We have begun to sequence the exomes of single cells and to...of populations, the evolution of tumor cells within a tumor can be diagrammed on a phylogenetic tree. The more diverse a tumor’s phylogenetic tree
Applications of Gene Targeting Technology to Mental Retardation and Developmental Disability Research

ERIC Educational Resources Information Center

Pimenta, Aurea F.; Levitt, Pat

2005-01-01

The human and mouse genome projects elucidated the sequence and position map of innumerous genes expressed in the central nervous system (CNS), advancing our ability to manipulate these sequences and create models to investigate regulation of gene expression and function. In this article, we reviewed gene targeting methodologies with emphasis on…
Fibonacci and Nature. Mathematics Investigations for Schools.

ERIC Educational Resources Information Center

Newton, Lynn D.

1987-01-01

Sets forth the history of the Fibonacci Sequence and details its occurrence in nature and its potential for project work in schools. Ideas and activities include the rabbit problem, investigations of the sequence itself, its relationship to plants, music, snail shells, and the golden section. Computer generation of spirals is also discussed. (PK)

Rhipicephalus microplus strain Deutsch, whole genome shotgun sequencing project Version 2

USDA-ARS?s Scientific Manuscript database

The cattle tick, Rhipicephalus (Boophilus) microplus, has a genome over 2.4 times the size of the human genome, and with over 70% of repetitive DNA, this genome would prove very costly to sequence at today's prices and difficult to assemble and analyze. Cot filtration/selection techniques were used ...
Genome Sequence of Fusarium oxysporum f. sp. melonis, a fungus causing wilt disease on melon

USDA-ARS?s Scientific Manuscript database

This manuscript reports the genome sequence of F. oxysporum f. sp. melonis, a fungal pathogen that causes Fusarium wilt disease on melon (Cucumis melo). The project is part of a large comparative study designed to explore the genetic composition and evolutionary origin of this group of horizontally ...
Genome sequence of Fusarium oxysporum f. sp. melonis, a fungus causing wilt disease on melon

USDA-ARS?s Scientific Manuscript database

This manuscript reports the genome sequence of F. oxysporum f. sp. melonis, a fungal pathogen that causes Fusarium wilt disease on melon (Cucumis melo). The project is part of a large comparative study designed to explore the genetic composition and evolutionary origin of this group of horizontally ...
Cracking the Genetic Code | NIH MedlinePlus the Magazine

MedlinePlus

... how do you approach that? Now, with sequencing technologies that allow you to sequence an entire genome for $10,000 in less than a week, you can really begin to see what's there. JEFFREY BROWN: But you've said that the Human Genome Project has not yet directly affected the health care ...
Technology-Enhanced Research in the Science Classroom.

ERIC Educational Resources Information Center

Francis, Joseph W.

1997-01-01

Describes a project where students use the Internet as a research tool. Discusses using e-mail to access molecular biology databases and identify proteins using amino acid sequences, obtaining complete amino acid sequences using the world wide web, using telnet to access library resources on the Internet, and various stages of protein analysis…
Note on a Family of Monotone Quantum Relative Entropies

NASA Astrophysics Data System (ADS)

Deuchert, Andreas; Hainzl, Christian; Seiringer, Robert

2015-10-01

Given a convex function and two hermitian matrices A and B, Lewin and Sabin study in (Lett Math Phys 104:691-705, 2014) the relative entropy defined by . Among other things, they prove that the so-defined quantity is monotone if and only if is operator monotone. The monotonicity is then used to properly define for bounded self-adjoint operators acting on an infinite-dimensional Hilbert space by a limiting procedure. More precisely, for an increasing sequence of finite-dimensional projections with strongly, the limit is shown to exist and to be independent of the sequence of projections . The question whether this sequence converges to its "obvious" limit, namely , has been left open. We answer this question in principle affirmatively and show that . If the operators A and B are regular enough, that is ( A - B), and are trace-class, the identity holds.
Haemonchus contortus: Genome Structure, Organization and Comparative Genomics.

PubMed

Laing, R; Martinelli, A; Tracey, A; Holroyd, N; Gilleard, J S; Cotton, J A

2016-01-01

One of the first genome sequencing projects for a parasitic nematode was that for Haemonchus contortus. The open access data from the Wellcome Trust Sanger Institute provided a valuable early resource for the research community, particularly for the identification of specific genes and genetic markers. Later, a second sequencing project was initiated by the University of Melbourne, and the two draft genome sequences for H. contortus were published back-to-back in 2013. There is a pressing need for long-range genomic information for genetic mapping, population genetics and functional genomic studies, so we are continuing to improve the Wellcome Trust Sanger Institute assembly to provide a finished reference genome for H. contortus. This review describes this process, compares the H. contortus genome assemblies with draft genomes from other members of the strongylid group and discusses future directions for parasite genomics using the H. contortus model. Copyright © 2016 Elsevier Ltd. All rights reserved.
Human Y chromosome copy number variation in the next generation sequencing era and beyond.

PubMed

Massaia, Andrea; Xue, Yali

2017-05-01

The human Y chromosome provides a fertile ground for structural rearrangements owing to its haploidy and high content of repeated sequences. The methodologies used for copy number variation (CNV) studies have developed over the years. Low-throughput techniques based on direct observation of rearrangements were developed early on, and are still used, often to complement array-based or sequencing approaches which have limited power in regions with high repeat content and specifically in the presence of long, identical repeats, such as those found in human sex chromosomes. Some specific rearrangements have been investigated for decades; because of their effects on fertility, or their outstanding evolutionary features, the interest in these has not diminished. However, following the flourishing of large-scale genomics, several studies have investigated CNVs across the whole chromosome. These studies sometimes employ data generated within large genomic projects such as the DDD study or the 1000 Genomes Project, and often survey large samples of healthy individuals without any prior selection. Novel technologies based on sequencing long molecules and combinations of technologies, promise to stimulate the study of Y-CNVs in the immediate future.
Automated Gene Ontology annotation for anonymous sequence data.

PubMed

Hennig, Steffen; Groth, Detlef; Lehrach, Hans

2003-07-01

Gene Ontology (GO) is the most widely accepted attempt to construct a unified and structured vocabulary for the description of genes and their products in any organism. Annotation by GO terms is performed in most of the current genome projects, which besides generality has the advantage of being very convenient for computer based classification methods. However, direct use of GO in small sequencing projects is not easy, especially for species not commonly represented in public databases. We present a software package (GOblet), which performs annotation based on GO terms for anonymous cDNA or protein sequences. It uses the species independent GO structure and vocabulary together with a series of protein databases collected from various sites, to perform a detailed GO annotation by sequence similarity searches. The sensitivity and the reference protein sets can be selected by the user. GOblet runs automatically and is available as a public service on our web server. The paper also addresses the reliability of automated GO annotations by using a reference set of more than 6000 human proteins. The GOblet server is accessible at http://goblet.molgen.mpg.de.
Evaluating information content of SNPs for sample-tagging in re-sequencing projects.

PubMed

Hu, Hao; Liu, Xiang; Jin, Wenfei; Hilger Ropers, H; Wienker, Thomas F

2015-05-15

Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. The analysis shows that as low as 60 optimized SNPs can differentiate the individuals in a population as large as the present world, and only 30 optimized SNPs are in practice sufficient in labeling up to 100 thousand individuals. In the simulated populations of 100 thousand individuals, the average Hamming distances, generated by the optimized set of 30 SNPs are larger than 18, and the duality frequency, is lower than 1 in 10 thousand. This strategy of sample discrimination is proved robust in large sample size and different datasets. The optimized sets of SNPs are designed for Whole Exome Sequencing, and a program is provided for SNP selection, allowing for customized SNP numbers and interested genes. The sample-tagging plan based on this framework will improve re-sequencing projects in terms of reliability and cost-effectiveness.
Jmol-Enhanced Biochemistry Research Projects

ERIC Educational Resources Information Center

Saderholm, Matthew; Reynolds, Anthony

2011-01-01

We developed a protein research project for a one-semester biochemistry lecture class to enhance learning and more effectively train students to understand protein structure and function. During this semester-long process, students select a protein with known structure and then research its structure, sequence, and function. This project…
Program Improvement Project for Industrial Education. Annual Report.

ERIC Educational Resources Information Center

Shaeffer, Bruce W.

Designed to improve industrial education programs through the development of minimum uniform quality standards, a project developed a task list, educationally sequenced the identified tasks, and developed a recommended shop layout and equipment list for four occupational areas: diesel repair, appliance repair, office machine repair, and small…
Insights about Psychotherapy Training and Curricular Sequencing: Portal of Discovery

ERIC Educational Resources Information Center

McGowen, K. Ramsey; Miller, Merry Noel; Floyd, Michael; Miller, Barney; Coyle, Brent

2009-01-01

Objective: The authors discuss the curricular implications of a research project originally designed to evaluate the instructional strategy of using standardized patients in a psychotherapy training seminar. Methods: The original project included second-year residents enrolled in an introductory psychotherapy seminar that employed sequential…
The 1000 bull genome project

USDA-ARS?s Scientific Manuscript database

To meet growing global demands for high value protein from milk and meat, rates of genetic gain in domestic cattle must be accelerated. At the same time, animal health and welfare must be considered. The 1000 bull genomes project supports these goals by providing annotated sequence variants and ge...
A Terminal Pharmaceutics Course in Clinical Pharmacokinetics.

ERIC Educational Resources Information Center

Reuning, Richard H.; Krautheim, Daniel

1978-01-01

At Ohio State University, an undergraduate course extends the course sequence in biopharmaceutics and pharmacokinetics to application to problems in optimizing drug therapy. Course content, structure, instructional methods, and student term projects are described, and a course outline, typical projects, and some behavioral objectives are appended.…
FIRST ORDER KINETIC GAS GENERATION MODEL PARAMETERS FOR WET LANDFILLS

EPA Science Inventory

Landfill gas is produced as a result of a sequence of physical, chemical, and biological processes occurring within an anaerobic landfill. Landfill operators, energy recovery project owners, regulators, and energy users need to be able to project the volume of gas produced and re...
Basic Math Facts: Guidelines for Teaching and Learning.

ERIC Educational Resources Information Center

Thornton, Carol A.; Toohey, Margaret A.

1985-01-01

Research and curriculum development projects have investigated ways to make teaching and learning basic facts easier. Reseach results and implications from four major projects are presented. Ten specific guidelines are then given and illustrated by examples from addition. Modifying instructional sequence and matching learning tasks with learning…
The international Genome sample resource (IGSR): A worldwide collection of genome variation incorporating the 1000 Genomes Project data

PubMed Central

Clarke, Laura; Fairley, Susan; Zheng-Bradley, Xiangqun; Streeter, Ian; Perry, Emily; Lowy, Ernesto; Tassé, Anne-Marie; Flicek, Paul

2017-01-01

The International Genome Sample Resource (IGSR; http://www.internationalgenome.org) expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups. Here, we present the new data and analysis that IGSR has made available. We have also introduced a new data portal that increases discoverability of our data—previously only browseable through our FTP site—by focusing on particular samples, populations or data sets of interest. PMID:27638885
Complete genome sequence of Staphylothermus hellenicus P8T

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anderson, Iain; Wirth, Reinhard; Lucas, Susan

2011-01-01

Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phy- lum Crenarchaeota. Strain P8T is the type strain of the species and was isolated from a shal- low hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the com- plete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein- coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) La- boratory Sequencing Program (LSP) project.
Identification of Bacterial Species in Kuwaiti Waters Through DNA Sequencing

NASA Astrophysics Data System (ADS)

Chen, K.

2017-01-01

With an objective of identifying the bacterial diversity associated with ecosystem of various Kuwaiti Seas, bacteria were cultured and isolated from 3 water samples. Due to the difficulties for cultured and isolated fecal coliforms on the selective agar plates, bacterial isolates from marine agar plates were selected for molecular identification. 16S rRNA genes were successfully amplified from the genome of the selected isolates using Universal Eubacterial 16S rRNA primers. The resulted amplification products were subjected to automated DNA sequencing. Partial 16S rDNA sequences obtained were compared directly with sequences in the NCBI database using BLAST as well as with the sequences available with Ribosomal Database Project (RDP).

Sequencing intractable DNA to close microbial genomes.

PubMed

Hurt, Richard A; Brown, Steven D; Podar, Mircea; Palumbo, Anthony V; Elias, Dwayne A

2012-01-01

Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.
Pacific Elementary Science: A Case Study of Educational Planning for Small Developing Nations.

ERIC Educational Resources Information Center

Taylor, Neil; Vlaardingerbroek, Barand

2000-01-01

Evaluates Science Education in Pacific Schools (SEPS), a project addressing science-education deficiencies in 12 small Pacific Island countries. The assessment revealed inadequate, outdated, and unattractive science teaching resources in some countries; badly sequenced and duplicative curriculum projects across the region; and lack of teacher…
Biology in 'silico': The Bioinformatics Revolution.

ERIC Educational Resources Information Center

Bloom, Mark

2001-01-01

Explains the Human Genome Project (HGP) and efforts to sequence the human genome. Describes the role of bioinformatics in the project and considers it the genetics Swiss Army Knife, which has many different uses, for use in forensic science, medicine, agriculture, and environmental sciences. Discusses the use of bioinformatics in the high school…
Tech Prep Model for Marketing Education.

ERIC Educational Resources Information Center

Ruhland, Sheila K.; King, Binky M.

A project was conducted to develop two tech prep models for marketing education (ME) in Missouri to provide a sequence of courses for skill-enhanced and time-shortened programs. First, labor market trends, employment growth projections, and business and industry labor needs in Missouri were researched and analyzed. The analysis results were used…
GrameneMart: the biomart data portal for the gramene project

USDA-ARS?s Scientific Manuscript database

The Gramene project was an early adopter of the BioMart software, which remains an integral and well-used component of the Gramene web site. BioMart accessible data sets include plant gene annotations, plant variation catalogues, genetic markers, physical mapping entities, public DNA/mRNA sequences ...
Project UPSTART. Final Report, October 1, 1983-September 30, 1984.

ERIC Educational Resources Information Center

Frain, Joan

Project UPSTART, during this fourth year of outreach, offered assistance in replicating its developed Sequenced Neuro-Sensorimotor Program (SNSP) for severely multihandicapped infants, pre-schoolers, young adults and their families. Future replication sites were identified. Programs received outreach assistance in the areas of staff training,…
Comparative genomic data of the Avian Phylogenomics Project.

PubMed

Zhang, Guojie; Li, Bo; Li, Cai; Gilbert, M Thomas P; Jarvis, Erich D; Wang, Jun

2014-01-01

The evolutionary relationships of modern birds are among the most challenging to understand in systematic biology and have been debated for centuries. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders, and used the genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomics analyses (Jarvis et al. in press; Zhang et al. in press). Here we release assemblies and datasets associated with the comparative genome analyses, which include 38 newly sequenced avian genomes plus previously released or simultaneously released genomes of Chicken, Zebra finch, Turkey, Pigeon, Peregrine falcon, Duck, Budgerigar, Adelie penguin, Emperor penguin and the Medium Ground Finch. We hope that this resource will serve future efforts in phylogenomics and comparative genomics. The 38 bird genomes were sequenced using the Illumina HiSeq 2000 platform and assembled using a whole genome shotgun strategy. The 48 genomes were categorized into two groups according to the N50 scaffold size of the assemblies: a high depth group comprising 23 species sequenced at high coverage (>50X) with multiple insert size libraries resulting in N50 scaffold sizes greater than 1 Mb (except the White-throated Tinamou and Bald Eagle); and a low depth group comprising 25 species sequenced at a low coverage (~30X) with two insert size libraries resulting in an average N50 scaffold size of about 50 kb. Repetitive elements comprised 4%-22% of the bird genomes. The assembled scaffolds allowed the homology-based annotation of 13,000 ~ 17000 protein coding genes in each avian genome relative to chicken, zebra finch and human, as well as comparative and sequence conservation analyses. Here we release full genome assemblies of 38 newly sequenced avian species, link genome assembly downloads for the 7 of the remaining 10 species, and provide a guideline of genomic data that has been generated and used in our Avian Phylogenomics Project. To the best of our knowledge, the Avian Phylogenomics Project is the biggest vertebrate comparative genomics project to date. The genomic data presented here is expected to accelerate further analyses in many fields, including phylogenetics, comparative genomics, evolution, neurobiology, development biology, and other related areas.
Sequence of the tomato chloroplast DNA and evolutionary comparison of solanaceous plastid genomes.

PubMed

Kahlau, Sabine; Aspinall, Sue; Gray, John C; Bock, Ralph

2006-08-01

Tomato, Solanum lycopersicum (formerly Lycopersicon esculentum), has long been one of the classical model species of plant genetics. More recently, solanaceous species have become a model of evolutionary genomics, with several EST projects and a tomato genome project having been initiated. As a first contribution toward deciphering the genetic information of tomato, we present here the complete sequence of the tomato chloroplast genome (plastome). The size of this circular genome is 155,461 base pairs (bp), with an average AT content of 62.14%. It contains 114 genes and conserved open reading frames (ycfs). Comparison with the previously sequenced plastid DNAs of Nicotiana tabacum and Atropa belladonna reveals patterns of plastid genome evolution in the Solanaceae family and identifies varying degrees of conservation of individual plastid genes. In addition, we discovered several new sites of RNA editing by cytidine-to-uridine conversion. A detailed comparison of editing patterns in the three solanaceous species highlights the dynamics of RNA editing site evolution in chloroplasts. To assess the level of intraspecific plastome variation in tomato, the plastome of a second tomato cultivar was sequenced. Comparison of the two genotypes (IPA-6, bred in South America, and Ailsa Craig, bred in Europe) revealed no nucleotide differences, suggesting that the plastomes of modern tomato cultivars display very little, if any, sequence variation.
Captured metagenomics: large-scale targeting of genes based on ‘sequence capture’ reveals functional diversity in soils

PubMed Central

Manoharan, Lokeshwaran; Kushwaha, Sandeep K.; Hedlund, Katarina; Ahrén, Dag

2015-01-01

Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. PMID:26490729
Sma3s: A universal tool for easy functional annotation of proteomes and transcriptomes.

PubMed

Casimiro-Soriguer, Carlos S; Muñoz-Mérida, Antonio; Pérez-Pulido, Antonio J

2017-06-01

The current cheapening of next-generation sequencing has led to an enormous growth in the number of sequenced genomes and transcriptomes, allowing wet labs to get the sequences from their organisms of study. To make the most of these data, one of the first things that should be done is the functional annotation of the protein-coding genes. But it used to be a slow and tedious step that can involve the characterization of thousands of sequences. Sma3s is an accurate computational tool for annotating proteins in an unattended way. Now, we have developed a completely new version, which includes functionalities that will be of utility for fundamental and applied science. Currently, the results provide functional categories such as biological processes, which become useful for both characterizing particular sequence datasets and comparing results from different projects. But one of the most important implemented innovations is that it has now low computational requirements, and the complete annotation of a simple proteome or transcriptome usually takes around 24 hours in a personal computer. Sma3s has been tested with a large amount of complete proteomes and transcriptomes, and it has demonstrated its potential in health science and other specific projects. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Workflow and web application for annotating NCBI BioProject transcriptome data

PubMed Central

Vera Alvarez, Roberto; Medeiros Vidal, Newton; Garzón-Martínez, Gina A.; Barrero, Luz S.; Landsman, David

2017-01-01

Abstract The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. Database URL: http://www.ncbi.nlm.nih.gov/projects/physalis/ PMID:28605765
Phylogenomic analyses data of the avian phylogenomics project.

PubMed

Jarvis, Erich D; Mirarab, Siavash; Aberer, Andre J; Li, Bo; Houde, Peter; Li, Cai; Ho, Simon Y W; Faircloth, Brant C; Nabholz, Benoit; Howard, Jason T; Suh, Alexander; Weber, Claudia C; da Fonseca, Rute R; Alfaro-Núñez, Alonzo; Narula, Nitish; Liu, Liang; Burt, Dave; Ellegren, Hans; Edwards, Scott V; Stamatakis, Alexandros; Mindell, David P; Cracraft, Joel; Braun, Edward L; Warnow, Tandy; Jun, Wang; Gilbert, M Thomas Pius; Zhang, Guojie

2015-01-01

Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.
PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation

PubMed Central

Portales-Casamar, Elodie; Kirov, Stefan; Lim, Jonathan; Lithwick, Stuart; Swanson, Magdalena I; Ticoll, Amy; Snoddy, Jay; Wasserman, Wyeth W

2007-01-01

PAZAR is an open-access and open-source database of transcription factor and regulatory sequence annotation with associated web interface and programming tools for data submission and extraction. Curated boutique data collections can be maintained and disseminated through the unified schema of the mall-like PAZAR repository. The Pleiades Promoter Project collection of brain-linked regulatory sequences is introduced to demonstrate the depth of annotation possible within PAZAR. PAZAR, located at , is open for business. PMID:17916232
PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation.

PubMed

Portales-Casamar, Elodie; Kirov, Stefan; Lim, Jonathan; Lithwick, Stuart; Swanson, Magdalena I; Ticoll, Amy; Snoddy, Jay; Wasserman, Wyeth W

2007-01-01

PAZAR is an open-access and open-source database of transcription factor and regulatory sequence annotation with associated web interface and programming tools for data submission and extraction. Curated boutique data collections can be maintained and disseminated through the unified schema of the mall-like PAZAR repository. The Pleiades Promoter Project collection of brain-linked regulatory sequences is introduced to demonstrate the depth of annotation possible within PAZAR. PAZAR, located at http://www.pazar.info, is open for business.
What can we learn about lyssavirus genomes using 454 sequencing?

PubMed

Höper, Dirk; Finke, Stefan; Freuling, Conrad M; Hoffmann, Bernd; Beer, Martin

2012-01-01

The main task of the individual project number four"Whole genome sequencing, virus-host adaptation, and molecular epidemiological analyses of lyssaviruses "within the network" Lyssaviruses--a potential re-emerging public health threat" is to provide high quality complete genome sequences from lyssaviruses. These sequences are analysed in-depth with regard to the diversity of the viral populations as to both quasi-species and so-called defective interfering RNAs. Moreover, the sequence data will facilitate further epidemiological analyses, will provide insight into the evolution of lyssaviruses and will be the basis for the design of novel nucleic acid based diagnostics. The first results presented here indicate that not only high quality full-length lyssavirus genome sequences can be generated, but indeed efficient analysis of the viral population gets feasible.
Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

PubMed

Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

2004-03-01

One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.
Sequencing and characterizing the genome of Estrella lausannensis as an undergraduate project: training students and biological insights.

PubMed

Bertelli, Claire; Aeby, Sébastien; Chassot, Bérénice; Clulow, James; Hilfiker, Olivier; Rappo, Samuel; Ritzmann, Sébastien; Schumacher, Paolo; Terrettaz, Céline; Benaglio, Paola; Falquet, Laurent; Farinelli, Laurent; Gharib, Walid H; Goesmann, Alexander; Harshman, Keith; Linke, Burkhard; Miyazaki, Ryo; Rivolta, Carlo; Robinson-Rechavi, Marc; van der Meer, Jan Roelof; Greub, Gilbert

2015-01-01

With the widespread availability of high-throughput sequencing technologies, sequencing projects have become pervasive in the molecular life sciences. The huge bulk of data generated daily must be analyzed further by biologists with skills in bioinformatics and by "embedded bioinformaticians," i.e., bioinformaticians integrated in wet lab research groups. Thus, students interested in molecular life sciences must be trained in the main steps of genomics: sequencing, assembly, annotation and analysis. To reach that goal, a practical course has been set up for master students at the University of Lausanne: the "Sequence a genome" class. At the beginning of the academic year, a few bacterial species whose genome is unknown are provided to the students, who sequence and assemble the genome(s) and perform manual annotation. Here, we report the progress of the first class from September 2010 to June 2011 and the results obtained by seven master students who specifically assembled and annotated the genome of Estrella lausannensis, an obligate intracellular bacterium related to Chlamydia. The draft genome of Estrella is composed of 29 scaffolds encompassing 2,819,825 bp that encode for 2233 putative proteins. Estrella also possesses a 9136 bp plasmid that encodes for 14 genes, among which we found an integrase and a toxin/antitoxin module. Like all other members of the Chlamydiales order, Estrella possesses a highly conserved type III secretion system, considered as a key virulence factor. The annotation of the Estrella genome also allowed the characterization of the metabolic abilities of this strictly intracellular bacterium. Altogether, the students provided the scientific community with the Estrella genome sequence and a preliminary understanding of the biology of this recently-discovered bacterial genus, while learning to use cutting-edge technologies for sequencing and to perform bioinformatics analyses.
Wheat EST resources for functional genomics of abiotic stress

PubMed Central

Houde, Mario; Belcaid, Mahdi; Ouellet, François; Danyluk, Jean; Monroy, Antonio F; Dryanova, Ani; Gulick, Patrick; Bergeron, Anne; Laroche, André; Links, Matthew G; MacCarthy, Luke; Crosby, William L; Sarhan, Fathey

2006-01-01

Background Wheat is an excellent species to study freezing tolerance and other abiotic stresses. However, the sequence of the wheat genome has not been completely characterized due to its complexity and large size. To circumvent this obstacle and identify genes involved in cold acclimation and associated stresses, a large scale EST sequencing approach was undertaken by the Functional Genomics of Abiotic Stress (FGAS) project. Results We generated 73,521 quality-filtered ESTs from eleven cDNA libraries constructed from wheat plants exposed to various abiotic stresses and at different developmental stages. In addition, 196,041 ESTs for which tracefiles were available from the National Science Foundation wheat EST sequencing program and DuPont were also quality-filtered and used in the analysis. Clustering of the combined ESTs with d2_cluster and TGICL yielded a few large clusters containing several thousand ESTs that were refractory to routine clustering techniques. To resolve this problem, the sequence proximity and "bridges" were identified by an e-value distance graph to manually break clusters into smaller groups. Assembly of the resolved ESTs generated a 75,488 unique sequence set (31,580 contigs and 43,908 singletons/singlets). Digital expression analyses indicated that the FGAS dataset is enriched in stress-regulated genes compared to the other public datasets. Over 43% of the unique sequence set was annotated and classified into functional categories according to Gene Ontology. Conclusion We have annotated 29,556 different sequences, an almost 5-fold increase in annotated sequences compared to the available wheat public databases. Digital expression analysis combined with gene annotation helped in the identification of several pathways associated with abiotic stress. The genomic resources and knowledge developed by this project will contribute to a better understanding of the different mechanisms that govern stress tolerance in wheat and other cereals. PMID:16772040
Earth BioGenome Project: Sequencing life for the future of life.

PubMed

Lewin, Harris A; Robinson, Gene E; Kress, W John; Baker, William J; Coddington, Jonathan; Crandall, Keith A; Durbin, Richard; Edwards, Scott V; Forest, Félix; Gilbert, M Thomas P; Goldstein, Melissa M; Grigoriev, Igor V; Hackett, Kevin J; Haussler, David; Jarvis, Erich D; Johnson, Warren E; Patrinos, Aristides; Richards, Stephen; Castilla-Rubio, Juan Carlos; van Sluys, Marie-Anne; Soltis, Pamela S; Xu, Xun; Yang, Huanming; Zhang, Guojie

2018-04-24

Increasing our understanding of Earth's biodiversity and responsibly stewarding its resources are among the most crucial scientific and social challenges of the new millennium. These challenges require fundamental new knowledge of the organization, evolution, functions, and interactions among millions of the planet's organisms. Herein, we present a perspective on the Earth BioGenome Project (EBP), a moonshot for biology that aims to sequence, catalog, and characterize the genomes of all of Earth's eukaryotic biodiversity over a period of 10 years. The outcomes of the EBP will inform a broad range of major issues facing humanity, such as the impact of climate change on biodiversity, the conservation of endangered species and ecosystems, and the preservation and enhancement of ecosystem services. We describe hurdles that the project faces, including data-sharing policies that ensure a permanent, freely available resource for future scientific discovery while respecting access and benefit sharing guidelines of the Nagoya Protocol. We also describe scientific and organizational challenges in executing such an ambitious project, and the structure proposed to achieve the project's goals. The far-reaching potential benefits of creating an open digital repository of genomic information for life on Earth can be realized only by a coordinated international effort.
Videogrammetry Using Projected Circular Targets: Proof-of-Concept Test

NASA Technical Reports Server (NTRS)

Pappa, Richard S.; Black, Jonathan T.

2003-01-01

Videogrammetry is the science of calculating 3D object coordinates as a function of time from image sequences. It expands the method of photogrammetry to multiple time steps enabling the object to be characterized dynamically. Photogrammetry achieves the greatest accuracy with high contrast, solid-colored, circular targets. The high contrast is most often effected using retro-reflective targets attached to the measurement article. Knowledge of the location of each target allows those points to be tracked in a sequence of images, thus yielding dynamic characterization of the overall object. For ultra-lightweight and inflatable gossamer structures (e.g. solar sails, inflatable antennae, sun shields, etc.) where it may be desirable to avoid physically attaching retro-targets, a high-density grid of projected circular targets - called dot projection - is a viable alternative. Over time the object changes shape or position independently of the dots. Dynamic behavior, such as deployment or vibration, can be characterized by tracking the overall 3D shape of the object instead of tracking specific object points. To develop this method, an oscillating rigid object was measured using both retroreflective targets and dot projection. This paper details these tests, compares the results, and discusses the overall accuracy of dot projection videogrammetry.

Videogrammetry Using Projected Circular Targets: Proof-of-Concept Test

NASA Technical Reports Server (NTRS)

Black, Jonathan T.; Pappa, Richard S.

2003-01-01

Videogrammetry is the science of calculating 3D object coordinates as a function of time from image sequences. It expands the method of photogrammetry to multiple time steps enabling the object to be characterized dynamically. Photogrammetry achieves the greatest accuracy with high contrast, solid-colored circular targets. The high contrast is most often effected using retro-reflective targets attached to the measurement article. Knowledge of the location of each target allows those points to be tracked in a sequence of images, thus yielding dynamic characterization of the overall object. For ultra-lightweight and inflatable gossamer structures (e.g. solar sails, inflatable antennae, sun shields, etc.) where it may be desirable to avoid physically attaching retro-targets, a high-density grid of projected circular targets - called dot projection - is a viable alternative. Over time the object changes shape or position independently of the dots. Dynamic behavior, such as deployment or vibration, can be characterized by tracking the overall 3D shape of the object instead of tracking specific object points. To develop this method, an oscillating rigid object was measured using both retro- reflective targets and dot projection. This paper details these tests, compares the results, and discusses the overall accuracy of dot projection videogrammetry.
A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs.

PubMed

Swain, Martin T; Tsai, Isheng J; Assefa, Samual A; Newbold, Chris; Berriman, Matthew; Otto, Thomas D

2012-06-07

Genome projects now produce draft assemblies within weeks owing to advanced high-throughput sequencing technologies. For milestone projects such as Escherichia coli or Homo sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects, and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT) that is used to improve the quality of draft genomes. It offers flexible functionality to close gaps in scaffolds, correct base errors in the consensus sequence and exploit reference genomes (if available) in order to improve scaffolding and generating annotations. The protocol is most accessible for bacterial and small eukaryotic genomes (up to 300 Mb), such as pathogenic bacteria, malaria and parasitic worms. Applying PAGIT to an E. coli assembly takes ∼24 h: it doubles the average contig size and annotates over 4,300 gene models.
A Web-Based Monitoring System for Multidisciplinary Design Projects

NASA Technical Reports Server (NTRS)

Rogers, James L.; Salas, Andrea O.; Weston, Robert P.

1998-01-01

In today's competitive environment, both industry and government agencies are under pressure to reduce the time and cost of multidisciplinary design projects. New tools have been introduced to assist in this process by facilitating the integration of and communication among diverse disciplinary codes. One such tool, a framework for multidisciplinary computational environments, is defined as a hardware and software architecture that enables integration, execution, and communication among diverse disciplinary processes. An examination of current frameworks reveals weaknesses in various areas, such as sequencing, displaying, monitoring, and controlling the design process. The objective of this research is to explore how Web technology, integrated with an existing framework, can improve these areas of weakness. This paper describes a Web-based system that optimizes and controls the execution sequence of design processes; and monitors the project status and results. The three-stage evolution of the system with increasingly complex problems demonstrates the feasibility of this approach.
IPD—the Immuno Polymorphism Database

PubMed Central

Robinson, James; Halliwell, Jason A.; McWilliam, Hamish; Lopez, Rodrigo; Marsh, Steven G. E.

2013-01-01

The Immuno Polymorphism Database (IPD), http://www.ebi.ac.uk/ipd/ is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of killer-cell immunoglobulin-like receptors, IPD-MHC, a database of sequences of the major histocompatibility complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTDAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The data is currently available online from the website and FTP directory. This article describes the latest updates and additional tools added to the IPD project. PMID:23180793
A streamlined collecting and preparation protocol for DNA barcoding of Lepidoptera as part of large-scale rapid biodiversity assessment projects, exemplified by the Indonesian Biodiversity Discovery and Information System (IndoBioSys)

PubMed Central

Hausmann, Axel; Cancian de Araujo, Bruno; Sutrisno, Hari; Peggie, Djunijanti; Schmidt, Stefan

2017-01-01

Abstract Here we present a general collecting and preparation protocol for DNA barcoding of Lepidoptera as part of large-scale rapid biodiversity assessment projects, and a comparison with alternative preserving and vouchering methods. About 98% of the sequenced specimens processed using the present collecting and preparation protocol yielded sequences with more than 500 base pairs. The study is based on the first outcomes of the Indonesian Biodiversity Discovery and Information System (IndoBioSys). IndoBioSys is a German-Indonesian research project that is conducted by the Museum für Naturkunde in Berlin and the Zoologische Staatssammlung München, in close cooperation with the Research Center for Biology – Indonesian Institute of Sciences (RCB-LIPI, Bogor). PMID:29134041
Enhancing the Breadth and Efficacy of Therapeutic Vaccines for Breast Cancer

DTIC Science & Technology

2015-10-01

and get the top shared TCR sequences of CD8 T cells from the tumor, TDLN, and peripheral blood. These sequences will be used to make avatars and these... avatars will be screened against HLA- A2+ BC cell lines, Oregon’s eluted peptides, and Denver’s Baculovirus library. 9 Outline of the project
Len Gen: The international lentil genome sequencing project

USDA-ARS?s Scientific Manuscript database

We have been sequencing CDC Redberry using NGS of paired-end and mate-pair libraries over a wide range of sizes and technologies. The most recent draft (v0.7) of approximately 150x coverage produced scaffolds covering over half the genome (2.7 Gb of the expected 4.3 Gb). Long reads from PacBio sequ...
The Augusta College Humanities Program: Strengthening an Introductory Three-Course Sequence.

ERIC Educational Resources Information Center

American Association of State Colleges and Universities, Washington, DC.

Presented is a compilation of materials concerning the Augusta College Humanities Program in Augusta, Georgia, beginning with a brief description of the program and its background. In 1984, the college began a 2.5-year project to revitalize and strengthen its required sophomore level three course humanities sequence (Greece and Rome, the Middle…
Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

PubMed

Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

2017-07-01

PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.
The sequence measurement system of the IR camera

NASA Astrophysics Data System (ADS)

Geng, Ai-hui; Han, Hong-xia; Zhang, Hai-bo

2011-08-01

Currently, the IR cameras are broadly used in the optic-electronic tracking, optic-electronic measuring, fire control and optic-electronic countermeasure field, but the output sequence of the most presently applied IR cameras in the project is complex and the giving sequence documents from the leave factory are not detailed. Aiming at the requirement that the continuous image transmission and image procession system need the detailed sequence of the IR cameras, the sequence measurement system of the IR camera is designed, and the detailed sequence measurement way of the applied IR camera is carried out. The FPGA programming combined with the SignalTap online observation way has been applied in the sequence measurement system, and the precise sequence of the IR camera's output signal has been achieved, the detailed document of the IR camera has been supplied to the continuous image transmission system, image processing system and etc. The sequence measurement system of the IR camera includes CameraLink input interface part, LVDS input interface part, FPGA part, CameraLink output interface part and etc, thereinto the FPGA part is the key composed part in the sequence measurement system. Both the video signal of the CmaeraLink style and the video signal of LVDS style can be accepted by the sequence measurement system, and because the image processing card and image memory card always use the CameraLink interface as its input interface style, the output signal style of the sequence measurement system has been designed into CameraLink interface. The sequence measurement system does the IR camera's sequence measurement work and meanwhile does the interface transmission work to some cameras. Inside the FPGA of the sequence measurement system, the sequence measurement program, the pixel clock modification, the SignalTap file configuration and the SignalTap online observation has been integrated to realize the precise measurement to the IR camera. Te sequence measurement program written by the verilog language combining the SignalTap tool on line observation can count the line numbers in one frame, pixel numbers in one line and meanwhile account the line offset and row offset of the image. Aiming at the complex sequence of the IR camera's output signal, the sequence measurement system of the IR camera accurately measures the sequence of the project applied camera, supplies the detailed sequence document to the continuous system such as image processing system and image transmission system and gives out the concrete parameters of the fval, lval, pixclk, line offset and row offset. The experiment shows that the sequence measurement system of the IR camera can get the precise sequence measurement result and works stably, laying foundation for the continuous system.
Recent patents of nanopore DNA sequencing technology: progress and challenges.

PubMed

Zhou, Jianfeng; Xu, Bingqian

2010-11-01

DNA sequencing techniques witnessed fast development in the last decades, primarily driven by the Human Genome Project. Among the proposed new techniques, Nanopore was considered as a suitable candidate for the single DNA sequencing with ultrahigh speed and very low cost. Several fabrication and modification techniques have been developed to produce robust and well-defined nanopore devices. Many efforts have also been done to apply nanopore to analyze the properties of DNA molecules. By comparing with traditional sequencing techniques, nanopore has demonstrated its distinctive superiorities in main practical issues, such as sample preparation, sequencing speed, cost-effective and read-length. Although challenges still remain, recent researches in improving the capabilities of nanopore have shed a light to achieve its ultimate goal: Sequence individual DNA strand at single nucleotide level. This patent review briefly highlights recent developments and technological achievements for DNA analysis and sequencing at single molecule level, focusing on nanopore based methods.
Genome sequencing in microfabricated high-density picolitre reactors.

PubMed

Margulies, Marcel; Egholm, Michael; Altman, William E; Attiya, Said; Bader, Joel S; Bemben, Lisa A; Berka, Jan; Braverman, Michael S; Chen, Yi-Ju; Chen, Zhoutao; Dewell, Scott B; Du, Lei; Fierro, Joseph M; Gomes, Xavier V; Godwin, Brian C; He, Wen; Helgesen, Scott; Ho, Chun Heen; Ho, Chun He; Irzyk, Gerard P; Jando, Szilveszter C; Alenquer, Maria L I; Jarvie, Thomas P; Jirage, Kshama B; Kim, Jong-Bum; Knight, James R; Lanza, Janna R; Leamon, John H; Lefkowitz, Steven M; Lei, Ming; Li, Jing; Lohman, Kenton L; Lu, Hong; Makhijani, Vinod B; McDade, Keith E; McKenna, Michael P; Myers, Eugene W; Nickerson, Elizabeth; Nobile, John R; Plant, Ramona; Puc, Bernard P; Ronan, Michael T; Roth, George T; Sarkis, Gary J; Simons, Jan Fredrik; Simpson, John W; Srinivasan, Maithreyan; Tartaro, Karrie R; Tomasz, Alexander; Vogt, Kari A; Volkmer, Greg A; Wang, Shally H; Wang, Yong; Weiner, Michael P; Yu, Pengguang; Begley, Richard F; Rothberg, Jonathan M

2005-09-15

The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.
Identification of Delta5-fatty acid desaturase from the cellular slime mold dictyostelium discoideum.

PubMed

Saito, T; Ochiai, H

1999-10-01

cDNA fragments putatively encoding amino acid sequences characteristic of the fatty acid desaturase were obtained using expressed sequence tag (EST) information of the Dictyostelium cDNA project. Using this sequence, we have determined the cDNA sequence and genomic sequence of a desaturase. The cloned cDNA is 1489 nucleotides long and the deduced amino acid sequence comprised 464 amino acid residues containing an N-terminal cytochrome b5 domain. The whole sequence was 38.6% identical to the initially identified Delta5-desaturase of Mortierella alpina. We have confirmed its function as Delta5-desaturase by over expression mutation in D. discoideum and also the gain of function mutation in the yeast Saccharomyces cerevisiae. Analysis of the lipids from transformed D. discoideum and yeast demonstrated the accumulation of Delta5-desaturated products. This is the first report concering fatty acid desaturase in cellular slime molds.
Genome Sequencing and Assembly by Long Reads in Plants

PubMed Central

Li, Changsheng; Lin, Feng; An, Dong; Huang, Ruidong

2017-01-01

Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of long-read sequencing by Third Generation Sequencing (TGS) methods such as PacBio and Nanopore have shown promise in producing high-quality assemblies for complex genomes. Here, we review the development of sequencing, introducing the application as well as considerations of experimental design in TGS of plant genomes. We also introduce recent revolutionary scaffolding technologies including BioNano, Hi-C, and 10× Genomics. We expect that the informative guidance for genome sequencing and assembly by long reads will benefit the initiation of scientists’ projects. PMID:29283420
Sequencing of a new target genome: the Pediculus humanus humanus (Phthiraptera: Pediculidae) genome project.

PubMed

Pittendrigh, B R; Clark, J M; Johnston, J S; Lee, S H; Romero-Severson, J; Dasch, G A

2006-11-01

The human body louse, Pediculus humanus humanus (L.), and the human head louse, Pediculus humanus capitis, belong to the hemimetabolous order Phthiraptera. The body louse is the primary vector that transmits the bacterial agents of louse-borne relapsing fever, trench fever, and epidemic typhus. The genomes of the bacterial causative agents of several of these aforementioned diseases have been sequenced. Thus, determining the body louse genome will enhance studies of host-vector-pathogen interactions. Although not important as a major disease vector, head lice are of major social concern. Resistance to traditional pesticides used to control head and body lice have developed. It is imperative that new molecular targets be discovered for the development of novel compounds to control these insects. No complete genome sequence exists for a hemimetabolous insect species primarily because hemimetabolous insects often have large (2000 Mb) to very large (up to 16,300 Mb) genomes. Fortuitously, we determined that the human body louse has one of the smallest genome sizes known in insects, suggesting it may be a suitable choice as a minimal hemimetabolous genome in which many genes have been eliminated during its adaptation to human parasitism. Because many louse species infest birds and mammals, the body louse genome-sequencing project will facilitate studies of their comparative genomics. A 6-8X coverage of the body louse genome, plus sequenced expressed sequence tags, should provide the entomological, evolutionary biology, medical, and public health communities with useful genetic information.
First Pass Annotation of Promoters on Human Chromosome 22

PubMed Central

Scherf, Matthias; Klingenhoff, Andreas; Frech, Kornelie; Quandt, Kerstin; Schneider, Ralf; Grote, Korbinian; Frisch, Matthias; Gailus-Durner, Valérie; Seidel, Alexander; Brack-Werner, Ruth; Werner, Thomas

2001-01-01

The publication of the first almost complete sequence of a human chromosome (chromosome 22) is a major milestone in human genomics. Together with the sequence, an excellent annotation of genes was published which certainly will serve as an information resource for numerous future projects. We noted that the annotation did not cover regulatory regions; in particular, no promoter annotation has been provided. Here we present an analysis of the complete published chromosome 22 sequence for promoters. A recent breakthrough in specific in silico prediction of promoter regions enabled us to attempt large-scale prediction of promoter regions on chromosome 22. Scanning of sequence databases revealed only 20 experimentally verified promoters, of which 10 were correctly predicted by our approach. Nearly 40% of our 465 predicted promoter regions are supported by the currently available gene annotation. Promoter finding also provides a biologically meaningful method for “chromosomal scaffolding”, by which long genomic sequences can be divided into segments starting with a gene. As one example, the combination of promoter region prediction with exon/intron structure predictions greatly enhances the specificity of de novo gene finding. The present study demonstrates that it is possible to identify promoters in silico on the chromosomal level with sufficient reliability for experimental planning and indicates that a wealth of information about regulatory regions can be extracted from current large-scale (megabase) sequencing projects. Results are available on-line at http://genomatix.gsf.de/chr22/. PMID:11230158
Implementing genomic medicine in pathology.

PubMed

Williams, Eli S; Hegde, Madhuri

2013-07-01

The finished sequence of the Human Genome Project, published 50 years after Watson and Crick's seminal paper on the structure of DNA, pushed human genetics into the public eye and ushered in the genomic era. A significant, if overlooked, aspect of the race to complete the genome was the technology that propelled scientists to the finish line. DNA sequencing technologies have become more standardized, automated, and capable of higher throughput. This technology has continued to grow at an astounding rate in the decade since the Human Genome Project was completed. Today, massively parallel sequencing, or next-generation sequencing (NGS), allows the detection of genetic variants across the entire genome. This ability has led to the identification of new causes of disease and is changing the way we categorize, treat, and manage disease. NGS approaches such as whole-exome sequencing and whole-genome sequencing are rapidly becoming an affordable genetic testing strategy for the clinical laboratory. One test can now provide vast amounts of health information pertaining not only to the disease of interest, but information that may also predict adult-onset disease, reveal carrier status for a rare disease and predict drug responsiveness. The issue of what to do with these incidental findings, along with questions pertaining to NGS testing strategies, data interpretation and storage, and applying genetic testing results into patient care, remains without a clear answer. This review will explore these issues and others relevant to the implementation of NGS in the clinical laboratory.
DNA fingerprinting, DNA barcoding, and next generation sequencing technology in plants.

PubMed

Sucher, Nikolaus J; Hennell, James R; Carles, Maria C

2012-01-01

DNA fingerprinting of plants has become an invaluable tool in forensic, scientific, and industrial laboratories all over the world. PCR has become part of virtually every variation of the plethora of approaches used for DNA fingerprinting today. DNA sequencing is increasingly used either in combination with or as a replacement for traditional DNA fingerprinting techniques. A prime example is the use of short, standardized regions of the genome as taxon barcodes for biological identification of plants. Rapid advances in "next generation sequencing" (NGS) technology are driving down the cost of sequencing and bringing large-scale sequencing projects into the reach of individual investigators. We present an overview of recent publications that demonstrate the use of "NGS" technology for DNA fingerprinting and DNA barcoding applications.
Optimal digital dynamical decoupling for general decoherence via Walsh modulation

NASA Astrophysics Data System (ADS)

Qi, Haoyu; Dowling, Jonathan P.; Viola, Lorenza

2017-11-01

We provide a general framework for constructing digital dynamical decoupling sequences based on Walsh modulation—applicable to arbitrary qubit decoherence scenarios. By establishing equivalence between decoupling design based on Walsh functions and on concatenated projections, we identify a family of optimal Walsh sequences, which can be exponentially more efficient, in terms of the required total pulse number, for fixed cancellation order, than known digital sequences based on concatenated design. Optimal sequences for a given cancellation order are highly non-unique—their performance depending sensitively on the control path. We provide an analytic upper bound to the achievable decoupling error and show how sequences within the optimal Walsh family can substantially outperform concatenated decoupling in principle, while respecting realistic timing constraints.
Evolution of Pre-Main Sequence Accretion Disks

NASA Technical Reports Server (NTRS)

Hartmann, Lee W.

2000-01-01

The aim of this project was to develop a comprehensive global picture of the physical conditions in, and evolutionary timescales of, pre-main sequence accretion disks. The results of this work will help constrain the initial conditions for planet formation. To this end we: (1) Developed detailed calculations of disk structure to study physical conditions and investigate the observational effects of grain growth in T Tauri disks; (2) Studied the dusty emission and accretion rates in older disk systems, with ages closer to the expected epoch of (giant) planet formation at 3-10 Myr, and (3) Began a project to develop much larger samples of 3-10 Myr-old stars to provide better empirical constraints on protoplanetary disk evolution.

Analysis and evaluation in the production process and equipment area of the low-cost solar array project

NASA Technical Reports Server (NTRS)

Goldman, H.; Wolf, M.

1979-01-01

The energy consumed in manufacturing silicon solar cell modules was calculated for the current process, as well as for 1982 and 1986 projected processes. In addition, energy payback times for the above three sequences are shown. The module manufacturing energy was partitioned two ways. In one way, the silicon reduction, silicon purification, sheet formation, cell fabrication, and encapsulation energies were found. In addition, the facility, equipment, processing material and direct material lost-in-process energies were appropriated in junction formation processes and full module manufacturing sequences. A brief methodology accounting for the energy of silicon wafers lost-in-processing during cell manufacturing is described.
B and F Projection Methods for Nearly Incompressible Linear and Nonlinear Elasticity and Plasticity using Higher-order NURBS Elements

DTIC Science & Technology

2007-08-01

Infinite plate with a hole: sequence of meshes produced by h-refinement. The geometry of the coarsest mesh...recalled with an emphasis on k -refinement. In Section 3, the use of high-order NURBS within a projection technique is studied in the geometri - cally linear...case with a B̄ method to investigate the choice of approximation and projection spaces with NURBS.
Representing Practice: Practice Models, Patterns, Bundles

ERIC Educational Resources Information Center

Falconer, Isobel; Finlay, Janet; Fincher, Sally

2011-01-01

This article critiques learning design as a representation for sharing and developing practice, based on synthesis of three projects. Starting with the findings of the Mod4L Models of Practice project, it argues that the technical origins of learning design, and the consequent focus on structure and sequence, limit its usefulness for sharing…
Reading, Writing, and Conducting Inquiry about Science in Kindergarten

ERIC Educational Resources Information Center

Patrick, Helen; Mantzicopoulos, Panayota; Samarapungavan, Ala

2009-01-01

Over the past three years, the authors have worked with kindergarten teachers to develop study units with sequences of integrated science inquiry and literacy activities appropriate for kindergartners. Their work, which is part of the Scientific Literacy Project, has been very successful. The success of the Scientific Literacy Project (SLP) is in…
A Project-based Spiral Curriculum for Introductory Courses in ChE: Part 2. Implementation.

ERIC Educational Resources Information Center

Dixon, Anthony G.; Clark, William M.; DiBiasio, David

2000-01-01

Reports the development, delivery, and assessment of a project-based spiral curriculum for the first sequence chemical engineering courses. Technical proficiency of students under the spiral curriculum was equal to or better than that of students under a traditional curriculum. Attitudes toward chemical engineering and teamwork were better, and…
Fostering Coherence in a University-Wide Humanities Program through a Comprehensive Faculty Development Project.

ERIC Educational Resources Information Center

Moseley, Merritt; Obergfell, Sandra

The goal of this year-long project was to foster coherence throughout the humanities program, an interdisciplinary, team-taught sequence of four required undergraduate courses. The humanities program has no faculty of its own, but draws instructors from existing departments throughout the university. Growth of the program has brought…
Videotaping EST/ESP Student Projects: "Real World" Research Projects for Professional and Academic Preparation.

ERIC Educational Resources Information Center

Gallowich, Kay

Descriptive information and supporting documents for courses taught in the language center of a school of mines are presented here. The first is a four-semester engineering practices introductory course sequence that incorporates professional-level technical problem-solving, cooperative learning, and the preparation of written and oral…
NIH Health Disparities Strategic Plan, Fiscal Years 2004-2008

ERIC Educational Resources Information Center

National Human Genome Research Institute, 2008

2008-01-01

The National Human Genome Research Institute (NHGRI) led the National Institutes of Health's (NIH) contribution to the International Human Genome Project, whose primary goal was the sequencing of the human genome. This project was successfully completed in April 2003. Now, the NHGRI's mission is focused on a broad range of studies aimed at…
Navajo Area Language Arts Project (NALAP). Book 1.

ERIC Educational Resources Information Center

Eby, J. Wesley; And Others

Ten units containing 86 structural objectives make up this volume of instructional materials for the first year to year and a half of teaching English as a second language to Navajo children. The Navajo Area Language Arts Project (NALAP) materials, intended to present a sequence of English grammatical structures based on specific language and…
Short Term Objectives. (SCAT Project, Title VI-G).

ERIC Educational Resources Information Center

Archer, Anita

Developed by the staff of the SCAT (Support, Competency-Assistance and Training) Project, the document deals with the third step of the systematic instructional model--sequencing short term objectives for exceptional students. The manual focuses on reviewing long term goals established by the child study team, converting these goals into long term…
NHEXAS PHASE I ARIZONA STUDY--STANDARD OPERATING PROCEDURE FOR LABORATORY ASSISTANT TRAINING PLAN--GENERAL (UA-T-6.0)

EPA Science Inventory

The purpose of this SOP is to describe the training sequence of incoming student laboratory assistants. The procedure is designed to provide them with an overview of the project in terms of project goals, structure, and laboratory needs. This overview familiarizes the student l...
Implementing and Assessing the Converging-Diverging Model of Design in a Sequence of Sophomore Projects

ERIC Educational Resources Information Center

Dahm, Kevin; Riddell, William; Constans, Eric; Courtney, Jennifer; Harvey, Roberta; Von Lockette, Paris

2009-01-01

This paper discusses a sophomore-level course that teaches engineering design and technical writing. Historically, the course was taught using semester-long design projects. Most students' overall approach to design problems left considerable room for improvement. Many teams chose a design without investigating alternatives, and important…
From Organelle to Protein Gel: A 6-Wk Laboratory Project on Flagellar Proteins

ERIC Educational Resources Information Center

Mitchell, Beth Ferro; Graziano, Mary R.

2006-01-01

Research suggests that undergraduate students learn more from lab experiences that involve longer-term projects. We have developed a one-semester laboratory sequence aimed at sophomore-level undergraduates. In designing this curriculum, we focused on several educational objectives: 1) giving students a feel for the scientific research process, 2)…
A systematic approach to the application of Automation, Robotics, and Machine Intelligence Systems /ARAMIS/ to future space projects

NASA Technical Reports Server (NTRS)

Smith, D. B. S.

1982-01-01

The potential applications of Automation, Robotics, and Machine Intelligence Systems (ARAMIS) to space projects are investigated, through a systematic method. In this method selected space projects are broken down into space project tasks, and 69 of these tasks are selected for study. Candidate ARAMIS options are defined for each task. The relative merits of these options are evaluated according to seven indices of performance. Logical sequences of ARAMIS development are also defined. Based on this data, promising applications of ARAMIS are
Mice and Men Environmental Balance, Parts Three and Four of an Integrated Science Sequence, Teacher's Guide, 1970 Edition.

ERIC Educational Resources Information Center

Portland Project Committee, OR.

This teacher's guide contains parts three and four of the four-part first year Portland Project, a three-year secondary integrated science curriculum sequence. Part three of the guide deals with topics such as the cell, reproduction, embryology, genetics, genetic diseases, genetics and change, populations, effects of density on populations,…
CENTRAL PLATEAU REMEDIATION OPTIMIZATION STUDY

DOE Office of Scientific and Technical Information (OSTI.GOV)

BERGMAN, T. B.; STEFANSKI, L. D.; SEELEY, P. N.

2012-09-19

THE CENTRAL PLATEAU REMEDIATION OPTIMIZATION STUDY WAS CONDUCTED TO DEVELOP AN OPTIMAL SEQUENCE OF REMEDIATION ACTIVITIES IMPLEMENTING THE CERCLA DECISION ON THE CENTRAL PLATEAU. THE STUDY DEFINES A SEQUENCE OF ACTIVITIES THAT RESULT IN AN EFFECTIVE USE OF RESOURCES FROM A STRATEGIC PERSPECTIVE WHEN CONSIDERING EQUIPMENT PROCUREMENT AND STAGING, WORKFORCE MOBILIZATION/DEMOBILIZATION, WORKFORCE LEVELING, WORKFORCE SKILL-MIX, AND OTHER REMEDIATION/DISPOSITION PROJECT EXECUTION PARAMETERS.
Androgen Receptor Splice Variants and Resistance to Taxane Chemotherapy

DTIC Science & Technology

2016-10-01

sequence (MTAS) on AR. Milestone: Identify the sequence of AR that is involved in microtubule-binding. Publish 1 peer-reviewed paper . Major Task 4...joined the project and worked on the validation of the PAXgene assay. 6. Products Publications, conference papers , and presentations...Journal publications. The following paper was published: Xichun Liu, Elisa Ledet, Dongying Li, Ary Dotiwala, Allie Steinberger, Jianzhuo
Heat, Energy, and Order, Part Two of an Integrated Science Sequence, Teacher's Guide, 1970 Edition.

ERIC Educational Resources Information Center

Portland Project Committee, OR.

This teacher's guide contains part two of the four-part first year Portland Project, a three-year secondary integrated science curriculum sequence. This part involves the student with unifying principles essential for deeper understanding of the concept of energy. Confidence in the atomic nature of matter is built by relating heat in terms of…
Integrating a DNA barcoding project with an ecological survey: a case study on temperate intertidal polychaete communities in Qingdao, China

NASA Astrophysics Data System (ADS)

Zhou, Hong; Zhang, Zhinan; Chen, Haiyan; Sun, Renhua; Wang, Hui; Guo, Lei; Pan, Haijian

2010-07-01

In this study, we integrated a DNA barcoding project with an ecological survey on intertidal polychaete communities and investigated the utility of CO1 gene sequence as a DNA barcode for the classification of the intertidal polychaetes. Using 16S rDNA as a complementary marker and combining morphological and ecological characterization, some of dominant and common polychaete species from Chinese coasts were assessed for their taxonomic status. We obtained 22 haplotype gene sequences of 13 taxa, including 10 CO1 sequences and 12 16S rDNA sequences. Based on intra- and inter-specific distances, we built phylogenetic trees using the neighbor-joining method. Our study suggested that the mitochondrial CO1 gene was a valid DNA barcoding marker for species identification in polychaetes, but other genes, such as 16S rDNA, could be used as a complementary genetic marker. For more accurate species identification and effective testing of species hypothesis, DNA barcoding should be incorporated with morphological, ecological, biogeographical, and phylogenetic information. The application of DNA barcoding and molecular identification in the ecological survey on the intertidal polychaete communities demonstrated the feasibility of integrating DNA taxonomy and ecology.
Using populations of human and microbial genomes for organism detection in metagenomes

DOE PAGES

Ames, Sasha K.; Gardner, Shea N.; Marti, Jose Manuel; ...

2015-04-29

Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-freemore » human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. In conclusion, left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected.« less

Using populations of human and microbial genomes for organism detection in metagenomes.

PubMed

Ames, Sasha K; Gardner, Shea N; Marti, Jose Manuel; Slezak, Tom R; Gokhale, Maya B; Allen, Jonathan E

2015-07-01

Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-free human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. Left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected. © 2015 Ames et al.; Published by Cold Spring Harbor Laboratory Press.
Using populations of human and microbial genomes for organism detection in metagenomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ames, Sasha K.; Gardner, Shea N.; Marti, Jose Manuel

Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-freemore » human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. In conclusion, left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected.« less
The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants.

PubMed

Fadista, João; Manning, Alisa K; Florez, Jose C; Groop, Leif

2016-08-01

Genome-wide association studies (GWAS) have long relied on proposed statistical significance thresholds to be able to differentiate true positives from false positives. Although the genome-wide significance P-value threshold of 5 × 10(-8) has become a standard for common-variant GWAS, it has not been updated to cope with the lower allele frequency spectrum used in many recent array-based GWAS studies and sequencing studies. Using a whole-genome- and -exome-sequencing data set of 2875 individuals of European ancestry from the Genetics of Type 2 Diabetes (GoT2D) project and a whole-exome-sequencing data set of 13 000 individuals from five ancestries from the GoT2D and T2D-GENES (Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples) projects, we describe guidelines for genome- and exome-wide association P-value thresholds needed to correct for multiple testing, explaining the impact of linkage disequilibrium thresholds for distinguishing independent variants, minor allele frequency and ancestry characteristics. We emphasize the advantage of studying recent genetic isolate populations when performing rare and low-frequency genetic association analyses, as the multiple testing burden is diminished due to higher genetic homogeneity.
Virtual PCR

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gardner, S N; Clague, D S; Vandersall, J A

2006-02-23

The polymerase chain reaction (PCR) stands among the keystone technologies for analysis of biological sequence data. PCR is used to amplify DNA, to generate many copies from as little as a single template. This is essential, for example, in processing forensic DNA samples, pathogen detection in clinical or biothreat surveillance applications, and medical genotyping for diagnosis and treatment of disease. It is used in virtually every laboratory doing molecular, cellular, genetic, ecologic, forensic, or medical research. Despite its ubiquity, we lack the precise predictive capability that would enable detailed optimization of PCR reaction dynamics. In this LDRD, we proposed tomore » develop Virtual PCR (VPCR) software, a computational method to model the kinetic, thermodynamic, and biological processes of PCR reactions. Given a successful completion, these tools will allow us to predict both the sequences and concentrations of all species that are amplified during PCR. The ability to answer the following questions will allow us both to optimize the PCR process and interpret the PCR results: What products are amplified when sequence mixtures are present, containing multiple, closely related targets and multiplexed primers, which may hybridize with sequence mismatches? What are the effects of time, temperature, and DNA concentrations on the concentrations of products? A better understanding of these issues will improve the design and interpretation of PCR reactions. The status of the VPCR project after 1.5 years of funding is consistent with the goals of the overall project which was scoped for 3 years of funding. At half way through the projected timeline of the project we have an early beta version of the VPCR code. We have begun investigating means to improve the robustness of the code, performed preliminary experiments to test the code and begun drafting manuscripts for publication. Although an experimental protocol for testing the code was developed, the preliminary experiments were tainted by contaminated products received from the manufacturer. Much knowledge has been gained in the development of the code thus far, but without final debugging, increasing its robustness and verifying it against experimental results, the papers which we have drafted to share our findings still require the final data necessary for publication. The following sections summarize our final progress on VPCR as it stands after 1.5 years of effort on an ambitious project scoped for a 3 year period. We have additional details of the methods than are provided here, but would like to have legal protection in place before releasing them. The result of this project, a suite of programs that predict PCR products as a function of reaction conditions and sequences, will be used to address outstanding questions in pathogen detection and forensics at LLNL. VPCR should enable scientists to optimize PCR protocols in terms of time, temperature, ion concentration, and primer sequences and concentrations, and to estimate products and error rates in advance of performing experiments. Our proposed capabilities are well ahead of all currently available technologies, which do not model non-equilibrium kinetics, polymerase extension, or predict multiple or undesired PCR products. We are currently seeking DHS funding to complete the project, at which time licensing opportunities will be explored, an updated patent application will be prepared, and a publication will be submitted. A provisional and a full patent application have already been filed (1).« less
The Genomics Education Partnership: Successful Integration of Research into Laboratory Classes at a Diverse Group of Undergraduate Institutions

PubMed Central

Shaffer, Christopher D.; Alvarez, Consuelo; Bailey, Cheryl; Barnard, Daron; Bhalla, Satish; Chandrasekaran, Chitra; Chandrasekaran, Vidya; Chung, Hui-Min; Dorer, Douglas R.; Du, Chunguang; Eckdahl, Todd T.; Poet, Jeff L.; Frohlich, Donald; Goodman, Anya L.; Gosser, Yuying; Hauser, Charles; Hoopes, Laura L.M.; Johnson, Diana; Jones, Christopher J.; Kaehler, Marian; Kokan, Nighat; Kopp, Olga R.; Kuleck, Gary A.; McNeil, Gerard; Moss, Robert; Myka, Jennifer L.; Nagengast, Alexis; Morris, Robert; Overvoorde, Paul J.; Shoop, Elizabeth; Parrish, Susan; Reed, Kelynne; Regisford, E. Gloria; Revie, Dennis; Rosenwald, Anne G.; Saville, Ken; Schroeder, Stephanie; Shaw, Mary; Skuse, Gary; Smith, Christopher; Smith, Mary; Spana, Eric P.; Spratt, Mary; Stamm, Joyce; Thompson, Jeff S.; Wawersik, Matthew; Wilson, Barbara A.; Youngblom, Jim; Leung, Wilson; Buhler, Jeremy; Mardis, Elaine R.; Lopatto, David

2010-01-01

Genomics is not only essential for students to understand biology but also provides unprecedented opportunities for undergraduate research. The goal of the Genomics Education Partnership (GEP), a collaboration between a growing number of colleges and universities around the country and the Department of Biology and Genome Center of Washington University in St. Louis, is to provide such research opportunities. Using a versatile curriculum that has been adapted to many different class settings, GEP undergraduates undertake projects to bring draft-quality genomic sequence up to high quality and/or participate in the annotation of these sequences. GEP undergraduates have improved more than 2 million bases of draft genomic sequence from several species of Drosophila and have produced hundreds of gene models using evidence-based manual annotation. Students appreciate their ability to make a contribution to ongoing research, and report increased independence and a more active learning approach after participation in GEP projects. They show knowledge gains on pre- and postcourse quizzes about genes and genomes and in bioinformatic analysis. Participating faculty also report professional gains, increased access to genomics-related technology, and an overall positive experience. We have found that using a genomics research project as the core of a laboratory course is rewarding for both faculty and students. PMID:20194808
The international Genome sample resource (IGSR): A worldwide collection of genome variation incorporating the 1000 Genomes Project data.

PubMed

Clarke, Laura; Fairley, Susan; Zheng-Bradley, Xiangqun; Streeter, Ian; Perry, Emily; Lowy, Ernesto; Tassé, Anne-Marie; Flicek, Paul

2017-01-04

The International Genome Sample Resource (IGSR; http://www.internationalgenome.org) expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups. Here, we present the new data and analysis that IGSR has made available. We have also introduced a new data portal that increases discoverability of our data-previously only browseable through our FTP site-by focusing on particular samples, populations or data sets of interest. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

PubMed

Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cuff, Alison; Dana, Jose M; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Rackham, Owen J L; Smith, James; Sternberg, Michael J E; Velankar, Sameer; Yeats, Corin; Orengo, Christine

2013-01-01

Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).
MIPS: a database for genomes and protein sequences.

PubMed Central

Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D

1999-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138
PipeOnline 2.0: automated EST processing and functional data sorting.

PubMed

Ayoubi, Patricia; Jin, Xiaojing; Leite, Saul; Liu, Xianghui; Martajaja, Jeson; Abduraham, Abdurashid; Wan, Qiaolan; Yan, Wei; Misawa, Eduardo; Prade, Rolf A

2002-11-01

Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.
galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

PubMed

Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

2004-06-12

The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se
Methicillin-resistant Staphylococcus argenteus misidentified as methicillin-resistant Staphylococcus aureus emerging in western Sweden.

PubMed

Tång Hallbäck, Erika; Karami, Nahid; Adlerberth, Ingegerd; Cardew, Sofia; Ohlén, Maria; Engström Jakobsson, Hedvig; Svensson Stadler, Liselott

2018-05-17

Two strains included in a whole-genome sequencing project for methicillin-resistant Staphylococcus aureus (MRSA) were identified as non-Staphylococcus aureus when the sequences were analysed using the bioinformatics software ALEX (www.1928diagnostics.com, Gothenburg, Sweden). Sequencing of the sodA gene of these strains identified them as Staphylococcus argenteus. The collection of MRSA in western Sweden was checked for additional strains of this species. A total of 18 strains of S. argenteus isolated between 2011 and December 2017 were identified.
Remote consulting based on ultrasonic digital immages and dynamic ultrasonic sequences

NASA Astrophysics Data System (ADS)

Margan, Anamarija; Rustemović, Nadan

2006-03-01

Telematic ultrasonic diagnostics is a relatively new tool in providing health care to patients in remote, islolated communities. Our project facility, "The Virtual Polyclinic - A Specialists' Consulting Network for the Islands", is located on the island of Cres in the Adriatic Sea in Croatia and has been extending telemedical services to the archipelago population since 2000. Telemedicine applications include consulting services by specialists at the University Clinical Hospital Center Rebro in Zagreb and at "Magdalena", a leading cardiology clinic in Croatia. After several years of experience with static high resolution ultrasonic digital immages for referral consulting diagnostics purposes, we now also use dynamic ultrasonic sequences in a project with the Department of Emmergency Gastroenterology at Rebro in Zagreb. The aim of the ongoing project is to compare the advantages and shortcomings in transmitting static ultrasonic digital immages and live sequences of ultrasonic examination in telematic diagnostics. Ultrasonic examination is a dynamic process in which the diagnostic accuracy is highly dependent on the dynamic moment of an ultrasound probe and signal. Our first results indicate that in diffuse parenchymal organ pathology the progression and the follow up of a disease is better presented to a remote consulting specialist by dynamic ultrasound sequences. However, the changes that involve only one part of a parenchymal organ can be suitably presented by static ultrasonic digital images alone. Furthermore, we need less time for digital imaging and such tele-consultations overall are more economical. Our previous telemedicine research and practice proved that we can greatly improve the level of medical care in remote healthcare facilities and cut healthcare costs considerably. The experience in the ongoing project points to a conclusion that we can further optimize remote diagnostics benefits by a right choice of telematic application thus reaching a correct diagnosis and starting an applicable therapy even faster. Nevertheless, a successful implementation of such diagnostics methods may require further improvements in telemedical systems.
PRADA: pipeline for RNA sequencing data analysis.

PubMed

Torres-García, Wandaliz; Zheng, Siyuan; Sivachenko, Andrey; Vegesna, Rahulsimham; Wang, Qianghu; Yao, Rong; Berger, Michael F; Weinstein, John N; Getz, Gad; Verhaak, Roel G W

2014-08-01

Technological advances in high-throughput sequencing necessitate improved computational tools for processing and analyzing large-scale datasets in a systematic automated manner. For that purpose, we have developed PRADA (Pipeline for RNA-Sequencing Data Analysis), a flexible, modular and highly scalable software platform that provides many different types of information available by multifaceted analysis starting from raw paired-end RNA-seq data: gene expression levels, quality metrics, detection of unsupervised and supervised fusion transcripts, detection of intragenic fusion variants, homology scores and fusion frame classification. PRADA uses a dual-mapping strategy that increases sensitivity and refines the analytical endpoints. PRADA has been used extensively and successfully in the glioblastoma and renal clear cell projects of The Cancer Genome Atlas program. http://sourceforge.net/projects/prada/ gadgetz@broadinstitute.org or rverhaak@mdanderson.org Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
11,670 whole-genome sequences representative of the Han Chinese population from the CONVERGE project.

PubMed

Cai, Na; Bigdeli, Tim B; Kretzschmar, Warren W; Li, Yihan; Liang, Jieqin; Hu, Jingchu; Peterson, Roseann E; Bacanu, Silviu; Webb, Bradley Todd; Riley, Brien; Li, Qibin; Marchini, Jonathan; Mott, Richard; Kendler, Kenneth S; Flint, Jonathan

2017-02-14

The China, Oxford and Virginia Commonwealth University Experimental Research on Genetic Epidemiology (CONVERGE) project on Major Depressive Disorder (MDD) sequenced 11,670 female Han Chinese at low-coverage (1.7X), providing the first large-scale whole genome sequencing resource representative of the largest ethnic group in the world. Samples are collected from 58 hospitals from 23 provinces around China. We are able to call 22 million high quality single nucleotide polymorphisms (SNP) from the nuclear genome, representing the largest SNP call set from an East Asian population to date. We use these variants for imputation of genotypes across all samples, and this has allowed us to perform a successful genome wide association study (GWAS) on MDD. The utility of these data can be extended to studies of genetic ancestry in the Han Chinese and evolutionary genetics when integrated with data from other populations. Molecular phenotypes, such as copy number variations and structural variations can be detected, quantified and analysed in similar ways.
Rare coding variation in paraoxonase-1 is associated with ischemic stroke in the NHLBI Exome Sequencing Project[S

PubMed Central

Kim, Daniel Seung; Crosslin, David R.; Auer, Paul L.; Suzuki, Stephanie M.; Marsillach, Judit; Burt, Amber A.; Gordon, Adam S.; Meschia, James F.; Nalls, Mike A.; Worrall, Bradford B.; Longstreth, W. T.; Gottesman, Rebecca F.; Furlong, Clement E.; Peters, Ulrike; Rich, Stephen S.; Nickerson, Deborah A.; Jarvik, Gail P.

2014-01-01

HDL-associated paraoxonase-1 (PON1) is an enzyme whose activity is associated with cerebrovascular disease. Common PON1 genetic variants have not been consistently associated with cerebrovascular disease. Rare coding variation that likely alters PON1 enzyme function may be more strongly associated with stroke. The National Heart, Lung, and Blood Institute Exome Sequencing Project sequenced the coding regions (exomes) of the genome for heart, lung, and blood-related phenotypes (including ischemic stroke). In this sample of 4,204 unrelated participants, 496 had verified, noncardioembolic ischemic stroke. After filtering, 28 nonsynonymous PON1 variants were identified. Analysis with the sequence kernel association test, adjusted for covariates, identified significant associations between PON1 variants and ischemic stroke (P = 3.01 × 10−3). Stratified analyses demonstrated a stronger association of PON1 variants with ischemic stroke in African ancestry (AA) participants (P = 5.03 × 10−3). Ethnic differences in the association between PON1 variants with stroke could be due to the effects of PON1Val109Ile (overall P = 7.88 × 10−3; AA P = 6.52 × 10−4), found at higher frequency in AA participants (1.16% vs. 0.02%) and whose protein is less stable than the common allele. In summary, rare genetic variation in PON1 was associated with ischemic stroke, with stronger associations identified in those of AA. Increased focus on PON1 enzyme function and its role in cerebrovascular disease is warranted. PMID:24711634
Workflow and web application for annotating NCBI BioProject transcriptome data.

PubMed

Vera Alvarez, Roberto; Medeiros Vidal, Newton; Garzón-Martínez, Gina A; Barrero, Luz S; Landsman, David; Mariño-Ramírez, Leonardo

2017-01-01

The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. URL: http://www.ncbi.nlm.nih.gov/projects/physalis/. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.
CMD: a Cotton Microsatellite Database resource for Gossypium genomics

PubMed Central

Blenda, Anna; Scheffler, Jodi; Scheffler, Brian; Palmer, Michael; Lacape, Jean-Marc; Yu, John Z; Jesudurai, Christopher; Jung, Sook; Muthukumar, Sriram; Yellambalase, Preetham; Ficklin, Stephen; Staton, Margaret; Eshelman, Robert; Ulloa, Mauricio; Saha, Sukumar; Burr, Ben; Liu, Shaolin; Zhang, Tianzhen; Fang, Deqiu; Pepper, Alan; Kumpatla, Siva; Jacobs, John; Tomkins, Jeff; Cantrell, Roy; Main, Dorrie

2006-01-01

Background The Cotton Microsatellite Database (CMD) is a curated and integrated web-based relational database providing centralized access to publicly available cotton microsatellites, an invaluable resource for basic and applied research in cotton breeding. Description At present CMD contains publication, sequence, primer, mapping and homology data for nine major cotton microsatellite projects, collectively representing 5,484 microsatellites. In addition, CMD displays data for three of the microsatellite projects that have been screened against a panel of core germplasm. The standardized panel consists of 12 diverse genotypes including genetic standards, mapping parents, BAC donors, subgenome representatives, unique breeding lines, exotic introgression sources, and contemporary Upland cottons with significant acreage. A suite of online microsatellite data mining tools are accessible at CMD. These include an SSR server which identifies microsatellites, primers, open reading frames, and GC-content of uploaded sequences; BLAST and FASTA servers providing sequence similarity searches against the existing cotton SSR sequences and primers, a CAP3 server to assemble EST sequences into longer transcripts prior to mining for SSRs, and CMap, a viewer for comparing cotton SSR maps. Conclusion The collection of publicly available cotton SSR markers in a centralized, readily accessible and curated web-enabled database provides a more efficient utilization of microsatellite resources and will help accelerate basic and applied research in molecular breeding and genetic mapping in Gossypium spp. PMID:16737546
Structural analysis of a set of proteins resulting from a bacterial genomics project.

PubMed

Badger, J; Sauder, J M; Adams, J M; Antonysamy, S; Bain, K; Bergseid, M G; Buchanan, S G; Buchanan, M D; Batiyenko, Y; Christopher, J A; Emtage, S; Eroshkina, A; Feil, I; Furlong, E B; Gajiwala, K S; Gao, X; He, D; Hendle, J; Huber, A; Hoda, K; Kearins, P; Kissinger, C; Laubert, B; Lewis, H A; Lin, J; Loomis, K; Lorimer, D; Louie, G; Maletic, M; Marsh, C D; Miller, I; Molinari, J; Muller-Dieckmann, H J; Newman, J M; Noland, B W; Pagarigan, B; Park, F; Peat, T S; Post, K W; Radojicic, S; Ramos, A; Romero, R; Rutter, M E; Sanderson, W E; Schwinn, K D; Tresser, J; Winhoven, J; Wright, T A; Wu, L; Xu, J; Harris, T J R

2005-09-01

The targets of the Structural GenomiX (SGX) bacterial genomics project were proteins conserved in multiple prokaryotic organisms with no obvious sequence homolog in the Protein Data Bank of known structures. The outcome of this work was 80 structures, covering 60 unique sequences and 49 different genes. Experimental phase determination from proteins incorporating Se-Met was carried out for 45 structures with most of the remainder solved by molecular replacement using members of the experimentally phased set as search models. An automated tool was developed to deposit these structures in the Protein Data Bank, along with the associated X-ray diffraction data (including refined experimental phases) and experimentally confirmed sequences. BLAST comparisons of the SGX structures with structures that had appeared in the Protein Data Bank over the intervening 3.5 years since the SGX target list had been compiled identified homologs for 49 of the 60 unique sequences represented by the SGX structures. This result indicates that, for bacterial structures that are relatively easy to express, purify, and crystallize, the structural coverage of gene space is proceeding rapidly. More distant sequence-structure relationships between the SGX and PDB structures were investigated using PDB-BLAST and Combinatorial Extension (CE). Only one structure, SufD, has a truly unique topology compared to all folds in the PDB. Copyright 2005 Wiley-Liss, Inc.
Construction and analysis of a high-density genetic linkage map in cabbage (Brassica oleracea L. var. capitata)

PubMed Central

2012-01-01

Background Brassica oleracea encompass a family of vegetables and cabbage that are among the most widely cultivated crops. In 2009, the B. oleracea Genome Sequencing Project was launched using next generation sequencing technology. None of the available maps were detailed enough to anchor the sequence scaffolds for the Genome Sequencing Project. This report describes the development of a large number of SSR and SNP markers from the whole genome shotgun sequence data of B. oleracea, and the construction of a high-density genetic linkage map using a double haploid mapping population. Results The B. oleracea high-density genetic linkage map that was constructed includes 1,227 markers in nine linkage groups spanning a total of 1197.9 cM with an average of 0.98 cM between adjacent loci. There were 602 SSR markers and 625 SNP markers on the map. The chromosome with the highest number of markers (186) was C03, and the chromosome with smallest number of markers (99) was C09. Conclusions This first high-density map allowed the assembled scaffolds to be anchored to pseudochromosomes. The map also provides useful information for positional cloning, molecular breeding, and integration of information of genes and traits in B. oleracea. All the markers on the map will be transferable and could be used for the construction of other genetic maps. PMID:23033896
Synthesis of Joint Volumes, Visualization of Paths, and Revision of Viewing Sequences in a Multi-dimensional Seismic Data Viewer

NASA Astrophysics Data System (ADS)

Chen, D. M.; Clapp, R. G.; Biondi, B.

2006-12-01

Ricksep is a freely-available interactive viewer for multi-dimensional data sets. The viewer is very useful for simultaneous display of multiple data sets from different viewing angles, animation of movement along a path through the data space, and selection of local regions for data processing and information extraction. Several new viewing features are added to enhance the program's functionality in the following three aspects. First, two new data synthesis algorithms are created to adaptively combine information from a data set with mostly high-frequency content, such as seismic data, and another data set with mainly low-frequency content, such as velocity data. Using the algorithms, these two data sets can be synthesized into a single data set which resembles the high-frequency data set on a local scale and at the same time resembles the low- frequency data set on a larger scale. As a result, the originally separated high and low-frequency details can now be more accurately and conveniently studied together. Second, a projection algorithm is developed to display paths through the data space. Paths are geophysically important because they represent wells into the ground. Two difficulties often associated with tracking paths are that they normally cannot be seen clearly inside multi-dimensional spaces and depth information is lost along the direction of projection when ordinary projection techniques are used. The new algorithm projects samples along the path in three orthogonal directions and effectively restores important depth information by using variable projection parameters which are functions of the distance away from the path. Multiple paths in the data space can be generated using different character symbols as positional markers, and users can easily create, modify, and view paths in real time. Third, a viewing history list is implemented which enables Ricksep's users to create, edit and save a recipe for the sequence of viewing states. Then, the recipe can be loaded into an active Ricksep session, after which the user can navigate to any state in the sequence and modify the sequence from that state. Typical uses of this feature are undoing and redoing viewing commands and animating a sequence of viewing states. The theoretical discussion are carried out and several examples using real seismic data are provided to show how these new Ricksep features provide more convenient, accurate ways to manipulate multi-dimensional data sets.

Large Scale Analyses and Visualization of Adaptive Amino Acid Changes Projects.

PubMed

Vázquez, Noé; Vieira, Cristina P; Amorim, Bárbara S R; Torres, André; López-Fernández, Hugo; Fdez-Riverola, Florentino; Sousa, José L R; Reboiro-Jato, Miguel; Vieira, Jorge

2018-03-01

When changes at few amino acid sites are the target of selection, adaptive amino acid changes in protein sequences can be identified using maximum-likelihood methods based on models of codon substitution (such as codeml). Although such methods have been employed numerous times using a variety of different organisms, the time needed to collect the data and prepare the input files means that tens or hundreds of coding regions are usually analyzed. Nevertheless, the recent availability of flexible and easy to use computer applications that collect relevant data (such as BDBM) and infer positively selected amino acid sites (such as ADOPS), means that the entire process is easier and quicker than before. However, the lack of a batch option in ADOPS, here reported, still precludes the analysis of hundreds or thousands of sequence files. Given the interest and possibility of running such large-scale projects, we have also developed a database where ADOPS projects can be stored. Therefore, this study also presents the B+ database, which is both a data repository and a convenient interface that looks at the information contained in ADOPS projects without the need to download and unzip the corresponding ADOPS project file. The ADOPS projects available at B+ can also be downloaded, unzipped, and opened using the ADOPS graphical interface. The availability of such a database ensures results repeatability, promotes data reuse with significant savings on the time needed for preparing datasets, and effortlessly allows further exploration of the data contained in ADOPS projects.
SeqHound: biological sequence and structure database as a platform for bioinformatics research

PubMed Central

2002-01-01

Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit. PMID:12401134
[Characterization of Black and Dichothrix Cyanobacteria Based on the 16S Ribosomal RNA Gene Sequence

NASA Technical Reports Server (NTRS)

Ortega, Maya

2010-01-01

My project focuses on characterizing different cyanobacteria in thrombolitic mats found on the island of Highborn Cay, Bahamas. Thrombolites are interesting ecosystems because of the ability of bacteria in these mats to remove carbon dioxide from the atmosphere and mineralize it as calcium carbonate. In the future they may be used as models to develop carbon sequestration technologies, which could be used as part of regenerative life systems in space. These thrombolitic communities are also significant because of their similarities to early communities of life on Earth. I targeted two cyanobacteria in my research, Dichothrix spp. and whatever black is, since they are believed to be important to carbon sequestration in these thrombolitic mats. The goal of my summer research project was to molecularly identify these two cyanobacteria. DNA was isolated from each organism through mat dissections and DNA extractions. I ran Polymerase Chain Reactions (PCR) to amplify the 16S ribosomal RNA (rRNA) gene in each cyanobacteria. This specific gene is found in almost all bacteria and is highly conserved, meaning any changes in the sequence are most likely due to evolution. As a result, the 16S rRNA gene can be used for bacterial identification of different species based on the sequence of their 16S rRNA gene. Since the exact sequence of the Dichothrix gene was unknown, I designed different primers that flanked the gene based on the known sequences from other taxonomically similar cyanobacteria. Once the 16S rRNA gene was amplified, I cloned the gene into specialized Escherichia coli cells and sent the gene products for sequencing. Once the sequence is obtained, it will be added to a genetic database for future reference to and classification of other Dichothrix sp.
Plasmid Flux in Escherichia coli ST131 Sublineages, Analyzed by Plasmid Constellation Network (PLACNET), a New Method for Plasmid Reconstruction from Whole Genome Sequences

PubMed Central

Garcillán-Barcia, M. Pilar; Mora, Azucena; Blanco, Jorge; Coque, Teresa M.; de la Cruz, Fernando

2014-01-01

Bacterial whole genome sequence (WGS) methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET) that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage), comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC), comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ–proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages. PMID:25522143
Plasmid flux in Escherichia coli ST131 sublineages, analyzed by plasmid constellation network (PLACNET), a new method for plasmid reconstruction from whole genome sequences.

PubMed

Lanza, Val F; de Toro, María; Garcillán-Barcia, M Pilar; Mora, Azucena; Blanco, Jorge; Coque, Teresa M; de la Cruz, Fernando

2014-12-01

Bacterial whole genome sequence (WGS) methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET) that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage), comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC), comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ-proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages.
Complex geomorphologic assemblage of terrains in association with the banded terrain in Hellas basin, Mars

NASA Astrophysics Data System (ADS)

Diot, X.; El-Maarry, M. R.; Schlunegger, F.; Norton, K. P.; Thomas, N.; Grindrod, P. M.; Chojnacki, M.

2016-02-01

Hellas basin acts as a major sink for the southern highlands of Mars and is likely to have recorded several episodes of sedimentation and erosion. The north-western part of the basin displays a potentially unique Amazonian landscape domain in the deepest part of Hellas, called ;banded terrain;, which is a deposit characterized by an alternation of narrow band shapes and inter-bands displaying a sinuous and relatively smooth surface texture suggesting a viscous flow origin. Here we use high-resolution (HiRISE and CTX) images to assess the geomorphological interaction of the banded terrain with the surrounding geomorphologic domains in the NW interior of Hellas to gain a better understanding of the geological evolution of the region as a whole. Our analysis reveals that the banded terrain is associated with six geomorphologic domains: a central plateau named Alpheus Colles, plain deposits (P1 and P2), reticulate (RT1 and RT2) and honeycomb terrains. Based on the analysis of the geomorphology of these domains and their cross-cutting relationships, we show that no widespread deposition post-dates the formation of the banded terrain, which implies that this domain is the youngest and latest deposit of the interior of Hellas. Therefore, the level of geologic activity in the NW Hellas during the Amazonian appears to have been relatively low and restricted to modification of the landscape through mechanical weathering, aeolian and periglacial processes. Thermophysical data and cross-cutting relationships support hypotheses of modification of the honeycomb terrain via vertical rise of diapirs such as ice diapirism, and the formation of the plain deposits through deposition and remobilization of an ice-rich mantle deposit. Finally, the observed gradual transition between honeycomb and banded terrain suggests that the banded terrain may have covered a larger area of the NW interior of Hellas in the past than previously thought. This has implications on the understanding of the evolution of the deepest part of Hellas.
Epigenomics

MedlinePlus

... Sheets A Brief Guide to Genomics About NHGRI Research About the International HapMap Project Biological Pathways Chromosome Abnormalities Chromosomes Cloning Comparative Genomics DNA Microarray Technology DNA Sequencing Deoxyribonucleic Acid ( ...
Cloning

MedlinePlus

... Sheets A Brief Guide to Genomics About NHGRI Research About the International HapMap Project Biological Pathways Chromosome Abnormalities Chromosomes Cloning Comparative Genomics DNA Microarray Technology DNA Sequencing Deoxyribonucleic Acid ( ...
Chromosomes

MedlinePlus

... Sheets A Brief Guide to Genomics About NHGRI Research About the International HapMap Project Biological Pathways Chromosome Abnormalities Chromosomes Cloning Comparative Genomics DNA Microarray Technology DNA Sequencing Deoxyribonucleic Acid ( ...
Transcriptome

MedlinePlus

... Sheets A Brief Guide to Genomics About NHGRI Research About the International HapMap Project Biological Pathways Chromosome Abnormalities Chromosomes Cloning Comparative Genomics DNA Microarray Technology DNA Sequencing Deoxyribonucleic Acid ( ...
Enabling a Community to Dissect an Organism: Overview of the Neurospora Functional Genomics Project

PubMed Central

Dunlap, Jay C.; Borkovich, Katherine A.; Henn, Matthew R.; Turner, Gloria E.; Sachs, Matthew S.; Glass, N. Louise; McCluskey, Kevin; Plamann, Michael; Galagan, James E.; Birren, Bruce W.; Weiss, Richard L.; Townsend, Jeffrey P.; Loros, Jennifer J.; Nelson, Mary Anne; Lambreghts, Randy; Colot, Hildur V.; Park, Gyungsoon; Collopy, Patrick; Ringelberg, Carol; Crew, Christopher; Litvinkova, Liubov; DeCaprio, Dave; Hood, Heather M.; Curilla, Susan; Shi, Mi; Crawford, Matthew; Koerhsen, Michael; Montgomery, Phil; Larson, Lisa; Pearson, Matthew; Kasuga, Takao; Tian, Chaoguang; Baştürkmen, Meray; Altamirano, Lorena; Xu, Junhuan

2013-01-01

A consortium of investigators is engaged in a functional genomics project centered on the filamentous fungus Neurospora, with an eye to opening up the functional genomic analysis of all the filamentous fungi. The overall goal of the four interdependent projects in this effort is to acccomplish functional genomics, annotation, and expression analyses of Neurospora crassa, a filamentous fungus that is an established model for the assemblage of over 250,000 species of nonyeast fungi. Building from the completely sequenced 43-Mb Neurospora genome, Project 1 is pursuing the systematic disruption of genes through targeted gene replacements, phenotypic analysis of mutant strains, and their distribution to the scientific community at large. Project 2, through a primary focus in Annotation and Bioinformatics, has developed a platform for electronically capturing community feedback and data about the existing annotation, while building and maintaining a database to capture and display information about phenotypes. Oligonucleotide-based microarrays created in Project 3 are being used to collect baseline expression data for the nearly 11,000 distinguishable transcripts in Neurospora under various conditions of growth and development, and eventually to begin to analyze the global effects of loss of novel genes in strains created by Project 1. cDNA libraries generated in Project 4 document the overall complexity of expressed sequences in Neurospora, including alternative splicing alternative promoters and antisense transcripts. In addition, these studies have driven the assembly of an SNP map presently populated by nearly 300 markers that will greatly accelerate the positional cloning of genes. PMID:17352902
Genomics - the new rock and roll?

PubMed

Dunham, I

2000-10-01

The end of the beginning of the Human Genome Project was announced on 26 June when the working draft or first assembly was announced. Here, Ian Dunham who led the group at the Sanger Centre that produced the first complete sequence of a human chromosome reflects on how it felt to be with the genome project from the beginning.
U.S.-MEXICO BORDER PROGRAM ARIZONA BORDER STUDY--STANDARD OPERATING PROCEDURE FOR LABORATORY ASSISTANT TRAINING PLAN--GENERAL (UA-T-6.0)

EPA Science Inventory

The purpose of this SOP is to describe the training sequence of incoming student laboratory assistants. The procedure is designed to provide them with an overview of the project in terms of project goals, structure, and laboratory needs. This overview familiarizes the student l...
Description of historical crop calendar data bases developed to support foreign commodity production forecasting project experiments

NASA Technical Reports Server (NTRS)

West, W. L., III (Principal Investigator)

1981-01-01

The content, format, and storage of data bases developed for the Foreign Commodity Production Forecasting project and used to produce normal crop calendars are described. In addition, the data bases may be used for agricultural meteorology, modeling of stage sequences and planting dates, and as indicators of possible drought and famine.
Large-scale analysis of full-length cDNAs from the tomato (Solanum lycopersicum) cultivar Micro-Tom, a reference system for the Solanaceae genomics.

PubMed

Aoki, Koh; Yano, Kentaro; Suzuki, Ayako; Kawamura, Shingo; Sakurai, Nozomu; Suda, Kunihiro; Kurabayashi, Atsushi; Suzuki, Tatsuya; Tsugane, Taneaki; Watanabe, Manabu; Ooga, Kazuhide; Torii, Maiko; Narita, Takanori; Shin-I, Tadasu; Kohara, Yuji; Yamamoto, Naoki; Takahashi, Hideki; Watanabe, Yuichiro; Egusa, Mayumi; Kodama, Motoichiro; Ichinose, Yuki; Kikuchi, Mari; Fukushima, Sumire; Okabe, Akiko; Arie, Tsutomu; Sato, Yuko; Yazawa, Katsumi; Satoh, Shinobu; Omura, Toshikazu; Ezura, Hiroshi; Shibata, Daisuke

2010-03-30

The Solanaceae family includes several economically important vegetable crops. The tomato (Solanum lycopersicum) is regarded as a model plant of the Solanaceae family. Recently, a number of tomato resources have been developed in parallel with the ongoing tomato genome sequencing project. In particular, a miniature cultivar, Micro-Tom, is regarded as a model system in tomato genomics, and a number of genomics resources in the Micro-Tom-background, such as ESTs and mutagenized lines, have been established by an international alliance. To accelerate the progress in tomato genomics, we developed a collection of fully-sequenced 13,227 Micro-Tom full-length cDNAs. By checking redundant sequences, coding sequences, and chimeric sequences, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs) was generated. Analysis of untranslated regions demonstrated that tomato has longer 5'- and 3'-untranslated regions than most other plants but rice. Classification of functions of proteins predicted from the coding sequences demonstrated that nrFLcDNAs covered a broad range of functions. A comparison of nrFLcDNAs with genes of sixteen plants facilitated the identification of tomato genes that are not found in other plants, most of which did not have known protein domains. Mapping of the nrFLcDNAs onto currently available tomato genome sequences facilitated prediction of exon-intron structure. Introns of tomato genes were longer than those of Arabidopsis and rice. According to a comparison of exon sequences between the nrFLcDNAs and the tomato genome sequences, the frequency of nucleotide mismatch in exons between Micro-Tom and the genome-sequencing cultivar (Heinz 1706) was estimated to be 0.061%. The collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies. The nrFLcDNA sequences will help annotation of the tomato whole-genome sequence and aid in tomato functional genomics and molecular breeding. Full-length cDNA sequences and their annotations are provided in the database KaFTom http://www.pgb.kazusa.or.jp/kaftom/ via the website of the National Bioresource Project Tomato http://tomato.nbrp.jp.
Pyvolve: A Flexible Python Module for Simulating Sequences along Phylogenies.

PubMed

Spielman, Stephanie J; Wilke, Claus O

2015-01-01

We introduce Pyvolve, a flexible Python module for simulating genetic data along a phylogeny using continuous-time Markov models of sequence evolution. Easily incorporated into Python bioinformatics pipelines, Pyvolve can simulate sequences according to most standard models of nucleotide, amino-acid, and codon sequence evolution. All model parameters are fully customizable. Users can additionally specify custom evolutionary models, with custom rate matrices and/or states to evolve. This flexibility makes Pyvolve a convenient framework not only for simulating sequences under a wide variety of conditions, but also for developing and testing new evolutionary models. Pyvolve is an open-source project under a FreeBSD license, and it is available for download, along with a detailed user-manual and example scripts, from http://github.com/sjspielman/pyvolve.
New tool to assemble repetitive regions using next-generation sequencing data

NASA Astrophysics Data System (ADS)

Kuśmirek, Wiktor; Nowak, Robert M.; Neumann, Łukasz

2017-08-01

The next generation sequencing techniques produce a large amount of sequencing data. Some part of the genome are composed of repetitive DNA sequences, which are very problematic for the existing genome assemblers. We propose a modification of the algorithm for a DNA assembly, which uses the relative frequency of reads to properly reconstruct repetitive sequences. The new approach was implemented and tested, as a demonstration of the capability of our software we present some results for model organisms. The new implementation, using a three-layer software architecture was selected, where the presentation layer, data processing layer, and data storage layer were kept separate. Source code as well as demo application with web interface and the additional data are available at project web-page: http://dnaasm.sourceforge.net.
The Past, Present, and Future of Human Centromere Genomics

PubMed Central

Aldrup-MacDonald, Megan E.; Sullivan, Beth A.

2014-01-01

The centromere is the chromosomal locus essential for chromosome inheritance and genome stability. Human centromeres are located at repetitive alpha satellite DNA arrays that compose approximately 5% of the genome. Contiguous alpha satellite DNA sequence is absent from the assembled reference genome, limiting current understanding of centromere organization and function. Here, we review the progress in centromere genomics spanning the discovery of the sequence to its molecular characterization and the work done during the Human Genome Project era to elucidate alpha satellite structure and sequence variation. We discuss exciting recent advances in alpha satellite sequence assembly that have provided important insight into the abundance and complex organization of this sequence on human chromosomes. In light of these new findings, we offer perspectives for future studies of human centromere assembly and function. PMID:24683489
Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers

PubMed Central

Zoledziewska, Magdalena; Mulas, Antonella; Pistis, Giorgio; Steri, Maristella; Danjou, Fabrice; Kwong, Alan; Ortega del Vecchyo, Vicente Diego; Chiang, Charleston W. K.; Bragg-Gresham, Jennifer; Pitzalis, Maristella; Nagaraja, Ramaiah; Tarrier, Brendan; Brennan, Christine; Uzzau, Sergio; Fuchsberger, Christian; Atzeni, Rossano; Reinier, Frederic; Berutti, Riccardo; Huang, Jie; Timpson, Nicholas J; Toniolo, Daniela; Gasparini, Paolo; Malerba, Giovanni; Dedoussis, George; Zeggini, Eleftheria; Soranzo, Nicole; Jones, Chris; Lyons, Robert; Angius, Andrea; Kang, Hyun M.; Novembre, John; Sanna, Serena; Schlessinger, David; Cucca, Francesco; Abecasis, Gonçalo R

2015-01-01

We report ~17.6M genetic variants from whole-genome sequencing of 2,120 Sardinians; 22% are absent from prior sequencing-based compilations and enriched for predicted functional consequence. Furthermore, ~76K variants common in our sample (frequency >5%) are rare elsewhere (<0.5% in the 1000 Genomes Project). We assessed the impact of these variants on circulating lipid levels and five inflammatory biomarkers. Fourteen signals, including two major new loci, were observed for lipid levels, and 19, including two novel loci, for inflammatory markers. New associations would be missed in analyses based on 1000 Genomes data, underlining the advantages of large-scale sequencing in this founder population. PMID:26366554
Complete genome sequence of Streptosporangium roseum type strain (NI 9100T)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nolan, Matt; Sikorski, Johannes; Jando, Marlen

2010-01-01

Streptosporangium roseum Crauch 1955 is the type strain of the species which is the type species of the genus Streptosporangium. The pinkish coiled Streptomyces-like organism with a spore case was isolated from vegetable garden soil in 1955. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the family Streptosporangiaceae, and the second largest microbial genome sequence ever deciphered. The 10,369,518 bp long genome with its 9421 protein-coding and 80 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaeamore » project.« less

Leg 67: the Deep Sea Drilling Project Mid-America Trench transect off Guatemala.

USGS Publications Warehouse

von Huene, Roland E.

1980-01-01

Drilling on the Cocos plate recovered a basal chalk sequence deposited during early and mid-Miocene time, a short interval of abyssal red clay, and an upper sequence of late Miocene and younger sediment deposited within an area influenced by a terrigenous source. In the trench, a mud and sand fill less than 400,000 yr old overlies the oceanic sequence. The entire section shows no evidence of compressive deformation. In contrast, the section cored on the trench's landward slope 3 km from the trench axis is affected by tectonism. The section contains a Cretaceous to Pliocene claystone sequence capped by Pliocene to Quaternary hemipelagic slope deposits.- from Authors
Draft Genome Sequence of Pedobacter agri PB92T, Which Belongs to the Family Sphingobacteriaceae

PubMed Central

Lee, Myunglip; Roh, Seong Woon; Lee, Hae-Won; Yim, Kyung June; Kim, Kil-Nam; Bae, Jin-Woo; Choi, Kwang-Sik; Jeon, You-Jin; Jung, Won-Kyo; Kang, Heewan

2012-01-01

Strain PB92T of Pedobacter agri, which belongs to the family Sphingobacteriaceae, was isolated from soil in the Republic of Korea. The draft genome of strain PB92T contains 5,141,552 bp, with a G+C content of 38.0%. This is the third genome sequencing project of the type strains among the Pedobacter species. PMID:22740666
Motion and Energy Chemical Reactions, Parts One and Two of an Integrated Science Sequence, Teacher's Guide, 1973 Edition.

ERIC Educational Resources Information Center

Portland Project Committee, OR.

This teacher's guide is for the second year of the Portland Project, a three-year integrated secondary science curriculum sequence. The first of two parts in this volume, "Motion and Energy," begins with the study of motion, going from the quantitative description to a consideration of what causes motion and a discussion of Newton's…
Chemistry of Living Matter, Energy Capture & Growth, Parts Three & Four of an Integrated Science Sequence, Teacher's Guide, 1973 Edition.

ERIC Educational Resources Information Center

Portland Project Committee, OR.

This teacher's guide includes parts three and four of the four-part third year Portland Project, a three-year integrated secondary science curriculum sequence. The underlying intention of the third year is to study energy and its importance to life. Energy-related concepts considered in year one and two, and the concepts related to atomic…
Draft Genome Sequence of Tatumella sp. Strain UCD-D_suzukii (Phylum Proteobacteria) Isolated from Drosophila suzukii Larvae

PubMed Central

Dunitz, Madison I.; James, Pamela M.; Jospin, Guillaume; Coil, David A.; Chandler, James Angus

2014-01-01

Here we present the draft genome of Tatumella sp. strain UCD-D_suzukii, the first member of this genus to be sequenced. The genome contains 3,602,931 bp in 72 scaffolds. This strain was isolated from Drosophila suzukii larvae as part of a larger project to study the microbiota of D. suzukii. PMID:24762940
Case Study Projects for College Mathematics Courses Based on a Particular Function of Two Variables

ERIC Educational Resources Information Center

Shi, Y.

2007-01-01

Based on a sequence of number pairs, a recent paper (Mauch, E. and Shi, Y., 2005, Using a sequence of number pairs as an example in teaching mathematics, "Mathematics and Computer Education," 39(3), 198-205) presented some interesting examples that can be used in teaching high school and college mathematics classes such as algebra, geometry,…
Draft Genome Sequence of Exiguobacterium sp. Strain BMC-KP, an Environmental Isolate from Bryn Mawr, Pennsylvania.

PubMed

Hyson, Peter; Shapiro, Joshua A; Wien, Michelle W

2015-10-08

Exiguobacterium sp. strain BMC-KP was isolated as part of a student environmental sampling project at Bryn Mawr College, PA. Sequencing of bacterial DNA assembled a 3.32-Mb draft genome. Analysis suggests the presence of genes for tolerance to cold and toxic metals, broad carbohydrate metabolism, and genes derived from phage. Copyright © 2015 Hyson et al.
Scope and Sequence. Life Sciences, Physical Sciences, Earth and Space Sciences. A Summer Curriculum Development Project.

ERIC Educational Resources Information Center

Cortland-Madison Board of Cooperative Educational Services, Cortland, NY.

Presented is a booklet containing scope and sequence charts for kindergarten and grades 1 to 6 science units. Overviews and lists of major concepts for units in the life, physical, and earth/space sciences are provided in tables for each grade level. Also presented are seven complete units, one for each grade level. Following a table of contents,…
Genetic Mapping

MedlinePlus

... Sheets A Brief Guide to Genomics About NHGRI Research About the International HapMap Project Biological Pathways Chromosome Abnormalities Chromosomes Cloning Comparative Genomics DNA Microarray Technology DNA Sequencing Deoxyribonucleic Acid ( ...
Biological Pathways

MedlinePlus

... Sheets A Brief Guide to Genomics About NHGRI Research About the International HapMap Project Biological Pathways Chromosome Abnormalities Chromosomes Cloning Comparative Genomics DNA Microarray Technology DNA Sequencing Deoxyribonucleic Acid ( ...
Simultaneous mutation and copy number variation (CNV) detection by multiplex PCR-based GS-FLX sequencing.

PubMed

Goossens, Dirk; Moens, Lotte N; Nelis, Eva; Lenaerts, An-Sofie; Glassee, Wim; Kalbe, Andreas; Frey, Bruno; Kopal, Guido; De Jonghe, Peter; De Rijk, Peter; Del-Favero, Jurgen

2009-03-01

We evaluated multiplex PCR amplification as a front-end for high-throughput sequencing, to widen the applicability of massive parallel sequencers for the detailed analysis of complex genomes. Using multiplex PCR reactions, we sequenced the complete coding regions of seven genes implicated in peripheral neuropathies in 40 individuals on a GS-FLX genome sequencer (Roche). The resulting dataset showed highly specific and uniform amplification. Comparison of the GS-FLX sequencing data with the dataset generated by Sanger sequencing confirmed the detection of all variants present and proved the sensitivity of the method for mutation detection. In addition, we showed that we could exploit the multiplexed PCR amplicons to determine individual copy number variation (CNV), increasing the spectrum of detected variations to both genetic and genomic variants. We conclude that our straightforward procedure substantially expands the applicability of the massive parallel sequencers for sequencing projects of a moderate number of amplicons (50-500) with typical applications in resequencing exons in positional or functional candidate regions and molecular genetic diagnostics. 2008 Wiley-Liss, Inc.
Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT+TREMBL.

PubMed

Apweiler, R; Gateau, A; Contrino, S; Martin, M J; Junker, V; O'Donovan, C; Lang, F; Mitaritonna, N; Kappus, S; Bairoch, A

1997-01-01

SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. Ongoing genome sequencing projects have dramatically increased the number of protein sequences to be incorporated into SWISS-PROT. Since we do not want to dilute the quality standards of SWISS-PROT by incorporating sequences without proper sequence analysis and annotation, we cannot speed up the incorporation of new incoming data indefinitely. However, as we also want to make the sequences available as fast as possible, we introduced TREMBL (TRanslation of EMBL nucleotide sequence database), a supplement to SWISS-PROT. TREMBL consists of computer-annotated entries in SWISS-PROT format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except for CDS already included in SWISS-PROT. While TREMBL is already of immense value, its computer-generated annotation does not match the quality of SWISS-PROTs. The main difference is in the protein functional information attached to sequences. With this in mind, we are dedicating substantial effort to develop and apply computer methods to enhance the functional information attached to TREMBL entries.
The FlyBase database of the Drosophila genome projects and community literature

PubMed Central

2003-01-01

FlyBase (http://flybase.bio.indiana.edu/) provides an integrated view of the fundamental genomic and genetic data on the major genetic model Drosophila melanogaster and related species. FlyBase has primary responsibility for the continual reannotation of the D. melanogaster genome. The ultimate goal of the reannotation effort is to decorate the euchromatic sequence of the genome with as much biological information as is available from the community and from the major genome project centers. A complete revision of the annotations of the now-finished euchromatic genomic sequence has been completed. There are many points of entry to the genome within FlyBase, most notably through maps, gene products and ontologies, structured phenotypic and gene expression data, and anatomy. PMID:12519974
Ultrafast Pulse Sequencing for Fast Projective Measurements of Atomic Hyperfine Qubits

NASA Astrophysics Data System (ADS)

Ip, Michael; Ransford, Anthony; Campbell, Wesley

2015-05-01

Projective readout of quantum information stored in atomic hyperfine structure typically uses state-dependent CW laser-induced fluorescence. This method requires an often sophisticated imaging system to spatially filter out the background CW laser light. We present an alternative approach that instead uses simple pulse sequences from a mode-locked laser to affect the same state-dependent excitations in less than 1 ns. The resulting atomic fluorescence occurs in the dark, allowing the placement of non-imaging detectors right next to the atom to improve the qubit state detection efficiency and speed. We also discuss methods of Doppler cooling with mode-locked lasers for trapped ions, where the creation of the necessary UV light is often difficult with CW lasers.
SOBA: sequence ontology bioinformatics analysis.

PubMed

Moore, Barry; Fan, Guozhen; Eilbeck, Karen

2010-07-01

The advent of cheaper, faster sequencing technologies has pushed the task of sequence annotation from the exclusive domain of large-scale multi-national sequencing projects to that of research laboratories and small consortia. The bioinformatics burden placed on these laboratories, some with very little programming experience can be daunting. Fortunately, there exist software libraries and pipelines designed with these groups in mind, to ease the transition from an assembled genome to an annotated and accessible genome resource. We have developed the Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome. We envisage its use during annotation jamborees, genome comparison and for use by developers for rapid feedback during annotation software development and testing. SOBA also provides annotation consistency feedback to ensure correct use of terminology within annotations, and guides users to add new terms to the Sequence Ontology when required. SOBA is available at http://www.sequenceontology.org/cgi-bin/soba.cgi.
Lawrence Livermore National Laboratory- Completing the Human Genome Project and Triggering Nearly $1 Trillion in U.S. Economic Activity

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stewart, Jeffrey S.

The success of the Human Genome project is already nearing $1 Trillion dollars of U.S. economic activity. Lawrence Livermore National Laboratory (LLNL) was a co-leader in one of the biggest biological research effort in history, sequencing the Human Genome Project. This ambitious research effort set out to sequence the approximately 3 billion nucleotides in the human genome, an effort many thought was nearly impossible. Deoxyribonucleic acid (DNA) was discovered in 1869, and by 1943 came the discovery that DNA was a molecule that encodes the genetic instructions used in the development and functioning of living organisms and many viruses. Tomore » make full use of the information, scientists needed to first sequence the billions of nucleotides to begin linking them to genetic traits and illnesses, and eventually more effective treatments. New medical discoveries and improved agriculture productivity were some of the expected benefits. While the potential benefits were vast, the timeline (over a decade) and cost ($3.8 Billion) exceeded what the private sector would normally attempt, especially when this would only be the first phase toward the path to new discoveries and market opportunities. The Department of Energy believed its best research laboratories could meet this Grand Challenge and soon convinced the National Institute of Health to formally propose the Human Genome project to the federal government. The U.S. government accepted the risk and challenge to potentially create new healthcare and food discoveries that could benefit the world and the U.S. Industry.« less
An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea.

PubMed

McDonald, Daniel; Price, Morgan N; Goodrich, Julia; Nawrocki, Eric P; DeSantis, Todd Z; Probst, Alexander; Andersen, Gary L; Knight, Rob; Hugenholtz, Philip

2012-03-01

Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a 'taxonomy to tree' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408,315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/.
An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea

PubMed Central

McDonald, Daniel; Price, Morgan N; Goodrich, Julia; Nawrocki, Eric P; DeSantis, Todd Z; Probst, Alexander; Andersen, Gary L; Knight, Rob; Hugenholtz, Philip

2012-01-01

Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a ‘taxonomy to tree' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408 315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/. PMID:22134646
Neural mechanisms of sequence generation in songbirds

NASA Astrophysics Data System (ADS)

Langford, Bruce

Animal models in research are useful for studying more complex behavior. For example, motor sequence generation of actions requiring good muscle coordination such as writing with a pen, playing an instrument, or speaking, may involve the interaction of many areas in the brain, each a complex system in itself; thus it can be difficult to determine causal relationships between neural behavior and the behavior being studied. Birdsong, however, provides an excellent model behavior for motor sequence learning, memory, and generation. The song consists of learned sequences of notes that are spectrographically stereotyped over multiple renditions of the song, similar to syllables in human speech. The main areas of the songbird brain involve in singing are known, however, the mechanisms by which these systems store and produce song are not well understood. We used a custom built, head-mounted, miniature motorized microdrive to chronically record the neural firing patterns of identified neurons in HVC, a pre-motor cortical nucleus which has been shown to be important in song timing. These were done in Bengalese finch which generate a song made up of stereotyped notes but variable note sequences. We observed song related bursting in neurons projecting to Area X, a homologue to basal ganglia, and tonic firing in HVC interneurons. Interneuron had firing rate patterns that were consistent over multiple renditions of the same note sequence. We also designed and built a light-weight, low-powered wireless programmable neural stimulator using Bluetooth Low Energy Protocol. It was able to generate perturbations in the song when current pulses were administered to RA, which projects to the brainstem nucleus responsible for syringeal muscle control.
Reference-guided assembly of four diverse Arabidopsis thaliana genomes

PubMed Central

Schneeberger, Korbinian; Ossowski, Stephan; Ott, Felix; Klein, Juliane D.; Wang, Xi; Lanz, Christa; Smith, Lisa M.; Cao, Jun; Fitz, Joffrey; Warthmann, Norman; Henz, Stefan R.; Huson, Daniel H.; Weigel, Detlef

2011-01-01

We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html. PMID:21646520

Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks.

PubMed

Gerlt, John A; Bouvier, Jason T; Davidson, Daniel B; Imker, Heidi J; Sadkhin, Boris; Slater, David R; Whalen, Katie L

2015-08-01

The Enzyme Function Initiative, an NIH/NIGMS-supported Large-Scale Collaborative Project (EFI; U54GM093342; http://enzymefunction.org/), is focused on devising and disseminating bioinformatics and computational tools as well as experimental strategies for the prediction and assignment of functions (in vitro activities and in vivo physiological/metabolic roles) to uncharacterized enzymes discovered in genome projects. Protein sequence similarity networks (SSNs) are visually powerful tools for analyzing sequence relationships in protein families (H.J. Atkinson, J.H. Morris, T.E. Ferrin, and P.C. Babbitt, PLoS One 2009, 4, e4345). However, the members of the biological/biomedical community have not had access to the capability to generate SSNs for their "favorite" protein families. In this article we announce the EFI-EST (Enzyme Function Initiative-Enzyme Similarity Tool) web tool (http://efi.igb.illinois.edu/efi-est/) that is available without cost for the automated generation of SSNs by the community. The tool can create SSNs for the "closest neighbors" of a user-supplied protein sequence from the UniProt database (Option A) or of members of any user-supplied Pfam and/or InterPro family (Option B). We provide an introduction to SSNs, a description of EFI-EST, and a demonstration of the use of EFI-EST to explore sequence-function space in the OMP decarboxylase superfamily (PF00215). This article is designed as a tutorial that will allow members of the community to use the EFI-EST web tool for exploring sequence/function space in protein families. Copyright © 2015 Elsevier B.V. All rights reserved.
Genotype calling from next-generation sequencing data using haplotype information of reads

PubMed Central

Zhi, Degui; Wu, Jihua; Liu, Nianjun; Zhang, Kui

2012-01-01

Motivation: Low coverage sequencing provides an economic strategy for whole genome sequencing. When sequencing a set of individuals, genotype calling can be challenging due to low sequencing coverage. Linkage disequilibrium (LD) based refinement of genotyping calling is essential to improve the accuracy. Current LD-based methods use read counts or genotype likelihoods at individual potential polymorphic sites (PPSs). Reads that span multiple PPSs (jumping reads) can provide additional haplotype information overlooked by current methods. Results: In this article, we introduce a new Hidden Markov Model (HMM)-based method that can take into account jumping reads information across adjacent PPSs and implement it in the HapSeq program. Our method extends the HMM in Thunder and explicitly models jumping reads information as emission probabilities conditional on the states of adjacent PPSs. Our simulation results show that, compared to Thunder, HapSeq reduces the genotyping error rate by 30%, from 0.86% to 0.60%. The results from the 1000 Genomes Project show that HapSeq reduces the genotyping error rate by 12 and 9%, from 2.24% and 2.76% to 1.97% and 2.50% for individuals with European and African ancestry, respectively. We expect our program can improve genotyping qualities of the large number of ongoing and planned whole genome sequencing projects. Contact: dzhi@ms.soph.uab.edu; kzhang@ms.soph.uab.edu Availability: The software package HapSeq and its manual can be found and downloaded at www.ssg.uab.edu/hapseq/. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22285565
Clinical Sequencing Exploratory Research Consortium: Accelerating Evidence-Based Practice of Genomic Medicine.

PubMed

Green, Robert C; Goddard, Katrina A B; Jarvik, Gail P; Amendola, Laura M; Appelbaum, Paul S; Berg, Jonathan S; Bernhardt, Barbara A; Biesecker, Leslie G; Biswas, Sawona; Blout, Carrie L; Bowling, Kevin M; Brothers, Kyle B; Burke, Wylie; Caga-Anan, Charlisse F; Chinnaiyan, Arul M; Chung, Wendy K; Clayton, Ellen W; Cooper, Gregory M; East, Kelly; Evans, James P; Fullerton, Stephanie M; Garraway, Levi A; Garrett, Jeremy R; Gray, Stacy W; Henderson, Gail E; Hindorff, Lucia A; Holm, Ingrid A; Lewis, Michelle Huckaby; Hutter, Carolyn M; Janne, Pasi A; Joffe, Steven; Kaufman, David; Knoppers, Bartha M; Koenig, Barbara A; Krantz, Ian D; Manolio, Teri A; McCullough, Laurence; McEwen, Jean; McGuire, Amy; Muzny, Donna; Myers, Richard M; Nickerson, Deborah A; Ou, Jeffrey; Parsons, Donald W; Petersen, Gloria M; Plon, Sharon E; Rehm, Heidi L; Roberts, J Scott; Robinson, Dan; Salama, Joseph S; Scollon, Sarah; Sharp, Richard R; Shirts, Brian; Spinner, Nancy B; Tabor, Holly K; Tarczy-Hornoch, Peter; Veenstra, David L; Wagle, Nikhil; Weck, Karen; Wilfond, Benjamin S; Wilhelmsen, Kirk; Wolf, Susan M; Wynn, Julia; Yu, Joon-Ho

2016-06-02

Despite rapid technical progress and demonstrable effectiveness for some types of diagnosis and therapy, much remains to be learned about clinical genome and exome sequencing (CGES) and its role within the practice of medicine. The Clinical Sequencing Exploratory Research (CSER) consortium includes 18 extramural research projects, one National Human Genome Research Institute (NHGRI) intramural project, and a coordinating center funded by the NHGRI and National Cancer Institute. The consortium is exploring analytic and clinical validity and utility, as well as the ethical, legal, and social implications of sequencing via multidisciplinary approaches; it has thus far recruited 5,577 participants across a spectrum of symptomatic and healthy children and adults by utilizing both germline and cancer sequencing. The CSER consortium is analyzing data and creating publically available procedures and tools related to participant preferences and consent, variant classification, disclosure and management of primary and secondary findings, health outcomes, and integration with electronic health records. Future research directions will refine measures of clinical utility of CGES in both germline and somatic testing, evaluate the use of CGES for screening in healthy individuals, explore the penetrance of pathogenic variants through extensive phenotyping, reduce discordances in public databases of genes and variants, examine social and ethnic disparities in the provision of genomics services, explore regulatory issues, and estimate the value and downstream costs of sequencing. The CSER consortium has established a shared community of research sites by using diverse approaches to pursue the evidence-based development of best practices in genomic medicine. Copyright © 2016 American Society of Human Genetics. All rights reserved.
Two-dimensional PCA-based human gait identification

NASA Astrophysics Data System (ADS)

Chen, Jinyan; Wu, Rongteng

2012-11-01

It is very necessary to recognize person through visual surveillance automatically for public security reason. Human gait based identification focus on recognizing human by his walking video automatically using computer vision and image processing approaches. As a potential biometric measure, human gait identification has attracted more and more researchers. Current human gait identification methods can be divided into two categories: model-based methods and motion-based methods. In this paper a two-Dimensional Principal Component Analysis and temporal-space analysis based human gait identification method is proposed. Using background estimation and image subtraction we can get a binary images sequence from the surveillance video. By comparing the difference of two adjacent images in the gait images sequence, we can get a difference binary images sequence. Every binary difference image indicates the body moving mode during a person walking. We use the following steps to extract the temporal-space features from the difference binary images sequence: Projecting one difference image to Y axis or X axis we can get two vectors. Project every difference image in the difference binary images sequence to Y axis or X axis difference binary images sequence we can get two matrixes. These two matrixes indicate the styles of one walking. Then Two-Dimensional Principal Component Analysis(2DPCA) is used to transform these two matrixes to two vectors while at the same time keep the maximum separability. Finally the similarity of two human gait images is calculated by the Euclidean distance of the two vectors. The performance of our methods is illustrated using the CASIA Gait Database.
DHS Student Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

. Wynne, E K

Throughout this project I have been involved in every step of the protocol. After proper training, I was introduced to the necessary lab techniques for the project. From then on it has been my responsibility to perform the necessary tasks to identify and isolate the mutants. This includes carrying out a detailed protocol of mixing reagents, streaking and incubating plates, inoculating cultures and evaluating any results in order to guide my actions for the next antibiotic concentration level. Simultaneously, I have been running PCR and sequencing reactions on all mutants in order to obtain the genetic sequence of the genesmore » of interest for comparison. Once I have the gene sequences of interest I am able, with the aid of a sequencing program (Sequencher 4.2.2), to analyze the sequences of the mutants against that of a wild type strain. This entails aligning the DNA sequences of a given gene for each of the mutants and locating any base changes from the wild types bacteria's genes. These polymorphisms allow me to identify the QRDR for that particular gene. Depending on whether the polymorphism occurred at a low antibiotic concentration level or high concentration level, we can evaluate whether that change is necessary for low or high-level quinolone resistance. Finally, I will compare the polymorphisms of each mutant at a given antibiotic selection level and evaluate whether B. anthracis consistently acquires resistance through the same polymorphisms or whether the resistance mechanism varies with each new mutant strain. Currently, I am analyzing the sequence data for stage one mutants, while simultaneously continuing the lab work necessary to select for stage two mutants. After I have left, the personnel at the lab that I've been working with at LLNL will continue this project. By the end of this experiment, we hope to corroborate the suggested mechanisms of resistance typically employed by B. anthracis Sterne at different resistance levels. Furthermore, if the mechanism is determined by one of the following genes: gyrA, gyrB, parC, parE we will be able to pinpoint which base pair changes are necessary for acquiring a given resistance level. Hopefully from these data researchers will be better able to determine an appropriate action should quinolone resistant strains of B. anthracis arise in either by natural evolution or selection in a laboratory.« less
The Development of a Course Sequence in Real-Time Systems Design

DTIC Science & Technology

1993-08-01

project was implemented in C. 3 A group of students used the material learned in this course in their...homework assignments are used to assess the students learning U process. The term project is to be done in teams of 2 to 4 students and it starts very...assignments are used to assess the students learning I process. The term project is to be done in teams of 2 to 4 students and it starts very early in the
Applications of the 1000 Genomes Project resources

PubMed Central

Zheng-Bradley, Xiangqun

2017-01-01

Abstract The 1000 Genomes Project created a valuable, worldwide reference for human genetic variation. Common uses of the 1000 Genomes dataset include genotype imputation supporting Genome-wide Association Studies, mapping expression Quantitative Trait Loci, filtering non-pathogenic variants from exome, whole genome and cancer genome sequencing projects, and genetic analysis of population structure and molecular evolution. In this article, we will highlight some of the multiple ways that the 1000 Genomes data can be and has been utilized for genetic studies. PMID:27436001
Using the rear projection of the Socibot Desktop robot for creation of applications with facial expressions

NASA Astrophysics Data System (ADS)

Gîlcă, G.; Bîzdoacă, N. G.; Diaconu, I.

2016-08-01

This article aims to implement some practical applications using the Socibot Desktop social robot. We mean to realize three applications: creating a speech sequence using the Kiosk menu of the browser interface, creating a program in the Virtual Robot browser interface and making a new guise to be loaded into the robot's memory in order to be projected onto it face. The first application is actually created in the Compose submenu that contains 5 file categories: audio, eyes, face, head, mood, this being helpful in the creation of the projected sequence. The second application is more complex, the completed program containing: audio files, speeches (can be created in over 20 languages), head movements, the robot's facial parameters function of each action units (AUs) of the facial muscles, its expressions and its line of sight. Last application aims to change the robot's appearance with the guise created by us. The guise was created in Adobe Photoshop and then loaded into the robot's memory.
PLAN: a web platform for automating high-throughput BLAST searches and for managing and mining results.

PubMed

He, Ji; Dai, Xinbin; Zhao, Xuechun

2007-02-09

BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Personal BLAST Navigator (PLAN) is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1) query and target sequence database management, (2) automated high-throughput BLAST searching, (3) indexing and searching of results, (4) filtering results online, (5) managing results of personal interest in favorite categories, (6) automated sequence annotation (such as NCBI NR and ontology-based annotation). PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results. The PLAN web interface is platform-independent, easily configurable and capable of comprehensive expansion, and user-intuitive. PLAN is freely available to academic users at http://bioinfo.noble.org/plan/. The source code for local deployment is provided under free license. Full support on system utilization, installation, configuration and customization are provided to academic users.
PLAN: a web platform for automating high-throughput BLAST searches and for managing and mining results

PubMed Central

He, Ji; Dai, Xinbin; Zhao, Xuechun

2007-01-01

Background BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Results Personal BLAST Navigator (PLAN) is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1) query and target sequence database management, (2) automated high-throughput BLAST searching, (3) indexing and searching of results, (4) filtering results online, (5) managing results of personal interest in favorite categories, (6) automated sequence annotation (such as NCBI NR and ontology-based annotation). PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. Conclusion PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results. The PLAN web interface is platform-independent, easily configurable and capable of comprehensive expansion, and user-intuitive. PLAN is freely available to academic users at . The source code for local deployment is provided under free license. Full support on system utilization, installation, configuration and customization are provided to academic users. PMID:17291345
Space Engineering Projects in Design Methodology

NASA Technical Reports Server (NTRS)

Crawford, R.; Wood, K.; Nichols, S.; Hearn, C.; Corrier, S.; DeKunder, G.; George, S.; Hysinger, C.; Johnson, C.; Kubasta, K.

1993-01-01

NASA/USRA is an ongoing sponsor of space design projects in the senior design courses of the Mechanical Engineering Department at The University of Texas at Austin. This paper describes the UT senior design sequence, focusing on the first-semester design methodology course. The philosophical basis and pedagogical structure of this course is summarized. A history of the Department's activities in the Advanced Design Program is then presented. The paper includes a summary of the projects completed during the 1992-93 Academic Year in the methodology course, and concludes with an example of two projects completed by student design teams.
Organizing, exploring, and analyzing antibody sequence data: the case for relational-database managers.

PubMed

Owens, John

2009-01-01

Technological advances in the acquisition of DNA and protein sequence information and the resulting onrush of data can quickly overwhelm the scientist unprepared for the volume of information that must be evaluated and carefully dissected to discover its significance. Few laboratories have the luxury of dedicated personnel to organize, analyze, or consistently record a mix of arriving sequence data. A methodology based on a modern relational-database manager is presented that is both a natural storage vessel for antibody sequence information and a conduit for organizing and exploring sequence data and accompanying annotation text. The expertise necessary to implement such a plan is equal to that required by electronic word processors or spreadsheet applications. Antibody sequence projects maintained as independent databases are selectively unified by the relational-database manager into larger database families that contribute to local analyses, reports, interactive HTML pages, or exported to facilities dedicated to sophisticated sequence analysis techniques. Database files are transposable among current versions of Microsoft, Macintosh, and UNIX operating systems.
Audio-Tutorial Project: An Audio-Tutorial Approach to Human Anatomy and Physiology.

ERIC Educational Resources Information Center

Muzio, Joseph N.; And Others

A two course sequence on human anatomy and physiology using the audiotutorial method of instruction was developed for use by nursing students and other students in the health or medical fields at the Kingsborough Community College in New York. The project was motivated by the problems of often underprepared students coming to learn a new field and…
Airfoil Design in Multivariable Calculus: Tying It All Together

ERIC Educational Resources Information Center

Laverty, Rich; Povich, Timothy; Williams, Tasha

2005-01-01

Near the conclusion of their final term in the calculus sequence at The United States Military Academy, cadets are given a week long group project. At the end of the week, the project is briefed to their instructors, classmates, and superior officers. From a teaching perspective, the goal is to encapsulate as much of the course as possible in one…
Relationships Between Selected Family Variables and Maternal and Infant Behavior in a Disadvantaged Population. A Supplementary Report.

ERIC Educational Resources Information Center

Gordon, Ira J.; And Others

This pamphlet contains a series of studies that grew out of the parent education project of the Institute for Development of Human Resources. The objectives and general design of the project consisted of instruction of 200 environmentally disadvantaged mothers by parent educators using a sequence of infant stimulation exercises conducted in the…
Unpacking the Black Box of the Chicago School Readiness Project Intervention: The Mediating Roles of Teacher-Child Relationship Quality and Self-Regulation

ERIC Educational Resources Information Center

Jones, Stephanie M.; Bub, Kristen L.; Raver, C. Cybele

2013-01-01

Research Findings: This study examines the theory of change of the Chicago School Readiness Project (CSRP), testing a sequence of theory-derived mediating mechanisms that include the quality of teacher-child relationships and children's self-regulation. The CSRP is a multicomponent teacher and classroom-focused intervention, and its…
For the Love of Statistics: Appreciating and Learning to Apply Experimental Analysis and Statistics through Computer Programming Activities

ERIC Educational Resources Information Center

Mascaró, Maite; Sacristán, Ana Isabel; Rufino, Marta M.

2016-01-01

For the past 4 years, we have been involved in a project that aims to enhance the teaching and learning of experimental analysis and statistics, of environmental and biological sciences students, through computational programming activities (using R code). In this project, through an iterative design, we have developed sequences of R-code-based…
Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis.

PubMed

Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B

2013-01-01

Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.
Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis

PubMed Central

Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.

2013-01-01

Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204
Magnetic Resonance Arterial Spin Tagging for Non-Invasive Pharmacokinetic Analysis of Breast Cancer

DTIC Science & Technology

2000-10-01

sequence software that we had developed for this project. In addition, we revised the pulse sequences to utilize the high performance gradients (40 mT/ m ...peak, 150 mT/ m /ms rise) of the system. We believe these revised sequences will provide better arterial spin tagged data for perfusion measurement. All...U.... ...... ... -- v p I _1 i-:F~ ----- ! - .Ag Jig. H aI .. M e fI6lo 3 ~ ~ 2 0’,~- A.11. I 1 1 9 - HP ~ ~ IM I 15 L 1 1 8 = NIAt I C J1 5

Complete genome sequence of Serratia plymuthica strain AS12

DOE Office of Scientific and Technical Information (OSTI.GOV)

Neupane, Saraswoti; Finlay, Roger D.; Alstrom, Sadhna

2012-01-01

A plant associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest due to its plant growth promoting and plant pathogen inhibiting ability. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled 'Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens'.
Malaria Genome Sequencing Project.

DTIC Science & Technology

2000-01-01

and the genomes of organisms that cause diseases such as syphylis (Treponema pallidum), ul- cers (Helicobacter pylori), Lyme disease ( Borrelia ...Parasitol Today 11: 1-4. Fräser CM, Casjens S, et al. (1997). Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 390: 580...of false-posi- tives. It has been used as the gene finder for Borrelia burgdorferi (Fräser et al, 1997), Treponema pallidum (Fräser et al., 1998
Unified Engineering Software System

NASA Technical Reports Server (NTRS)

Purves, L. R.; Gordon, S.; Peltzman, A.; Dube, M.

1989-01-01

Collection of computer programs performs diverse functions in prototype engineering. NEXUS, NASA Engineering Extendible Unified Software system, is research set of computer programs designed to support full sequence of activities encountered in NASA engineering projects. Sequence spans preliminary design, design analysis, detailed design, manufacturing, assembly, and testing. Primarily addresses process of prototype engineering, task of getting single or small number of copies of product to work. Written in FORTRAN 77 and PROLOG.
The proteome: structure, function and evolution

PubMed Central

Fleming, Keiran; Kelley, Lawrence A; Islam, Suhail A; MacCallum, Robert M; Muller, Arne; Pazos, Florencio; Sternberg, Michael J.E

2006-01-01

This paper reports two studies to model the inter-relationships between protein sequence, structure and function. First, an automated pipeline to provide a structural annotation of proteomes in the major genomes is described. The results are stored in a database at Imperial College, London (3D-GENOMICS) that can be accessed at www.sbg.bio.ic.ac.uk. Analysis of the assignments to structural superfamilies provides evolutionary insights. 3D-GENOMICS is being integrated with related proteome annotation data at University College London and the European Bioinformatics Institute in a project known as e-protein (http://www.e-protein.org/). The second topic is motivated by the developments in structural genomics projects in which the structure of a protein is determined prior to knowledge of its function. We have developed a new approach PHUNCTIONER that uses the gene ontology (GO) classification to supervise the extraction of the sequence signal responsible for protein function from a structure-based sequence alignment. Using GO we can obtain profiles for a range of specificities described in the ontology. In the region of low sequence similarity (around 15%), our method is more accurate than assignment from the closest structural homologue. The method is also able to identify the specific residues associated with the function of the protein family. PMID:16524832
HUBBLE TARANTULA TREASURY PROJECT. V. THE STAR CLUSTER HODGE 301: THE OLD FACE OF 30 DORADUS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cignoni, M.; Sabbi, E.; Marel, R. P. van der

Based on color–magnitude diagrams (CMDs) from the Hubble Space Telescope Hubble Tarantula Treasury Project (HTTP) survey, we present the star formation history of Hodge 301, the oldest star cluster in the Tarantula Nebula. The HTTP photometry extends faint enough to reach, for the first time, the cluster pre-main sequence (PMS) turn-on, where the PMS joins the main sequence. Using the location of this feature, along with synthetic CMDs generated with the latest PARSEC models, we find that Hodge 301 is older than previously thought, with an age between 26.5 and 31.5 Myr. From this age, we also estimate that between 38 andmore » 61 Type II supernovae exploded in the region. The same age is derived from the main sequence turn-off, whereas the age derived from the post-main sequence stars is younger and between 20 and 25 Myr. Other relevant parameters are a total stellar mass of ≈8800 ± 800 M {sub ⊙} and average reddening E ( B − V ) ≈ 0.22–0.24 mag, with a differential reddening δE ( B − V ) ≈ 0.04 mag.« less
Hubble Tarantula Treasury Project V. The Star Cluster Hodge 301: The Old Face of 30 Doradus

NASA Astrophysics Data System (ADS)

Cignoni, M.; Sabbi, E.; van der Marel, R. P.; Lennon, D. J.; Tosi, M.; Grebel, E. K.; Gallagher, J. S., III; Aloisi, A.; de Marchi, G.; Gouliermis, D. A.; Larsen, S.; Panagia, N.; Smith, L. J.

2016-12-01

Based on color-magnitude diagrams (CMDs) from the Hubble Space Telescope Hubble Tarantula Treasury Project (HTTP) survey, we present the star formation history of Hodge 301, the oldest star cluster in the Tarantula Nebula. The HTTP photometry extends faint enough to reach, for the first time, the cluster pre-main sequence (PMS) turn-on, where the PMS joins the main sequence. Using the location of this feature, along with synthetic CMDs generated with the latest PARSEC models, we find that Hodge 301 is older than previously thought, with an age between 26.5 and 31.5 Myr. From this age, we also estimate that between 38 and 61 Type II supernovae exploded in the region. The same age is derived from the main sequence turn-off, whereas the age derived from the post-main sequence stars is younger and between 20 and 25 Myr. Other relevant parameters are a total stellar mass of ≈8800 ± 800 M ⊙ and average reddening E(B - V) ≈ 0.22-0.24 mag, with a differential reddening δE(B - V) ≈ 0.04 mag. Based on observations with the NASA/ESA Hubble Space Telescope, obtained at the Space Telescope Science Institute, which is operated by AURA Inc., under NASA contract NAS 5-26555.
EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries

PubMed Central

Smith, Robin P; Buchser, William J; Lemmon, Marcus B; Pardinas, Jose R; Bixby, John L; Lemmon, Vance P

2008-01-01

Background Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds. Results We have developed "EST Express", a suite of analytical tools that identify and annotate ESTs originating from specific mRNA populations. The software consists of a user-friendly GUI powered by PHP and MySQL that allows for online collaboration between researchers and continuity with UniGene, Entrez Gene and RefSeq. Two key features of the software include a novel, simplified Entrez Gene parser and tools to manage cDNA library sequencing projects. We have tested the software on a large data set (2,016 samples) produced by subtractive hybridization. Conclusion EST Express is an open-source, cross-platform web server application that imports sequences from cDNA libraries, such as those generated through subtractive hybridization or yeast two-hybrid screens. It then provides several layers of annotation based on Entrez Gene and RefSeq to allow the user to highlight useful genes and manage cDNA library projects. PMID:18402700
EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries.

PubMed

Smith, Robin P; Buchser, William J; Lemmon, Marcus B; Pardinas, Jose R; Bixby, John L; Lemmon, Vance P

2008-04-10

Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds. We have developed "EST Express", a suite of analytical tools that identify and annotate ESTs originating from specific mRNA populations. The software consists of a user-friendly GUI powered by PHP and MySQL that allows for online collaboration between researchers and continuity with UniGene, Entrez Gene and RefSeq. Two key features of the software include a novel, simplified Entrez Gene parser and tools to manage cDNA library sequencing projects. We have tested the software on a large data set (2,016 samples) produced by subtractive hybridization. EST Express is an open-source, cross-platform web server application that imports sequences from cDNA libraries, such as those generated through subtractive hybridization or yeast two-hybrid screens. It then provides several layers of annotation based on Entrez Gene and RefSeq to allow the user to highlight useful genes and manage cDNA library projects.
MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads.

PubMed

Tyakht, Alexander V; Popenko, Anna S; Belenikin, Maxim S; Altukhov, Ilya A; Pavlenko, Alexander V; Kostryukova, Elena S; Selezneva, Oksana V; Larin, Andrei K; Karpova, Irina Y; Alexeev, Dmitry G

2012-12-07

MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors' knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded from http://hmpdacc.org). MALINA is made freely available on the web at http://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.
Evaluating the efficacy of a structure-derived amino acid substitution matrix in detecting protein homologs by BLAST and PSI-BLAST.

PubMed

Goonesekere, Nalin Cw

2009-01-01

The large numbers of protein sequences generated by whole genome sequencing projects require rapid and accurate methods of annotation. The detection of homology through computational sequence analysis is a powerful tool in determining the complex evolutionary and functional relationships that exist between proteins. Homology search algorithms employ amino acid substitution matrices to detect similarity between proteins sequences. The substitution matrices in common use today are constructed using sequences aligned without reference to protein structure. Here we present amino acid substitution matrices constructed from the alignment of a large number of protein domain structures from the structural classification of proteins (SCOP) database. We show that when incorporated into the homology search algorithms BLAST and PSI-blast, the structure-based substitution matrices enhance the efficacy of detecting remote homologs.
Technical Report on Modeling for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

McLoughlin, K.

2016-01-11

The overall aim of this project is to develop a software package, called MetaQuant, that can determine the constituents of a complex microbial sample and estimate their relative abundances by analysis of metagenomic sequencing data. The goal for Task 1 is to create a generative model describing the stochastic process underlying the creation of sequence read pairs in the data set. The stages in this generative process include the selection of a source genome sequence for each read pair, with probability dependent on its abundance in the sample. The other stages describe the evolution of the source genome from itsmore » nearest common ancestor with a reference genome, breakage of the source DNA into short fragments, and the errors in sequencing the ends of the fragments to produce read pairs.« less
Sequencing of GJB2 in Cameroonians and Black South Africans and comparison to 1000 Genomes Project Data Support Need to Revise Strategy for Discovery of Nonsyndromic Deafness Genes in Africans.

PubMed

Bosch, Jason; Noubiap, Jean Jacques N; Dandara, Collet; Makubalo, Nomlindo; Wright, Galen; Entfellner, Jean-Baka Domelevo; Tiffin, Nicki; Wonkam, Ambroise

2014-11-01

Mutations in the GJB2 gene, encoding connexin 26, could account for 50% of congenital, nonsyndromic, recessive deafness cases in some Caucasian/Asian populations. There is a scarcity of published data in sub-Saharan Africans. We Sanger sequenced the coding region of the GJB2 gene in 205 Cameroonian and Xhosa South Africans with congenital, nonsyndromic deafness; and performed bioinformatic analysis of variations in the GJB2 gene, incorporating data from the 1000 Genomes Project. Amongst Cameroonian patients, 26.1% were familial. The majority of patients (70%) suffered from sensorineural hearing loss. Ten GJB2 genetic variants were detected by sequencing. A previously reported pathogenic mutation, g.3741_3743delTTC (p.F142del), and a putative pathogenic mutation, g.3816G>A (p.V167M), were identified in single heterozygous samples. Amongst eight the remaining variants, two novel variants, g.3318-41G>A and g.3332G>A, were reported. There were no statistically significant differences in allele frequencies between cases and controls. Principal Components Analyses differentiated between Africans, Asians, and Europeans, but only explained 40% of the variation. The present study is the first to compare African GJB2 sequences with the data from the 1000 Genomes Project and have revealed the low variation between population groups. This finding has emphasized the hypothesis that the prevalence of mutations in GJB2 in nonsyndromic deafness amongst European and Asian populations is due to founder effects arising after these individuals migrated out of Africa, and not to a putative "protective" variant in the genomic structure of GJB2 in Africans. Our results confirm that mutations in GJB2 are not associated with nonsyndromic deafness in Africans.
HLA Diversity in the 1000 Genomes Dataset

PubMed Central

Gourraud, Pierre-Antoine; Khankhanian, Pouya; Cereb, Nezih; Yang, Soo Young; Feolo, Michael; Maiers, Martin; D. Rioux, John; Hauser, Stephen; Oksenberg, Jorge

2014-01-01

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation by sequencing at a level that should allow the genome-wide detection of most variants with frequencies as low as 1%. However, in the major histocompatibility complex (MHC), only the top 10 most frequent haplotypes are in the 1% frequency range whereas thousands of haplotypes are present at lower frequencies. Given the limitation of both the coverage and the read length of the sequences generated by the 1000 Genomes Project, the highly variable positions that define HLA alleles may be difficult to identify. We used classical Sanger sequencing techniques to type the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 genes in the available 1000 Genomes samples and combined the results with the 103,310 variants in the MHC region genotyped by the 1000 Genomes Project. Using pairwise identity-by-descent distances between individuals and principal component analysis, we established the relationship between ancestry and genetic diversity in the MHC region. As expected, both the MHC variants and the HLA phenotype can identify the major ancestry lineage, informed mainly by the most frequent HLA haplotypes. To some extent, regions of the genome with similar genetic or similar recombination rate have similar properties. An MHC-centric analysis underlines departures between the ancestral background of the MHC and the genome-wide picture. Our analysis of linkage disequilibrium (LD) decay in these samples suggests that overestimation of pairwise LD occurs due to a limited sampling of the MHC diversity. This collection of HLA-specific MHC variants, available on the dbMHC portal, is a valuable resource for future analyses of the role of MHC in population and disease studies. PMID:24988075
HLA diversity in the 1000 genomes dataset.

PubMed

Gourraud, Pierre-Antoine; Khankhanian, Pouya; Cereb, Nezih; Yang, Soo Young; Feolo, Michael; Maiers, Martin; Rioux, John D; Hauser, Stephen; Oksenberg, Jorge

2014-01-01

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation by sequencing at a level that should allow the genome-wide detection of most variants with frequencies as low as 1%. However, in the major histocompatibility complex (MHC), only the top 10 most frequent haplotypes are in the 1% frequency range whereas thousands of haplotypes are present at lower frequencies. Given the limitation of both the coverage and the read length of the sequences generated by the 1000 Genomes Project, the highly variable positions that define HLA alleles may be difficult to identify. We used classical Sanger sequencing techniques to type the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 genes in the available 1000 Genomes samples and combined the results with the 103,310 variants in the MHC region genotyped by the 1000 Genomes Project. Using pairwise identity-by-descent distances between individuals and principal component analysis, we established the relationship between ancestry and genetic diversity in the MHC region. As expected, both the MHC variants and the HLA phenotype can identify the major ancestry lineage, informed mainly by the most frequent HLA haplotypes. To some extent, regions of the genome with similar genetic or similar recombination rate have similar properties. An MHC-centric analysis underlines departures between the ancestral background of the MHC and the genome-wide picture. Our analysis of linkage disequilibrium (LD) decay in these samples suggests that overestimation of pairwise LD occurs due to a limited sampling of the MHC diversity. This collection of HLA-specific MHC variants, available on the dbMHC portal, is a valuable resource for future analyses of the role of MHC in population and disease studies.
Next Generation Sequencing Technologies: The Doorway to the Unexplored Genomics of Non-Model Plants

PubMed Central

Unamba, Chibuikem I. N.; Nag, Akshay; Sharma, Ram K.

2015-01-01

Non-model plants i.e., the species which have one or all of the characters such as long life cycle, difficulty to grow in the laboratory or poor fecundity, have been schemed out of sequencing projects earlier, due to high running cost of Sanger sequencing. Consequently, the information about their genomics and key biological processes are inadequate. However, the advent of fast and cost effective next generation sequencing (NGS) platforms in the recent past has enabled the unearthing of certain characteristic gene structures unique to these species. It has also aided in gaining insight about mechanisms underlying processes of gene expression and secondary metabolism as well as facilitated development of genomic resources for diversity characterization, evolutionary analysis and marker assisted breeding even without prior availability of genomic sequence information. In this review we explore how different Next Gen Sequencing platforms, as well as recent advances in NGS based high throughput genotyping technologies are rewarding efforts on de-novo whole genome/transcriptome sequencing, development of genome wide sequence based markers resources for improvement of non-model crops that are less costly than phenotyping. PMID:26734016
Mutation detection using automated fluorescence-based sequencing.

PubMed

Montgomery, Kate T; Iartchouck, Oleg; Li, Li; Perera, Anoja; Yassin, Yosuf; Tamburino, Alex; Loomis, Stephanie; Kucherlapati, Raju

2008-04-01

The development of high-throughput DNA sequencing techniques has made direct DNA sequencing of PCR-amplified genomic DNA a rapid and economical approach to the identification of polymorphisms that may play a role in disease. Point mutations as well as small insertions or deletions are readily identified by DNA sequencing. The mutations may be heterozygous (occurring in one allele while the other allele retains the normal sequence) or homozygous (occurring in both alleles). Sequencing alone cannot discriminate between true homozygosity and apparent homozygosity due to the loss of one allele due to a large deletion. In this unit, strategies are presented for using PCR amplification and automated fluorescence-based sequencing to identify sequence variation. The size of the project and laboratory preference and experience will dictate how the data is managed and which software tools are used for analysis. A high-throughput protocol is given that has been used to search for mutations in over 200 different genes at the Harvard Medical School - Partners Center for Genetics and Genomics (HPCGG, http://www.hpcgg.org/). Copyright 2008 by John Wiley & Sons, Inc.
Genomic sequencing: assessing the health care system, policy, and big-data implications.

PubMed

Phillips, Kathryn A; Trosman, Julia R; Kelley, Robin K; Pletcher, Mark J; Douglas, Michael P; Weldon, Christine B

2014-07-01

New genomic sequencing technologies enable the high-speed analysis of multiple genes simultaneously, including all of those in a person's genome. Sequencing is a prominent example of a "big data" technology because of the massive amount of information it produces and its complexity, diversity, and timeliness. Our objective in this article is to provide a policy primer on sequencing and illustrate how it can affect health care system and policy issues. Toward this end, we developed an easily applied classification of sequencing based on inputs, methods, and outputs. We used it to examine the implications of sequencing for three health care system and policy issues: making care more patient-centered, developing coverage and reimbursement policies, and assessing economic value. We conclude that sequencing has great promise but that policy challenges include how to optimize patient engagement as well as privacy, develop coverage policies that distinguish research from clinical uses and account for bioinformatics costs, and determine the economic value of sequencing through complex economic models that take into account multiple findings and downstream costs. Project HOPE—The People-to-People Health Foundation, Inc.
The EMBL nucleotide sequence database

PubMed Central

Stoesser, Guenter; Baker, Wendy; van den Broek, Alexandra; Camon, Evelyn; Garcia-Pastor, Maria; Kanz, Carola; Kulikova, Tamara; Lombard, Vincent; Lopez, Rodrigo; Parkinson, Helen; Redaschi, Nicole; Sterk, Peter; Stoehr, Peter; Tuli, Mary Ann

2001-01-01

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. PMID:11125039
Constructing DNA Barcode Sets Based on Particle Swarm Optimization.

PubMed

Wang, Bin; Zheng, Xuedong; Zhou, Shihua; Zhou, Changjun; Wei, Xiaopeng; Zhang, Qiang; Wei, Ziqi

2018-01-01

Following the completion of the human genome project, a large amount of high-throughput bio-data was generated. To analyze these data, massively parallel sequencing, namely next-generation sequencing, was rapidly developed. DNA barcodes are used to identify the ownership between sequences and samples when they are attached at the beginning or end of sequencing reads. Constructing DNA barcode sets provides the candidate DNA barcodes for this application. To increase the accuracy of DNA barcode sets, a particle swarm optimization (PSO) algorithm has been modified and used to construct the DNA barcode sets in this paper. Compared with the extant results, some lower bounds of DNA barcode sets are improved. The results show that the proposed algorithm is effective in constructing DNA barcode sets.
A Workshop Report on Wheat Genome Sequencing

PubMed Central

Gill, Bikram S.; Appels, Rudi; Botha-Oberholster, Anna-Maria; Buell, C. Robin; Bennetzen, Jeffrey L.; Chalhoub, Boulos; Chumley, Forrest; Dvořák, Jan; Iwanaga, Masaru; Keller, Beat; Li, Wanlong; McCombie, W. Richard; Ogihara, Yasunari; Quetier, Francis; Sasaki, Takuji

2004-01-01

Sponsored by the National Science Foundation and the U.S. Department of Agriculture, a wheat genome sequencing workshop was held November 10–11, 2003, in Washington, DC. It brought together 63 scientists of diverse research interests and institutions, including 45 from the United States and 18 from a dozen foreign countries (see list of participants at http://www.ksu.edu/igrow). The objectives of the workshop were to discuss the status of wheat genomics, obtain feedback from ongoing genome sequencing projects, and develop strategies for sequencing the wheat genome. The purpose of this report is to convey the information discussed at the workshop and provide the basis for an ongoing dialogue, bringing forth comments and suggestions from the genetics community. PMID:15514080

Complete genome sequence of Aminobacterium colombiense type strain (ALA-1T)

PubMed Central

Chertkov, Olga; Sikorski, Johannes; Brambilla, Evelyne; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Detter, John C.; Bruce, David; Tapia, Roxanne; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Spring, Stefan; Rohde, Manfred; Göker, Markus; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

2010-01-01

Aminobacterium colombiense Baena et al. 1999 is the type species of the genus Aminobacterium. This genus is of large interest because of its isolated phylogenetic location in the family Synergistaceae, its strictly anaerobic lifestyle, and its ability to grow by fermentation of a limited range of amino acids but not carbohydrates. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the second completed genome sequence of a member of the family Synergistaceae and the first genome sequence of a member of the genus Aminobacterium. The 1,980,592 bp long genome with its 1,914 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304712
Report for the NGFA-5 project.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jaing, C; Jackson, P; Thissen, J

The objective of this project is to provide DHS a comprehensive evaluation of the current genomic technologies including genotyping, TaqMan PCR, multiple locus variable tandem repeat analysis (MLVA), microarray and high-throughput DNA sequencing in the analysis of biothreat agents from complex environmental samples. To effectively compare the sensitivity and specificity of the different genomic technologies, we used SNP TaqMan PCR, MLVA, microarray and high-throughput illumine and 454 sequencing to test various strains from B. anthracis, B. thuringiensis, BioWatch aerosol filter extracts or soil samples that were spiked with B. anthracis, and samples that were previously collected during DHS and EPAmore » environmental release exercises that were known to contain B. thuringiensis spores. The results of all the samples against the various assays are discussed in this report.« less
From sequencing to annotating: extending the metaphor of the book of life from genetics to genomics.

PubMed

Hellsten, Iina

2005-12-01

The article discusses how the metaphor of the Book of Life was extended over time to cover the life cycle of the Human Genome Project from genetics to genomics. In particular, the focus is on the role of extendable metaphors in the debate on the Human Genome Project in three European newspapers, popular scientific journals and scientific and scholarly articles from 1990 to 2002. In these different domains of use, various parts of the metaphor were highlighted. The metaphor of Book of Life was mainly used to justify the continuation of the gene research from gene sequencing to comparative genomics. Readily extendable metaphors, such as the Book of Life, function as useful communicative tools both over time and across domains of use.
Draft genome of the medaka fish: a comprehensive resource for medaka developmental genetics and vertebrate evolutionary biology.

PubMed

Takeda, Hiroyuki

2008-06-01

The medaka Oryzias latipes is a small egg-laying freshwater teleost, and has become an excellent model system for developmental genetics and evolutionary biology. The medaka genome is relatively small in size, approximately 800 Mb, and the genome sequencing project was recently completed by Japanese research groups, providing a high-quality draft genome sequence of the inbred Hd-rR strain of medaka. In this review, I present an overview of the medaka genome project including genome resources, followed by specific findings obtained with the medaka draft genome. In particular, I focus on the analysis that was done by taking advantage of the medaka system, such as the sex chromosome differentiation and the regional history of medaka species using single nucleotide polymorphisms as genomic markers.
A descriptive investigation of the impact of student research projects arising from elective research courses.

PubMed

Harirforoosh, Sam; Stewart, David W

2016-01-27

Pharmacy academicians have noted the need to develop research skills in student pharmacists. At the Gatton College of Pharmacy, significant focus has been placed on the development of research skills through offering elective research courses. In order to evaluate the impact of participation in the research elective(s), we analyzed college records and surveyed faculty members with regard to the number of poster/podium presentations, published peer-reviewed manuscripts, and funded projects. Student enrollment in the research elective sequence has increased over time and has resulted in 81 poster presentations, 14 podium presentations, and 15 peer-reviewed publications. Implementation of a research elective sequence and fostering of a research culture amongst the faculty and students has resulted in increased student engagement in research and related scholarly activities.
Gold nanoparticles for high-throughput genotyping of long-range haplotypes

NASA Astrophysics Data System (ADS)

Chen, Peng; Pan, Dun; Fan, Chunhai; Chen, Jianhua; Huang, Ke; Wang, Dongfang; Zhang, Honglu; Li, You; Feng, Guoyin; Liang, Peiji; He, Lin; Shi, Yongyong

2011-10-01

Completion of the Human Genome Project and the HapMap Project has led to increasing demands for mapping complex traits in humans to understand the aetiology of diseases. Identifying variations in the DNA sequence, which affect how we develop disease and respond to pathogens and drugs, is important for this purpose, but it is difficult to identify these variations in large sample sets. Here we show that through a combination of capillary sequencing and polymerase chain reaction assisted by gold nanoparticles, it is possible to identify several DNA variations that are associated with age-related macular degeneration and psoriasis on significant regions of human genomic DNA. Our method is accurate and promising for large-scale and high-throughput genetic analysis of susceptibility towards disease and drug resistance.
A comprehensive crop genome research project: the Superhybrid Rice Genome Project in China.

PubMed

Yu, Jun; Wong, Gane Ka-Shu; Liu, Siqi; Wang, Jian; Yang, Huanming

2007-06-29

In May 2000, the Beijing Institute of Genomics formally announced the launch of a comprehensive crop genome research project on rice genomics, the Chinese Superhybrid Rice Genome Project. SRGP is not simply a sequencing project targeted to a single rice (Oryza sativa L.) genome, but a full-swing research effort with an ultimate goal of providing inclusive basic genomic information and molecular tools not only to understand biology of the rice, both as an important crop species and a model organism of cereals, but also to focus on a popular superhybrid rice landrace, LYP9. We have completed the first phase of SRGP and provide the rice research community with a finished genome sequence of an indica variety, 93-11 (the paternal cultivar of LYP9), together with ample data on subspecific (between subspecies) polymorphisms, transcriptomes and proteomes, useful for within-species comparative studies. In the second phase, we have acquired the genome sequence of the maternal cultivar, PA64S, together with the detailed catalogues of genes uniquely expressed in the parental cultivars and the hybrid as well as allele-specific markers that distinguish parental alleles. Although SRGP in China is not an open-ended research programme, it has been designed to pave a way for future plant genomics research and application, such as to interrogate fundamentals of plant biology, including genome duplication, polyploidy and hybrid vigour, as well as to provide genetic tools for crop breeding and to carry along a social burden-leading a fight against the world's hunger. It began with genomics, the newly developed and industry-scale research field, and from the world's most populous country. In this review, we summarize our scientific goals and noteworthy discoveries that exploit new territories of systematic investigations on basic and applied biology of rice and other major cereal crops.
Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing.

PubMed

Liu, Yu; Koyutürk, Mehmet; Maxwell, Sean; Xiang, Min; Veigl, Martina; Cooper, Richard S; Tayo, Bamidele O; Li, Li; LaFramboise, Thomas; Wang, Zhenghe; Zhu, Xiaofeng; Chance, Mark R

2014-08-16

Sequences up to several megabases in length have been found to be present in individual genomes but absent in the human reference genome. These sequences may be common in populations, and their absence in the reference genome may indicate rare variants in the genomes of individuals who served as donors for the human genome project. As the reference genome is used in probe design for microarray technology and mapping short reads in next generation sequencing (NGS), this missing sequence could be a source of bias in functional genomic studies and variant analysis. One End Anchor (OEA) and/or orphan reads from paired-end sequencing have been used to identify novel sequences that are absent in reference genome. However, there is no study to investigate the distribution, evolution and functionality of those sequences in human populations. To systematically identify and study the missing common sequences (micSeqs), we extended the previous method by pooling OEA reads from large number of individuals and applying strict filtering methods to remove false sequences. The pipeline was applied to data from phase 1 of the 1000 Genomes Project. We identified 309 micSeqs that are present in at least 1% of the human population, but absent in the reference genome. We confirmed 76% of these 309 micSeqs by comparison to other primate genomes, individual human genomes, and gene expression data. Furthermore, we randomly selected fifteen micSeqs and confirmed their presence using PCR validation in 38 additional individuals. Functional analysis using published RNA-seq and ChIP-seq data showed that eleven micSeqs are highly expressed in human brain and three micSeqs contain transcription factor (TF) binding regions, suggesting they are functional elements. In addition, the identified micSeqs are absent in non-primates and show dynamic acquisition during primate evolution culminating with most micSeqs being present in Africans, suggesting some micSeqs may be important sources of human diversity. 76% of micSeqs were confirmed by a comparative genomics approach. Fourteen micSeqs are expressed in human brain or contain TF binding regions. Some micSeqs are primate-specific, conserved and may play a role in the evolution of primates.
Facies analysis and sequence stratigraphic framework of upper Campanian strata (Neslen and Mount Garfield formations, Bluecastle Tongue of the Castlegate sandstone, and Mancos shale), Eastern Book cliffs, Colorado and Utah

USGS Publications Warehouse

Kirschbaum, Mark A.; Hettinger, Robert D.

2004-01-01

Facies and sequence-stratigraphic analysis identifies six high-resolution sequences within upper Campanian strata across about 120 miles of the Book Cliffs in western Colorado and eastern Utah. The six sequences are named after prominent sandstone units and include, in ascending order, upper Sego sequence, Neslen sequence, Corcoran sequence, Buck Canyon/lower Cozzette sequence, upper Cozzette sequence, and Cozzette/Rollins sequence. A seventh sequence, the Bluecastle sequence, is present in the extreme western part of the study area. Facies analysis documents deepening- and shallowing- upward successions, parasequence stacking patterns, downlap in subsurface cross sections, facies dislocations, basinward shifts in facies, and truncation of strata.All six sequences display major incision into shoreface deposits of the Sego Sandstone and sandstones of the Corcoran and Cozzette Members of the Mount Garfield Formation. The incised surfaces represent sequence-boundary unconformities that allowed bypass of sediment to lowstand shorelines that are either attached to the older highstand shorelines or are detached from the older highstand shorelines and located southeast of the main study area. The sequence boundary unconformities represent valley incisions that were cut during successive lowstands of relative sea level. The overlying valley-fill deposits generally consist of tidally influenced strata deposited during an overall base level rise. Transgressive surfaces can be traced or projected over, or locally into, estuarine deposits above and landward of their associated shoreface deposits. Maximum flooding surfaces can be traced or projected landward from offshore strata into, or above, coastal-plain deposits. With the exception of the Cozzette/Rollins sequence, the majority of coal-bearing coastal-plain strata was deposited before maximum flooding and is therefore within the transgressive systems tracts. Maximum flooding was followed by strong progradation of parasequences and low preservation potential of coastal-plain strata within the highstand systems tract. The large incised valleys, lack of transgressive retrogradational parasequences, strong progradational nature of highstand parasequences, and low preservation of coastal-plain strata in the highstand systems tracts argue for relatively low accommodation space during deposition of the Sego, Corcoran, and Cozzette sequences. The Buck Canyon/Cozzette and Cozzette/Rollins sequences contrast with other sequences in that the preservation of retrogradational parasequences and the development of large estuaries coincident with maximum flooding indicate a relative increase in accommodation space during deposition of these strata. Following maximum flooding, the Buck Canyon/Cozzette sequence follows the pattern of the other sequences, but the Cozzette/Rollins sequence exhibits a contrasting offlapping pattern with development of offshore clinoforms that downlap and eventually parallel its maximum flooding surface. This highstand systems tract preserves a thick coal-bearing section where the Rollins Sandstone Member of the Mount Garfield Formation parasequences prograde out of the study area, stepping up as much as 800 ft stratigraphically over a distance of about 90 miles. This progradational stacking pattern indicates a higher accommodation space and increased sedimentation rate compared to the previous sequences.
Control method of Three-phase Four-leg converter based on repetitive control

NASA Astrophysics Data System (ADS)

Hui, Wang

2018-03-01

The research chose the magnetic levitation force of wind power generation system as the object. In order to improve the power quality problem caused by unbalanced load in power supply system, we combined the characteristics and repetitive control principle of magnetic levitation wind power generation system, and then an independent control strategy for three-phase four-leg converter was proposed. In this paper, based on the symmetric component method, the second order generalized integrator was used to generate the positive and negative sequence of signals, and the decoupling control was carried out under the synchronous rotating reference frame, in which the positive and negative sequence voltage is PI double closed loop, and a PI regulator with repetitive control was introduced to eliminate the static error regarding the fundamental frequency fluctuation characteristic of zero sequence component. The simulation results based on Matlab/Simulink show that the proposed control project can effectively suppress the disturbance caused by unbalanced loads and maintain the load voltage balance. The project is easy to be achieved and remarkably improves the quality of the independent power supply system.
Leptospiral Pathogenomics

PubMed Central

Lehmann, Jason S.; Matthias, Michael A.; Vinetz, Joseph M.; Fouts, Derrick E.

2014-01-01

Leptospirosis, caused by pathogenic spirochetes belonging to the genus Leptospira, is a zoonosis with important impacts on human and animal health worldwide. Research on the mechanisms of Leptospira pathogenesis has been hindered due to slow growth of infectious strains, poor transformability, and a paucity of genetic tools. As a result of second generation sequencing technologies, there has been an acceleration of leptospiral genome sequencing efforts in the past decade, which has enabled a concomitant increase in functional genomics analyses of Leptospira pathogenesis. A pathogenomics approach, by coupling of pan-genomic analysis of multiple isolates with sequencing of experimentally attenuated highly pathogenic Leptospira, has resulted in the functional inference of virulence factors. The global Leptospira Genome Project supported by the U.S. National Institute of Allergy and Infectious Diseases to which key scientific contributions have been made from the international leptospirosis research community has provided a new roadmap for comprehensive studies of Leptospira and leptospirosis well into the future. This review describes functional genomics approaches to apply the data generated by the Leptospira Genome Project towards deepening our knowledge of virulence factors of Leptospira using the emerging discipline of pathogenomics. PMID:25437801
Preliminary report for analysis of genome wide mutations from four ciprofloxacin resistant B. anthracis Sterne isolates generated by Illumina, 454 sequencing and microarrays for DHS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jaing, Crystal; Vergez, Lisa; Hinckley, Aubree

2011-06-21

The objective of this project is to provide DHS a comprehensive evaluation of the current genomic technologies including genotyping, Taqman PCR, multiple locus variable tandem repeat analysis (MLVA), microarray and high-throughput DNA sequencing in the analysis of biothreat agents from complex environmental samples. As the result of a different DHS project, we have selected for and isolated a large number of ciprofloxacin resistant B. anthracis Sterne isolates. These isolates vary in the concentrations of ciprofloxacin that they can tolerate, suggesting multiple mutations in the samples. In collaboration with University of Houston, Eureka Genomics and Oak Ridge National Laboratory, we analyzedmore » the ciprofloxacin resistant B. anthracis Sterne isolates by microarray hybridization, Illumina and Roche 454 sequencing to understand the error rates and sensitivity of the different methods. The report provides an assessment of the results and a complete set of all protocols used and all data generated along with information to interpret the protocols and data sets.« less
Connecting the Human Variome Project to nutrigenomics.

PubMed

Kaput, Jim; Evelo, Chris T; Perozzi, Giuditta; van Ommen, Ben; Cotton, Richard

2010-12-01

Nutrigenomics is the science of analyzing and understanding gene-nutrient interactions, which because of the genetic heterogeneity, varying degrees of interaction among gene products, and the environmental diversity is a complex science. Although much knowledge of human diversity has been accumulated, estimates suggest that ~90% of genetic variation has not yet been characterized. Identification of the DNA sequence variants that contribute to nutrition-related disease risk is essential for developing a better understanding of the complex causes of disease in humans, including nutrition-related disease. The Human Variome Project (HVP; http://www.humanvariomeproject.org/) is an international effort to systematically identify genes, their mutations, and their variants associated with phenotypic variability and indications of human disease or phenotype. Since nutrigenomic research uses genetic information in the design and analysis of experiments, the HVP is an essential collaborator for ongoing studies of gene-nutrient interactions. With the advent of next generation sequencing methodologies and the understanding of the undiscovered variation in human genomes, the nutrigenomic community will be generating novel sequence data and results. The guidelines and practices of the HVP can guide and harmonize these efforts.
Connecting the Human Variome Project to nutrigenomics

PubMed Central

Evelo, Chris T.; Perozzi, Giuditta; van Ommen, Ben; Cotton, Richard

2010-01-01

Nutrigenomics is the science of analyzing and understanding gene–nutrient interactions, which because of the genetic heterogeneity, varying degrees of interaction among gene products, and the environmental diversity is a complex science. Although much knowledge of human diversity has been accumulated, estimates suggest that ~90% of genetic variation has not yet been characterized. Identification of the DNA sequence variants that contribute to nutrition-related disease risk is essential for developing a better understanding of the complex causes of disease in humans, including nutrition-related disease. The Human Variome Project (HVP; http://www.humanvariomeproject.org/) is an international effort to systematically identify genes, their mutations, and their variants associated with phenotypic variability and indications of human disease or phenotype. Since nutrigenomic research uses genetic information in the design and analysis of experiments, the HVP is an essential collaborator for ongoing studies of gene–nutrient interactions. With the advent of next generation sequencing methodologies and the understanding of the undiscovered variation in human genomes, the nutrigenomic community will be generating novel sequence data and results. The guidelines and practices of the HVP can guide and harmonize these efforts. PMID:28300226
MACARON: A python framework to identify and re-annotate multi-base affected codons in whole genome/exome sequence data.

PubMed

Khan, Waqasuddin; Saripella, Ganapathi Varma-; Ludwig, Thomas; Cuppens, Tania; Thibord, Florian; Génin, Emmanuelle; Deleuze, Jean-Francois; Trégouët, David-Alexandre

2018-05-03

Predicted deleteriousness of coding variants is a frequently used criterion to filter out variants detected in next-generation sequencing projects and to select candidates impacting on the risk of human diseases. Most available dedicated tools implement a base-to-base annotation approach that could be biased in presence of several variants in the same genetic codon. We here proposed the MACARON program that, from a standard VCF file, identifies, re-annotates and predicts the amino acid change resulting from multiple single nucleotide variants (SNVs) within the same genetic codon. Applied to the whole exome dataset of 573 individuals, MACARON identifies 114 situations where multiple SNVs within a genetic codon induce an amino acid change that is different from those predicted by standard single SNV annotation tool. Such events are not uncommon and deserve to be studied in sequencing projects with inconclusive findings. MACARON is written in python with codes available on the GENMED website (www.genmed.fr). david-alexandre.tregouet@inserm.fr. Supplementary data are available at Bioinformatics online.
The neural dynamics of song syntax in songbirds

NASA Astrophysics Data System (ADS)

Jin, Dezhe

2010-03-01

Songbird is ``the hydrogen atom'' of the neuroscience of complex, learned vocalizations such as human speech. Songs of Bengalese finch consist of sequences of syllables. While syllables are temporally stereotypical, syllable sequences can vary and follow complex, probabilistic syntactic rules, which are rudimentarily similar to grammars in human language. Songbird brain is accessible to experimental probes, and is understood well enough to construct biologically constrained, predictive computational models. In this talk, I will discuss the structure and dynamics of neural networks underlying the stereotypy of the birdsong syllables and the flexibility of syllable sequences. Recent experiments and computational models suggest that a syllable is encoded in a chain network of projection neurons in premotor nucleus HVC (proper name). Precisely timed spikes propagate along the chain, driving vocalization of the syllable through downstream nuclei. Through a computational model, I show that that variable syllable sequences can be generated through spike propagations in a network in HVC in which the syllable-encoding chain networks are connected into a branching chain pattern. The neurons mutually inhibit each other through the inhibitory HVC interneurons, and are driven by external inputs from nuclei upstream of HVC. At a branching point that connects the final group of a chain to the first groups of several chains, the spike activity selects one branch to continue the propagation. The selection is probabilistic, and is due to the winner-take-all mechanism mediated by the inhibition and noise. The model predicts that the syllable sequences statistically follow partially observable Markov models. Experimental results supporting this and other predictions of the model will be presented. We suggest that the syntax of birdsong syllable sequences is embedded in the connection patterns of HVC projection neurons.
Mining biological databases for candidate disease genes

NASA Astrophysics Data System (ADS)

Braun, Terry A.; Scheetz, Todd; Webster, Gregg L.; Casavant, Thomas L.

2001-07-01

The publicly-funded effort to sequence the complete nucleotide sequence of the human genome, the Human Genome Project (HGP), has currently produced more than 93% of the 3 billion nucleotides of the human genome into a preliminary `draft' format. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the sequencing of model organisms (rat, mouse, fly, and others), gene discovery projects (ESTs and full-length), and new technologies such as expression analysis and resources (micro-arrays or gene chips). These resources are invaluable for the researchers identifying the functional genes of the genome that transcribe and translate into the transcriptome and proteome, both of which potentially contain orders of magnitude more complexity than the genome itself. Preliminary analyses of this data identified approximately 30,000 - 40,000 human `genes.' However, the bulk of the effort still remains -- to identify the functional and structural elements contained within the transcriptome and proteome, and to associate function in the transcriptome and proteome to genes. A fortuitous consequence of the HGP is the existence of hundreds of databases containing biological information that may contain relevant data pertaining to the identification of disease-causing genes. The task of mining these databases for information on candidate genes is a commercial application of enormous potential. We are developing a system to acquire and mine data from specific databases to aid our efforts to identify disease genes. A high speed cluster of Linux of workstations is used to analyze sequence and perform distributed sequence alignments as part of our data mining and processing. This system has been used to mine GeneMap99 sequences within specific genomic intervals to identify potential candidate disease genes associated with Bardet-Biedle Syndrome (BBS).
The Personal Genome Project Canada: findings from whole genome sequences of the inaugural 56 participants

PubMed Central

Reuter, Miriam S.; Walker, Susan; Thiruvahindrapuram, Bhooma; Whitney, Joe; Cohn, Iris; Sondheimer, Neal; Yuen, Ryan K.C.; Trost, Brett; Paton, Tara A.; Pereira, Sergio L.; Herbrick, Jo-Anne; Wintle, Richard F.; Merico, Daniele; Howe, Jennifer; MacDonald, Jeffrey R.; Lu, Chao; Nalpathamkalam, Thomas; Sung, Wilson W.L.; Wang, Zhuozhi; Patel, Rohan V.; Pellecchia, Giovanna; Wei, John; Strug, Lisa J.; Bell, Sherilyn; Kellam, Barbara; Mahtani, Melanie M.; Bassett, Anne S.; Bombard, Yvonne; Weksberg, Rosanna; Shuman, Cheryl; Cohn, Ronald D.; Stavropoulos, Dimitri J.; Bowdin, Sarah; Hildebrandt, Matthew R.; Wei, Wei; Romm, Asli; Pasceri, Peter; Ellis, James; Ray, Peter; Meyn, M. Stephen; Monfared, Nasim; Hosseini, S. Mohsen; Joseph-George, Ann M.; Keeley, Fred W.; Cook, Ryan A.; Fiume, Marc; Lee, Hin C.; Marshall, Christian R.; Davies, Jill; Hazell, Allison; Buchanan, Janet A.; Szego, Michael J.; Scherer, Stephen W.

2018-01-01

BACKGROUND: The Personal Genome Project Canada is a comprehensive public data resource that integrates whole genome sequencing data and health information. We describe genomic variation identified in the initial recruitment cohort of 56 volunteers. METHODS: Volunteers were screened for eligibility and provided informed consent for open data sharing. Using blood DNA, we performed whole genome sequencing and identified all possible classes of DNA variants. A genetic counsellor explained the implication of the results to each participant. RESULTS: Whole genome sequencing of the first 56 participants identified 207 662 805 sequence variants and 27 494 copy number variations. We analyzed a prioritized disease-associated data set (n = 1606 variants) according to standardized guidelines, and interpreted 19 variants in 14 participants (25%) as having obvious health implications. Six of these variants (e.g., in BRCA1 or mosaic loss of an X chromosome) were pathogenic or likely pathogenic. Seven were risk factors for cancer, cardiovascular or neurobehavioural conditions. Four other variants — associated with cancer, cardiac or neurodegenerative phenotypes — remained of uncertain significance because of discrepancies among databases. We also identified a large structural chromosome aberration and a likely pathogenic mitochondrial variant. There were 172 recessive disease alleles (e.g., 5 individuals carried mutations for cystic fibrosis). Pharmacogenomics analyses revealed another 3.9 potentially relevant genotypes per individual. INTERPRETATION: Our analyses identified a spectrum of genetic variants with potential health impact in 25% of participants. When also considering recessive alleles and variants with potential pharmacologic relevance, all 56 participants had medically relevant findings. Although access is mostly limited to research, whole genome sequencing can provide specific and novel information with the potential of major impact for health care. PMID:29431110
GTRAC: fast retrieval from compressed collections of genomic variants

PubMed Central

Tatwawadi, Kedar; Hernaez, Mikel; Ochoa, Idoia; Weissman, Tsachy

2016-01-01

Motivation: The dramatic decrease in the cost of sequencing has resulted in the generation of huge amounts of genomic data, as evidenced by projects such as the UK10K and the Million Veteran Project, with the number of sequenced genomes ranging in the order of 10 K to 1 M. Due to the large redundancies among genomic sequences of individuals from the same species, most of the medical research deals with the variants in the sequences as compared with a reference sequence, rather than with the complete genomic sequences. Consequently, millions of genomes represented as variants are stored in databases. These databases are constantly updated and queried to extract information such as the common variants among individuals or groups of individuals. Previous algorithms for compression of this type of databases lack efficient random access capabilities, rendering querying the database for particular variants and/or individuals extremely inefficient, to the point where compression is often relinquished altogether. Results: We present a new algorithm for this task, called GTRAC, that achieves significant compression ratios while allowing fast random access over the compressed database. For example, GTRAC is able to compress a Homo sapiens dataset containing 1092 samples in 1.1 GB (compression ratio of 160), while allowing for decompression of specific samples in less than a second and decompression of specific variants in 17 ms. GTRAC uses and adapts techniques from information theory, such as a specialized Lempel-Ziv compressor, and tailored succinct data structures. Availability and Implementation: The GTRAC algorithm is available for download at: https://github.com/kedartatwawadi/GTRAC Contact: kedart@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27587665
GTRAC: fast retrieval from compressed collections of genomic variants.

PubMed

Tatwawadi, Kedar; Hernaez, Mikel; Ochoa, Idoia; Weissman, Tsachy

2016-09-01

The dramatic decrease in the cost of sequencing has resulted in the generation of huge amounts of genomic data, as evidenced by projects such as the UK10K and the Million Veteran Project, with the number of sequenced genomes ranging in the order of 10 K to 1 M. Due to the large redundancies among genomic sequences of individuals from the same species, most of the medical research deals with the variants in the sequences as compared with a reference sequence, rather than with the complete genomic sequences. Consequently, millions of genomes represented as variants are stored in databases. These databases are constantly updated and queried to extract information such as the common variants among individuals or groups of individuals. Previous algorithms for compression of this type of databases lack efficient random access capabilities, rendering querying the database for particular variants and/or individuals extremely inefficient, to the point where compression is often relinquished altogether. We present a new algorithm for this task, called GTRAC, that achieves significant compression ratios while allowing fast random access over the compressed database. For example, GTRAC is able to compress a Homo sapiens dataset containing 1092 samples in 1.1 GB (compression ratio of 160), while allowing for decompression of specific samples in less than a second and decompression of specific variants in 17 ms. GTRAC uses and adapts techniques from information theory, such as a specialized Lempel-Ziv compressor, and tailored succinct data structures. The GTRAC algorithm is available for download at: https://github.com/kedartatwawadi/GTRAC CONTACT: : kedart@stanford.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.

PubMed

Bansal, Vikas

2017-03-14

PCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from "natural" read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments. In this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45-50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70-95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples. The method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates .
The Personal Genome Project Canada: findings from whole genome sequences of the inaugural 56 participants.

PubMed

Reuter, Miriam S; Walker, Susan; Thiruvahindrapuram, Bhooma; Whitney, Joe; Cohn, Iris; Sondheimer, Neal; Yuen, Ryan K C; Trost, Brett; Paton, Tara A; Pereira, Sergio L; Herbrick, Jo-Anne; Wintle, Richard F; Merico, Daniele; Howe, Jennifer; MacDonald, Jeffrey R; Lu, Chao; Nalpathamkalam, Thomas; Sung, Wilson W L; Wang, Zhuozhi; Patel, Rohan V; Pellecchia, Giovanna; Wei, John; Strug, Lisa J; Bell, Sherilyn; Kellam, Barbara; Mahtani, Melanie M; Bassett, Anne S; Bombard, Yvonne; Weksberg, Rosanna; Shuman, Cheryl; Cohn, Ronald D; Stavropoulos, Dimitri J; Bowdin, Sarah; Hildebrandt, Matthew R; Wei, Wei; Romm, Asli; Pasceri, Peter; Ellis, James; Ray, Peter; Meyn, M Stephen; Monfared, Nasim; Hosseini, S Mohsen; Joseph-George, Ann M; Keeley, Fred W; Cook, Ryan A; Fiume, Marc; Lee, Hin C; Marshall, Christian R; Davies, Jill; Hazell, Allison; Buchanan, Janet A; Szego, Michael J; Scherer, Stephen W

2018-02-05

The Personal Genome Project Canada is a comprehensive public data resource that integrates whole genome sequencing data and health information. We describe genomic variation identified in the initial recruitment cohort of 56 volunteers. Volunteers were screened for eligibility and provided informed consent for open data sharing. Using blood DNA, we performed whole genome sequencing and identified all possible classes of DNA variants. A genetic counsellor explained the implication of the results to each participant. Whole genome sequencing of the first 56 participants identified 207 662 805 sequence variants and 27 494 copy number variations. We analyzed a prioritized disease-associated data set ( n = 1606 variants) according to standardized guidelines, and interpreted 19 variants in 14 participants (25%) as having obvious health implications. Six of these variants (e.g., in BRCA1 or mosaic loss of an X chromosome) were pathogenic or likely pathogenic. Seven were risk factors for cancer, cardiovascular or neurobehavioural conditions. Four other variants - associated with cancer, cardiac or neurodegenerative phenotypes - remained of uncertain significance because of discrepancies among databases. We also identified a large structural chromosome aberration and a likely pathogenic mitochondrial variant. There were 172 recessive disease alleles (e.g., 5 individuals carried mutations for cystic fibrosis). Pharmacogenomics analyses revealed another 3.9 potentially relevant genotypes per individual. Our analyses identified a spectrum of genetic variants with potential health impact in 25% of participants. When also considering recessive alleles and variants with potential pharmacologic relevance, all 56 participants had medically relevant findings. Although access is mostly limited to research, whole genome sequencing can provide specific and novel information with the potential of major impact for health care. © 2018 Joule Inc. or its licensors.
Exome sequencing and genome-wide linkage analysis in 17 families illustrate the complex contribution of TTN truncating variants to dilated cardiomyopathy.

PubMed

Norton, Nadine; Li, Duanxiang; Rampersaud, Evadnie; Morales, Ana; Martin, Eden R; Zuchner, Stephan; Guo, Shengru; Gonzalez, Michael; Hedges, Dale J; Robertson, Peggy D; Krumm, Niklas; Nickerson, Deborah A; Hershberger, Ray E

2013-04-01

BACKGROUND- Familial dilated cardiomyopathy (DCM) is a genetically heterogeneous disease with >30 known genes. TTN truncating variants were recently implicated in a candidate gene study to cause 25% of familial and 18% of sporadic DCM cases. METHODS AND RESULTS- We used an unbiased genome-wide approach using both linkage analysis and variant filtering across the exome sequences of 48 individuals affected with DCM from 17 families to identify genetic cause. Linkage analysis ranked the TTN region as falling under the second highest genome-wide multipoint linkage peak, multipoint logarithm of odds, 1.59. We identified 6 TTN truncating variants carried by individuals affected with DCM in 7 of 17 DCM families (logarithm of odds, 2.99); 2 of these 7 families also had novel missense variants that segregated with disease. Two additional novel truncating TTN variants did not segregate with DCM. Nucleotide diversity at the TTN locus, including missense variants, was comparable with 5 other known DCM genes. The average number of missense variants in the exome sequences from the DCM cases or the ≈5400 cases from the Exome Sequencing Project was ≈23 per individual. The average number of TTN truncating variants in the Exome Sequencing Project was 0.014 per individual. We also identified a region (chr9q21.11-q22.31) with no known DCM genes with a maximum heterogeneity logarithm of odds score of 1.74. CONCLUSIONS- These data suggest that TTN truncating variants contribute to DCM cause. However, the lack of segregation of all identified TTN truncating variants illustrates the challenge of determining variant pathogenicity even with full exome sequencing.
Sensitivity of the North Atlantic Basin to cyclic climatic forcing during the early Cretaceous

USGS Publications Warehouse

Dean, W.E.; Arthur, M.A.

1999-01-01

Striking cyclic interbeds of laminated dark-olive to black marlstone and bioturbated white to light-gray limestone of Neocomian (Early Cretaceous) age have been recovered at Deep Sea Drilling Project (DSDP) and Ocean Drilling Project (ODP) sites in the North Atlantic. These Neocomian sequences are equivalent to the Maiolica Formation that outcrops in the Tethyan regions of the Mediterranean and to thick limestone sequences of the Vocontian Trough of France. This lithologic unit marks the widespread deposition of biogenic carbonate over much of the North Atlantic and Tethyan seafloor during a time of overall low sealevel and a deep carbonate compensation depth. The dark clay-rich interbeds typically are rich in organic carbon (OC) with up to 5.5% OC in sequences in the eastern North Atlantic. These eastern North Atlantic sequences off northwest Africa, contain more abundant and better preserved hydrogen-rich, algal organic matter (type II kerogen) relative to the western North Atlantic, probably in response to coastal upwelling induced by an eastern boundary current in the young North Atlantic Ocean. The more abundant algal organic matter in sequences in the eastern North Atlantic is also expressed in the isotopic composition of the carbon in that organic matter. In contrast, organic matter in Neocomian sequences in the western North Atlantic along the continental margin of North America has geochemical and optical characteristics of herbaceous, woody, hydrogen-poor, humic, type III kerogen. The inorganic geochemical characteristics of the dark clay-rich (80% CaCO3) interbeds in both the eastern and western basins of the North Atlantic suggest that they contain minor amounts of relatively unweathered eolian dust derived from northwest Africa during dry intervals.
Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

PubMed Central

Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

2010-01-01

Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665
Building information models for astronomy projects

NASA Astrophysics Data System (ADS)

Ariño, Javier; Murga, Gaizka; Campo, Ramón; Eletxigerra, Iñigo; Ampuero, Pedro

2012-09-01

A Building Information Model is a digital representation of physical and functional characteristics of a building. BIMs represent the geometrical characteristics of the Building, but also properties like bills of quantities, definition of COTS components, status of material in the different stages of the project, project economic data, etc. The BIM methodology, which is well established in the Architecture Engineering and Construction (AEC) domain for conventional buildings, has been brought one step forward in its application for Astronomical/Scientific facilities. In these facilities steel/concrete structures have high dynamic and seismic requirements, M&E installations are complex and there is a large amount of special equipment and mechanisms involved as a fundamental part of the facility. The detail design definition is typically implemented by different design teams in specialized design software packages. In order to allow the coordinated work of different engineering teams, the overall model, and its associated engineering database, is progressively integrated using a coordination and roaming software which can be used before starting construction phase for checking interferences, planning the construction sequence, studying maintenance operation, reporting to the project office, etc. This integrated design & construction approach will allow to efficiently plan construction sequence (4D). This is a powerful tool to study and analyze in detail alternative construction sequences and ideally coordinate the work of different construction teams. In addition engineering, construction and operational database can be linked to the virtual model (6D), what gives to the end users a invaluable tool for the lifecycle management, as all the facility information can be easily accessed, added or replaced. This paper presents the BIM methodology as implemented by IDOM with the E-ELT and ATST Enclosures as application examples.
A National Survey on the Taxonomy of Community Living Skills. Working Paper 87-4. COMPETE: Community-Based Model for Public-School Exit and Transition to Employment.

ERIC Educational Resources Information Center

Dever, Richard B.

This paper is a product of Project COMPETE, a service demonstration project undertaken for the purpose of developing and validating a model and training sequence to improve transition services for moderately, severely, and profoundly retarded youth. The paper describes the Taxonomy of Community Living Skills, an organized statement of…
A Hybrid Integrated Laboratory and Inquiry-Based Research Experience: Replacing Traditional Laboratory Instruction with a Sustainable Student-Led Research Project

ERIC Educational Resources Information Center

Hartings, Matthew R.; Fox, Douglas M.; Miller, Abigail E.; Muratore, Kathryn E.

2015-01-01

The Department of Chemistry at American University has replaced its junior- and senior-level laboratory curriculum with two, two-semester long, student-led research projects as part of the department's American Chemical Society-accredited program. In the first semester of each sequence, a faculty instructor leads the students through a set of…
An Assessment for Learning System Called ACED: Designing for Learning Effectiveness and Accessibility. Research Report. ETS RR-07-26

ERIC Educational Resources Information Center

Shute, Valerie J.; Hansen, Eric G.; Almond, Russell G.

2007-01-01

This paper reports on a 3-year, NSF-funded research and development project called ACED: Adaptive Content with Evidence-based Diagnosis. The purpose of the project was to design, develop, and evaluate an assessment for learning (AfL) system for diverse students, using Algebra I content related to geometric sequences (i.e., successive numbers…
Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus

PubMed Central

Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

2012-01-01

Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function. PMID:22368382
K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

PubMed

Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

2018-05-15

Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.
Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

PubMed Central

2005-01-01

Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington University Department of Biology Science Outreach to create a video tour depicting the processes involved in large-scale sequencing. “Sequencing a Genome: Inside the Washington University Genome Sequencing Center” is a tour of the laboratory that follows the steps in the sequencing pipeline, interspersed with animated explanations of the scientific procedures used at the facility. Accompanying interviews with the staff illustrate different entry levels for a career in genome science. This video project serves as an example of how research and academic institutions can provide teachers and students with access and exposure to innovative technologies at the forefront of biomedical research. Initial feedback on the video from undergraduate students, high school teachers, and high school students provides suggestions for use of this video in a classroom setting to supplement present curricula. PMID:16341256
Novel application of the MSSCP method in biodiversity studies.

PubMed

Tomczyk-Żak, Karolina; Kaczanowski, Szymon; Górecka, Magdalena; Zielenkiewicz, Urszula

2012-02-01

Analysis of 16S rRNA sequence diversity is widely performed for characterizing the biodiversity of microbial samples. The number of determined sequences has a considerable impact on complete results. Although the cost of mass sequencing is decreasing, it is often still too high for individual projects. We applied the multi-temperature single-strand conformational polymorphism (MSSCP) method to decrease the number of analysed sequences. This was a novel application of this method. As a control, the same sample was analysed using random sequencing. In this paper, we adapted the MSSCP technique for screening of unique sequences of the 16S rRNA gene library and bacterial strains isolated from biofilms growing on the walls of an ancient gold mine in Poland and determined whether the results obtained by both methods differed and whether random sequencing could be replaced by MSSCP. Although it was biased towards the detection of rare sequences in the samples, the qualitative results of MSSCP were not different than those of random sequencing. Unambiguous discrimination of unique clones and strains creates an opportunity to effectively estimate the biodiversity of natural communities, especially in populations which are numerous but species poor. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Array automated assembly task low cost silicon solar array project, phase 2

NASA Technical Reports Server (NTRS)

Olson, C.

1980-01-01

Analyses of solar cell and module process steps for throughput rate, cost effectiveness, and reproductibility are reported. In addition to the concentration on cell and module processing sequences, an investigation was made into the capability of using microwave energy in the diffusion, sintering, and thick film firing steps of cell processing. Although the entire process sequence was integrated, the steps are treated individually with test and experimental data, conclusions, and recommendations.
Navy LPD-17 Amphibious Ship Procurement: Background, Issues, and Options for Congress

DTIC Science & Technology

2010-07-01

Background, Issues, and Options for Congress 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e . TASK...performed out of sequence and significant rework has been required, disrupting the optimal construction sequence and application of lessons learned...deeply concerned about Northrop Grumman Ship Systems’ ( NGSS ) ability to recover in the aftermath of Hurricane Katrina, particularly in regard to
Navy LPD-17 Amphibious Ship Procurement: Background, Issues, and Options for Congress

DTIC Science & Technology

2010-06-10

Background, Issues, and Options for Congress 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e . TASK...out of sequence and significant rework has been required, disrupting the optimal construction sequence and application of lessons learned for...concerned about Northrop Grumman Ship Systems’ ( NGSS ) ability to recover in the aftermath of Hurricane Katrina, particularly in regard to construction
Navy LPD-17 Amphibious Ship Procurement: Background, Issues, and Options for Congress

DTIC Science & Technology

2010-03-29

and Options for Congress 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e . TASK NUMBER 5f...performed out of sequence and significant rework has been required, disrupting the optimal construction sequence and application of lessons learned...deeply concerned about Northrop Grumman Ship Systems’ ( NGSS ) ability to recover in the aftermath of Hurricane Katrina, particularly in regard to
LLNL's Big Science Capabilities Help Spur Over $796 Billion in U.S. Economic Activity Sequencing the Human Genome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stewart, Jeffrey S.

LLNL’s successful history of taking on big science projects spans beyond national security and has helped create billions of dollars per year in new economic activity. One example is LLNL’s role in helping sequence the human genome. Over $796 billion in new economic activity in over half a dozen fields has been documented since LLNL successfully completed this Grand Challenge.
Applications of the 1000 Genomes Project resources.

PubMed

Zheng-Bradley, Xiangqun; Flicek, Paul

2017-05-01

The 1000 Genomes Project created a valuable, worldwide reference for human genetic variation. Common uses of the 1000 Genomes dataset include genotype imputation supporting Genome-wide Association Studies, mapping expression Quantitative Trait Loci, filtering non-pathogenic variants from exome, whole genome and cancer genome sequencing projects, and genetic analysis of population structure and molecular evolution. In this article, we will highlight some of the multiple ways that the 1000 Genomes data can be and has been utilized for genetic studies. © The Author 2016. Published by Oxford University Press.
Genome-wide comparative analysis of four Indian Drosophila species.

PubMed

Mohanty, Sujata; Khanna, Radhika

2017-12-01

Comparative analysis of multiple genomes of closely or distantly related Drosophila species undoubtedly creates excitement among evolutionary biologists in exploring the genomic changes with an ecology and evolutionary perspective. We present herewith the de novo assembled whole genome sequences of four Drosophila species, D. bipectinata, D. takahashii, D. biarmipes and D. nasuta of Indian origin using Next Generation Sequencing technology on an Illumina platform along with their detailed assembly statistics. The comparative genomics analysis, e.g. gene predictions and annotations, functional and orthogroup analysis of coding sequences and genome wide SNP distribution were performed. The whole genome of Zaprionus indianus of Indian origin published earlier by us and the genome sequences of previously sequenced 12 Drosophila species available in the NCBI database were included in the analysis. The present work is a part of our ongoing genomics project of Indian Drosophila species.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.