sequences information management: Topics by Science.gov

Sample records for sequences information management

Information Avoidance Tendencies, Threat Management Resources, and Interest in Genetic Sequencing Feedback.

PubMed

Taber, Jennifer M; Klein, William M P; Ferrer, Rebecca A; Lewis, Katie L; Harris, Peter R; Shepperd, James A; Biesecker, Leslie G

2015-08-01

Information avoidance is a defensive strategy that undermines receipt of potentially beneficial but threatening health information and may especially occur when threat management resources are unavailable. We examined whether individual differences in information avoidance predicted intentions to receive genetic sequencing results for preventable and unpreventable (i.e., more threatening) disease and, secondarily, whether threat management resources of self-affirmation or optimism mitigated any effects. Participants (N = 493) in an NIH study (ClinSeq®) piloting the use of genome sequencing reported intentions to receive (optional) sequencing results and completed individual difference measures of information avoidance, self-affirmation, and optimism. Information avoidance tendencies corresponded with lower intentions to learn results, particularly for unpreventable diseases. The association was weaker among individuals higher in self-affirmation or optimism, but only for results regarding preventable diseases. Information avoidance tendencies may influence decisions to receive threatening health information; threat management resources hold promise for mitigating this association.
PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities.

PubMed

Troshin, Peter V; Postis, Vincent Lg; Ashworth, Denise; Baldwin, Stephen A; McPherson, Michael J; Barton, Geoffrey J

2011-03-07

Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.
PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

PubMed Central

2011-01-01

Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/. PMID:21385349
Organizing, exploring, and analyzing antibody sequence data: the case for relational-database managers.

PubMed

Owens, John

2009-01-01

Technological advances in the acquisition of DNA and protein sequence information and the resulting onrush of data can quickly overwhelm the scientist unprepared for the volume of information that must be evaluated and carefully dissected to discover its significance. Few laboratories have the luxury of dedicated personnel to organize, analyze, or consistently record a mix of arriving sequence data. A methodology based on a modern relational-database manager is presented that is both a natural storage vessel for antibody sequence information and a conduit for organizing and exploring sequence data and accompanying annotation text. The expertise necessary to implement such a plan is equal to that required by electronic word processors or spreadsheet applications. Antibody sequence projects maintained as independent databases are selectively unified by the relational-database manager into larger database families that contribute to local analyses, reports, interactive HTML pages, or exported to facilities dedicated to sophisticated sequence analysis techniques. Database files are transposable among current versions of Microsoft, Macintosh, and UNIX operating systems.
MendeLIMS: a web-based laboratory information management system for clinical genome sequencing.

PubMed

Grimes, Susan M; Ji, Hanlee P

2014-08-27

Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis. To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies. We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.
JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow.

PubMed

Latorre, Mariano; Silva, Herman; Saba, Juan; Guziolowski, Carito; Vizoso, Paula; Martinez, Veronica; Maldonado, Jonathan; Morales, Andrea; Caroca, Rodrigo; Cambiazo, Veronica; Campos-Vargas, Reinaldo; Gonzalez, Mauricio; Orellana, Ariel; Retamales, Julio; Meisel, Lee A

2006-11-23

Expressed sequence tag (EST) analyses provide a rapid and economical means to identify candidate genes that may be involved in a particular biological process. These ESTs are useful in many Functional Genomics studies. However, the large quantity and complexity of the data generated during an EST sequencing project can make the analysis of this information a daunting task. In an attempt to make this task friendlier, we have developed JUICE, an open source data management system (Apache + PHP + MySQL on Linux), which enables the user to easily upload, organize, visualize and search the different types of data generated in an EST project pipeline. In contrast to other systems, the JUICE data management system allows a branched pipeline to be established, modified and expanded, during the course of an EST project. The web interfaces and tools in JUICE enable the users to visualize the information in a graphical, user-friendly manner. The user may browse or search for sequences and/or sequence information within all the branches of the pipeline. The user can search using terms associated with the sequence name, annotation or other characteristics stored in JUICE and associated with sequences or sequence groups. Groups of sequences can be created by the user, stored in a clipboard and/or downloaded for further analyses. Different user profiles restrict the access of each user depending upon their role in the project. The user may have access exclusively to visualize sequence information, access to annotate sequences and sequence information, or administrative access. JUICE is an open source data management system that has been developed to aid users in organizing and analyzing the large amount of data generated in an EST Project workflow. JUICE has been used in one of the first functional genomics projects in Chile, entitled "Functional Genomics in nectarines: Platform to potentiate the competitiveness of Chile in fruit exportation". However, due to its ability to organize and visualize data from external pipelines, JUICE is a flexible data management system that should be useful for other EST/Genome projects. The JUICE data management system is released under the Open Source GNU Lesser General Public License (LGPL). JUICE may be downloaded from http://genoma.unab.cl/juice_system/ or http://www.genomavegetal.cl/juice_system/.
JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow

PubMed Central

Latorre, Mariano; Silva, Herman; Saba, Juan; Guziolowski, Carito; Vizoso, Paula; Martinez, Veronica; Maldonado, Jonathan; Morales, Andrea; Caroca, Rodrigo; Cambiazo, Veronica; Campos-Vargas, Reinaldo; Gonzalez, Mauricio; Orellana, Ariel; Retamales, Julio; Meisel, Lee A

2006-01-01

Background Expressed sequence tag (EST) analyses provide a rapid and economical means to identify candidate genes that may be involved in a particular biological process. These ESTs are useful in many Functional Genomics studies. However, the large quantity and complexity of the data generated during an EST sequencing project can make the analysis of this information a daunting task. Results In an attempt to make this task friendlier, we have developed JUICE, an open source data management system (Apache + PHP + MySQL on Linux), which enables the user to easily upload, organize, visualize and search the different types of data generated in an EST project pipeline. In contrast to other systems, the JUICE data management system allows a branched pipeline to be established, modified and expanded, during the course of an EST project. The web interfaces and tools in JUICE enable the users to visualize the information in a graphical, user-friendly manner. The user may browse or search for sequences and/or sequence information within all the branches of the pipeline. The user can search using terms associated with the sequence name, annotation or other characteristics stored in JUICE and associated with sequences or sequence groups. Groups of sequences can be created by the user, stored in a clipboard and/or downloaded for further analyses. Different user profiles restrict the access of each user depending upon their role in the project. The user may have access exclusively to visualize sequence information, access to annotate sequences and sequence information, or administrative access. Conclusion JUICE is an open source data management system that has been developed to aid users in organizing and analyzing the large amount of data generated in an EST Project workflow. JUICE has been used in one of the first functional genomics projects in Chile, entitled "Functional Genomics in nectarines: Platform to potentiate the competitiveness of Chile in fruit exportation". However, due to its ability to organize and visualize data from external pipelines, JUICE is a flexible data management system that should be useful for other EST/Genome projects. The JUICE data management system is released under the Open Source GNU Lesser General Public License (LGPL). JUICE may be downloaded from or . PMID:17123449
Self-guided management of exome and whole-genome sequencing results: changing the results return model.

PubMed

Yu, Joon-Ho; Jamal, Seema M; Tabor, Holly K; Bamshad, Michael J

2013-09-01

Researchers and clinicians face the practical and ethical challenge of if and how to offer for return the wide and varied scope of results available from individual exome sequencing and whole-genome sequencing. We argue that rather than viewing individual exome sequencing and whole-genome sequencing as a test for which results need to be "returned," that the technology should instead be framed as a dynamic resource of information from which results should be "managed" over the lifetime of an individual. We further suggest that individual exome sequencing and whole-genome sequencing results management is optimized using a self-guided approach that enables individuals to self-select among results offered for return in a convenient, confidential, personalized context that is responsive to their value system. This approach respects autonomy, allows individuals to maximize potential benefits of genomic information (beneficence) and minimize potential harms (nonmaleficence), and also preserves their right to an open future to the extent they desire or think is appropriate. We describe key challenges and advantages of such a self-guided management system and offer guidance on implementation using an information systems approach.
MACSIMS : multiple alignment of complete sequences information management system

PubMed Central

Thompson, Julie D; Muller, Arnaud; Waterhouse, Andrew; Procter, Jim; Barton, Geoffrey J; Plewniak, Frédéric; Poch, Olivier

2006-01-01

Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at . PMID:16792820
Storing and managing information artifacts collected by information analysts using a computing device

DOEpatents

Pike, William A; Riensche, Roderick M; Best, Daniel M; Roberts, Ian E; Whyatt, Marie V; Hart, Michelle L; Carr, Norman J; Thomas, James J

2012-09-18

Systems and computer-implemented processes for storage and management of information artifacts collected by information analysts using a computing device. The processes and systems can capture a sequence of interactive operation elements that are performed by the information analyst, who is collecting an information artifact from at least one of the plurality of software applications. The information artifact can then be stored together with the interactive operation elements as a snippet on a memory device, which is operably connected to the processor. The snippet comprises a view from an analysis application, data contained in the view, and the sequence of interactive operation elements stored as a provenance representation comprising operation element class, timestamp, and data object attributes for each interactive operation element in the sequence.
A laboratory information management system for DNA barcoding workflows.

PubMed

Vu, Thuy Duong; Eberhardt, Ursula; Szöke, Szániszló; Groenewald, Marizeth; Robert, Vincent

2012-07-01

This paper presents a laboratory information management system for DNA sequences (LIMS) created and based on the needs of a DNA barcoding project at the CBS-KNAW Fungal Biodiversity Centre (Utrecht, the Netherlands). DNA barcoding is a global initiative for species identification through simple DNA sequence markers. We aim at generating barcode data for all strains (or specimens) included in the collection (currently ca. 80 k). The LIMS has been developed to better manage large amounts of sequence data and to keep track of the whole experimental procedure. The system has allowed us to classify strains more efficiently as the quality of sequence data has improved, and as a result, up-to-date taxonomic names have been given to strains and more accurate correlation analyses have been carried out.
Omics Metadata Management Software (OMMS).

PubMed

Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo

2015-01-01

Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. The OMMS can be obtained at http://omms.sandia.gov.
Omics Metadata Management Software (OMMS)

PubMed Central

Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo

2015-01-01

Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. Availability The OMMS can be obtained at http://omms.sandia.gov PMID:26124554
ABM Drag_Pass Report Generator

NASA Technical Reports Server (NTRS)

Fisher, Forest; Gladden, Roy; Khanampornpan, Teerapat

2008-01-01

dragREPORT software was developed in parallel with abmREPORT, which is described in the preceding article. Both programs were built on the capabilities created during that process. This tool generates a drag_pass report that summarizes vital information from the MRO aerobreaking drag_pass build process to facilitate both sequence reviews and provide a high-level summarization of the sequence for mission management. The script extracts information from the ENV, SSF, FRF, SCMFmax, and OPTG files, presenting them in a single, easy-to-check report providing the majority of parameters needed for cross check and verification as part of the sequence review process. Prior to dragReport, all the needed information was spread across a number of different files, each in a different format. This software is a Perl script that extracts vital summarization information and build-process details from a number of source files into a single, concise report format used to aid the MPST sequence review process and to provide a high-level summarization of the sequence for mission management reference. This software could be adapted for future aerobraking missions to provide similar reports, review and summarization information.
Downsizing genomic medicine: approaching the ethical complexity of whole-genome sequencing by starting small.

PubMed

Sharp, Richard R

2011-03-01

As we look to a time when whole-genome sequencing is integrated into patient care, it is possible to anticipate a number of ethical challenges that will need to be addressed. The most intractable of these concern informed consent and the responsible management of very large amounts of genetic information. Given the range of possible findings, it remains unclear to what extent it will be possible to obtain meaningful patient consent to genomic testing. Equally unclear is how clinicians will disseminate the enormous volume of genetic information produced by whole-genome sequencing. Toward developing practical strategies for managing these ethical challenges, we propose a research agenda that approaches multiplexed forms of clinical genetic testing as natural laboratories in which to develop best practices for managing the ethical complexities of genomic medicine.
Highly Informative Simple Sequence Repeat (SSR) Markers for Fingerprinting Hazelnut

USDA-ARS?s Scientific Manuscript database

Simple sequence repeat (SSR) or microsatellite markers have many applications in breeding and genetic studies of plants, including fingerprinting of cultivars and investigations of genetic diversity, and therefore provide information for better management of germplasm collections. They are repeatab...
MetaLIMS, a simple open-source laboratory information management system for small metagenomic labs

PubMed Central

Gaultier, Nicolas Paul Eugène; Miller, Dana; Purbojati, Rikky Wenang; Lauro, Federico M.

2017-01-01

Abstract Background: As the cost of sequencing continues to fall, smaller groups increasingly initiate and manage larger sequencing projects and take on the complexity of data storage for high volumes of samples. This has created a need for low-cost laboratory information management systems (LIMS) that contain flexible fields to accommodate the unique nature of individual labs. Many labs do not have a dedicated information technology position, so LIMS must also be easy to setup and maintain with minimal technical proficiency. Findings: MetaLIMS is a free and open-source web-based application available via GitHub. The focus of MetaLIMS is to store sample metadata prior to sequencing and analysis pipelines. Initially designed for environmental metagenomics labs, in addition to storing generic sample collection information and DNA/RNA processing information, the user can also add fields specific to the user's lab. MetaLIMS can also produce a basic sequencing submission form compatible with the proprietary Clarity LIMS system used by some sequencing facilities. To help ease the technical burden associated with web deployment, MetaLIMS options the use of commercial web hosting combined with MetaLIMS bash scripts for ease of setup. Conclusions: MetaLIMS overcomes key challenges common in LIMS by giving labs access to a low-cost and open-source tool that also has the flexibility to meet individual lab needs and an option for easy deployment. By making the web application open source and hosting it on GitHub, we hope to encourage the community to build upon MetaLIMS, making it more robust and tailored to the needs of more researchers. PMID:28430964
MetaLIMS, a simple open-source laboratory information management system for small metagenomic labs.

PubMed

Heinle, Cassie Elizabeth; Gaultier, Nicolas Paul Eugène; Miller, Dana; Purbojati, Rikky Wenang; Lauro, Federico M

2017-06-01

As the cost of sequencing continues to fall, smaller groups increasingly initiate and manage larger sequencing projects and take on the complexity of data storage for high volumes of samples. This has created a need for low-cost laboratory information management systems (LIMS) that contain flexible fields to accommodate the unique nature of individual labs. Many labs do not have a dedicated information technology position, so LIMS must also be easy to setup and maintain with minimal technical proficiency. MetaLIMS is a free and open-source web-based application available via GitHub. The focus of MetaLIMS is to store sample metadata prior to sequencing and analysis pipelines. Initially designed for environmental metagenomics labs, in addition to storing generic sample collection information and DNA/RNA processing information, the user can also add fields specific to the user's lab. MetaLIMS can also produce a basic sequencing submission form compatible with the proprietary Clarity LIMS system used by some sequencing facilities. To help ease the technical burden associated with web deployment, MetaLIMS options the use of commercial web hosting combined with MetaLIMS bash scripts for ease of setup. MetaLIMS overcomes key challenges common in LIMS by giving labs access to a low-cost and open-source tool that also has the flexibility to meet individual lab needs and an option for easy deployment. By making the web application open source and hosting it on GitHub, we hope to encourage the community to build upon MetaLIMS, making it more robust and tailored to the needs of more researchers. © The Authors 2017. Published by Oxford University Press.
Genetic diversity of the captive Asian tapir population in Thailand, based on mitochondrial control region sequence data and the comparison of its nucleotide structure with Brazilian tapir.

PubMed

Muangkram, Yuttamol; Amano, Akira; Wajjwalku, Worawidh; Pinyopummintr, Tanu; Thongtip, Nikorn; Kaolim, Nongnid; Sukmak, Manakorn; Kamolnorranath, Sumate; Siriaroonrat, Boripat; Tipkantha, Wanlaya; Maikaew, Umaporn; Thomas, Warisara; Polsrila, Kanda; Dongsaard, Kwanreaun; Sanannu, Saowaphang; Wattananorrasate, Anuwat

2017-07-01

The Asian tapir (Tapirus indicus) has been classified as Endangered on the IUCN Red List of Threatened Species (2008). Genetic diversity data provide important information for the management of captive breeding and conservation of this species. We analyzed mitochondrial control region (CR) sequences from 37 captive Asian tapirs in Thailand. Multiple alignments of the full-length CR sequences sized 1268 bp comprised three domains as described in other mammal species. Analysis of 16 parsimony-informative variable sites revealed 11 haplotypes. Furthermore, the phylogenetic analysis using median-joining network clearly showed three clades correlated with our earlier cytochrome b gene study in this endangered species. The repetitive motif is located between first and second conserved sequence blocks, similar to the Brazilian tapir. The highest polymorphic site was located in the extended termination associated sequences domain. The results could be applied for future genetic management based in captivity and wild that shows stable populations.
Pilot Performance on New ATM Operations: Maintaining In-Trail Separation and Arrival Sequencing

NASA Technical Reports Server (NTRS)

Pritchett, Amy R.; Yankosky, L. J.; Johnson, Walter (Technical Monitor)

1999-01-01

Cockpit Display of Traffic Information (CDTI) may enable new Air Traffic Management (ATM) operations. However, CDTI is not the only source of traffic information in the cockpit; ATM procedures may provide information, implicitly and explicitly, about other aircraft. An experiment investigated pilot ability to perform two new ATM operations - maintaining in-trail separation from another aircraft and sequencing into an arrival stream. In the experiment, pilots were provided different amounts of information from displays and procedures. The results are described.

NG6: Integrated next generation sequencing storage and processing environment.

PubMed

Mariette, Jérôme; Escudié, Frédéric; Allias, Nicolas; Salin, Gérald; Noirot, Céline; Thomas, Sylvain; Klopp, Christophe

2012-09-09

Next generation sequencing platforms are now well implanted in sequencing centres and some laboratories. Upcoming smaller scale machines such as the 454 junior from Roche or the MiSeq from Illumina will increase the number of laboratories hosting a sequencer. In such a context, it is important to provide these teams with an easily manageable environment to store and process the produced reads. We describe a user-friendly information system able to manage large sets of sequencing data. It includes, on one hand, a workflow environment already containing pipelines adapted to different input formats (sff, fasta, fastq and qseq), different sequencers (Roche 454, Illumina HiSeq) and various analyses (quality control, assembly, alignment, diversity studies,…) and, on the other hand, a secured web site giving access to the results. The connected user will be able to download raw and processed data and browse through the analysis result statistics. The provided workflows can easily be modified or extended and new ones can be added. Ergatis is used as a workflow building, running and monitoring system. The analyses can be run locally or in a cluster environment using Sun Grid Engine. NG6 is a complete information system designed to answer the needs of a sequencing platform. It provides a user-friendly interface to process, store and download high-throughput sequencing data.
epiPATH: an information system for the storage and management of molecular epidemiology data from infectious pathogens.

PubMed

Amadoz, Alicia; González-Candelas, Fernando

2007-04-20

Most research scientists working in the fields of molecular epidemiology, population and evolutionary genetics are confronted with the management of large volumes of data. Moreover, the data used in studies of infectious diseases are complex and usually derive from different institutions such as hospitals or laboratories. Since no public database scheme incorporating clinical and epidemiological information about patients and molecular information about pathogens is currently available, we have developed an information system, composed by a main database and a web-based interface, which integrates both types of data and satisfies requirements of good organization, simple accessibility, data security and multi-user support. From the moment a patient arrives to a hospital or health centre until the processing and analysis of molecular sequences obtained from infectious pathogens in the laboratory, lots of information is collected from different sources. We have divided the most relevant data into 12 conceptual modules around which we have organized the database schema. Our schema is very complete and it covers many aspects of sample sources, samples, laboratory processes, molecular sequences, phylogenetics results, clinical tests and results, clinical information, treatments, pathogens, transmissions, outbreaks and bibliographic information. Communication between end-users and the selected Relational Database Management System (RDMS) is carried out by default through a command-line window or through a user-friendly, web-based interface which provides access and management tools for the data. epiPATH is an information system for managing clinical and molecular information from infectious diseases. It facilitates daily work related to infectious pathogens and sequences obtained from them. This software is intended for local installation in order to safeguard private data and provides advanced SQL-users the flexibility to adapt it to their needs. The database schema, tool scripts and web-based interface are free software but data stored in our database server are not publicly available. epiPATH is distributed under the terms of GNU General Public License. More details about epiPATH can be found at http://genevo.uv.es/epipath.
Cloud-based adaptive exon prediction for DNA analysis.

PubMed

Putluri, Srinivasareddy; Zia Ur Rahman, Md; Fathima, Shaik Yasmeen

2018-02-01

Cloud computing offers significant research and economic benefits to healthcare organisations. Cloud services provide a safe place for storing and managing large amounts of such sensitive data. Under conventional flow of gene information, gene sequence laboratories send out raw and inferred information via Internet to several sequence libraries. DNA sequencing storage costs will be minimised by use of cloud service. In this study, the authors put forward a novel genomic informatics system using Amazon Cloud Services, where genomic sequence information is stored and accessed for processing. True identification of exon regions in a DNA sequence is a key task in bioinformatics, which helps in disease identification and design drugs. Three base periodicity property of exons forms the basis of all exon identification techniques. Adaptive signal processing techniques found to be promising in comparison with several other methods. Several adaptive exon predictors (AEPs) are developed using variable normalised least mean square and its maximum normalised variants to reduce computational complexity. Finally, performance evaluation of various AEPs is done based on measures such as sensitivity, specificity and precision using various standard genomic datasets taken from National Center for Biotechnology Information genomic sequence database.
Rapid prenatal diagnosis using targeted exome sequencing: a cohort study to assess feasibility and potential impact on prenatal counseling and pregnancy management.

PubMed

Chandler, Natalie; Best, Sunayna; Hayward, Jane; Faravelli, Francesca; Mansour, Sahar; Kivuva, Emma; Tapon, Dagmar; Male, Alison; DeVile, Catherine; Chitty, Lyn S

2018-03-29

PurposeUnexpected fetal abnormalities occur in 2-5% of pregnancies. While traditional cytogenetic and microarray approaches achieve diagnosis in around 40% of cases, lack of diagnosis in others impedes parental counseling, informed decision making, and pregnancy management. Postnatally exome sequencing yields high diagnostic rates, but relies on careful phenotyping to interpret genotype results. Here we used a multidisciplinary approach to explore the utility of rapid fetal exome sequencing for prenatal diagnosis using skeletal dysplasias as an exemplar.MethodsParents in pregnancies undergoing invasive testing because of sonographic fetal abnormalities, where multidisciplinary review considered skeletal dysplasia a likely etiology, were consented for exome trio sequencing (both parents and fetus). Variant interpretation focused on a virtual panel of 240 genes known to cause skeletal dysplasias.ResultsDefinitive molecular diagnosis was made in 13/16 (81%) cases. In some cases, fetal ultrasound findings alone were of sufficient severity for parents to opt for termination. In others, molecular diagnosis informed accurate prediction of outcome, improved parental counseling, and enabled parents to terminate or continue the pregnancy with certainty.ConclusionTrio sequencing with expert multidisciplinary review for case selection and data interpretation yields timely, high diagnostic rates in fetuses presenting with unexpected skeletal abnormalities. This improves parental counseling and pregnancy management.Genetics in Medicine advance online publication, 29 March 2018; doi:10.1038/gim.2018.30.
Life-cycle analysis of dryland greenhouse gases affected by cropping sequence and nitrogen fertilization

USDA-ARS?s Scientific Manuscript database

Little information is available about management practices effect on net global warming potential (GWP) and greenhouse gas intensity (GHGI) under dryland cropping systems. We evaluated the effects of cropping sequences (conventional till malt barley-fallow [CTB-F], no-till malt barley-pea [NTB-P], a...
System for Informatics in the Molecular Pathology Laboratory: An Open-Source End-to-End Solution for Next-Generation Sequencing Clinical Data Management.

PubMed

Kang, Wenjun; Kadri, Sabah; Puranik, Rutika; Wurst, Michelle N; Patil, Sushant A; Mujacic, Ibro; Benhamed, Sonia; Niu, Nifang; Zhen, Chao Jie; Ameti, Bekim; Long, Bradley C; Galbo, Filipo; Montes, David; Iracheta, Crystal; Gamboa, Venessa L; Lopez, Daisy; Yourshaw, Michael; Lawrence, Carolyn A; Aisner, Dara L; Fitzpatrick, Carrie; McNerney, Megan E; Wang, Y Lynn; Andrade, Jorge; Volchenboum, Samuel L; Furtado, Larissa V; Ritterhouse, Lauren L; Segal, Jeremy P

2018-04-24

Next-generation sequencing (NGS) diagnostic assays increasingly are becoming the standard of care in oncology practice. As the scale of an NGS laboratory grows, management of these assays requires organizing large amounts of information, including patient data, laboratory processes, genomic data, as well as variant interpretation and reporting. Although several Laboratory Information Systems and/or Laboratory Information Management Systems are commercially available, they may not meet all of the needs of a given laboratory, in addition to being frequently cost-prohibitive. Herein, we present the System for Informatics in the Molecular Pathology Laboratory, a free and open-source Laboratory Information System/Laboratory Information Management System for academic and nonprofit molecular pathology NGS laboratories, developed at the Genomic and Molecular Pathology Division at the University of Chicago Medicine. The System for Informatics in the Molecular Pathology Laboratory was designed as a modular end-to-end information system to handle all stages of the NGS laboratory workload from test order to reporting. We describe the features of the system, its clinical validation at the Genomic and Molecular Pathology Division at the University of Chicago Medicine, and its installation and testing within a different academic center laboratory (University of Colorado), and we propose a platform for future community co-development and interlaboratory data sharing. Copyright © 2018. Published by Elsevier Inc.
Genome puzzle master (GPM): an integrated pipeline for building and editing pseudomolecules from fragmented sequences.

PubMed

Zhang, Jianwei; Kudrna, Dave; Mu, Ting; Li, Weiming; Copetti, Dario; Yu, Yeisoo; Goicoechea, Jose Luis; Lei, Yang; Wing, Rod A

2016-10-15

Next generation sequencing technologies have revolutionized our ability to rapidly and affordably generate vast quantities of sequence data. Once generated, raw sequences are assembled into contigs or scaffolds. However, these assemblies are mostly fragmented and inaccurate at the whole genome scale, largely due to the inability to integrate additional informative datasets (e.g. physical, optical and genetic maps). To address this problem, we developed a semi-automated software tool-Genome Puzzle Master (GPM)-that enables the integration of additional genomic signposts to edit and build 'new-gen-assemblies' that result in high-quality 'annotation-ready' pseudomolecules. With GPM, loaded datasets can be connected to each other via their logical relationships which accomplishes tasks to 'group,' 'merge,' 'order and orient' sequences in a draft assembly. Manual editing can also be performed with a user-friendly graphical interface. Final pseudomolecules reflect a user's total data package and are available for long-term project management. GPM is a web-based pipeline and an important part of a Laboratory Information Management System (LIMS) which can be easily deployed on local servers for any genome research laboratory. The GPM (with LIMS) package is available at https://github.com/Jianwei-Zhang/LIMS CONTACTS: jzhang@mail.hzau.edu.cn or rwing@mail.arizona.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Drafting Lab Management Guide.

ERIC Educational Resources Information Center

Ohio State Univ., Columbus. Instructional Materials Lab.

This manual was developed to guide drafting instructors and vocational supervisors in sequencing laboratory instruction and controlling the flow of work for a 2-year machine trades training program. The first part of the guide provides information on program management (program description, safety concerns, academic issues, implementation…
Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

PubMed

Mackey, Aaron J; Pearson, William R

2004-10-01

Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.
33 CFR 385.30 - Master Implementation Sequencing Plan.

Code of Federal Regulations, 2010 CFR

2010-07-01

... projects of the Plan, including pilot projects and operational elements, based on the best scientific... Florida Water Management District shall also consult with the South Florida Ecosystem Restoration Task...; (ii) Information obtained from pilot projects; (iii) Updated funding information; (iv) Approved...
Machine Trades Lab Management Guide.

ERIC Educational Resources Information Center

Ohio State Univ., Columbus. Instructional Materials Lab.

This manual was developed to guide machine trades instructors and vocational supervisors in sequencing laboratory instruction and controlling the flow of work for a 2-year machine trades training program. The first part of the guide provides information on program management (program description, safety concerns, academic issues, implementation…
MyLabStocks: a web-application to manage molecular biology materials

PubMed Central

Chuffart, Florent; Yvert, Gaël

2014-01-01

Laboratory stocks are the hardware of research. They must be stored and managed with mimimum loss of material and information. Plasmids, oligonucleotides and strains are regularly exchanged between collaborators within and between laboratories. Managing and sharing information about every item is crucial for retrieval of reagents, for planning experiments and for reproducing past experimental results. We have developed a web-based application to manage stocks commonly used in a molecular biology laboratory. Its functionalities include user-defined privileges, visualization of plasmid maps directly from their sequence and the capacity to search items from fields of annotation or directly from a query sequence using BLAST. It is designed to handle records of plasmids, oligonucleotides, yeast strains, antibodies, pipettes and notebooks. Based on PHP/MySQL, it can easily be extended to handle other types of stocks and it can be installed on any server architecture. MyLabStocks is freely available from: https://forge.cbp.ens-lyon.fr/redmine/projects/mylabstocks under an open source licence. PMID:24643870
Cloud-based adaptive exon prediction for DNA analysis

PubMed Central

Putluri, Srinivasareddy; Fathima, Shaik Yasmeen

2018-01-01

Cloud computing offers significant research and economic benefits to healthcare organisations. Cloud services provide a safe place for storing and managing large amounts of such sensitive data. Under conventional flow of gene information, gene sequence laboratories send out raw and inferred information via Internet to several sequence libraries. DNA sequencing storage costs will be minimised by use of cloud service. In this study, the authors put forward a novel genomic informatics system using Amazon Cloud Services, where genomic sequence information is stored and accessed for processing. True identification of exon regions in a DNA sequence is a key task in bioinformatics, which helps in disease identification and design drugs. Three base periodicity property of exons forms the basis of all exon identification techniques. Adaptive signal processing techniques found to be promising in comparison with several other methods. Several adaptive exon predictors (AEPs) are developed using variable normalised least mean square and its maximum normalised variants to reduce computational complexity. Finally, performance evaluation of various AEPs is done based on measures such as sensitivity, specificity and precision using various standard genomic datasets taken from National Center for Biotechnology Information genomic sequence database. PMID:29515813
Aerobraking Maneuver (ABM) Report Generator

NASA Technical Reports Server (NTRS)

Fisher, Forrest; Gladden, Roy; Khanampornpan, Teerapat

2008-01-01

abmREPORT Version 3.1 is a Perl script that extracts vital summarization information from the Mars Reconnaissance Orbiter (MRO) aerobraking ABM build process. This information facilitates sequence reviews, and provides a high-level summarization of the sequence for mission management. The script extracts information from the ENV, SSF, FRF, SCMFmax, and OPTG files and burn magnitude configuration files and presents them in a single, easy-to-check report that provides the majority of the parameters necessary for cross check and verification during the sequence review process. This means that needed information, formerly spread across a number of different files and each in a different format, is all available in this one application. This program is built on the capabilities developed in dragReport and then the scripts evolved as the two tools continued to be developed in parallel.
Update on Genomic Databases and Resources at the National Center for Biotechnology Information.

PubMed

Tatusova, Tatiana

2016-01-01

The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data.
Integrating Genomic Data Sets for Knowledge Discovery: An Informed Approach to Management of Captive Endangered Species.

PubMed

Irizarry, Kristopher J L; Bryant, Doug; Kalish, Jordan; Eng, Curtis; Schmidt, Peggy L; Barrett, Gini; Barr, Margaret C

2016-01-01

Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs) that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.
Integrating Genomic Data Sets for Knowledge Discovery: An Informed Approach to Management of Captive Endangered Species

PubMed Central

Irizarry, Kristopher J. L.; Bryant, Doug; Kalish, Jordan; Eng, Curtis; Schmidt, Peggy L.; Barrett, Gini; Barr, Margaret C.

2016-01-01

Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs) that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management. PMID:27376076
Information Flow Analysis of Level 4 Payload Processing Operations

NASA Technical Reports Server (NTRS)

Danz, Mary E.

1991-01-01

The Level 4 Mission Sequence Test (MST) was studied to develop strategies and recommendations to facilitate information flow. Recommendations developed as a result of this study include revised format of the Test and Assembly Procedure (TAP) document and a conceptualized software based system to assist in the management of information flow during the MST.
A public HTLV-1 molecular epidemiology database for sequence management and data mining.

PubMed

Araujo, Thessika Hialla Almeida; Souza-Brito, Leandro Inacio; Libin, Pieter; Deforche, Koen; Edwards, Dustin; de Albuquerque-Junior, Antonio Eduardo; Vandamme, Anne-Mieke; Galvao-Castro, Bernardo; Alcantara, Luiz Carlos Junior

2012-01-01

It is estimated that 15 to 20 million people are infected with the human T-cell lymphotropic virus type 1 (HTLV-1). At present, there are more than 2,000 unique HTLV-1 isolate sequences published. A central database to aggregate sequence information from a range of epidemiological aspects including HTLV-1 infections, pathogenesis, origins, and evolutionary dynamics would be useful to scientists and physicians worldwide. Described here, we have developed a database that collects and annotates sequence data and can be accessed through a user-friendly search interface. The HTLV-1 Molecular Epidemiology Database website is available at http://htlv1db.bahia.fiocruz.br/. All data was obtained from publications available at GenBank or through contact with the authors. The database was developed using Apache Webserver 2.1.6 and SGBD MySQL. The webpage interfaces were developed in HTML and sever-side scripting written in PHP. The HTLV-1 Molecular Epidemiology Database is hosted on the Gonçalo Moniz/FIOCRUZ Research Center server. There are currently 2,457 registered sequences with 2,024 (82.37%) of those sequences representing unique isolates. Of these sequences, 803 (39.67%) contain information about clinical status (TSP/HAM, 17.19%; ATL, 7.41%; asymptomatic, 12.89%; other diseases, 2.17%; and no information, 60.32%). Further, 7.26% of sequences contain information on patient gender while 5.23% of sequences provide the age of the patient. The HTLV-1 Molecular Epidemiology Database retrieves and stores annotated HTLV-1 proviral sequences from clinical, epidemiological, and geographical studies. The collected sequences and related information are now accessible on a publically available and user-friendly website. This open-access database will support clinical research and vaccine development related to viral genotype.
Analysis and design of hospital management information system based on UML

NASA Astrophysics Data System (ADS)

Ma, Lin; Zhao, Huifang; You, Shi Jun; Ge, Wenyong

2018-05-01

With the rapid development of computer technology, computer information management system has been utilized in many industries. Hospital Information System (HIS) is in favor of providing data for directors, lightening the workload for the medical workers, and improving the workers efficiency. According to the HIS demand analysis and system design, this paper focus on utilizing unified modeling language (UML) models to establish the use case diagram, class diagram, sequence chart and collaboration diagram, and satisfying the demands of the daily patient visit, inpatient, drug management and other relevant operations. At last, the paper summarizes the problems of the system and puts forward an outlook of the HIS system.

Creating databases for biological information: an introduction.

PubMed

Stein, Lincoln

2002-08-01

The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, and relational databases, as well as ACeDB. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system.
Dryland soil chemical properties and crop yields affected by long-term tillage and cropping sequence

USDA-ARS?s Scientific Manuscript database

Information on the effect of long-term management on soil nutrients and chemical properties is scanty. We examined the 30-yr effect of tillage frequency and cropping sequence combination on dryland soil Olsen-P, K, Ca, Mg, Na, SO4-S, and Zn concentrations, pH, electrical conductivity (EC), and catio...
Genome puzzle master (GPM): an integrated pipeline for building and editing pseudomolecules from fragmented sequences

PubMed Central

Zhang, Jianwei; Kudrna, Dave; Mu, Ting; Li, Weiming; Copetti, Dario; Yu, Yeisoo; Goicoechea, Jose Luis; Lei, Yang; Wing, Rod A.

2016-01-01

Abstract Motivation: Next generation sequencing technologies have revolutionized our ability to rapidly and affordably generate vast quantities of sequence data. Once generated, raw sequences are assembled into contigs or scaffolds. However, these assemblies are mostly fragmented and inaccurate at the whole genome scale, largely due to the inability to integrate additional informative datasets (e.g. physical, optical and genetic maps). To address this problem, we developed a semi-automated software tool—Genome Puzzle Master (GPM)—that enables the integration of additional genomic signposts to edit and build ‘new-gen-assemblies’ that result in high-quality ‘annotation-ready’ pseudomolecules. Results: With GPM, loaded datasets can be connected to each other via their logical relationships which accomplishes tasks to ‘group,’ ‘merge,’ ‘order and orient’ sequences in a draft assembly. Manual editing can also be performed with a user-friendly graphical interface. Final pseudomolecules reflect a user’s total data package and are available for long-term project management. GPM is a web-based pipeline and an important part of a Laboratory Information Management System (LIMS) which can be easily deployed on local servers for any genome research laboratory. Availability and Implementation: The GPM (with LIMS) package is available at https://github.com/Jianwei-Zhang/LIMS Contacts: jzhang@mail.hzau.edu.cn or rwing@mail.arizona.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27318200
The coffee genome hub: a resource for coffee genomes

PubMed Central

Dereeper, Alexis; Bocs, Stéphanie; Rouard, Mathieu; Guignon, Valentin; Ravel, Sébastien; Tranchant-Dubreuil, Christine; Poncet, Valérie; Garsmeur, Olivier; Lashermes, Philippe; Droc, Gaëtan

2015-01-01

The whole genome sequence of Coffea canephora, the perennial diploid species known as Robusta, has been recently released. In the context of the C. canephora genome sequencing project and to support post-genomics efforts, we developed the Coffee Genome Hub (http://coffee-genome.org/), an integrative genome information system that allows centralized access to genomics and genetics data and analysis tools to facilitate translational and applied research in coffee. We provide the complete genome sequence of C. canephora along with gene structure, gene product information, metabolism, gene families, transcriptomics, syntenic blocks, genetic markers and genetic maps. The hub relies on generic software (e.g. GMOD tools) for easy querying, visualizing and downloading research data. It includes a Genome Browser enhanced by a Community Annotation System, enabling the improvement of automatic gene annotation through an annotation editor. In addition, the hub aims at developing interoperability among other existing South Green tools managing coffee data (phylogenomics resources, SNPs) and/or supporting data analyses with the Galaxy workflow manager. PMID:25392413
SMITH: a LIMS for handling next-generation sequencing workflows

PubMed Central

2014-01-01

Background Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). Methods SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. Results SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. Conclusions SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis. PMID:25471934
SMITH: a LIMS for handling next-generation sequencing workflows.

PubMed

Venco, Francesco; Vaskin, Yuriy; Ceol, Arnaud; Muller, Heiko

2014-01-01

Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis.
FOUNTAIN: A JAVA open-source package to assist large sequencing projects

PubMed Central

Buerstedde, Jean-Marie; Prill, Florian

2001-01-01

Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort. PMID:11591214
Effects of Interpersonal Goals on Inadvertent Intrapersonal Influence in Computer-Mediated Communication

ERIC Educational Resources Information Center

Walther, Joseph B.; Van Der Heide, Brandon; Tong, Stephanie Tom; Carr, Caleb T.; Atkin, Charles K.

2010-01-01

This research explores a sequence of effects pertaining to the influence of relational goals on online information seeking, the use of information and arguments as relational management strategies in computer-mediated chat, and the intrapersonal attitude change resulting from these processes. Affinity versus disaffinity goals affected…
Mitochondrial genome sequences illuminate maternal lineages of conservation concern in a rare carnivore

Treesearch

Brian J. Knaus; Richard Cronn; Aaron Liston; Kristine Pilgrim; Michael K. Schwartz

2011-01-01

Science-based wildlife management relies on genetic information to infer population connectivity and identify conservation units. The most commonly used genetic marker for characterizing animal biodiversity and identifying maternal lineages is the mitochondrial genome. Mitochondrial genotyping figures prominently in conservation and management plans, with much of the...
MyLabStocks: a web-application to manage molecular biology materials.

PubMed

Chuffart, Florent; Yvert, Gaël

2014-05-01

Laboratory stocks are the hardware of research. They must be stored and managed with mimimum loss of material and information. Plasmids, oligonucleotides and strains are regularly exchanged between collaborators within and between laboratories. Managing and sharing information about every item is crucial for retrieval of reagents, for planning experiments and for reproducing past experimental results. We have developed a web-based application to manage stocks commonly used in a molecular biology laboratory. Its functionalities include user-defined privileges, visualization of plasmid maps directly from their sequence and the capacity to search items from fields of annotation or directly from a query sequence using BLAST. It is designed to handle records of plasmids, oligonucleotides, yeast strains, antibodies, pipettes and notebooks. Based on PHP/MySQL, it can easily be extended to handle other types of stocks and it can be installed on any server architecture. MyLabStocks is freely available from: https://forge.cbp.ens-lyon.fr/redmine/projects/mylabstocks under an open source licence. © 2014 Laboratoire de Biologie Moleculaire de la Cellule CNRS. Yeast published by John Wiley & Sons, Ltd.
Creating databases for biological information: an introduction.

PubMed

Stein, Lincoln

2013-06-01

The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, relational databases, and NoSQL databases. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system. Copyright 2013 by JohnWiley & Sons, Inc.
Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA.

PubMed

Xu, Weijia; Ozer, Stuart; Gutell, Robin R

2009-01-01

With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure.
Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA

PubMed Central

Xu, Weijia; Ozer, Stuart; Gutell, Robin R.

2010-01-01

With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure. PMID:20502534
A proposed clinical decision support architecture capable of supporting whole genome sequence information.

PubMed

Welch, Brandon M; Loya, Salvador Rodriguez; Eilbeck, Karen; Kawamoto, Kensaku

2014-04-04

Whole genome sequence (WGS) information may soon be widely available to help clinicians personalize the care and treatment of patients. However, considerable barriers exist, which may hinder the effective utilization of WGS information in a routine clinical care setting. Clinical decision support (CDS) offers a potential solution to overcome such barriers and to facilitate the effective use of WGS information in the clinic. However, genomic information is complex and will require significant considerations when developing CDS capabilities. As such, this manuscript lays out a conceptual framework for a CDS architecture designed to deliver WGS-guided CDS within the clinical workflow. To handle the complexity and breadth of WGS information, the proposed CDS framework leverages service-oriented capabilities and orchestrates the interaction of several independently-managed components. These independently-managed components include the genome variant knowledge base, the genome database, the CDS knowledge base, a CDS controller and the electronic health record (EHR). A key design feature is that genome data can be stored separately from the EHR. This paper describes in detail: (1) each component of the architecture; (2) the interaction of the components; and (3) how the architecture attempts to overcome the challenges associated with WGS information. We believe that service-oriented CDS capabilities will be essential to using WGS information for personalized medicine.
A Proposed Clinical Decision Support Architecture Capable of Supporting Whole Genome Sequence Information

PubMed Central

Welch, Brandon M.; Rodriguez Loya, Salvador; Eilbeck, Karen; Kawamoto, Kensaku

2014-01-01

Whole genome sequence (WGS) information may soon be widely available to help clinicians personalize the care and treatment of patients. However, considerable barriers exist, which may hinder the effective utilization of WGS information in a routine clinical care setting. Clinical decision support (CDS) offers a potential solution to overcome such barriers and to facilitate the effective use of WGS information in the clinic. However, genomic information is complex and will require significant considerations when developing CDS capabilities. As such, this manuscript lays out a conceptual framework for a CDS architecture designed to deliver WGS-guided CDS within the clinical workflow. To handle the complexity and breadth of WGS information, the proposed CDS framework leverages service-oriented capabilities and orchestrates the interaction of several independently-managed components. These independently-managed components include the genome variant knowledge base, the genome database, the CDS knowledge base, a CDS controller and the electronic health record (EHR). A key design feature is that genome data can be stored separately from the EHR. This paper describes in detail: (1) each component of the architecture; (2) the interaction of the components; and (3) how the architecture attempts to overcome the challenges associated with WGS information. We believe that service-oriented CDS capabilities will be essential to using WGS information for personalized medicine. PMID:25411644
Microbial profiling, neural network and semantic web: an integrated information system for human pathogen risk management, prevention and surveillance in food safety

USDA-ARS?s Scientific Manuscript database

It is estimated that food-borne pathogens cause approximately 76 million cases of gastrointestinal illnesses, 325,000 hospitalizations, and 5,000 deaths in the United States annually. Genomic, proteomic, and metabolomic studies, particularly, genome sequencing projects are providing valuable inform...
Development of Genomic Simple Sequence Repeats (SSR) by Enrichment Libraries in Date Palm.

PubMed

Al-Faifi, Sulieman A; Migdadi, Hussein M; Algamdi, Salem S; Khan, Mohammad Altaf; Al-Obeed, Rashid S; Ammar, Megahed H; Jakse, Jerenj

2017-01-01

Development of highly informative markers such as simple sequence repeats (SSR) for cultivar identification and germplasm characterization and management is essential for date palms genetic studies. The present study documents the development of SSR markers and assesses genetic relationships of commonly grown date palm (Phoenix dactylifera L.) cultivars in different geographical regions of Saudi Arabia. A total of 93 novel simple sequence repeat (SSR) markers were screened for their ability to detect polymorphism in date palm. Around 71% of genomic SSRs are dinucleotide, 25% trinucleotide, 3% tetranucleotide, and 1% pentanucleotide motives and show 100% polymorphism. The Unweighted Pair Group Method with Arithmetic Mean (UPGMA) cluster analysis illustrates that cultivars trend to group according to their class of maturity, region of cultivation, and fruit color. Analysis of molecular variations (AMOVA) reveals genetic variation among and within cultivars of 27% and 73%, respectively, according to the geographical distribution of the cultivars. Developed microsatellite markers are of additional value to date palm characterization, tools which can be used by researchers in population genetics, cultivar identification, as well as genetic resource exploration and management. The cultivars tested exhibited a significant amount of genetic diversity and could be suitable for successful breeding programs. Genomic sequences generated from this study are available at the National Center for Biotechnology Information (NCBI), Sequence Read Archive (Accession numbers. LIBGSS_039019).
Content Analysis of Informed Consent for Whole Genome Sequencing Offered by Direct-to-Consumer Genetic Testing Companies.

PubMed

Niemiec, Emilia; Borry, Pascal; Pinxten, Wim; Howard, Heidi Carmen

2016-12-01

Whole exome sequencing (WES) and whole genome sequencing (WGS) have become increasingly available in the research and clinical settings and are now also being offered by direct-to-consumer (DTC) genetic testing (GT) companies. This offer can be perceived as amplifying the already identified concerns regarding adequacy of informed consent (IC) for both WES/WGS and the DTC GT context. We performed a qualitative content analysis of Websites of four companies offering WES/WGS DTC regarding the following elements of IC: pre-test counseling, benefits and risks, and incidental findings (IFs). The analysis revealed concerns, including the potential lack of pre-test counseling in three of the companies studied, missing relevant information in the risks and benefits sections, and potentially misleading information for consumers. Regarding IFs, only one company, which provides opportunistic screening, provides basic information about their management. In conclusion, some of the information (and related practices) present on the companies' Web pages salient to the consent process are not adequate in reference to recommendations for IC for WGS or WES in the clinical context. Requisite resources should be allocated to ensure that commercial companies are offering high-throughput sequencing under responsible conditions, including an adequate consent process. © 2016 WILEY PERIODICALS, INC.
PLAN: a web platform for automating high-throughput BLAST searches and for managing and mining results.

PubMed

He, Ji; Dai, Xinbin; Zhao, Xuechun

2007-02-09

BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Personal BLAST Navigator (PLAN) is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1) query and target sequence database management, (2) automated high-throughput BLAST searching, (3) indexing and searching of results, (4) filtering results online, (5) managing results of personal interest in favorite categories, (6) automated sequence annotation (such as NCBI NR and ontology-based annotation). PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results. The PLAN web interface is platform-independent, easily configurable and capable of comprehensive expansion, and user-intuitive. PLAN is freely available to academic users at http://bioinfo.noble.org/plan/. The source code for local deployment is provided under free license. Full support on system utilization, installation, configuration and customization are provided to academic users.
PLAN: a web platform for automating high-throughput BLAST searches and for managing and mining results

PubMed Central

He, Ji; Dai, Xinbin; Zhao, Xuechun

2007-01-01

Background BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Results Personal BLAST Navigator (PLAN) is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1) query and target sequence database management, (2) automated high-throughput BLAST searching, (3) indexing and searching of results, (4) filtering results online, (5) managing results of personal interest in favorite categories, (6) automated sequence annotation (such as NCBI NR and ontology-based annotation). PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. Conclusion PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results. The PLAN web interface is platform-independent, easily configurable and capable of comprehensive expansion, and user-intuitive. PLAN is freely available to academic users at . The source code for local deployment is provided under free license. Full support on system utilization, installation, configuration and customization are provided to academic users. PMID:17291345

Mismatch repair defects and Lynch syndrome: The role of the basic scientist in the battle against cancer.

PubMed

Heinen, Christopher D

2016-02-01

We have currently entered a genomic era of cancer research which may soon lead to a genomic era of cancer treatment. Patient DNA sequencing information may lead to a personalized approach to managing an individual's cancer as well as future cancer risk. The success of this approach, however, begins not necessarily in the clinician's office, but rather at the laboratory bench of the basic scientist. The basic scientist plays a critical role since the DNA sequencing information is of limited use unless one knows the function of the gene that is altered and the manner by which a sequence alteration affects that function. The role of basic science research in aiding the clinical management of a disease is perhaps best exemplified by considering the case of Lynch syndrome, a hereditary disease that predisposes patients to colorectal and other cancers. This review will examine how the diagnosis, treatment and even prevention of Lynch syndrome-associated cancers has benefitted from extensive basic science research on the DNA mismatch repair genes whose alteration underlies this condition. Copyright © 2015 Elsevier B.V. All rights reserved.
Comparison of neonatal MRI examinations with and without an MR-compatible incubator: advantages in examination feasibility and clinical decision-making.

PubMed

Rona, Z; Klebermass, K; Cardona, F; Czaba, C D; Brugger, P C; Weninger, M; Pollak, A; Prayer, D

2010-09-01

To assess the utility of an MRI-compatible incubator (INC) by comparing. In a retrospective study, the clinical and radiological aspects of 129 neonatal MRI examinations during a 3 year period were analyzed. Routine protocols including fast spin-echo T2-weighted (w) sequences, axial T1w, Gradient-echo, diffusion sequences, and 3D T1 gradient-echo sequences were performed routinely, angiography and spectroscopy were added in some cases. Diffusion-tensor imaging was done in 50% of the babies examined in the INC and 26% without INC. Sequences, adapted from fetal MR-protocols were done in infants younger than 32 gestational weeks. Benefit from MR-information with respect to further management was evaluated. The number of the examinations increased (30-99), while the mean age (43-38, 8 weeks of gestational age) and weight (3308-2766 g) decreased significantly with the use of the MR-compatible incubator. The mean imaging time (34, 43-30, 29 min) decreased, with a mean of one additionally performed sequence in the INC group. All infants received sedatives according to our anaesthetic protocol preceding imaging, but a repeated dose was never necessary (10% without INC) using the INC. Regarding all cases, MR-based changes in clinical management were initiated in 58%, while in 57% of cases the initial ultrasound diagnosis was changed or further specified. The use of the INC enables the MR access of unstable infants with suspect CNS problems to the management, of whom is improved by MR information to significantly higher percentage, than without INC. Copyright (c) 2010 European Paediatric Neurology Society. Published by Elsevier Ltd. All rights reserved.
The Use of Genomics in Conservation Management of the Endangered Visayan Warty Pig (Sus cebifrons).

PubMed

Nuijten, Rascha J M; Bosse, Mirte; Crooijmans, Richard P M A; Madsen, Ole; Schaftenaar, Willem; Ryder, Oliver A; Groenen, Martien A M; Megens, Hendrik-Jan

2016-01-01

The list of threatened and endangered species is growing rapidly, due to various anthropogenic causes. Many endangered species are present in captivity and actively managed in breeding programs in which often little is known about the founder individuals. Recent developments in genetic research techniques have made it possible to sequence and study whole genomes. In this study we used the critically endangered Visayan warty pig (Sus cebifrons) as a case study to test the use of genomic information as a tool in conservation management. Two captive populations of S. cebifrons exist, which originated from two different Philippine islands. We found some evidence for a recent split between the two island populations; however all individuals that were sequenced show a similar demographic history. Evidence for both past and recent inbreeding indicated that the founders were at least to some extent related. Together with this, the low level of nucleotide diversity compared to other Sus species potentially poses a threat to the viability of the captive populations. In conclusion, genomic techniques answered some important questions about this critically endangered mammal and can be a valuable toolset to inform future conservation management in other species as well.
ORFer--retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files.

PubMed

Büssow, Konrad; Hoffmann, Steve; Sievert, Volker

2002-12-19

Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.
Genomics and breeding in food crops

USDA-ARS?s Scientific Manuscript database

Plant biology is in the midst of a revolution. The generation of tremendous volumes of sequence information introduce new technical challenges into plant biology and agriculture. The relatively new field of bioinformatics addresses these challenges by utilizing efficient data management strategies;...
The Clinical Next-Generation Sequencing Database: A Tool for the Unified Management of Clinical Information and Genetic Variants to Accelerate Variant Pathogenicity Classification.

PubMed

Nishio, Shin-Ya; Usami, Shin-Ichi

2017-03-01

Recent advances in next-generation sequencing (NGS) have given rise to new challenges due to the difficulties in variant pathogenicity interpretation and large dataset management, including many kinds of public population databases as well as public or commercial disease-specific databases. Here, we report a new database development tool, named the "Clinical NGS Database," for improving clinical NGS workflow through the unified management of variant information and clinical information. This database software offers a two-feature approach to variant pathogenicity classification. The first of these approaches is a phenotype similarity-based approach. This database allows the easy comparison of the detailed phenotype of each patient with the average phenotype of the same gene mutation at the variant or gene level. It is also possible to browse patients with the same gene mutation quickly. The other approach is a statistical approach to variant pathogenicity classification based on the use of the odds ratio for comparisons between the case and the control for each inheritance mode (families with apparently autosomal dominant inheritance vs. control, and families with apparently autosomal recessive inheritance vs. control). A number of case studies are also presented to illustrate the utility of this database. © 2016 The Authors. **Human Mutation published by Wiley Periodicals, Inc.
The business value and cost-effectiveness of genomic medicine.

PubMed

Crawford, James M; Aspinall, Mara G

2012-05-01

Genomic medicine offers the promise of more effective diagnosis and treatment of human diseases. Genome sequencing early in the course of disease may enable more timely and informed intervention, with reduced healthcare costs and improved long-term outcomes. However, genomic medicine strains current models for demonstrating value, challenging efforts to achieve fair payment for services delivered, both for laboratory diagnostics and for use of molecular information in clinical management. Current models of healthcare reform stipulate that care must be delivered at equal or lower cost, with better patient and population outcomes. To achieve demonstrated value, genomic medicine must overcome many uncertainties: the clinical relevance of genomic variation; potential variation in technical performance and/or computational analysis; management of massive information sets; and must have available clinical interventions that can be informed by genomic analysis, so as to attain more favorable cost management of healthcare delivery and demonstrate improvements in cost-effectiveness.
Personal Genomic Information Management and Personalized Medicine: Challenges, Current Solutions, and Roles of HIM Professionals

PubMed Central

Alzu'bi, Amal; Zhou, Leming; Watzlaf, Valerie

2014-01-01

In recent years, the term personalized medicine has received more and more attention in the field of healthcare. The increasing use of this term is closely related to the astonishing advancement in DNA sequencing technologies and other high-throughput biotechnologies. A large amount of personal genomic data can be generated by these technologies in a short time. Consequently, the needs for managing, analyzing, and interpreting these personal genomic data to facilitate personalized care are escalated. In this article, we discuss the challenges for implementing genomics-based personalized medicine in healthcare, current solutions to these challenges, and the roles of health information management (HIM) professionals in genomics-based personalized medicine. PMID:24808804
Does an integrated Emergency Department Information System change the sequence of clinical work? A mixed-method cross-site study.

PubMed

Callen, Joanne; Li, Ling; Georgiou, Andrew; Paoloni, Richard; Gibson, Kathryn; Li, Julie; Stewart, Michael; Braithwaite, Jeffrey; Westbrook, Johanna I

2014-12-01

(1) to describe Emergency Department (ED) physicians' and nurses' perceptions about the sequence of work related to patient management with use of an integrated Emergency Department Information System (EDIS), and (2) to measure changes in the sequence of clinician access to patient information. A mixed method study was conducted in four metropolitan EDs. Each used the same EDIS which is a module of the hospitals' enterprise-wide clinical information system composed of many components of an electronic medical record. This enabled access to clinical and management information relating to patients attending all hospitals in the region. Phase one - data were collected from ED physicians and nurses (n=97) by 69 in-depth interviews, five focus groups (28 participants), and 26 h of observations. Phase two - physicians (n=34) in one ED were observed over 2 weeks. Data included whether and what type of information was accessed from the EDIS prior to first examination of the patient. Clinicians reported, and phase 2 observations confirmed, that the integrated EDIS led to changes to the order of information access, which held implications for when tests were ordered and results accessed. Most physicians accessed patient information using EDIS prior to taking the patients' first medical history (77/116; 66.4%, 95% CI: 57.8-75.0%). Previous discharge summaries (74%) and past test results (61%) were most frequently accessed and junior doctors were more likely to access electronic past history information than their senior colleagues (χ(2)=20.717, d.f.=1, p<0.001). The integrated EDIS created new ways of working for ED clinicians. Such changes could hold positive implications for: time taken to reach a diagnosis and deliver treatments; length of stay; patient outcomes and experiences. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Sequence History Update Tool

NASA Technical Reports Server (NTRS)

Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

2008-01-01

The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.
Hmrbase: a database of hormones and their receptors

PubMed Central

Rashid, Mamoon; Singla, Deepak; Sharma, Arun; Kumar, Manish; Raghava, Gajendra PS

2009-01-01

Background Hormones are signaling molecules that play vital roles in various life processes, like growth and differentiation, physiology, and reproduction. These molecules are mostly secreted by endocrine glands, and transported to target organs through the bloodstream. Deficient, or excessive, levels of hormones are associated with several diseases such as cancer, osteoporosis, diabetes etc. Thus, it is important to collect and compile information about hormones and their receptors. Description This manuscript describes a database called Hmrbase which has been developed for managing information about hormones and their receptors. It is a highly curated database for which information has been collected from the literature and the public databases. The current version of Hmrbase contains comprehensive information about ~2000 hormones, e.g., about their function, source organism, receptors, mature sequences, structures etc. Hmrbase also contains information about ~3000 hormone receptors, in terms of amino acid sequences, subcellular localizations, ligands, and post-translational modifications etc. One of the major features of this database is that it provides data about ~4100 hormone-receptor pairs. A number of online tools have been integrated into the database, to provide the facilities like keyword search, structure-based search, mapping of a given peptide(s) on the hormone/receptor sequence, sequence similarity search. This database also provides a number of external links to other resources/databases in order to help in the retrieving of further related information. Conclusion Owing to the high impact of endocrine research in the biomedical sciences, the Hmrbase could become a leading data portal for researchers. The salient features of Hmrbase are hormone-receptor pair-related information, mapping of peptide stretches on the protein sequences of hormones and receptors, Pfam domain annotations, categorical browsing options, online data submission, DrugPedia linkage etc. Hmrbase is available online for public from . PMID:19589147
"I know your name, but not your number"--Patients with verbal short-term memory deficits are impaired in learning sequences of digits.

PubMed

Bormann, Tobias; Seyboth, Margret; Umarova, Roza; Weiller, Cornelius

2015-06-01

Studies on verbal learning in patients with impaired verbal short-term memory (vSTM) have revealed dissociations among types of verbal information. Patients with impaired vSTM are able to learn lists of known words but fail to acquire new word forms. This suggests that vSTM is involved in new word learning. The present study assessed both new word learning and the learning of digit sequences in two patients with impaired vSTM. In two experiments, participants were required to learn people's names, ages and professions, or their four digit 'phone numbers'. The STM patients were impaired on learning unknown family names and phone numbers, but managed to acquire other verbal information. In contrast, a patient with a severe verbal episodic memory impairment was impaired across information types. These results indicate verbal STM involvement in the learning of digit sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.
Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements

PubMed Central

Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon; Ovchinnikova, Galina; Verezemska, Olena; Isbandi, Michelle; Thomas, Alex D.; Ali, Rida; Sharma, Kaushal; Kyrpides, Nikos C.; Reddy, T. B. K.

2017-01-01

The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is a manually curated data management system that catalogs sequencing projects with associated metadata from around the world. In the current version of GOLD (v.6), all projects are organized based on a four level classification system in the form of a Study, Organism (for isolates) or Biosample (for environmental samples), Sequencing Project and Analysis Project. Currently, GOLD provides information for 26 117 Studies, 239 100 Organisms, 15 887 Biosamples, 97 212 Sequencing Projects and 78 579 Analysis Projects. These are integrated with over 312 metadata fields from which 58 are controlled vocabularies with 2067 terms. The web interface facilitates submission of a diverse range of Sequencing Projects (such as isolate genome, single-cell genome, metagenome, metatranscriptome) and complex Analysis Projects (such as genome from metagenome, or combined assembly from multiple Sequencing Projects). GOLD provides a seamless interface with the Integrated Microbial Genomes (IMG) system and supports and promotes the Genomic Standards Consortium (GSC) Minimum Information standards. This paper describes the data updates and additional features added during the last two years. PMID:27794040
SaDA: From Sampling to Data Analysis—An Extensible Open Source Infrastructure for Rapid, Robust and Automated Management and Analysis of Modern Ecological High-Throughput Microarray Data

PubMed Central

Singh, Kumar Saurabh; Thual, Dominique; Spurio, Roberto; Cannata, Nicola

2015-01-01

One of the most crucial characteristics of day-to-day laboratory information management is the collection, storage and retrieval of information about research subjects and environmental or biomedical samples. An efficient link between sample data and experimental results is absolutely important for the successful outcome of a collaborative project. Currently available software solutions are largely limited to large scale, expensive commercial Laboratory Information Management Systems (LIMS). Acquiring such LIMS indeed can bring laboratory information management to a higher level, but most of the times this requires a sufficient investment of money, time and technical efforts. There is a clear need for a light weighted open source system which can easily be managed on local servers and handled by individual researchers. Here we present a software named SaDA for storing, retrieving and analyzing data originated from microorganism monitoring experiments. SaDA is fully integrated in the management of environmental samples, oligonucleotide sequences, microarray data and the subsequent downstream analysis procedures. It is simple and generic software, and can be extended and customized for various environmental and biomedical studies. PMID:26047146
SaDA: From Sampling to Data Analysis-An Extensible Open Source Infrastructure for Rapid, Robust and Automated Management and Analysis of Modern Ecological High-Throughput Microarray Data.

PubMed

Singh, Kumar Saurabh; Thual, Dominique; Spurio, Roberto; Cannata, Nicola

2015-06-03

One of the most crucial characteristics of day-to-day laboratory information management is the collection, storage and retrieval of information about research subjects and environmental or biomedical samples. An efficient link between sample data and experimental results is absolutely important for the successful outcome of a collaborative project. Currently available software solutions are largely limited to large scale, expensive commercial Laboratory Information Management Systems (LIMS). Acquiring such LIMS indeed can bring laboratory information management to a higher level, but most of the times this requires a sufficient investment of money, time and technical efforts. There is a clear need for a light weighted open source system which can easily be managed on local servers and handled by individual researchers. Here we present a software named SaDA for storing, retrieving and analyzing data originated from microorganism monitoring experiments. SaDA is fully integrated in the management of environmental samples, oligonucleotide sequences, microarray data and the subsequent downstream analysis procedures. It is simple and generic software, and can be extended and customized for various environmental and biomedical studies.
Strategic crisis and risk communication during a prolonged natural hazard event: lessons learned from the Canterbury earthquake sequence

NASA Astrophysics Data System (ADS)

Wein, A. M.; Potter, S.; Becker, J.; Doyle, E. E.; Jones, J. L.

2015-12-01

While communication products are developed for monitoring and forecasting hazard events, less thought may have been given to crisis and risk communication plans. During larger (and rarer) events responsible science agencies may find themselves facing new and intensified demands for information and unprepared for effectively resourcing communications. In a study of the communication of aftershock information during the 2010-12 Canterbury Earthquake Sequence (New Zealand), issues are identified and implications for communication strategy noted. Communication issues during the responses included reliability and timeliness of communication channels for immediate and short decision time frames; access to scientists by those who needed information; unfamiliar emergency management frameworks; information needs of multiple audiences, audience readiness to use the information; and how best to convey empathy during traumatic events and refer to other information sources about what to do and how to cope. Other science communication challenges included meeting an increased demand for earthquake education, getting attention on aftershock forecasts; responding to rumor management; supporting uptake of information by critical infrastructure and government and for the application of scientific information in complex societal decisions; dealing with repetitive information requests; addressing diverse needs of multiple audiences for scientific information; and coordinating communications within and outside the science domain. For a science agency, a communication strategy would consider training scientists in communication, establishing relationships with university scientists and other disaster communication roles, coordinating messages, prioritizing audiences, deliberating forecasts with community leaders, identifying user needs and familiarizing them with the products ahead of time, and practicing the delivery and use of information via scenario planning and exercises.
MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data

PubMed Central

Denis, Jean-Baptiste; Vandenbogaert, Mathias; Caro, Valérie

2016-01-01

The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members. PMID:28451381
MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data.

PubMed

Correia, Damien; Doppelt-Azeroual, Olivia; Denis, Jean-Baptiste; Vandenbogaert, Mathias; Caro, Valérie

2015-01-01

The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users' input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user's input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy's main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members.
MolabIS--an integrated information system for storing and managing molecular genetics data.

PubMed

Truong, Cong V C; Groeneveld, Linn F; Morgenstern, Burkhard; Groeneveld, Eildert

2011-10-31

Long-term sample storage, tracing of data flow and data export for subsequent analyses are of great importance in genetics studies. Therefore, molecular labs do need a proper information system to handle an increasing amount of data from different projects. We have developed a molecular labs information management system (MolabIS). It was implemented as a web-based system allowing the users to capture original data at each step of their workflow. MolabIS provides essential functionality for managing information on individuals, tracking samples and storage locations, capturing raw files, importing final data from external files, searching results, accessing and modifying data. Further important features are options to generate ready-to-print reports and convert sequence and microsatellite data into various data formats, which can be used as input files in subsequent analyses. Moreover, MolabIS also provides a tool for data migration. MolabIS is designed for small-to-medium sized labs conducting Sanger sequencing and microsatellite genotyping to store and efficiently handle a relative large amount of data. MolabIS not only helps to avoid time consuming tasks but also ensures the availability of data for further analyses. The software is packaged as a virtual appliance which can run on different platforms (e.g. Linux, Windows). MolabIS can be distributed to a wide range of molecular genetics labs since it was developed according to a general data model. Released under GPL, MolabIS is freely available at http://www.molabis.org.
MolabIS - An integrated information system for storing and managing molecular genetics data

PubMed Central

2011-01-01

Background Long-term sample storage, tracing of data flow and data export for subsequent analyses are of great importance in genetics studies. Therefore, molecular labs do need a proper information system to handle an increasing amount of data from different projects. Results We have developed a molecular labs information management system (MolabIS). It was implemented as a web-based system allowing the users to capture original data at each step of their workflow. MolabIS provides essential functionality for managing information on individuals, tracking samples and storage locations, capturing raw files, importing final data from external files, searching results, accessing and modifying data. Further important features are options to generate ready-to-print reports and convert sequence and microsatellite data into various data formats, which can be used as input files in subsequent analyses. Moreover, MolabIS also provides a tool for data migration. Conclusions MolabIS is designed for small-to-medium sized labs conducting Sanger sequencing and microsatellite genotyping to store and efficiently handle a relative large amount of data. MolabIS not only helps to avoid time consuming tasks but also ensures the availability of data for further analyses. The software is packaged as a virtual appliance which can run on different platforms (e.g. Linux, Windows). MolabIS can be distributed to a wide range of molecular genetics labs since it was developed according to a general data model. Released under GPL, MolabIS is freely available at http://www.molabis.org. PMID:22040322

An Integrated Korean Biodiversity and Genetic Information Retrieval System

PubMed Central

Lim, Jeongheui; Bhak, Jong; Oh, Hee-Mock; Kim, Chang-Bae; Park, Yong-Ha; Paek, Woon Kee

2008-01-01

Background On-line biodiversity information databases are growing quickly and being integrated into general bioinformatics systems due to the advances of fast gene sequencing technologies and the Internet. These can reduce the cost and effort of performing biodiversity surveys and genetic searches, which allows scientists to spend more time researching and less time collecting and maintaining data. This will cause an increased rate of knowledge build-up and improve conservations. The biodiversity databases in Korea have been scattered among several institutes and local natural history museums with incompatible data types. Therefore, a comprehensive database and a nation wide web portal for biodiversity information is necessary in order to integrate diverse information resources, including molecular and genomic databases. Results The Korean Natural History Research Information System (NARIS) was built and serviced as the central biodiversity information system to collect and integrate the biodiversity data of various institutes and natural history museums in Korea. This database aims to be an integrated resource that contains additional biological information, such as genome sequences and molecular level diversity. Currently, twelve institutes and museums in Korea are integrated by the DiGIR (Distributed Generic Information Retrieval) protocol, with Darwin Core2.0 format as its metadata standard for data exchange. Data quality control and statistical analysis functions have been implemented. In particular, integrating molecular and genetic information from the National Center for Biotechnology Information (NCBI) databases with NARIS was recently accomplished. NARIS can also be extended to accommodate other institutes abroad, and the whole system can be exported to establish local biodiversity management servers. Conclusion A Korean data portal, NARIS, has been developed to efficiently manage and utilize biodiversity data, which includes genetic resources. NARIS aims to be integral in maximizing bio-resource utilization for conservation, management, research, education, industrial applications, and integration with other bioinformation data resources. It can be found at . PMID:19091024
Sexual communication in Tephritidae-current knowledge and potential applications for integrated pest management

USDA-ARS?s Scientific Manuscript database

Diptera Tephritidae are an enormous threat to fruit and vegetable production throughout the world, causing both quantitative and qualitative losses. Investigating mating behavioural sequences could help to unravel mate choice dynamics, adding useful information to build behaviour-based control strat...
Design of a final approach spacing tool for TRACON air traffic control

NASA Technical Reports Server (NTRS)

Davis, Thomas J.; Erzberger, Heinz; Bergeron, Hugh

1989-01-01

This paper describes an automation tool that assists air traffic controllers in the Terminal Radar Approach Control (TRACON) Facilities in providing safe and efficient sequencing and spacing of arrival traffic. The automation tool, referred to as the Final Approach Spacing Tool (FAST), allows the controller to interactively choose various levels of automation and advisory information ranging from predicted time errors to speed and heading advisories for controlling time error. FAST also uses a timeline to display current scheduling and sequencing information for all aircraft in the TRACON airspace. FAST combines accurate predictive algorithms and state-of-the-art mouse and graphical interface technology to present advisory information to the controller. Furthermore, FAST exchanges various types of traffic information and communicates with automation tools being developed for the Air Route Traffic Control Center. Thus it is part of an integrated traffic management system for arrival traffic at major terminal areas.
Improvement of the material and transport component of the system of construction waste management

NASA Astrophysics Data System (ADS)

Kostyshak, Mikhail; Lunyakov, Mikhail

2017-10-01

Relevance of the topic of selected research is conditioned with the growth of construction operations and growth rates of construction and demolition wastes. This article considers modern approaches to the management of turnover of construction waste, sequence of reconstruction or demolition processes of the building, information flow of the complete cycle of turnover of construction and demolition waste, methods for improvement of the material and transport component of the construction waste management system. Performed analysis showed that mechanism of management of construction waste allows to increase efficiency and environmental safety of this branch and regions.
dictyExpress: a web-based platform for sequence data management and analytics in Dictyostelium and beyond.

PubMed

Stajdohar, Miha; Rosengarten, Rafael D; Kokosar, Janez; Jeran, Luka; Blenkus, Domen; Shaulsky, Gad; Zupan, Blaz

2017-06-02

Dictyostelium discoideum, a soil-dwelling social amoeba, is a model for the study of numerous biological processes. Research in the field has benefited mightily from the adoption of next-generation sequencing for genomics and transcriptomics. Dictyostelium biologists now face the widespread challenges of analyzing and exploring high dimensional data sets to generate hypotheses and discovering novel insights. We present dictyExpress (2.0), a web application designed for exploratory analysis of gene expression data, as well as data from related experiments such as Chromatin Immunoprecipitation sequencing (ChIP-Seq). The application features visualization modules that include time course expression profiles, clustering, gene ontology enrichment analysis, differential expression analysis and comparison of experiments. All visualizations are interactive and interconnected, such that the selection of genes in one module propagates instantly to visualizations in other modules. dictyExpress currently stores the data from over 800 Dictyostelium experiments and is embedded within a general-purpose software framework for management of next-generation sequencing data. dictyExpress allows users to explore their data in a broader context by reciprocal linking with dictyBase-a repository of Dictyostelium genomic data. In addition, we introduce a companion application called GenBoard, an intuitive graphic user interface for data management and bioinformatics analysis. dictyExpress and GenBoard enable broad adoption of next generation sequencing based inquiries by the Dictyostelium research community. Labs without the means to undertake deep sequencing projects can mine the data available to the public. The entire information flow, from raw sequence data to hypothesis testing, can be accomplished in an efficient workspace. The software framework is generalizable and represents a useful approach for any research community. To encourage more wide usage, the backend is open-source, available for extension and further development by bioinformaticians and data scientists.
Sequence modelling and an extensible data model for genomic database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Peter Wei-Der

1992-01-01

The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data modelmore » that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.« less
Sequence modelling and an extensible data model for genomic database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Peter Wei-Der

1992-01-01

The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data modelmore » that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.« less
HYDRA: A Middleware-Oriented Integrated Architecture for e-Procurement in Supply Chains

NASA Astrophysics Data System (ADS)

Alor-Hernandez, Giner; Aguilar-Lasserre, Alberto; Juarez-Martinez, Ulises; Posada-Gomez, Ruben; Cortes-Robles, Guillermo; Garcia-Martinez, Mario Alberto; Gomez-Berbis, Juan Miguel; Rodriguez-Gonzalez, Alejandro

The Service-Oriented Architecture (SOA) development paradigm has emerged to improve the critical issues of creating, modifying and extending solutions for business processes integration, incorporating process automation and automated exchange of information between organizations. Web services technology follows the SOA's principles for developing and deploying applications. Besides, Web services are considered as the platform for SOA, for both intra- and inter-enterprise communication. However, an SOA does not incorporate information about occurring events into business processes, which are the main features of supply chain management. These events and information delivery are addressed in an Event-Driven Architecture (EDA). Taking this into account, we propose a middleware-oriented integrated architecture that offers a brokering service for the procurement of products in a Supply Chain Management (SCM) scenario. As salient contributions, our system provides a hybrid architecture combining features of both SOA and EDA and a set of mechanisms for business processes pattern management, monitoring based on UML sequence diagrams, Web services-based management, event publish/subscription and reliable messaging service.
Translating genomics into practice for real-time surveillance and response to carbapenemase-producing Enterobacteriaceae: evidence from a complex multi-institutional KPC outbreak.

PubMed

Kwong, Jason C; Lane, Courtney R; Romanes, Finn; Gonçalves da Silva, Anders; Easton, Marion; Cronin, Katie; Waters, Mary Jo; Tomita, Takehiro; Stevens, Kerrie; Schultz, Mark B; Baines, Sarah L; Sherry, Norelle L; Carter, Glen P; Mu, Andre; Sait, Michelle; Ballard, Susan A; Seemann, Torsten; Stinear, Timothy P; Howden, Benjamin P

2018-01-01

Until recently, Klebsiella pneumoniae carbapenemase (KPC)-producing Enterobacteriaceae were rarely identified in Australia. Following an increase in the number of incident cases across the state of Victoria, we undertook a real-time combined genomic and epidemiological investigation. The scope of this study included identifying risk factors and routes of transmission, and investigating the utility of genomics to enhance traditional field epidemiology for informing management of established widespread outbreaks. All KPC-producing Enterobacteriaceae isolates referred to the state reference laboratory from 2012 onwards were included. Whole-genome sequencing was performed in parallel with a detailed descriptive epidemiological investigation of each case, using Illumina sequencing on each isolate. This was complemented with PacBio long-read sequencing on selected isolates to establish high-quality reference sequences and interrogate characteristics of KPC-encoding plasmids. Initial investigations indicated that the outbreak was widespread, with 86 KPC-producing Enterobacteriaceae isolates ( K. pneumoniae 92%) identified from 35 different locations across metropolitan and rural Victoria between 2012 and 2015. Initial combined analyses of the epidemiological and genomic data resolved the outbreak into distinct nosocomial transmission networks, and identified healthcare facilities at the epicentre of KPC transmission. New cases were assigned to transmission networks in real-time, allowing focussed infection control efforts. PacBio sequencing confirmed a secondary transmission network arising from inter-species plasmid transmission. Insights from Bayesian transmission inference and analyses of within-host diversity informed the development of state-wide public health and infection control guidelines, including interventions such as an intensive approach to screening contacts following new case detection to minimise unrecognised colonisation. A real-time combined epidemiological and genomic investigation proved critical to identifying and defining multiple transmission networks of KPC Enterobacteriaceae, while data from either investigation alone were inconclusive. The investigation was fundamental to informing infection control measures in real-time and the development of state-wide public health guidelines on carbapenemase-producing Enterobacteriaceae surveillance and management.
Military Health Care Dilemmas and Genetic Discrimination: A Family's Experience with Whole Exome Sequencing.

PubMed

Helm, Benjamin M; Langley, Katherine; Spangler, Brooke B; Schrier Vergano, Samantha A

2015-01-01

Whole-exome sequencing (WES) has increased our ability to analyze large parts of the human genome, bringing with it a plethora of ethical, legal, and social implications. A topic dominating discussion of WES is identification of "secondary findings" (SFs), defined as the identification of risk in an asymptomatic individual unrelated to the indication for the test. SFs can have considerable psychosocial impact on patients and families, and patients with an SF may have concerns regarding genomic privacy and genetic discrimination. The Genetic Information Nondiscrimination Act of 2008 (GINA) currently excludes protections for members of the military. This may cause concern in military members and families regarding genetic discrimination when considering genetic testing. In this report, we discuss a case involving a patient and family in which a secondary finding was discovered by WES. The family members have careers in the U.S. military, and a risk-predisposing condition could negatively affect employment. While beneficial medical management changes were made, the information placed exceptional stress on the family, who were forced to navigate career-sensitive "extra-medical" issues, to consider the impacts of uncovering risk-predisposition, and to manage the privacy of their genetic information. We highlight how information obtained from WES may collide with these issues and emphasize the importance of genetic counseling for anyone undergoing WES.
Time management displays for shuttle countdown

NASA Technical Reports Server (NTRS)

Beller, Arthur E.; Hadaller, H. Greg; Ricci, Mark J.

1992-01-01

The Intelligent Launch Decision Support System project is developing a Time Management System (TMS) for the NASA Test Director (NTD) to use for time management during Shuttle terminal countdown. TMS is being developed in three phases: an information phase; a tool phase; and an advisor phase. The information phase is an integrated display (TMID) of firing room clocks, of graphic timelines with Ground Launch Sequencer events, and of constraints. The tool phase is a what-if spreadsheet (TMWI) for devising plans for resuming from unplanned hold situations. It is tied to information in TMID, propagates constraints forward and backward to complete unspecified values, and checks the plan against constraints. The advisor phase is a situation advisor (TMSA), which proactively suggests tactics. A concept prototype for TMSA is under development. The TMID is currently undergoing field testing. Displays for TMID and TMWI are described. Descriptions include organization, rationale for organization, implementation choices and constraints, and use by NTD.
Evolving approaches to the ethical management of genomic data.

PubMed

McEwen, Jean E; Boyer, Joy T; Sun, Kathie Y

2013-06-01

The ethical landscape in the field of genomics is rapidly shifting. Plummeting sequencing costs, along with ongoing advances in bioinformatics, now make it possible to generate an enormous volume of genomic data about vast numbers of people. The informational richness, complexity, and frequently uncertain meaning of these data, coupled with evolving norms surrounding the sharing of data and samples and persistent privacy concerns, have generated a range of approaches to the ethical management of genomic information. As calls increase for the expanded use of broad or even open consent, and as controversy grows about how best to handle incidental genomic findings, these approaches, informed by normative analysis and empirical data, will continue to evolve alongside the science. Published by Elsevier Ltd.
Evolving Approaches to the Ethical Management of Genomic Data

PubMed Central

Boyer, Joy T.; Sun, Kathie Y.

2013-01-01

The ethical landscape in the field of genomics is rapidly shifting. Plummeting sequencing costs, along with ongoing advances in bioinformatics, now make it possible to generate an enormous volume of genomic data about vast numbers of people. The informational richness, complexity, and frequently uncertain meaning of these data, coupled with evolving norms surrounding the sharing of data and samples and persistent privacy concerns, have generated a range of approaches to the ethical management of genomic information. As calls increase for the expanded use of broad or even open consent, and as controversy grows about how best to handle incidental genomic findings, these approaches, informed by normative analysis and empirical data, will continue to evolve alongside the science. PMID:23453621
Whole Exome Sequencing in Pediatric Neurology Patients: Clinical Implications and Estimated Cost Analysis.

PubMed

Nolan, Danielle; Carlson, Martha

2016-06-01

Genetic heterogeneity in neurologic disorders has been an obstacle to phenotype-based diagnostic testing. The authors hypothesized that information compiled via whole exome sequencing will improve clinical diagnosis and management of pediatric neurology patients. The authors performed a retrospective chart review of patients evaluated in the University of Michigan Pediatric Neurology clinic between 6/2011 and 6/2015. The authors recorded previous diagnostic testing, indications for whole exome sequencing, and whole exome sequencing results. Whole exome sequencing was recommended for 135 patients and obtained in 53 patients. Insurance barriers often precluded whole exome sequencing. The most common indication for whole exome sequencing was neurodevelopmental disorders. Whole exome sequencing improved the presumptive diagnostic rate in the patient cohort from 25% to 48%. Clinical implications included family planning, medication selection, and systemic investigation. Compared to current second tier testing, whole exome sequencing can result in lower long-term charges and more timely diagnosis. Overcoming barriers related to whole exome sequencing insurance authorization could allow for more efficient and fruitful diagnostic neurological evaluations. © The Author(s) 2016.
Cassini Information Management System in Distributed Operations Collaboration and Cassini Science Planning

NASA Technical Reports Server (NTRS)

Equils, Douglas J.

2008-01-01

Launched on October 15, 1997, the Cassini-Huygens spacecraft began its ambitious journey to the Saturnian system with a complex suite of 12 scientific instruments, and another 6 instruments aboard the European Space Agencies Huygens Probe. Over the next 6 1/2 years, Cassini would continue its relatively simplistic cruise phase operations, flying past Venus, Earth, and Jupiter. However, following Saturn Orbit Insertion (SOI), Cassini would become involved in a complex series of tasks that required detailed resource management, distributed operations collaboration, and a data base for capturing science objectives. Collectively, these needs were met through a web-based software tool designed to help with the Cassini uplink process and ultimately used to generate more robust sequences for spacecraft operations. In 2001, in conjunction with the Southwest Research Institute (SwRI) and later Venustar Software and Engineering Inc., the Cassini Information Management System (CIMS) was released which enabled the Cassini spacecraft and science planning teams to perform complex information management and team collaboration between scientists and engineers in 17 countries. Originally tailored to help manage the science planning uplink process, CIMS has been actively evolving since its inception to meet the changing and growing needs of the Cassini uplink team and effectively reduce mission risk through a series of resource management validation algorithms. These algorithms have been implemented in the web-based software tool to identify potential sequence conflicts early in the science planning process. CIMS mitigates these sequence conflicts through identification of timing incongruities, pointing inconsistencies, flight rule violations, data volume issues, and by assisting in Deep Space Network (DSN) coverage analysis. In preparation for extended mission operations, CIMS has also evolved further to assist in the planning and coordination of the dual playback redundancy of highvalue data from targets such as Titan and Enceladus. This paper will outline the critical role that CIMS has played for Cassini in the distributed ops paradigm throughout operations. This paper will also examine the evolution that CIMS has undergone in the face of new science discoveries and fluctuating operational needs. And finally, this paper will conclude with theoretical adaptation of CIMS for other projects and the potential savings in cost and risk reduction that could potentially be tapped into by future missions.
Alternative management technologies for postharvest disease control: the journey from simplicity to complexity

USDA-ARS?s Scientific Manuscript database

It has been often stated that we have moved from an age of chemistry to an age of biology. The ease of sequencing genomes and obtaining related genotypic, transcriptomic, proteomic, and metabolomics information is leading to the development of new commercial technologies where problems are solved "...
RCN4GSC workshop report: managing data at the interface of biodiversity and (meta)genomics, March 2011

USDA-ARS?s Scientific Manuscript database

The Genomic Standards Consortium (GSC) is an international working body with the mission of working towards richer descriptions of genomic and metagenomic data through the development of standards and tools for supporting the consistent documentation of contextual information about sequences. Becaus...
Professionally Responsible Disclosure of Genomic Sequencing Results in Pediatric Practice

PubMed Central

Brothers, Kyle B.; Chung, Wendy K.; Joffe, Steven; Koenig, Barbara A.; Wilfond, Benjamin; Yu, Joon-Ho

2015-01-01

Genomic sequencing is being rapidly introduced into pediatric clinical practice. The results of sequencing are distinctive for their complexity and subsequent challenges of interpretation for generalist and specialist pediatricians, parents, and patients. Pediatricians therefore need to prepare for the professionally responsible disclosure of sequencing results to parents and patients and guidance of parents and patients in the interpretation and use of these results, including managing uncertain data. This article provides an ethical framework to guide and evaluate the professionally responsible disclosure of the results of genomic sequencing in pediatric practice. The ethical framework comprises 3 core concepts of pediatric ethics: the best interests of the child standard, parental surrogate decision-making, and pediatric assent. When recommending sequencing, pediatricians should explain the nature of the proposed test, its scope and complexity, the categories of results, and the concept of a secondary or incidental finding. Pediatricians should obtain the informed permission of parents and the assent of mature adolescents about the scope of sequencing to be performed and the return of results. PMID:26371191
Pathogen profiling for disease management and surveillance.

PubMed

Sintchenko, Vitali; Iredell, Jonathan R; Gilbert, Gwendolyn L

2007-06-01

The usefulness of rapid pathogen genotyping is widely recognized, but its effective interpretation and application requires integration into clinical and public health decision-making. How can pathogen genotyping data best be translated to inform disease management and surveillance? Pathogen profiling integrates microbial genomics data into communicable disease control by consolidating phenotypic identity-based methods with DNA microarrays, proteomics, metabolomics and sequence-based typing. Sharing data on pathogen profiles should facilitate our understanding of transmission patterns and the dynamics of epidemics.
SeqReporter: automating next-generation sequencing result interpretation and reporting workflow in a clinical laboratory.

PubMed

Roy, Somak; Durso, Mary Beth; Wald, Abigail; Nikiforov, Yuri E; Nikiforova, Marina N

2014-01-01

A wide repertoire of bioinformatics applications exist for next-generation sequencing data analysis; however, certain requirements of the clinical molecular laboratory limit their use: i) comprehensive report generation, ii) compatibility with existing laboratory information systems and computer operating system, iii) knowledgebase development, iv) quality management, and v) data security. SeqReporter is a web-based application developed using ASP.NET framework version 4.0. The client-side was designed using HTML5, CSS3, and Javascript. The server-side processing (VB.NET) relied on interaction with a customized SQL server 2008 R2 database. Overall, 104 cases (1062 variant calls) were analyzed by SeqReporter. Each variant call was classified into one of five report levels: i) known clinical significance, ii) uncertain clinical significance, iii) pending pathologists' review, iv) synonymous and deep intronic, and v) platform and panel-specific sequence errors. SeqReporter correctly annotated and classified 99.9% (859 of 860) of sequence variants, including 68.7% synonymous single-nucleotide variants, 28.3% nonsynonymous single-nucleotide variants, 1.7% insertions, and 1.3% deletions. One variant of potential clinical significance was re-classified after pathologist review. Laboratory information system-compatible clinical reports were generated automatically. SeqReporter also facilitated quality management activities. SeqReporter is an example of a customized and well-designed informatics solution to optimize and automate the downstream analysis of clinical next-generation sequencing data. We propose it as a model that may envisage the development of a comprehensive clinical informatics solution. Copyright © 2014 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis.

PubMed

Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron; Thompson, Julie Dawn

2009-01-01

The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented.
Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis

PubMed Central

Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron

2009-01-01

The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented. PMID:18971242
Operation Odyssey Dawn and Lessons for the Future

DTIC Science & Technology

2013-02-14

private meeting during the G8 Summit, Sarkozy informed them that French combat aircraft were en route to the Libyan coast. Soon thereafter, Rafale...force of any form on any part of Libyan territory, and requests the Member States concerned to inform the Secretary-General immediately of the...serve as a kind of forward air control air battle manager , sequencing and de-conflicting flights in addition to finding and identifying their own
WImpiBLAST: web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing.

PubMed

Sharma, Parichit; Mantri, Shrikant S

2014-01-01

The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis.
WImpiBLAST: Web Interface for mpiBLAST to Help Biologists Perform Large-Scale Annotation Using High Performance Computing

PubMed Central

Sharma, Parichit; Mantri, Shrikant S.

2014-01-01

The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis. PMID:24979410
DOE Office of Scientific and Technical Information (OSTI.GOV)

Harwood, Caroline S.

Rhodopseudomonas palustris is a common soil and water bacterium that makes its living by converting sunlight to cellular energy and by absorbing atmospheric carbon dioxide and converting it to biomass. This microbe can also degrade and recycle components of the woody tissues of plants, wood being the most abundant polymer on earth. Because of its intimate involvement in carbon management and recycling, R. palustris was selected by the DOE Carbon Management Program to have its genome sequenced by the Joint Genome Institute (JGI). This award provided funds for the preparation of R. palustris genomic DNA which was then supplied tomore » the JGI in sufficient amounts to enable the complete sequencing of the R. palustris genome. The PI also supplied the JGI with technical information about the molecular biology of R. palustris.« less
TriageTools: tools for partitioning and prioritizing analysis of high-throughput sequencing data.

PubMed

Fimereli, Danai; Detours, Vincent; Konopka, Tomasz

2013-04-01

High-throughput sequencing is becoming a popular research tool but carries with it considerable costs in terms of computation time, data storage and bandwidth. Meanwhile, some research applications focusing on individual genes or pathways do not necessitate processing of a full sequencing dataset. Thus, it is desirable to partition a large dataset into smaller, manageable, but relevant pieces. We present a toolkit for partitioning raw sequencing data that includes a method for extracting reads that are likely to map onto pre-defined regions of interest. We show the method can be used to extract information about genes of interest from DNA or RNA sequencing samples in a fraction of the time and disk space required to process and store a full dataset. We report speedup factors between 2.6 and 96, depending on settings and samples used. The software is available at http://www.sourceforge.net/projects/triagetools/.
MGIS: Managing banana (Musa spp.) genetic resources information and high-throughput genotyping data

USDA-ARS?s Scientific Manuscript database

Unraveling genetic diversity held in genebanks on a large scale is underway, due to the advances in Next-generation sequence-based technologies that produce high-density genetic markers for a large number of samples at low cost. Genebank users should be in a position to identify and select germplasm...
Laboratory complex for simulation of navigation signals of pseudosatellites

NASA Astrophysics Data System (ADS)

Ratushniak, V. N.; Gladyshev, A. B.; Sokolovskiy, A. V.; Mikhov, E. D.

2018-05-01

In the article, features of the organization, structure and questions of formation of navigation signals of pseudosatellites of the short - range navigation system based on the hardware-software complex National Instruments are considered. A software model that performs the formation and management of a pseudo-random sequence of a navigation signal and the formation and management of the format transmitted pseudosatellite navigation information is presented. The variant of constructing the transmitting equipment of the pseudosatellite base stations is provided.
Information management in DNA replication modeled by directional, stochastic chains with memory

NASA Astrophysics Data System (ADS)

Arias-Gonzalez, J. Ricardo

2016-11-01

Stochastic chains represent a key variety of phenomena in many branches of science within the context of information theory and thermodynamics. They are typically approached by a sequence of independent events or by a memoryless Markov process. Stochastic chains are of special significance to molecular biology, where genes are conveyed by linear polymers made up of molecular subunits and transferred from DNA to proteins by specialized molecular motors in the presence of errors. Here, we demonstrate that when memory is introduced, the statistics of the chain depends on the mechanism by which objects or symbols are assembled, even in the slow dynamics limit wherein friction can be neglected. To analyze these systems, we introduce a sequence-dependent partition function, investigate its properties, and compare it to the standard normalization defined by the statistical physics of ensembles. We then apply this theory to characterize the enzyme-mediated information transfer involved in DNA replication under the real, non-equilibrium conditions, reproducing measured error rates and explaining the typical 100-fold increase in fidelity that is experimentally found when proofreading and edition take place. Our model further predicts that approximately 1 kT has to be consumed to elevate fidelity in one order of magnitude. We anticipate that our results are necessary to interpret configurational order and information management in many molecular systems within biophysics, materials science, communication, and engineering.
Next Generation Sequencing Technology and Genomewide Data Analysis: Perspectives for Retinal Research

PubMed Central

Chaitankar, Vijender; Karakülah, Gökhan; Ratnapriya, Rinki; Giuste, Felipe O.; Brooks, Matthew J.; Swaroop, Anand

2016-01-01

The advent of high throughput next generation sequencing (NGS) has accelerated the pace of discovery of disease-associated genetic variants and genomewide profiling of expressed sequences and epigenetic marks, thereby permitting systems-based analyses of ocular development and disease. Rapid evolution of NGS and associated methodologies presents significant challenges in acquisition, management, and analysis of large data sets and for extracting biologically or clinically relevant information. Here we illustrate the basic design of commonly used NGS-based methods, specifically whole exome sequencing, transcriptome, and epigenome profiling, and provide recommendations for data analyses. We briefly discuss systems biology approaches for integrating multiple data sets to elucidate gene regulatory or disease networks. While we provide examples from the retina, the NGS guidelines reviewed here are applicable to other tissues/cell types as well. PMID:27297499
Onco-STS: a web-based laboratory information management system for sample and analysis tracking in oncogenomic experiments.

PubMed

Gavrielides, Mike; Furney, Simon J; Yates, Tim; Miller, Crispin J; Marais, Richard

2014-01-01

Whole genomes, whole exomes and transcriptomes of tumour samples are sequenced routinely to identify the drivers of cancer. The systematic sequencing and analysis of tumour samples, as well other oncogenomic experiments, necessitates the tracking of relevant sample information throughout the investigative process. These meta-data of the sequencing and analysis procedures include information about the samples and projects as well as the sequencing centres, platforms, data locations, results locations, alignments, analysis specifications and further information relevant to the experiments. The current work presents a sample tracking system for oncogenomic studies (Onco-STS) to store these data and make them easily accessible to the researchers who work with the samples. The system is a web application, which includes a database and a front-end web page that allows the remote access, submission and updating of the sample data in the database. The web application development programming framework Grails was used for the development and implementation of the system. The resulting Onco-STS solution is efficient, secure and easy to use and is intended to replace the manual data handling of text records. Onco-STS allows simultaneous remote access to the system making collaboration among researchers more effective. The system stores both information on the samples in oncogenomic studies and details of the analyses conducted on the resulting data. Onco-STS is based on open-source software, is easy to develop and can be modified according to a research group's needs. Hence it is suitable for laboratories that do not require a commercial system.
adLIMS: a customized open source software that allows bridging clinical and basic molecular research studies.

PubMed

Calabria, Andrea; Spinozzi, Giulio; Benedicenti, Fabrizio; Tenderini, Erika; Montini, Eugenio

2015-01-01

Many biological laboratories that deal with genomic samples are facing the problem of sample tracking, both for pure laboratory management and for efficiency. Our laboratory exploits PCR techniques and Next Generation Sequencing (NGS) methods to perform high-throughput integration site monitoring in different clinical trials and scientific projects. Because of the huge amount of samples that we process every year, which result in hundreds of millions of sequencing reads, we need to standardize data management and tracking systems, building up a scalable and flexible structure with web-based interfaces, which are usually called Laboratory Information Management System (LIMS). We started collecting end-users' requirements, composed of desired functionalities of the system and Graphical User Interfaces (GUI), and then we evaluated available tools that could address our requirements, spanning from pure LIMS to Content Management Systems (CMS) up to enterprise information systems. Our analysis identified ADempiere ERP, an open source Enterprise Resource Planning written in Java J2EE, as the best software that also natively implements some highly desirable technological advances, such as the high usability and modularity that grants high use-case flexibility and software scalability for custom solutions. We extended and customized ADempiere ERP to fulfil LIMS requirements and we developed adLIMS. It has been validated by our end-users verifying functionalities and GUIs through test cases for PCRs samples and pre-sequencing data and it is currently in use in our laboratories. adLIMS implements authorization and authentication policies, allowing multiple users management and roles definition that enables specific permissions, operations and data views to each user. For example, adLIMS allows creating sample sheets from stored data using available exporting operations. This simplicity and process standardization may avoid manual errors and information backtracking, features that are not granted using track recording on files or spreadsheets. adLIMS aims to combine sample tracking and data reporting features with higher accessibility and usability of GUIs, thus allowing time to be saved on doing repetitive laboratory tasks, and reducing errors with respect to manual data collection methods. Moreover, adLIMS implements automated data entry, exploiting sample data multiplexing and parallel/transactional processing. adLIMS is natively extensible to cope with laboratory automation through platform-dependent API interfaces, and could be extended to genomic facilities due to the ERP functionalities.
PAVE: program for assembling and viewing ESTs.

PubMed

Soderlund, Carol; Johnson, Eric; Bomhoff, Matthew; Descour, Anne

2009-08-26

New sequencing technologies are rapidly emerging. Many laboratories are simultaneously working with the traditional Sanger ESTs and experimenting with ESTs generated by the 454 Life Science sequencers. Though Sanger ESTs have been used to generate contigs for many years, no program takes full advantage of the 5' and 3' mate-pair information, hence, many tentative transcripts are assembled into two separate contigs. The new 454 technology has the benefit of high-throughput expression profiling, but introduces time and space problems for assembling large contigs. The PAVE (Program for Assembling and Viewing ESTs) assembler takes advantage of the 5' and 3' mate-pair information by requiring that the mate-pairs be assembled into the same contig and joined by n's if the two sub-contigs do not overlap. It handles the depth of 454 data sets by "burying" similar ESTs during assembly, which retains the expression level information while circumventing time and space problems. PAVE uses MegaBLAST for the clustering step and CAP3 for assembly, however it assembles incrementally to enforce the mate-pair constraint, bury ESTs, and reduce incorrect joins and splits. The PAVE data management system uses a MySQL database to store multiple libraries of ESTs along with their metadata; the management system allows multiple assemblies with variations on libraries and parameters. Analysis routines provide standard annotation for the contigs including a measure of differentially expressed genes across the libraries. A Java viewer program is provided for display and analysis of the results. Our results clearly show the benefit of using the PAVE assembler to explicitly use mate-pair information and bury ESTs for large contigs. The PAVE assembler provides a software package for assembling Sanger and/or 454 ESTs. The assembly software, data management software, Java viewer and user's guide are freely available.
Use of microsatellite markers in management of conifer forest species

Treesearch

Craig S. Echt

1999-01-01

Within the past ten years a new class of genetic marker1 has risen to prominence as the tool of choice for many geneticists. Microsatellite DNAs, or simple sequence repeats (SSRs), were first characterized as highly informative genetic markers in humans (Weber and May, 1990; Litt and Luty, 1990), and have since been found in practically all...
Children's Language Production: How Cognitive Neuroscience and Industrial Engineering Can Inform Public Education Policy and Practice

ERIC Educational Resources Information Center

Leisman, Gerry

2012-01-01

Little of 150 years of research in Cognitive Neurosciences, Human Factors, and the mathematics of Production Management have found their way into educational policy and certainly not into the classroom or in the production of educational materials in any meaningful or practical fashion. Whilst more mundane concepts of timing, sequencing, spatial…
Hazardous Waste Environmental Education Resource Kit for Manitoba Teachers: Suggested Activities K-12.

ERIC Educational Resources Information Center

Downey-Franchuk, Andrea J.

Society has become increasingly aware of the harmful effects that the disposal of chemical waste products have on the environment and human health. Public information is central to the development of a responsible waste management plan. The activities contained in this guide are organized in sequence from kindergarten to grade 12, and provide…
Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies.

PubMed

de Brevern, Alexandre G; Meyniel, Jean-Philippe; Fairhead, Cécile; Neuvéglise, Cécile; Malpertuy, Alain

2015-01-01

Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries.
Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies

PubMed Central

de Brevern, Alexandre G.; Meyniel, Jean-Philippe; Fairhead, Cécile; Neuvéglise, Cécile; Malpertuy, Alain

2015-01-01

Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries. PMID:26125026
Application of next generation sequencing in clinical microbiology and infection prevention.

PubMed

Deurenberg, Ruud H; Bathoorn, Erik; Chlebowicz, Monika A; Couto, Natacha; Ferdous, Mithila; García-Cobos, Silvia; Kooistra-Smid, Anna M D; Raangs, Erwin C; Rosema, Sigrid; Veloo, Alida C M; Zhou, Kai; Friedrich, Alexander W; Rossen, John W A

2017-02-10

Current molecular diagnostics of human pathogens provide limited information that is often not sufficient for outbreak and transmission investigation. Next generation sequencing (NGS) determines the DNA sequence of a complete bacterial genome in a single sequence run, and from these data, information on resistance and virulence, as well as information for typing is obtained, useful for outbreak investigation. The obtained genome data can be further used for the development of an outbreak-specific screening test. In this review, a general introduction to NGS is presented, including the library preparation and the major characteristics of the most common NGS platforms, such as the MiSeq (Illumina) and the Ion PGM™ (ThermoFisher). An overview of the software used for NGS data analyses used at the medical microbiology diagnostic laboratory in the University Medical Center Groningen in The Netherlands is given. Furthermore, applications of NGS in the clinical setting are described, such as outbreak management, molecular case finding, characterization and surveillance of pathogens, rapid identification of bacteria using the 16S-23S rRNA region, taxonomy, metagenomics approaches on clinical samples, and the determination of the transmission of zoonotic micro-organisms from animals to humans. Finally, we share our vision on the use of NGS in personalised microbiology in the near future, pointing out specific requirements. Copyright © 2016 The Author(s). Published by Elsevier B.V. All rights reserved.

Reprint of "Application of next generation sequencing in clinical microbiology and infection prevention".

PubMed

Deurenberg, Ruud H; Bathoorn, Erik; Chlebowicz, Monika A; Couto, Natacha; Ferdous, Mithila; García-Cobos, Silvia; Kooistra-Smid, Anna M D; Raangs, Erwin C; Rosema, Sigrid; Veloo, Alida C M; Zhou, Kai; Friedrich, Alexander W; Rossen, John W A

2017-05-20

Current molecular diagnostics of human pathogens provide limited information that is often not sufficient for outbreak and transmission investigation. Next generation sequencing (NGS) determines the DNA sequence of a complete bacterial genome in a single sequence run, and from these data, information on resistance and virulence, as well as information for typing is obtained, useful for outbreak investigation. The obtained genome data can be further used for the development of an outbreak-specific screening test. In this review, a general introduction to NGS is presented, including the library preparation and the major characteristics of the most common NGS platforms, such as the MiSeq (Illumina) and the Ion PGM™ (ThermoFisher). An overview of the software used for NGS data analyses used at the medical microbiology diagnostic laboratory in the University Medical Center Groningen in The Netherlands is given. Furthermore, applications of NGS in the clinical setting are described, such as outbreak management, molecular case finding, characterization and surveillance of pathogens, rapid identification of bacteria using the 16S-23S rRNA region, taxonomy, metagenomics approaches on clinical samples, and the determination of the transmission of zoonotic micro-organisms from animals to humans. Finally, we share our vision on the use of NGS in personalised microbiology in the near future, pointing out specific requirements. Copyright © 2017. Published by Elsevier B.V.
Maximizing the potential of cropping systems for nematode management.

PubMed

Noe, J P; Sasser, J N; Imbriani, J L

1991-07-01

Quantitative techniques were used to analyze and determine optimal potential profitability of 3-year rotations of cotton, Gossypium hirsutum cv. Coker 315, and soybean, Glycine max cv. Centennial, with increasing population densities of Hoplolaimus columbus. Data collected from naturally infested on-farm research plots were combined with economic information to construct a microcomputer spreadsheet analysis of the cropping system. Nonlinear mathematical functions were fitted to field data to represent damage functions and population dynamic curves. Maximum yield losses due to H. columbus were estimated to be 20% on cotton and 42% on soybean. Maximum at-harvest population densities were calculated to be 182/100 cm(3) soil for cotton and 149/100 cm(3) soil for soybean. Projected net incomes ranged from a $17.74/ha net loss for the soybean-cotton-soybean sequence to a net profit of $46.80/ha for the cotton-soybean-cotton sequence. The relative profitability of various rotations changed as nematode densities increased, indicating economic thresholds for recommending alternative crop sequences. The utility and power of quantitative optimization was demonstrated for comparisons of rotations under different economic assumptions and with other management alternatives.
Ten years of maintaining and expanding a microbial genome and metagenome analysis system.

PubMed

Markowitz, Victor M; Chen, I-Min A; Chu, Ken; Pati, Amrita; Ivanova, Natalia N; Kyrpides, Nikos C

2015-11-01

Launched in March 2005, the Integrated Microbial Genomes (IMG) system is a comprehensive data management system that supports multidimensional comparative analysis of genomic data. At the core of the IMG system is a data warehouse that contains genome and metagenome datasets sequenced at the Joint Genome Institute or provided by scientific users, as well as public genome datasets available at the National Center for Biotechnology Information Genbank sequence data archive. Genomes and metagenome datasets are processed using IMG's microbial genome and metagenome sequence data processing pipelines and are integrated into the data warehouse using IMG's data integration toolkits. Microbial genome and metagenome application specific data marts and user interfaces provide access to different subsets of IMG's data and analysis toolkits. This review article revisits IMG's original aims, highlights key milestones reached by the system during the past 10 years, and discusses the main challenges faced by a rapidly expanding system, in particular the complexity of maintaining such a system in an academic setting with limited budgets and computing and data management infrastructure. Copyright © 2015 Elsevier Ltd. All rights reserved.
Automated plant, production management system

NASA Astrophysics Data System (ADS)

Aksenova, V. I.; Belov, V. I.

1984-12-01

The development of a complex of tasks for the operational management of production (OUP) within the framework of an automated system for production management (ASUP) shows that it is impossible to have effective computations without reliable initial information. The influence of many factors involving the production and economic activity of the entire enterprise upon the plan and course of production are considered. It is suggested that an adequate model should be available which covers all levels of the hierarchical system: workplace, section (bridgade), shop, enterprise, and the model should be incorporated into the technological sequence of performance and there should be provisions for an adequate man machine system.
Genome-derived vaccines.

PubMed

De Groot, Anne S; Rappuoli, Rino

2004-02-01

Vaccine research entered a new era when the complete genome of a pathogenic bacterium was published in 1995. Since then, more than 97 bacterial pathogens have been sequenced and at least 110 additional projects are now in progress. Genome sequencing has also dramatically accelerated: high-throughput facilities can draft the sequence of an entire microbe (two to four megabases) in 1 to 2 days. Vaccine developers are using microarrays, immunoinformatics, proteomics and high-throughput immunology assays to reduce the truly unmanageable volume of information available in genome databases to a manageable size. Vaccines composed by novel antigens discovered from genome mining are already in clinical trials. Within 5 years we can expect to see a novel class of vaccines composed by genome-predicted, assembled and engineered T- and Bcell epitopes. This article addresses the convergence of three forces--microbial genome sequencing, computational immunology and new vaccine technologies--that are shifting genome mining for vaccines onto the forefront of immunology research.
Semantic integration of information about orthologs and diseases: the OGO system.

PubMed

Miñarro-Gimenez, Jose Antonio; Egaña Aranguren, Mikel; Martínez Béjar, Rodrigo; Fernández-Breis, Jesualdo Tomás; Madrid, Marisa

2011-12-01

Semantic Web technologies like RDF and OWL are currently applied in life sciences to improve knowledge management by integrating disparate information. Many of the systems that perform such task, however, only offer a SPARQL query interface, which is difficult to use for life scientists. We present the OGO system, which consists of a knowledge base that integrates information of orthologous sequences and genetic diseases, providing an easy to use ontology-constrain driven query interface. Such interface allows the users to define SPARQL queries through a graphical process, therefore not requiring SPARQL expertise. Copyright © 2011 Elsevier Inc. All rights reserved.
Corruption of genomic databases with anomalous sequence.

PubMed

Lamperti, E D; Kittelberger, J M; Smith, T F; Villa-Komaroff, L

1992-06-11

We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%.
Using SQL Databases for Sequence Similarity Searching and Analysis.

PubMed

Pearson, William R; Mackey, Aaron J

2017-09-13

Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
NGS Catalog: A Database of Next Generation Sequencing Studies in Humans

PubMed Central

Xia, Junfeng; Wang, Qingguo; Jia, Peilin; Wang, Bing; Pao, William; Zhao, Zhongming

2015-01-01

Next generation sequencing (NGS) technologies have been rapidly applied in biomedical and biological research since its advent only a few years ago, and they are expected to advance at an unprecedented pace in the following years. To provide the research community with a comprehensive NGS resource, we have developed the database Next Generation Sequencing Catalog (NGS Catalog, http://bioinfo.mc.vanderbilt.edu/NGS/index.html), a continually updated database that collects, curates and manages available human NGS data obtained from published literature. NGS Catalog deposits publication information of NGS studies and their mutation characteristics (SNVs, small insertions/deletions, copy number variations, and structural variants), as well as mutated genes and gene fusions detected by NGS. Other functions include user data upload, NGS general analysis pipelines, and NGS software. NGS Catalog is particularly useful for investigators who are new to NGS but would like to take advantage of these powerful technologies for their own research. Finally, based on the data deposited in NGS Catalog, we summarized features and findings from whole exome sequencing, whole genome sequencing, and transcriptome sequencing studies for human diseases or traits. PMID:22517761
Whole genome sequencing in the prevention and control of Staphylococcus aureus infection.

PubMed

Price, J R; Didelot, X; Crook, D W; Llewelyn, M J; Paul, J

2013-01-01

Staphylococcus aureus remains a leading cause of hospital-acquired infection but weaknesses inherent in currently available typing methods impede effective infection prevention and control. The high resolution offered by whole genome sequencing has the potential to revolutionise our understanding and management of S. aureus infection. To outline the practicalities of whole genome sequencing and discuss how it might shape future infection control practice. We review conventional typing methods and compare these with the potential offered by whole genome sequencing. In contrast with conventional methods, whole genome sequencing discriminates down to single nucleotide differences and allows accurate characterisation of transmission events and outbreaks and additionally provides information about the genetic basis of phenotypic characteristics, including antibiotic susceptibility and virulence. However, translating its potential into routine practice will depend on affordability, acceptable turnaround times and on creating a reliable standardised bioinformatic infrastructure. Whole genome sequencing has the potential to provide a universal test that facilitates outbreak investigation, enables the detection of emerging strains and predicts their clinical importance. Copyright © 2012 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.
Omics Metadata Management Software v. 1 (OMMS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and to perform bioinformatics analyses and information management tasks via a simple and intuitive web-based interface. Several use cases with short-read sequence datasets are provided to showcase the full functionality of the OMMS, from metadata curation tasks, to bioinformatics analyses and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for web-based deployment supporting geographically dispersed research teams. Our software was developed with open-source bundles, is flexible, extensible and easily installedmore » and run by operators with general system administration and scripting language literacy.« less
Decentralized Resource Management in Distributed Computer Systems.

DTIC Science & Technology

1982-02-01

directly exchanging user state information. Eventcounts and sequencers correspond to semaphores in the sense that synchronization primitives are used to...and techniques are required to achieve synchronization in distributed computers without reliance on any centralized entity such as a semaphore ...known solutions to the access synchronization problem was Dijkstra’s semaphore [12]. The importance of the semaphore is that it correctly addresses the
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.

PubMed

O'Leary, Nuala A; Wright, Mathew W; Brister, J Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S; Kodali, Vamsi K; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M; Murphy, Michael R; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H; Rausch, Daniel; Riddick, Lillian D; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E; Vatsan, Anjana R; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J; Kimchi, Avi; Tatusova, Tatiana; DiCuccio, Michael; Kitts, Paul; Murphy, Terence D; Pruitt, Kim D

2016-01-04

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Implementing genomic medicine in pathology.

PubMed

Williams, Eli S; Hegde, Madhuri

2013-07-01

The finished sequence of the Human Genome Project, published 50 years after Watson and Crick's seminal paper on the structure of DNA, pushed human genetics into the public eye and ushered in the genomic era. A significant, if overlooked, aspect of the race to complete the genome was the technology that propelled scientists to the finish line. DNA sequencing technologies have become more standardized, automated, and capable of higher throughput. This technology has continued to grow at an astounding rate in the decade since the Human Genome Project was completed. Today, massively parallel sequencing, or next-generation sequencing (NGS), allows the detection of genetic variants across the entire genome. This ability has led to the identification of new causes of disease and is changing the way we categorize, treat, and manage disease. NGS approaches such as whole-exome sequencing and whole-genome sequencing are rapidly becoming an affordable genetic testing strategy for the clinical laboratory. One test can now provide vast amounts of health information pertaining not only to the disease of interest, but information that may also predict adult-onset disease, reveal carrier status for a rare disease and predict drug responsiveness. The issue of what to do with these incidental findings, along with questions pertaining to NGS testing strategies, data interpretation and storage, and applying genetic testing results into patient care, remains without a clear answer. This review will explore these issues and others relevant to the implementation of NGS in the clinical laboratory.
Aftershock communication during the Canterbury Earthquakes, New Zealand: implications for response and recovery in the built environment

USGS Publications Warehouse

Julia Becker,; Wein, Anne; Sally Potter,; Emma Doyle,; Ratliff, Jamie L.

2015-01-01

On 4 September 2010, a Mw7.1 earthquake occurred in Canterbury, New Zealand. Following the initial earthquake, an aftershock sequence was initiated, with the most significant aftershock being a Mw6.3 earthquake occurring on 22 February 2011. This aftershock caused severe damage to the city of Christchurch and building failures that killed 185 people. During the aftershock sequence it became evident that effective communication of aftershock information (e.g., history and forecasts) was imperative to assist with decision making during the response and recovery phases of the disaster, as well as preparedness for future aftershock events. As a consequence, a joint JCDR-USGS research project was initiated to investigate: • How aftershock information was communicated to organisations and to the public; • How people interpreted that information; • What people did in response to receiving that information; • What information people did and did not need; and • What decision-making challenges were encountered relating to aftershocks. Research was conducted by undertaking focus group meetings and interviews with a range of information providers and users, including scientists and science advisors, emergency managers and responders, engineers, communication officers, businesses, critical infrastructure operators, elected officials, and the public. The interviews and focus group meetings were recorded and transcribed, and key themes were identified. This paper focuses on the aftershock information needs for decision-making about the built environment post-earthquake, including those involved in response (e.g., for building assessment and management), recovery/reduction (e.g., the development of new building standards), and readiness (e.g. between aftershocks). The research has found that the communication of aftershock information varies with time, is contextual, and is affected by interactions among roles, by other information, and by decision objectives. A number of general and specific insights into improving the communication of aftershock information are provided.
SGDB: a database of synthetic genes re-designed for optimizing protein over-expression.

PubMed

Wu, Gang; Zheng, Yuanpu; Qureshi, Imran; Zin, Htar Thant; Beck, Tyler; Bulka, Blazej; Freeland, Stephen J

2007-01-01

Here we present the Synthetic Gene Database (SGDB): a relational database that houses sequences and associated experimental information on synthetic (artificially engineered) genes from all peer-reviewed studies published to date. At present, the database comprises information from more than 200 published experiments. This resource not only provides reference material to guide experimentalists in designing new genes that improve protein expression, but also offers a dataset for analysis by bioinformaticians who seek to test ideas regarding the underlying factors that influence gene expression. The SGDB was built under MySQL database management system. We also offer an XML schema for standardized data description of synthetic genes. Users can access the database at http://www.evolvingcode.net/codon/sgdb/index.php, or batch downloads all information through XML files. Moreover, users may visually compare the coding sequences of a synthetic gene and its natural counterpart with an integrated web tool at http://www.evolvingcode.net/codon/sgdb/aligner.php, and discuss questions, findings and related information on an associated e-forum at http://www.evolvingcode.net/forum/viewforum.php?f=27.
Improving molecular diagnosis in epilepsy by a dedicated high-throughput sequencing platform.

PubMed

Della Mina, Erika; Ciccone, Roberto; Brustia, Francesca; Bayindir, Baran; Limongelli, Ivan; Vetro, Annalisa; Iascone, Maria; Pezzoli, Laura; Bellazzi, Riccardo; Perotti, Gianfranco; De Giorgis, Valentina; Lunghi, Simona; Coppola, Giangennaro; Orcesi, Simona; Merli, Pietro; Savasta, Salvatore; Veggiotti, Pierangelo; Zuffardi, Orsetta

2015-03-01

We analyzed by next-generation sequencing (NGS) 67 epilepsy genes in 19 patients with different types of either isolated or syndromic epileptic disorders and in 15 controls to investigate whether a quick and cheap molecular diagnosis could be provided. The average number of nonsynonymous and splice site mutations per subject was similar in the two cohorts indicating that, even with relatively small targeted platforms, finding the disease gene is not an univocal process. Our diagnostic yield was 47% with nine cases in which we identified a very likely causative mutation. In most of them no interpretation would have been possible in absence of detailed phenotype and familial information. Seven out of 19 patients had a phenotype suggesting the involvement of a specific gene. Disease-causing mutations were found in six of these cases. Among the remaining patients, we could find a probably causative mutation only in three. None of the genes affected in the latter cases had been suspected a priori. Our protocol requires 8-10 weeks including the investigation of the parents with a cost per patient comparable to sequencing of 1-2 medium-to-large-sized genes by conventional techniques. The platform we used, although providing much less information than whole-exome or whole-genome sequencing, has the advantage that can also be run on 'benchtop' sequencers combining rapid turnaround times with higher manageability.
Viral genome analysis and knowledge management.

PubMed

Kuiken, Carla; Yoon, Hyejin; Abfalterer, Werner; Gaschen, Brian; Lo, Chienchi; Korber, Bette

2013-01-01

One of the challenges of genetic data analysis is to combine information from sources that are distributed around the world and accessible through a wide array of different methods and interfaces. The HIV database and its footsteps, the hepatitis C virus (HCV) and hemorrhagic fever virus (HFV) databases, have made it their mission to make different data types easily available to their users. This involves a large amount of behind-the-scenes processing, including quality control and analysis of the sequences and their annotation. Gene and protein sequences are distilled from the sequences that are stored in GenBank; to this end, both submitter annotation and script-generated sequences are used. Alignments of both nucleotide and amino acid sequences are generated, manually curated, distilled into an alignment model, and regenerated in an iterative cycle that results in ever better new alignments. Annotation of epidemiological and clinical information is parsed, checked, and added to the database. User interfaces are updated, and new interfaces are added based upon user requests. Vital for its success, the database staff are heavy users of the system, which enables them to fix bugs and find opportunities for improvement. In this chapter we describe some of the infrastructure that keeps these heavily used analysis platforms alive and vital after nearly 25 years of use. The database/analysis platforms described in this chapter can be accessed at http://hiv.lanl.gov http://hcv.lanl.gov http://hfv.lanl.gov.
Clinical genomics information management software linking cancer genome sequence and clinical decisions.

PubMed

Watt, Stuart; Jiao, Wei; Brown, Andrew M K; Petrocelli, Teresa; Tran, Ben; Zhang, Tong; McPherson, John D; Kamel-Reid, Suzanne; Bedard, Philippe L; Onetto, Nicole; Hudson, Thomas J; Dancey, Janet; Siu, Lillian L; Stein, Lincoln; Ferretti, Vincent

2013-09-01

Using sequencing information to guide clinical decision-making requires coordination of a diverse set of people and activities. In clinical genomics, the process typically includes sample acquisition, template preparation, genome data generation, analysis to identify and confirm variant alleles, interpretation of clinical significance, and reporting to clinicians. We describe a software application developed within a clinical genomics study, to support this entire process. The software application tracks patients, samples, genomic results, decisions and reports across the cohort, monitors progress and sends reminders, and works alongside an electronic data capture system for the trial's clinical and genomic data. It incorporates systems to read, store, analyze and consolidate sequencing results from multiple technologies, and provides a curated knowledge base of tumor mutation frequency (from the COSMIC database) annotated with clinical significance and drug sensitivity to generate reports for clinicians. By supporting the entire process, the application provides deep support for clinical decision making, enabling the generation of relevant guidance in reports for verification by an expert panel prior to forwarding to the treating physician. Copyright © 2013 Elsevier Inc. All rights reserved.
Mobile Genome Express (MGE): A comprehensive automatic genetic analyses pipeline with a mobile device.

PubMed

Yoon, Jun-Hee; Kim, Thomas W; Mendez, Pedro; Jablons, David M; Kim, Il-Jin

2017-01-01

The development of next-generation sequencing (NGS) technology allows to sequence whole exomes or genome. However, data analysis is still the biggest bottleneck for its wide implementation. Most laboratories still depend on manual procedures for data handling and analyses, which translates into a delay and decreased efficiency in the delivery of NGS results to doctors and patients. Thus, there is high demand for developing an automatic and an easy-to-use NGS data analyses system. We developed comprehensive, automatic genetic analyses controller named Mobile Genome Express (MGE) that works in smartphones or other mobile devices. MGE can handle all the steps for genetic analyses, such as: sample information submission, sequencing run quality check from the sequencer, secured data transfer and results review. We sequenced an Actrometrix control DNA containing multiple proven human mutations using a targeted sequencing panel, and the whole analysis was managed by MGE, and its data reviewing program called ELECTRO. All steps were processed automatically except for the final sequencing review procedure with ELECTRO to confirm mutations. The data analysis process was completed within several hours. We confirmed the mutations that we have identified were consistent with our previous results obtained by using multi-step, manual pipelines.

Microbial forensics: fiber optic microarray subtyping of Bacillus anthracis

NASA Astrophysics Data System (ADS)

Shepard, Jason R. E.

2009-05-01

The past decade has seen increased development and subsequent adoption of rapid molecular techniques involving DNA analysis for detection of pathogenic microorganisms, also termed microbial forensics. The continued accumulation of microbial sequence information in genomic databases now better positions the field of high-throughput DNA analysis to proceed in a more manageable fashion. The potential to build off of these databases exists as technology continues to develop, which will enable more rapid, cost effective analyses. This wealth of genetic information, along with new technologies, has the potential to better address some of the current problems and solve the key issues involved in DNA analysis of pathogenic microorganisms. To this end, a high density fiber optic microarray has been employed, housing numerous DNA sequences simultaneously for detection of various pathogenic microorganisms, including Bacillus anthracis, among others. Each organism is analyzed with multiple sequences and can be sub-typed against other closely related organisms. For public health labs, real-time PCR methods have been developed as an initial preliminary screen, but culture and growth are still considered the gold standard. Technologies employing higher throughput than these standard methods are better suited to capitalize on the limitless potential garnered from the sequence information. Microarray analyses are one such format positioned to exploit this potential, and our array platform is reusable, allowing repetitive tests on a single array, providing an increase in throughput and decrease in cost, along with a certainty of detection, down to the individual strain level.
Transforming microbial genotyping: a robotic pipeline for genotyping bacterial strains.

PubMed

O'Farrell, Brian; Haase, Jana K; Velayudhan, Vimalkumar; Murphy, Ronan A; Achtman, Mark

2012-01-01

Microbial genotyping increasingly deals with large numbers of samples, and data are commonly evaluated by unstructured approaches, such as spread-sheets. The efficiency, reliability and throughput of genotyping would benefit from the automation of manual manipulations within the context of sophisticated data storage. We developed a medium- throughput genotyping pipeline for MultiLocus Sequence Typing (MLST) of bacterial pathogens. This pipeline was implemented through a combination of four automated liquid handling systems, a Laboratory Information Management System (LIMS) consisting of a variety of dedicated commercial operating systems and programs, including a Sample Management System, plus numerous Python scripts. All tubes and microwell racks were bar-coded and their locations and status were recorded in the LIMS. We also created a hierarchical set of items that could be used to represent bacterial species, their products and experiments. The LIMS allowed reliable, semi-automated, traceable bacterial genotyping from initial single colony isolation and sub-cultivation through DNA extraction and normalization to PCRs, sequencing and MLST sequence trace evaluation. We also describe robotic sequencing to facilitate cherrypicking of sequence dropouts. This pipeline is user-friendly, with a throughput of 96 strains within 10 working days at a total cost of < €25 per strain. Since developing this pipeline, >200,000 items were processed by two to three people. Our sophisticated automated pipeline can be implemented by a small microbiology group without extensive external support, and provides a general framework for semi-automated bacterial genotyping of large numbers of samples at low cost.
Transforming Microbial Genotyping: A Robotic Pipeline for Genotyping Bacterial Strains

PubMed Central

Velayudhan, Vimalkumar; Murphy, Ronan A.; Achtman, Mark

2012-01-01

Microbial genotyping increasingly deals with large numbers of samples, and data are commonly evaluated by unstructured approaches, such as spread-sheets. The efficiency, reliability and throughput of genotyping would benefit from the automation of manual manipulations within the context of sophisticated data storage. We developed a medium- throughput genotyping pipeline for MultiLocus Sequence Typing (MLST) of bacterial pathogens. This pipeline was implemented through a combination of four automated liquid handling systems, a Laboratory Information Management System (LIMS) consisting of a variety of dedicated commercial operating systems and programs, including a Sample Management System, plus numerous Python scripts. All tubes and microwell racks were bar-coded and their locations and status were recorded in the LIMS. We also created a hierarchical set of items that could be used to represent bacterial species, their products and experiments. The LIMS allowed reliable, semi-automated, traceable bacterial genotyping from initial single colony isolation and sub-cultivation through DNA extraction and normalization to PCRs, sequencing and MLST sequence trace evaluation. We also describe robotic sequencing to facilitate cherrypicking of sequence dropouts. This pipeline is user-friendly, with a throughput of 96 strains within 10 working days at a total cost of < €25 per strain. Since developing this pipeline, >200,000 items were processed by two to three people. Our sophisticated automated pipeline can be implemented by a small microbiology group without extensive external support, and provides a general framework for semi-automated bacterial genotyping of large numbers of samples at low cost. PMID:23144721
Sturgeon conservation genomics: SNP discovery and validation using RAD sequencing.

PubMed

Ogden, R; Gharbi, K; Mugue, N; Martinsohn, J; Senn, H; Davey, J W; Pourkazemi, M; McEwing, R; Eland, C; Vidotto, M; Sergeev, A; Congiu, L

2013-06-01

Caviar-producing sturgeons belonging to the genus Acipenser are considered to be one of the most endangered species groups in the world. Continued overfishing in spite of increasing legislation, zero catch quotas and extensive aquaculture production have led to the collapse of wild stocks across Europe and Asia. The evolutionary relationships among Adriatic, Russian, Persian and Siberian sturgeons are complex because of past introgression events and remain poorly understood. Conservation management, traceability and enforcement suffer a lack of appropriate DNA markers for the genetic identification of sturgeon at the species, population and individual level. This study employed RAD sequencing to discover and characterize single nucleotide polymorphism (SNP) DNA markers for use in sturgeon conservation in these four tetraploid species over three biological levels, using a single sequencing lane. Four population meta-samples and eight individual samples from one family were barcoded separately before sequencing. Analysis of 14.4 Gb of paired-end RAD data focused on the identification of SNPs in the paired-end contig, with subsequent in silico and empirical validation of candidate markers. Thousands of putatively informative markers were identified including, for the first time, SNPs that show population-wide differentiation between Russian and Persian sturgeons, representing an important advance in our ability to manage these cryptic species. The results highlight the challenges of genotyping-by-sequencing in polyploid taxa, while establishing the potential genetic resources for developing a new range of caviar traceability and enforcement tools. © 2013 John Wiley & Sons Ltd.
Molecular Diagnostic Experience of Whole-Exome Sequencing in Adult Patients

PubMed Central

Posey, Jennifer E.; Rosenfeld, Jill A.; James, Regis A.; Bainbridge, Matthew; Niu, Zhiyv; Wang, Xia; Dhar, Shweta; Wiszniewski, Wojciech; Akdemir, Zeynep H.C.; Gambin, Tomasz; Xia, Fan; Person, Richard E.; Walkiewicz, Magdalena; Shaw, Chad A.; Sutton, V. Reid; Beaudet, Arthur L.; Muzny, Donna; Eng, Christine M.; Yang, Yaping; Gibbs, Richard A.; Lupski, James R.; Boerwinkle, Eric; Plon, Sharon E.

2015-01-01

Purpose Whole exome sequencing (WES) is increasingly used as a diagnostic tool in medicine, but prior reports focus on predominantly pediatric cohorts with neurologic or developmental disorders. We describe the diagnostic yield and characteristics of whole exome sequencing in adults. Methods We performed a retrospective analysis of consecutive WES reports for adults from a diagnostic laboratory. Phenotype composition was determined using Human Phenotype Ontology terms. Results Molecular diagnoses were reported for 17.5% (85/486) of adults, lower than a primarily pediatric population (25.2%; p=0.0003); the diagnostic rate was higher (23.9%) in those 18–30 years of age compared to patients over 30 years (10.4%; p=0.0001). Dual Mendelian diagnoses contributed to 7% of diagnoses, revealing blended phenotypes. Diagnoses were more frequent among individuals with abnormalities of the nervous system, skeletal system, head/neck, and growth. Diagnostic rate was independent of family history information, and de novo mutations contributed to 61.4% of autosomal dominant diagnoses. Conclusion Early WES experience in adults demonstrates molecular diagnoses in a substantial proportion of patients, informing clinical management, recurrence risk and recommendations for relatives. A positive family history was not predictive, consistent with molecular diagnoses often revealed by de novo events, informing the Mendelian basis of genetic disease in adults. PMID:26633545
Molecular Approaches to Thyroid Cancer Diagnosis

PubMed Central

Hsiao, Susan J.; Nikiforov, Yuri E.

2014-01-01

Thyroid nodules are common, and the accurate diagnosis of cancer or benign disease is important for the effective clinical management of these patients. Molecular markers are a helpful diagnostic tool, particularly for cytologically indeterminate thyroid nodules. In the past few years, significant progress has been made in developing molecular markers for clinical use in fine needle aspiration (FNA) specimens, including gene mutation panels and gene expression classifiers. With the availability of next generation sequencing technology, gene mutation panels can be expanded to interrogate multiple genes simultaneously and to provide yet more accurate diagnostic information. In addition, recently several new molecular markers in thyroid cancer have been identified that offer diagnostic, prognostic, and therapeutic information that could potentially be of value in guiding individualized management of patients with thyroid nodules. PMID:24829266
Replacement Sequence of Events Generator

NASA Technical Reports Server (NTRS)

Fisher, Forest; Gladden, Daniel Wenkert Roy; Khanampompan, Teerpat

2008-01-01

The soeWINDOW program automates the generation of an ITAR (International Traffic in Arms Regulations)-compliant sub-RSOE (Replacement Sequence of Events) by extracting a specified temporal window from an RSOE while maintaining page header information. RSOEs contain a significant amount of information that is not ITAR-compliant, yet that foreign partners need to see for command details to their instrument, as well as the surrounding commands that provide context for validation. soeWINDOW can serve as an example of how command support products can be made ITAR-compliant for future missions. This software is a Perl script intended for use in the mission operations UNIX environment. It is designed for use to support the MRO (Mars Reconnaissance Orbiter) instrument team. The tool also provides automated DOM (Distributed Object Manager) storage into the special ITAR-okay DOM collection, and can be used for creating focused RSOEs for product review by any of the MRO teams.
JCoDA: a tool for detecting evolutionary selection.

PubMed

Steinway, Steven N; Dannenfelser, Ruth; Laucius, Christopher D; Hayes, James E; Nayak, Sudhir

2010-05-27

The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda.
JCoDA: a tool for detecting evolutionary selection

PubMed Central

2010-01-01

Background The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. Results JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. Conclusions JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda. PMID:20507581
Data standardization. The key to effective management

USGS Publications Warehouse

Wagner, C. Russell

1991-01-01

Effective management of the nation's water resources is dependent upon accurate and consistent hydrologic information. Before the emergence of environmental concerns in the 1960's, most hydrologic information was collected by the U.S. Geological Survey and other Federal agencies that used fairly consistent methods and equipment. In the past quarter century, however, increased environmental awareness has resulted in an expansion of hydrologic data collection not only by Federal agencies, but also by state and municipal governments, university investigators, and private consulting firms. The acceptance and use of standard methods of collecting and processing hydrologic data would contribute to cost savings and to greater credibility of flow information vital to responsible assessment and management of the nation's water resources. This paper traces the evolution of the requirements and uses of open-channel flow information in the U.S., and the sequence of efforts to standardize the methods used to obtain this information in the future. The variable nature of naturally flowing rivers results in continually changing hydraulic properties of their channels. Those persons responsible for measurement of water flowing in open channels (streamflow) must use a large amount of judgement in the selection of appropriate equipment and technique to obtain accurate flow information. Standardization of the methods used in the measurement of streamflow is essential to assure consistency of data, but must also allow considerable latitude for individual judgement to meet constantly changing field conditions.
75 FR 41790 - Address Management Services-Elimination of the Manual Card Option for Address Sequencing Services

Federal Register 2010, 2011, 2012, 2013, 2014

2010-07-19

... Electronic Address Sequencing (EAS) service processes a customer's addresses file for walk sequence and/or... POSTAL SERVICE 39 CFR Part 111 Address Management Services--Elimination of the Manual Card Option for Address Sequencing Services AGENCY: Postal Service TM . ACTION: Proposed rule. SUMMARY: The Postal...
DNA Metabarcoding of Amazonian Ichthyoplankton Swarms.

PubMed

Maggia, M E; Vigouroux, Y; Renno, J F; Duponchelle, F; Desmarais, E; Nunez, J; García-Dávila, C; Carvajal-Vallejos, F M; Paradis, E; Martin, J F; Mariac, C

2017-01-01

Tropical rainforests harbor extraordinary biodiversity. The Amazon basin is thought to hold 30% of all river fish species in the world. Information about the ecology, reproduction, and recruitment of most species is still lacking, thus hampering fisheries management and successful conservation strategies. One of the key understudied issues in the study of population dynamics is recruitment. Fish larval ecology in tropical biomes is still in its infancy owing to identification difficulties. Molecular techniques are very promising tools for the identification of larvae at the species level. However, one of their limits is obtaining individual sequences with large samples of larvae. To facilitate this task, we developed a new method based on the massive parallel sequencing capability of next generation sequencing (NGS) coupled with hybridization capture. We focused on the mitochondrial marker cytochrome oxidase I (COI). The results obtained using the new method were compared with individual larval sequencing. We validated the ability of the method to identify Amazonian catfish larvae at the species level and to estimate the relative abundance of species in batches of larvae. Finally, we applied the method and provided evidence for strong temporal variation in reproductive activity of catfish species in the Ucayalí River in the Peruvian Amazon. This new time and cost effective method enables the acquisition of large datasets, paving the way for a finer understanding of reproductive dynamics and recruitment patterns of tropical fish species, with major implications for fisheries management and conservation.
Multi-Data Approach for remote sensing-based regional crop rotation mapping: A case study for the Rur catchment, Germany

NASA Astrophysics Data System (ADS)

Waldhoff, Guido; Lussem, Ulrike; Bareth, Georg

2017-09-01

Spatial land use information is one of the key input parameters for regional agro-ecosystem modeling. Furthermore, to assess the crop-specific management in a spatio-temporal context accurately, parcel-related crop rotation information is additionally needed. Such data is scarcely available for a regional scale, so that only modeled crop rotations can be incorporated instead. However, the spectrum of the occurring multiannual land use patterns on arable land remains unknown. Thus, this contribution focuses on the mapping of the actually practiced crop rotations in the Rur catchment, located in the western part of Germany. We addressed this by combining multitemporal multispectral remote sensing data, ancillary information and expert-knowledge on crop phenology in a GIS-based Multi-Data Approach (MDA). At first, a methodology for the enhanced differentiation of the major crop types on an annual basis was developed. Key aspects are (i) the usage of physical block data to separate arable land from other land use types, (ii) the classification of remote sensing scenes of specific time periods, which are most favorable for the differentiation of certain crop types, and (iii) the combination of the multitemporal classification results in a sequential analysis strategy. Annual crop maps of eight consecutive years (2008-2015) were combined to a crop sequence dataset to have a profound data basis for the mapping of crop rotations. In most years, the remote sensing data basis was highly fragmented. Nevertheless, our method enabled satisfying crop mapping results. As an example for the annual crop mapping workflow, the procedure and the result of 2015 are illustrated. For the generation of the crop sequence dataset, the eight annual crop maps were geometrically smoothened and integrated into a single vector data layer. The resulting dataset informs about the occurring crop sequence for individual areas on arable land, so that crop rotation schemes can be derived. The resulting dataset reveals that the spectrum of the practiced crop rotations is extremely heterogeneous and contains a large amount of crop sequences, which strongly diverge from model crop rotations. Consequently, the integration of remote sensing-based crop rotation data can considerably reduce uncertainties regarding the management in regional agro-ecosystem modeling. Finally, the developed methods and the results are discussed in detail.
The Cassini Solstice Mission: Streamlining Operations by Sequencing with PIEs

NASA Technical Reports Server (NTRS)

Vandermey, Nancy; Alonge, Eleanor K.; Magee, Kari; Heventhal, William

2014-01-01

The Cassini Solstice Mission (CSM) is the second extended mission phase of the highly successful Cassini/Huygens mission to Saturn. Conducted at a much-reduced funding level, operations for the CSM have been streamlined and simplified significantly. Integration of the science timeline, which involves allocating observation time in a balanced manner to each of the five different science disciplines (with representatives from the twelve different science instruments), has long been a labor-intensive endeavor. Lessons learned from the prime mission (2004-2008) and first extended mission (Equinox mission, 2008-2010) were utilized to design a new process involving PIEs (Pre-Integrated Events) to ensure the highest priority observations for each discipline could be accomplished despite reduced work force and overall simplification of processes. Discipline-level PIE lists were managed by the Science Planning team and graphically mapped to aid timeline deconfliction meetings prior to assigning discrete segments of time to the various disciplines. Periapse segments are generally discipline-focused, with the exception of a handful of PIEs. In addition to all PIEs being documented in a spreadsheet, allocated out-of-discipline PIEs were entered into the Cassini Information Management System (CIMS) well in advance of timeline integration. The disciplines were then free to work the rest of the timeline internally, without the need for frequent interaction, debate, and negotiation with representatives from other disciplines. As a result, the number of integration meetings has been cut back extensively, freeing up workforce. The sequence implementation process was streamlined as well, combining two previous processes (and teams) into one. The new Sequence Implementation Process (SIP) schedules 22 weeks to build each 10-week-long sequence, and only 3 sequence processes overlap. This differs significantly from prime mission during which 5-week-long sequences were built in 24 weeks, with 6 overlapping processes.
Role of magnetic resonance imaging in the management of perianal Crohn's disease.

PubMed

Gallego, Jose C; Echarri, Ana

2018-02-01

Perianal fistulas are a major problem in many patients with Crohn's disease. These are usually complex fistulas that adversely affect patients' quality of life, and their clinical management is difficult. Medical treatment sometimes achieves cessation of discharge and closure of the external opening; however, it is difficult to assess the status of the rest of the fistula tract. Magnetic resonance imaging is the method of choice with which to evaluate the condition of perianal fistulas and allows for assessment of the status of inaccessible areas. Magnetic resonance imaging also allows the clinician to evaluate other perianal manifestations of Crohn's disease that differ from the fistulas. This imaging technique is therefore a fundamental means of patient monitoring. When used in conjunction with assessment of the patient's morphological findings, it provides information that allows for both quantification of disease severity and evaluation of the response to treatment. New types of magnetic resonance sequences are emerging, such as diffusion, perfusion, and magnetisation transfer. These sequences may serve as biomarkers because they provide information reflecting the changes taking place at the molecular level. This will help to shape a new scenario in the early assessment of the response to treatments such as anti-tumour necrosis factor drugs. • MRI is the method of choice with which to evaluate perianal fistulas. • In perianal Crohn's disease, MRI is a fundamental means of patient monitoring. • The usefulness of the Van Assche score for patient monitoring remains unclear. • New MRI sequences' diffusion, perfusion, and magnetisation transfer may serve as biomarkers.
United States Department of Agriculture-Agricultural Research Service: advances in the molecular genetic analysis of insects and their application to pest management.

PubMed

Handler, Alfred M; Beeman, Richard W

2003-01-01

USDA-ARS scientists have made important contributions to the molecular genetic analysis of agriculturally important insects, and have been in the forefront of using this information for the development of new pest management strategies. Advances have been made in the identification and analysis of genetic systems involved in insect development, reproduction and behavior which enable the identification of new targets for control, as well as the development of highly specific insecticidal products. Other studies have been on the leading edge of developing gene transfer technology to better elucidate these biological processes though functional genomics and to develop new transgenic strains for biological control. Important contributions have also been made to the development and use of molecular markers and methodologies to identify and track insect populations. The use of molecular genetic technology and strategies will become increasingly important to pest management as genomic sequencing information becomes available from important pest insects, their targets and other associated organisms.
GIS-based planning system for managing the flow of construction and demolition waste in Brazil.

PubMed

Paz, Diogo Henrique Fernandes da; Lafayette, Kalinny Patrícia Vaz; Sobral, Maria do Carmo

2018-05-01

The objective of this article was to plan a network for municipal management of construction and demolition waste in Brazil with the assistance of a geographic information system, using the city of Recife as a case study. The methodology was carried out in three stages. The first was to map the illegal construction and demolition of waste disposal points across Recife and classify the waste according to its recyclability. In sequence, a method for indicating suitable areas for installation of voluntary delivery points, for small waste generators, are presented. Finally, a method for indicating suitable areas for the installation of trans-shipment and waste sorting areas, developed for large generators, is presented. The results show that a geographic information system is an essential tool in the planning of municipal construction and demolition waste management, in order to facilitate the spatial analysis and control the generation, sorting, collection, transportation, and final destination of construction and demolition waste, increasing the rate of recovery and recycling of materials.
Notes from the field: the economic value chain in disease management organizations.

PubMed

Fetterolf, Donald

2006-12-01

The disease management (DM) "value chain" is composed of a linear series of steps that include operational milestones in the development of knowledge, each stage evolving from the preceding one. As an adaptation of Michael Porter's "value chain" model, the process flow in DM moves along the following path: (1) data/information technology, (2) information generation, (3) analysis, (4) assessment/recommendations, (5) actionable customer plan, and (6) program assessment/reassessment. Each of these stages is managed as a major line of product operations within a DM company or health plan. Metrics around each of the key production variables create benchmark milestones, ongoing management insight into program effectiveness, and potential drivers for activity-based cost accounting pricing models. The value chain process must remain robust from early entry of data and information into the system, through the final presentation and recommendations for our clients if the program is to be effective. For individuals involved in the evaluation or review of DM programs, this framework is an excellent method to visualize the key components and sequence in the process. The value chain model is an excellent way to establish the value of a formal DM program and to create a consultancy relationship with a client involved in purchasing these complex services.
Item Unique Identification Capability Expansion: Established Process Analysis, Cost Benefit Analysis, and Optimal Marking Procedures

DTIC Science & Technology

2014-12-01

chemical etching EDM electrical discharge machine EID enterprise identifier EOSS Engineering Operational Sequencing System F Fahrenheit...Center in Corona , California, released a DoN IUID Marking Guide, which made recommendations on how to mark legacy items. It provides technical...uploaded into the IUID registry managed by the Naval Surface Warfare Center (NSWC) in Corona , California. There is no set amount of information
SpaceOps 1992: Proceedings of the Second International Symposium on Ground Data Systems for Space Mission Operations

NASA Technical Reports Server (NTRS)

1993-01-01

The Second International Symposium featured 135 oral presentations in these 12 categories: Future Missions and Operations; System-Level Architectures; Mission-Specific Systems; Mission and Science Planning and Sequencing; Mission Control; Operations Automation and Emerging Technologies; Data Acquisition; Navigation; Operations Support Services; Engineering Data Analysis of Space Vehicle and Ground Systems; Telemetry Processing, Mission Data Management, and Data Archiving; and Operations Management. Topics focused on improvements in the productivity, effectiveness, efficiency, and quality of mission operations, ground systems, and data acquisition. Also emphasized were accomplishments in management of human factors; use of information systems to improve data retrieval, reporting, and archiving; design and implementation of logistics support for mission operations; and the use of telescience and teleoperations.

FBIS: A regional DNA barcode archival & analysis system for Indian fishes.

PubMed

Nagpure, Naresh Sahebrao; Rashid, Iliyas; Pathak, Ajey Kumar; Singh, Mahender; Singh, Shri Prakash; Sarkar, Uttam Kumar

2012-01-01

DNA barcode is a new tool for taxon recognition and classification of biological organisms based on sequence of a fragment of mitochondrial gene, cytochrome c oxidase I (COI). In view of the growing importance of the fish DNA barcoding for species identification, molecular taxonomy and fish diversity conservation, we developed a Fish Barcode Information System (FBIS) for Indian fishes, which will serve as a regional DNA barcode archival and analysis system. The database presently contains 2334 sequence records of COI gene for 472 aquatic species belonging to 39 orders and 136 families, collected from available published data sources. Additionally, it contains information on phenotype, distribution and IUCN Red List status of fishes. The web version of FBIS was designed using MySQL, Perl and PHP under Linux operating platform to (a) store and manage the acquisition (b) analyze and explore DNA barcode records (c) identify species and estimate genetic divergence. FBIS has also been integrated with appropriate tools for retrieving and viewing information about the database statistics and taxonomy. It is expected that FBIS would be useful as a potent information system in fish molecular taxonomy, phylogeny and genomics. The database is available for free at http://mail.nbfgr.res.in/fbis/
From metaphor to practices: The introduction of "information engineers" into the first DNA sequence database.

PubMed

García-Sancho, Miguel

2011-01-01

This paper explores the introduction of professional systems engineers and information management practices into the first centralized DNA sequence database, developed at the European Molecular Biology Laboratory (EMBL) during the 1980s. In so doing, it complements the literature on the emergence of an information discourse after World War II and its subsequent influence in biological research. By the careers of the database creators and the computer algorithms they designed, analyzing, from the mid-1960s onwards information in biology gradually shifted from a pervasive metaphor to be embodied in practices and professionals such as those incorporated at the EMBL. I then investigate the reception of these database professionals by the EMBL biological staff, which evolved from initial disregard to necessary collaboration as the relationship between DNA, genes, and proteins turned out to be more complex than expected. The trajectories of the database professionals at the EMBL suggest that the initial subject matter of the historiography of genomics should be the long-standing practices that emerged after World War II and to a large extent originated outside biomedicine and academia. Only after addressing these practices, historians may turn to their further disciplinary assemblage in fields such as bioinformatics or biotechnology.
Integrative clinical sequencing in the management of children and young adults with refractory or relapsed cancer

PubMed Central

Mody, Rajen J.; Wu, Yi-Mi; Lonigro, Robert J.; Cao, Xuhong; Roychowdhury, Sameek; Vats, Pankaj; Frank, Kevin M.; Prensner, John R.; Asangani, Irfan; Palanisamy, Nallasivam; Dillman, Jonathan R.; Rabah, Raja M.; Kunju, Laxmi Priya; Everett, Jessica; Raymond, Victoria M.; Ning, Yu; Su, Fengyun; Wang, Rui; Stoffel, Elena M.; Innis, Jeffrey W.; Roberts, J. Scott; Robertson, Patricia L.; Yanik, Gregory; Chamdin, Aghiad; Connelly, James A.; Choi, Sung; Harris, Andrew C.; Kitko, Carrie; Rao, Rama Jasty; Levine, John E.; Castle, Valerie P.; Hutchinson, Raymond J.; Talpaz, Moshe; Robinson, Dan R.; Chinnaiyan, Arul M.

2016-01-01

Importance Cancer is caused by a diverse array of somatic and germline genomic aberrations. Advances in genomic sequencing technologies have improved the ability to detect these molecular aberrations with greater sensitivity. However, integrating them into clinical management in an individualized manner has proven challenging. Objective To evaluate the use of integrative clinical sequencing and genetic counseling in the assessment and treatment of children and young adults with cancer. Design, Settings and Participants An observational, consecutive case series (May 2012–October 2014) of 102 children and young adults (mean age, 10.6; median age, 11.5, range: 0–22 years) with relapsed, refractory, or rare cancer at a single major academic medical center. Exposures Each participant underwent integrative clinical exome (tumor and germline DNA) and transcriptome (tumor RNA) sequencing along with genetic counseling. Results were discussed in a multi-disciplinary Precision Medicine Tumor Board (PMTB) and recommendations were reported to treating physicians and families. Main Outcomes and Measures Proportion of patients with potentially actionable findings (PAF), results of clinical actions based on integrative clinical sequencing (ICS), and estimated proportion of patients or their families at risk for future cancer. PAF was defined as any genomic findings discovered during sequencing analysis that could lead to a 1) change in patient management by providing a targetable molecular aberration, 2) change in diagnosis or risk stratification or 3) provides cancer-related germline findings, which inform patients/families about a potential future risk of various cancers; Results We screened 104 patients and enrolled 102 patients of which 91 (89%) had adequate tumor tissue available to complete sequencing and only these patients were included in all subsequent calculations, including 28 (31%) with hematological malignancies and 63 (69%) with solid tumors. Overall, 42 (46%) patients had PAFs which changed patient management including, 54% (15/28) with hematological malignancies and 43% (27/63) with solid tumors. Overall, individualized actions were taken in 23 of the 91 (25%) patients and families based on actionable ICS findings, including change in treatment in 14 (15%) and genetic counseling for future cancer risk in 9 (10%) patients. 9/91 (10%) of these personalized clinical interventions resulted in ongoing partial clinical remission of 8–16 months duration or help sustain complete clinical remission of 6–21 months duration. All 9 (10%) patients and families with actionable incidental genetic findings agreed to formal genetic counseling and screening. Conclusions and Relevance In this single center case series of children and young adults with relapsed or refractory cancer, incorporation of data from integrative clinical sequencing into clinical management was feasible, revealed potentially actionable findings in 46% of patients, and was associated with change in treatment and family genetic counseling in a small proportion of patients. The lack of a control group limited our ability to judge whether better clinical outcomes were achieved compared to standard care. PMID:26325560
Truck driver fatigue risk assessment and management: a multinational survey.

PubMed

Adams-Guppy, Julie; Guppy, Andrew

2003-06-20

As part of an organizational review of safety, interviews and questionnaire surveys were performed on over 700 commercial goods drivers and their managers within a series of related companies operating across 17 countries. The results examine the reported incidence of fatigue-related problems in drivers and their associations with near miss and accident experience as well as work and organizational factors. Experience of fatigue problems while driving was linked to time of day and rotation of shifts, though most associations were small. There were significant associations found between fatigue experiences and driver and management systems of break taking and route scheduling. The quantitative combined with qualitative information suggested that, where feasible, more flexible approaches to managing the scheduling and sequencing of deliveries assisted drivers in managing their own fatigue problems through appropriate break-taking. The results are interpreted within the overarching principles of risk assessment and risk control.
Framework for Integrating Science Data Processing Algorithms Into Process Control Systems

NASA Technical Reports Server (NTRS)

Mattmann, Chris A.; Crichton, Daniel J.; Chang, Albert Y.; Foster, Brian M.; Freeborn, Dana J.; Woollard, David M.; Ramirez, Paul M.

2011-01-01

A software framework called PCS Task Wrapper is responsible for standardizing the setup, process initiation, execution, and file management tasks surrounding the execution of science data algorithms, which are referred to by NASA as Product Generation Executives (PGEs). PGEs codify a scientific algorithm, some step in the overall scientific process involved in a mission science workflow. The PCS Task Wrapper provides a stable operating environment to the underlying PGE during its execution lifecycle. If the PGE requires a file, or metadata regarding the file, the PCS Task Wrapper is responsible for delivering that information to the PGE in a manner that meets its requirements. If the PGE requires knowledge of upstream or downstream PGEs in a sequence of executions, that information is also made available. Finally, if information regarding disk space, or node information such as CPU availability, etc., is required, the PCS Task Wrapper provides this information to the underlying PGE. After this information is collected, the PGE is executed, and its output Product file and Metadata generation is managed via the PCS Task Wrapper framework. The innovation is responsible for marshalling output Products and Metadata back to a PCS File Management component for use in downstream data processing and pedigree. In support of this, the PCS Task Wrapper leverages the PCS Crawler Framework to ingest (during pipeline processing) the output Product files and Metadata produced by the PGE. The architectural components of the PCS Task Wrapper framework include PGE Task Instance, PGE Config File Builder, Config File Property Adder, Science PGE Config File Writer, and PCS Met file Writer. This innovative framework is really the unifying bridge between the execution of a step in the overall processing pipeline, and the available PCS component services as well as the information that they collectively manage.
Sequence variants in the bovine silent information regulator 6, their linkage and their associations with body measurements and carcass quality traits in Qinchuan cattle.

PubMed

Gui, Linsheng; Jiang, Bijie; Zhang, Yaran; Zan, Linsen

2015-03-15

Silent information regulator 6 (SIRT6) belongs to the family of class III nicotinamide adenine dinucleotide (NAD)-dependent deacetylase and plays an essential role in DNA repair and metabolism. This study was conducted to detect potential polymorphisms of the bovine SIRT6 gene and explore their relationships with body measurement and carcass quality in Qinchuan cattle. Four sequence variants (SVs) were identified in intron 6, exon 7, exon 9, and 3' UTR, via sequencing technology conducted in 468 individual Qinchuan cattle. Eleven different haplotypes were identified, of which two major haplotypes had a frequency of 45.7% (-CACT-) and 14.8% (-CGTC-). Three SVs (SV2, SV3 and SV4) were significantly associated with some of the body measurements and carcass quality traits (P<0.05 or P<0.01), and the H2H7 (CC-GA-TT-TC) diplotype had better performance than other combinations. Our results suggest that some polymorphisms in SIRT6 are associated with production traits and may be used as candidates for marker-assisted selection (MAS) and management in beef cattle breeding programs. Copyright © 2015 Elsevier B.V. All rights reserved.
Microbial community structure and diversity in the soil spatial profile of 5-year-old Robinia pseudoacacia 'Idaho,' determined by 454 sequencing of the 16S RNA gene.

PubMed

Chang, Yanping; Bu, Xiangpan; Niu, Weibo; Xiu, Yu; Wang, Huafang

2013-01-01

Relatively little information is available regarding the variability of microbial communities inhabiting deeper soil layers. We investigated the distribution of soil microbial communities down to 1.2 m in 5-year-old Robinia pseudoacacia 'Idaho' soil by 454 sequencing of the 16S RNA gene. The average number of sequences per sample was 12,802. The Shannon and Chao 1 indices revealed various relative microbial abundances and even distribution of microbial diversity for all evaluated sample depths. The predicted diversity in the topsoil exceeded that of the corresponding subsoil. The changes in the relative abundance of the major soil bacterial phyla showed decreasing, increasing, or no consistent trends with respect to sampling depth. Despite their novelty, members of the new candidate phyla OD1 and TM7 were widespread. Environmental variables affecting the bacterial community within the environment appeared to differ from those reported previously, especially the lack of detectable effect from pH. Overall, we found that the overall relative abundance fluctuated with the physical and chemical properties of the soil, root system, and sampling depth. Such information may facilitate forest soil management.
LIFEdb: a database for functional genomics experiments integrating information from external sources, and serving as a sample tracking system

PubMed Central

Bannasch, Detlev; Mehrle, Alexander; Glatting, Karl-Heinz; Pepperkok, Rainer; Poustka, Annemarie; Wiemann, Stefan

2004-01-01

We have implemented LIFEdb (http://www.dkfz.de/LIFEdb) to link information regarding novel human full-length cDNAs generated and sequenced by the German cDNA Consortium with functional information on the encoded proteins produced in functional genomics and proteomics approaches. The database also serves as a sample-tracking system to manage the process from cDNA to experimental read-out and data interpretation. A web interface enables the scientific community to explore and visualize features of the annotated cDNAs and ORFs combined with experimental results, and thus helps to unravel new features of proteins with as yet unknown functions. PMID:14681468
75 FR 21963 - Regulatory Flexibility Agenda

Federal Register 2010, 2011, 2012, 2013, 2014

2010-04-26

... Materials 3235-AK25 DIVISION OF INVESTMENT MANAGEMENT--Proposed Rule Stage Regulation Sequence Title... 3235-AI17 DIVISION OF INVESTMENT MANAGEMENT--Completed Actions Regulation Sequence Title Identifier... Management Investment Company 3235-AJ11 Shares, Unit Investment Trust Interests, and Municipal Fund...
The Hawaiian Algal Database: a laboratory LIMS and online resource for biodiversity data

PubMed Central

Wang, Norman; Sherwood, Alison R; Kurihara, Akira; Conklin, Kimberly Y; Sauvage, Thomas; Presting, Gernot G

2009-01-01

Background Organization and presentation of biodiversity data is greatly facilitated by databases that are specially designed to allow easy data entry and organized data display. Such databases also have the capacity to serve as Laboratory Information Management Systems (LIMS). The Hawaiian Algal Database was designed to showcase specimens collected from the Hawaiian Archipelago, enabling users around the world to compare their specimens with our photographs and DNA sequence data, and to provide lab personnel with an organizational tool for storing various biodiversity data types. Description We describe the Hawaiian Algal Database, a comprehensive and searchable database containing photographs and micrographs, geo-referenced collecting information, taxonomic checklists and standardized DNA sequence data. All data for individual samples are linked through unique accession numbers. Users can search online for sample information by accession number, numerous levels of taxonomy, or collection site. At the present time the database contains data representing over 2,000 samples of marine, freshwater and terrestrial algae from the Hawaiian Archipelago. These samples are primarily red algae, although other taxa are being added. Conclusion The Hawaiian Algal Database is a digital repository for Hawaiian algal samples and acts as a LIMS for the laboratory. Users can make use of the online search tool to view and download specimen photographs and micrographs, DNA sequences and relevant habitat data, including georeferenced collecting locations. It is publicly available at . PMID:19728892
Molecular Diagnosis of Long-QT syndrome at 10 Days of Life by Rapid Whole Genome Sequencing

PubMed Central

Priest, James R.; Ceresnak, Scott R.; Dewey, Frederick E.; Malloy-Walton, Lindsey E.; Dunn, Kyla; Grove, Megan E.; Perez, Marco V.; Maeda, Katsuhide; Dubin, Anne M.; Ashley, Euan A.

2014-01-01

Background The advent of clinical next generation sequencing is rapidly changing the landscape of rare disease medicine. Molecular diagnosis of long QT syndrome (LQTS) can impact clinical management, including risk stratification and selection of pharmacotherapy based on the type of ion channel affected, but results from current gene panel testing requires 4 to 16 weeks before return to clinicians. Objective A term female infant presented with 2:1 atrioventricular block and ventricular arrhythmias consistent with perinatal LQTS, requiring aggressive treatment including epicardial pacemaker, and cardioverter-defibrillator implantation and sympathectomy on day of life two. We sought to provide a rapid molecular diagnosis for optimization of treatment strategies. Methods We performed CLIA-certified rapid whole genome sequencing (WGS) with a speed-optimized bioinformatics platform to achieve molecular diagnosis at 10 days of life. Results We detected a known pathogenic variant in KCNH2 that was demonstrated to be paternally inherited by followup genotyping. The unbiased assessment of the entire catalog of human genes provided by whole genome sequencing revealed a maternally inherited variant of unknown significance in a novel gene. Conclusions Rapid clinical WGS provides faster and more comprehensive diagnostic information by 10 days of life than standard gene-panel testing. In selected clinical scenarios such as perinatal LQTS, rapid WGS may be able to provide more timely and clinically actionable information than a standard commercial test. PMID:24973560
Integration of Temporal and Ordinal Information During Serial Interception Sequence Learning

PubMed Central

Gobel, Eric W.; Sanchez, Daniel J.; Reber, Paul J.

2011-01-01

The expression of expert motor skills typically involves learning to perform a precisely timed sequence of movements (e.g., language production, music performance, athletic skills). Research examining incidental sequence learning has previously relied on a perceptually-cued task that gives participants exposure to repeating motor sequences but does not require timing of responses for accuracy. Using a novel perceptual-motor sequence learning task, learning a precisely timed cued sequence of motor actions is shown to occur without explicit instruction. Participants learned a repeating sequence through practice and showed sequence-specific knowledge via a performance decrement when switched to an unfamiliar sequence. In a second experiment, the integration of representation of action order and timing sequence knowledge was examined. When either action order or timing sequence information was selectively disrupted, performance was reduced to levels similar to completely novel sequences. Unlike prior sequence-learning research that has found timing information to be secondary to learning action sequences, when the task demands require accurate action and timing information, an integrated representation of these types of information is acquired. These results provide the first evidence for incidental learning of fully integrated action and timing sequence information in the absence of an independent representation of action order, and suggest that this integrative mechanism may play a material role in the acquisition of complex motor skills. PMID:21417511
Reflections on Communicating Science during the Canterbury Earthquake Sequence of 2010-2011, New Zealand

NASA Astrophysics Data System (ADS)

Wein, A. M.; Berryman, K. R.; Jolly, G. E.; Brackley, H. L.; Gledhill, K. R.

2015-12-01

The 2010-2011 Canterbury Earthquake Sequence began with the 4th September 2010 Darfield earthquake (Mw 7.1). Perhaps because there were no deaths, the mood of the city and the government was that high standards of earthquake engineering in New Zealand protected us, and there was a confident attitude to response and recovery. The demand for science and engineering information was of interest but not seen as crucial to policy, business or the public. The 22nd February 2011 Christchurch earthquake (Mw 6.2) changed all that; there was a significant death toll and many injuries. There was widespread collapse of older unreinforced and two relatively modern multi-storey buildings, and major disruption to infrastructure. The contrast in the interest and relevance of the science could not have been greater compared to 5 months previously. Magnitude 5+ aftershocks over a 20 month period resulted in confusion, stress, an inability to define a recovery trajectory, major concerns about whether insurers and reinsurers would continue to provide cover, very high levels of media interest from New Zealand and around the world, and high levels of political risk. As the aftershocks continued there was widespread speculation as to what the future held. During the sequence, the science and engineering sector sought to coordinate and offer timely and integrated advice. However, other than GeoNet, the national geophysical monitoring network, there were few resources devoted to communication, with the result that it was almost always reactive. With hindsight we have identified the need to resource information gathering and synthesis, execute strategic assessments of stakeholder needs, undertake proactive communication, and develop specific information packages for the diversity of users. Overall this means substantially increased resources. Planning is now underway for the science sector to adopt the New Zealand standardised CIMS (Coordinated Incident Management System) structure for management and communication during a crisis, which should help structure and resource the science response needs in future major events.
75 FR 79937 - Regulatory Flexibility Agenda

Federal Register 2010, 2011, 2012, 2013, 2014

2010-12-20

... INVESTMENT MANAGEMENT--Final Rule Stage Regulation Sequence Title Identifier Number Number 617 Temporary Rule Regarding Principal Trades With Certain Advisory Clients 3235-AJ96 DIVISION OF INVESTMENT MANAGEMENT... Sequence Title Identifier Number Number 626 Confirmation of Transactions in Open-End Management Investment...
Fourteenth-Sixteenth Microbial Genomics Conference-2006-2008

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, Jeffrey H

2011-04-18

The concept of an annual meeting on the E. coli genome was formulated at the Banbury Center Conference on the Genome of E. coli in October, 1991. The first meeting was held on September 10-14, 1992 at the University of Wisconsin, and this was followed by a yearly series of meetings, and by an expansion to include The fourteenth meeting took place September 24-28, 2006 at Lake Arrowhead, CA, the fifteenth September 16-20, 2007 at the University of Maryland, College Park, MD, and the sixteenth September 14-18, 2008 at Lake Arrowhead. The full program for the 16th meeting is attached.more » There have been rapid and exciting advances in microbial genomics that now make possible comparing large data sets of sequences from a wide variety of microbial genomes, and from whole microbial communities. Examining the “microbiomes”, the living microbial communities in different host organisms opens up many possibilities for understanding the landscape presented to pathogenic microorganisms. For quite some time there has been a shifting emphasis from pure sequence data to trying to understand how to use that information to solve biological problems. Towards this end new technologies are being developed and improved. Using genetics, functional genomics, and proteomics has been the recent focus of many different laboratories. A key element is the integration of different aspects of microbiology, sequencing technology, analysis techniques, and bioinformatics. The goal of these conference is to provide a regular forum for these interactions to occur. While there have been a number of genome conferences, what distinguishes the Microbial Genomics Conference is its emphasis on bringing together biology and genetics with sequencing and bioinformatics. Also, this conference is the longest continuing meeting, now established as a major regular annual meeting. In addition to its coverage of microbial genomes and biodiversity, the meetings also highlight microbial communities and the use of genomic information to aid in the understanding of pathogens and biothreats. An additional focus cover s“bioenergetics. The meetings have a mix of invited and participant-initiated presentations and poster sessions during which investigators from different disciplines become familiar with available data bases and new tools facilitating coordination of information. The fields are moving very fast both in the acquisition of new knowledge of genome contents and also in the management and analysis of the information. The key is connecting bodies of knowledge on sequences, genetic organization and regulation to be able to relate the significance of this information to understanding cellular processes. To our knowledge, no other meeting synthesizes the biology of organisms, sequence information and database analysis, as well as the comparison with other completed genome sequences.« less
Apollo Operations Handbook Lunar Module (LM 11 and Subsequent) Vol. 2 Operational Procedures

NASA Technical Reports Server (NTRS)

1971-01-01

The Apollo Operations Handbook (AOH) is the primary means of documenting LM descriptions and procedures. The AOH is published in two separately bound volumes. This information is useful in support of program management, engineering, test, flight simulation, and real time flight support efforts. This volume contains crew operational procedures: normal, backup, abort, malfunction, and emergency. These procedures define the sequence of actions necessary for safe and efficient subsystem operation.
Meta-All: a system for managing metabolic pathway information.

PubMed

Weise, Stephan; Grosse, Ivo; Klukas, Christian; Koschützki, Dirk; Scholz, Uwe; Schreiber, Falk; Junker, Björn H

2006-10-23

Many attempts are being made to understand biological subjects at a systems level. A major resource for these approaches are biological databases, storing manifold information about DNA, RNA and protein sequences including their functional and structural motifs, molecular markers, mRNA expression levels, metabolite concentrations, protein-protein interactions, phenotypic traits or taxonomic relationships. The use of these databases is often hampered by the fact that they are designed for special application areas and thus lack universality. Databases on metabolic pathways, which provide an increasingly important foundation for many analyses of biochemical processes at a systems level, are no exception from the rule. Data stored in central databases such as KEGG, BRENDA or SABIO-RK is often limited to read-only access. If experimentalists want to store their own data, possibly still under investigation, there are two possibilities. They can either develop their own information system for managing that own data, which is very time-consuming and costly, or they can try to store their data in existing systems, which is often restricted. Hence, an out-of-the-box information system for managing metabolic pathway data is needed. We have designed META-ALL, an information system that allows the management of metabolic pathways, including reaction kinetics, detailed locations, environmental factors and taxonomic information. Data can be stored together with quality tags and in different parallel versions. META-ALL uses Oracle DBMS and Oracle Application Express. We provide the META-ALL information system for download and use. In this paper, we describe the database structure and give information about the tools for submitting and accessing the data. As a first application of META-ALL, we show how the information contained in a detailed kinetic model can be stored and accessed. META-ALL is a system for managing information about metabolic pathways. It facilitates the handling of pathway-related data and is designed to help biochemists and molecular biologists in their daily research. It is available on the Web at http://bic-gh.de/meta-all and can be downloaded free of charge and installed locally.
Meta-All: a system for managing metabolic pathway information

PubMed Central

Weise, Stephan; Grosse, Ivo; Klukas, Christian; Koschützki, Dirk; Scholz, Uwe; Schreiber, Falk; Junker, Björn H

2006-01-01

Background Many attempts are being made to understand biological subjects at a systems level. A major resource for these approaches are biological databases, storing manifold information about DNA, RNA and protein sequences including their functional and structural motifs, molecular markers, mRNA expression levels, metabolite concentrations, protein-protein interactions, phenotypic traits or taxonomic relationships. The use of these databases is often hampered by the fact that they are designed for special application areas and thus lack universality. Databases on metabolic pathways, which provide an increasingly important foundation for many analyses of biochemical processes at a systems level, are no exception from the rule. Data stored in central databases such as KEGG, BRENDA or SABIO-RK is often limited to read-only access. If experimentalists want to store their own data, possibly still under investigation, there are two possibilities. They can either develop their own information system for managing that own data, which is very time-consuming and costly, or they can try to store their data in existing systems, which is often restricted. Hence, an out-of-the-box information system for managing metabolic pathway data is needed. Results We have designed META-ALL, an information system that allows the management of metabolic pathways, including reaction kinetics, detailed locations, environmental factors and taxonomic information. Data can be stored together with quality tags and in different parallel versions. META-ALL uses Oracle DBMS and Oracle Application Express. We provide the META-ALL information system for download and use. In this paper, we describe the database structure and give information about the tools for submitting and accessing the data. As a first application of META-ALL, we show how the information contained in a detailed kinetic model can be stored and accessed. Conclusion META-ALL is a system for managing information about metabolic pathways. It facilitates the handling of pathway-related data and is designed to help biochemists and molecular biologists in their daily research. It is available on the Web at and can be downloaded free of charge and installed locally. PMID:17059592
DNA Metabarcoding of Amazonian Ichthyoplankton Swarms

PubMed Central

Maggia, M. E.; Vigouroux, Y.; Renno, J. F.; Duponchelle, F.; Desmarais, E.; Nunez, J.; García-Dávila, C.; Carvajal-Vallejos, F. M.; Paradis, E.; Martin, J. F.; Mariac, C.

2017-01-01

Tropical rainforests harbor extraordinary biodiversity. The Amazon basin is thought to hold 30% of all river fish species in the world. Information about the ecology, reproduction, and recruitment of most species is still lacking, thus hampering fisheries management and successful conservation strategies. One of the key understudied issues in the study of population dynamics is recruitment. Fish larval ecology in tropical biomes is still in its infancy owing to identification difficulties. Molecular techniques are very promising tools for the identification of larvae at the species level. However, one of their limits is obtaining individual sequences with large samples of larvae. To facilitate this task, we developed a new method based on the massive parallel sequencing capability of next generation sequencing (NGS) coupled with hybridization capture. We focused on the mitochondrial marker cytochrome oxidase I (COI). The results obtained using the new method were compared with individual larval sequencing. We validated the ability of the method to identify Amazonian catfish larvae at the species level and to estimate the relative abundance of species in batches of larvae. Finally, we applied the method and provided evidence for strong temporal variation in reproductive activity of catfish species in the Ucayalí River in the Peruvian Amazon. This new time and cost effective method enables the acquisition of large datasets, paving the way for a finer understanding of reproductive dynamics and recruitment patterns of tropical fish species, with major implications for fisheries management and conservation. PMID:28095487
Contamination of sequence databases with adaptor sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoshikawa, Takeo; Sanders, A.R.; Detera-Wadleigh, S.D.

Because of the exponential increase in the amount of DNA sequences being added to the public databases on a daily basis, it has become imperative to identify sources of contamination rapidly. Previously, contaminations of sequence databases have been reported to alert the scientific community to the problem. These contaminations can be divided into two categories. The first category comprises host sequences that have been difficult for submitters to manage or control. Examples include anomalous sequences derived from Escherichia coli, which are inserted into the chromosomes (and plasmids) of the bacterial hosts. Insertion sequences are highly mobile and are capable ofmore » transposing themselves into plasmids during cloning manipulation. Another example of the first category is the infection with yeast genomic DNA or with bacterial DNA of some commercially available cDNA libraries from Clontech. The second category of database contamination is due to the inadvertent inclusion of nonhost sequences. This category includes incorporation of cloning-vector sequences and multicloning sites in the database submission. M13-derived artifacts have been common, since M13-based vectors have been widely used for subcloning DNA fragments. Recognizing this problem, the National Center for Biotechnology Information (NCBI) started to screen, in April 1994, all sequences directly submitted to GenBank, against a set of vector data retrieved from GenBank by use of key-word searches, such as {open_quotes}vector.{close_quotes} In this report, we present evidence for another sequence artifact that is widespread but that, to our knowledge, has not yet been reported. 11 refs., 1 tab.« less

Use of continuous transdermal alcohol monitoring during a contingency management procedure to reduce excessive alcohol use.

PubMed

Dougherty, Donald M; Hill-Kapturczak, Nathalie; Liang, Yuanyuan; Karns, Tara E; Cates, Sharon E; Lake, Sarah L; Mullen, Jillian; Roache, John D

2014-09-01

Research on contingency management to treat excessive alcohol use is limited due to feasibility issues with monitoring adherence. This study examined the effectiveness of using transdermal alcohol monitoring as a continuous measure of alcohol use to implement financial contingencies to reduce heavy drinking. Twenty-six male and female drinkers (from 21 to 39 years old) were recruited from the community. Participants were randomly assigned to one of the two treatment sequences. Sequence 1 received 4 weeks of no financial contingency (i.e., $0) drinking followed by 4 weeks each of $25 and then $50 contingency management; Sequence 2 received 4 weeks of $25 contingency management followed by 4 weeks each of no contingency (i.e., $0) and then $50 contingency management. During the $25 and $50 contingency management conditions, participants were paid each week when the Secure Continuous Remote Alcohol Monitor (SCRAM-II™) identified no heavy drinking days. Participants in both contingency management conditions had fewer drinking episodes and reduced frequencies of heavy drinking compared to the $0 condition. Participants randomized to Sequence 2 (receiving $25 contingency before the $0 condition) exhibited less frequent drinking and less heavy drinking in the $0 condition compared to participants from Sequence 1. Transdermal alcohol monitoring can be used to implement contingency management programs to reduce excessive alcohol consumption. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Use of Continuous Transdermal Alcohol Monitoring during a Contingency Management Procedure to Reduce Excessive Alcohol Use

PubMed Central

Dougherty, Donald M.; Hill-Kapturczak, Nathalie; Liang, Yuanyuan; Karns, Tara E.; Cates, Sharon E.; Lake, Sarah L.; Mullen, Jillian; Roache, John D.

2014-01-01

Background Research on contingency management to treat excessive alcohol use is limited due to feasibility issues with monitoring adherence. This study examined the effectiveness of using transdermal alcohol monitoring as a continuous measure of alcohol use to implement financial contingencies to reduce heavy drinking. Methods Twenty-six male and female drinkers (from 21–39 years old) were recruited from the community. Participants were randomly assigned to one of two treatment sequences. Sequence 1 received 4 weeks of no financial contingency (i.e., $0) drinking followed by 4 weeks each of $25 and then $50 contingency management; Sequence 2 received 4 weeks of $25 contingency management followed by 4 weeks each of no contingency (i.e., $0) and then $50 contingency management. During the $25 and $50 contingency management conditions, participants were paid each week when the Secure Continuous Remote Alcohol Monitor (SCRAM-II™) identified no heavy drinking days. Results Participants in both contingency management conditions had fewer drinking episodes and reduced frequencies of heavy drinking compared to the $0 condition. Participants randomized to Sequence 2 (receiving $25 contingency before the $0 condition) exhibited less frequent drinking and less heavy drinking in the $0 condition compared to participants from Sequence 1. Conclusions Transdermal alcohol monitoring can be used to implement contingency management programs to reduce excessive alcohol consumption. PMID:25064019
Practices and Policies of Clinical Exome Sequencing Providers: Analysis and Implications

PubMed Central

Jamal, Seema M.; Yu, Joon-Ho; Chong, Jessica X.; Dent, Karin M.; Conta, Jessie H.; Tabor, Holly K.; Bamshad, Michael J.

2013-01-01

Exome and whole genome sequencing (ES/WGS) offer potential advantages over traditional approaches to diagnostic genetic testing. Consequently, use of ES/WGS in clinical settings is rapidly becoming commonplace. Yet there are myriad moral, ethical, and perhaps legal implications attached to the use of ES and health care professionals and institutions will need to consider these implications in the context of the varied practices and policies of ES service providers. We developed “core elements” of content and procedures for informed consent, data sharing, and results management and a quantitative scale to assess the extent to which research protocols met the standards established by these core elements. We then used these tools to evaluate the practices and policies of each of the 6 U.S. CLIA-certified labs offering clinical ES. Approaches toward informed consent, data sharing, and results return vary widely among ES providers as do the overall potential merits and disadvantages of each, and more importantly, the balance between the two. PMID:23610049
The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

PubMed Central

Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

2015-01-01

The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402
Leaf LIMS: A Flexible Laboratory Information Management System with a Synthetic Biology Focus.

PubMed

Craig, Thomas; Holland, Richard; D'Amore, Rosalinda; Johnson, James R; McCue, Hannah V; West, Anthony; Zulkower, Valentin; Tekotte, Hille; Cai, Yizhi; Swan, Daniel; Davey, Robert P; Hertz-Fowler, Christiane; Hall, Anthony; Caddick, Mark

2017-12-15

This paper presents Leaf LIMS, a flexible laboratory information management system (LIMS) designed to address the complexity of synthetic biology workflows. At the project's inception there was a lack of a LIMS designed specifically to address synthetic biology processes, with most systems focused on either next generation sequencing or biobanks and clinical sample handling. Leaf LIMS implements integrated project, item, and laboratory stock tracking, offering complete sample and construct genealogy, materials and lot tracking, and modular assay data capture. Hence, it enables highly configurable task-based workflows and supports data capture from project inception to completion. As such, in addition to it supporting synthetic biology it is ideal for many laboratory environments with multiple projects and users. The system is deployed as a web application through Docker and is provided under a permissive MIT license. It is freely available for download at https://leaflims.github.io .
FERN Ethnomedicinal Plant Database: Exploring Fern Ethnomedicinal Plants Knowledge for Computational Drug Discovery.

PubMed

Thakar, Sambhaji B; Ghorpade, Pradnya N; Kale, Manisha V; Sonawane, Kailas D

2015-01-01

Fern plants are known for their ethnomedicinal applications. Huge amount of fern medicinal plants information is scattered in the form of text. Hence, database development would be an appropriate endeavor to cope with the situation. So by looking at the importance of medicinally useful fern plants, we developed a web based database which contains information about several group of ferns, their medicinal uses, chemical constituents as well as protein/enzyme sequences isolated from different fern plants. Fern ethnomedicinal plant database is an all-embracing, content management web-based database system, used to retrieve collection of factual knowledge related to the ethnomedicinal fern species. Most of the protein/enzyme sequences have been extracted from NCBI Protein sequence database. The fern species, family name, identification, taxonomy ID from NCBI, geographical occurrence, trial for, plant parts used, ethnomedicinal importance, morphological characteristics, collected from various scientific literatures and journals available in the text form. NCBI's BLAST, InterPro, phylogeny, Clustal W web source has also been provided for the future comparative studies. So users can get information related to fern plants and their medicinal applications at one place. This Fern ethnomedicinal plant database includes information of 100 fern medicinal species. This web based database would be an advantageous to derive information specifically for computational drug discovery, botanists or botanical interested persons, pharmacologists, researchers, biochemists, plant biotechnologists, ayurvedic practitioners, doctors/pharmacists, traditional medicinal users, farmers, agricultural students and teachers from universities as well as colleges and finally fern plant lovers. This effort would be useful to provide essential knowledge for the users about the adventitious applications for drug discovery, applications, conservation of fern species around the world and finally to create social awareness.
Risk stratification for therapeutic management and prognosis.

PubMed

Coelho-Filho, Otavio R; Nallamshetty, Leelakrishna; Kwong, Raymond Y

2009-07-01

In coronary artery disease (CAD), cardiac magnetic resonance (CMR) imaging can integrate several types of pulse-sequence examinations (eg, myocardial perfusion, cine wall motion, T2-weighted imaging for myocardial edema, late gadolinium enhancement, and CMR angiography) that can provide anatomic, functional, and physiologic information about the heart in a single imaging session. Because of this ability to interrogate myocardial physiology using different pulse sequence techniques within a single CMR session, this technique has been recognized increasingly in many centers as the test of choice for assessing patients who present with cardiomyopathy of undetermined cause. This article first reviews the current evidence supporting the prognosticating role of CMR in assessing CAD and then discusses CMR applications and prognostication in many non-coronary cardiac conditions.
Draft genome sequence of Escherichia coli ST977: A clinical multidrug-resistant strain harbouring blaNDM-3 isolated from a bloodstream infection.

PubMed

Li, Xi; Sun, Long; Zhu, Yongze; Shen, Mengyuan; Tu, Yuexing

2018-04-14

The emergence of carbapenem-resistant Escherichia coli has become a serious challenge to manage in the clinic because of multidrug resistance. Here we report the draft genome sequence of NDM-3-producing E. coli strain NT1 isolated from a bloodstream infection in China. Whole genomic DNA of E. coli strain NT1 was extracted and was sequenced using an Illumina HiSeq™ X Ten platform. The generated sequence reads were assembled using CLC Genomics Workbench. The draft genome was annotated using Rapid Annotation using Subsystem Technology (RAST). Bioinformatics analysis was further performed. The genome size was calculated at 5,353 620bp, with 5297 protein-coding sequences and the presence of genes conferring resistance to aminoglycosides, β-lactams, quinolones, macrolides, phenicols, sulphonamides, tetracycline and trimethoprim. In addition, genes encoding virulence factors were also identified. To our knowledge, this is the first report of an E. coli strain producing NDM-3 isolated from a human bloodstream infection. The genome sequence will provide valuable information to understand antibiotic resistance mechanisms and pathogenic mechanisms in this strain. Close surveillance is urgently needed to monitor the spread of NDM-3-producing isolates. Copyright © 2018 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information.

PubMed

Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

2016-01-01

3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.
The design and implementation of EPL: An event pattern language for active databases

NASA Technical Reports Server (NTRS)

Giuffrida, G.; Zaniolo, C.

1994-01-01

The growing demand for intelligent information systems requires closer coupling of rule-based reasoning engines, such as CLIPS, with advanced data base management systems (DBMS). For instance, several commercial DBMS now support the notion of triggers that monitor events and transactions occurring in the database and fire induced actions, which perform a variety of critical functions, including safeguarding the integrity of data, monitoring access, and recording volatile information needed by administrators, analysts, and expert systems to perform assorted tasks; examples of these tasks include security enforcement, market studies, knowledge discovery, and link analysis. At UCLA, we designed and implemented the event pattern language (EPL) which is capable of detecting and acting upon complex patterns of events which are temporally related to each other. For instance, a plant manager should be notified when a certain pattern of overheating repeats itself over time in a chemical process; likewise, proper notification is required when a suspicious sequence of bank transactions is executed within a certain time limit. The EPL prototype is built in CLIPS to operate on top of Sybase, a commercial relational DBMS, where actions can be triggered by events such as simple database updates, insertions, and deletions. The rule-based syntax of EPL allows the sequences of goals in rules to be interpreted as sequences of temporal events; each goal can correspond to either (1) a simple event, or (2) a (possibly negated) event/condition predicate, or (3) a complex event defined as the disjunction and repetition of other events. Various extensions have been added to CLIPS in order to tailor the interface with Sybase and its open client/server architecture.
A structured interface to the object-oriented genomics unified schema for XML-formatted data.

PubMed

Clark, Terry; Jurek, Josef; Kettler, Gregory; Preuss, Daphe

2005-01-01

Data management systems are fast becoming required components in many biology laboratories as the role of computer-based information grows. Although the need for data management systems is on the rise, their inherent complexities can deter the full and routine use of their computational capabilities. The significant undertaking to implement a capable production system can be reduced in part by adapting an established data management system. In such a way, we are leveraging the Genomics Unified Schema (GUS) developed at the Computational Biology and Informatics Laboratory at the University of Pennsylvania as a foundation for managing and analysing DNA sequence data in centromere research projects around Arabidopsis thaliana and related species. Because GUS provides a core schema that includes support for genome sequences, mRNA and its expression, and annotated chromosomes, it is ideal for synthesising a variety of parameters to analyse these repetitive and highly dynamic portions of the genome. Despite this, production-strength data management frameworks are complex, requiring dedicated efforts to adapt and maintain. The work reported in this article addresses one component of such an effort, namely the pivotal task of marshalling data from various sources into GUS. In order to harness GUS for our project, and motivated by efficiency needs, we developed a structured framework for transferring data into GUS from outside sources. This technology is embodied in a GUS object-layer processor, XMLGUS. XMLGUS facilitates incorporating data into GUS by (i) formulating an XML interface that includes relational database key constraint definitions, (ii) regularising traversal through that XML, (iii) realising automatic processing of the XML with database key constraints and (iv) allowing for special processing of input data within the framework for automated processing. The application of XMLGUS to production pipeline processing for a sequencing project and inputting the Arabidopsis genome into GUS is discussed. XMLGUS is available from the Flora website (http://flora.ittc.ku.edu/).
Phylogenetic Placement of Exact Amplicon Sequences Improves Associations with Clinical Information

PubMed Central

McDonald, Daniel; Gonzalez, Antonio; Navas-Molina, Jose A.; Jiang, Lingjing; Xu, Zhenjiang Zech; Winker, Kevin; Kado, Deborah M.; Orwoll, Eric; Manary, Mark; Mirarab, Siavash

2018-01-01

ABSTRACT Recent algorithmic advances in amplicon-based microbiome studies enable the inference of exact amplicon sequence fragments. These new methods enable the investigation of sub-operational taxonomic units (sOTU) by removing erroneous sequences. However, short (e.g., 150-nucleotide [nt]) DNA sequence fragments do not contain sufficient phylogenetic signal to reproduce a reasonable tree, introducing a barrier in the utilization of critical phylogenetically aware metrics such as Faith’s PD or UniFrac. Although fragment insertion methods do exist, those methods have not been tested for sOTUs from high-throughput amplicon studies in insertions against a broad reference phylogeny. We benchmarked the SATé-enabled phylogenetic placement (SEPP) technique explicitly against 16S V4 sequence fragments and showed that it outperforms the conceptually problematic but often-used practice of reconstructing de novo phylogenies. In addition, we provide a BSD-licensed QIIME2 plugin (https://github.com/biocore/q2-fragment-insertion) for SEPP and integration into the microbial study management platform QIITA. IMPORTANCE The move from OTU-based to sOTU-based analysis, while providing additional resolution, also introduces computational challenges. We demonstrate that one popular method of dealing with sOTUs (building a de novo tree from the short sequences) can provide incorrect results in human gut metagenomic studies and show that phylogenetic placement of the new sequences with SEPP resolves this problem while also yielding other benefits over existing methods. PMID:29719869
A de novo transcriptome and valid reference genes for quantitative real-time PCR in Colaphellus bowringi.

PubMed

Tan, Qian-Qian; Zhu, Li; Li, Yi; Liu, Wen; Ma, Wei-Hua; Lei, Chao-Liang; Wang, Xiao-Ping

2015-01-01

The cabbage beetle Colaphellus bowringi Baly is a serious insect pest of crucifers and undergoes reproductive diapause in soil. An understanding of the molecular mechanisms of diapause regulation, insecticide resistance, and other physiological processes is helpful for developing new management strategies for this beetle. However, the lack of genomic information and valid reference genes limits knowledge on the molecular bases of these physiological processes in this species. Using Illumina sequencing, we obtained more than 57 million sequence reads derived from C. bowringi, which were assembled into 39,390 unique sequences. A Clusters of Orthologous Groups classification was obtained for 9,048 of these sequences, covering 25 categories, and 16,951 were assigned to 255 Kyoto Encyclopedia of Genes and Genomes pathways. Eleven candidate reference gene sequences from the transcriptome were then identified through reverse transcriptase polymerase chain reaction. Among these candidate genes, EF1α, ACT1, and RPL19 proved to be the most stable reference genes for different reverse transcriptase quantitative polymerase chain reaction experiments in C. bowringi. Conversely, aTUB and GAPDH were the least stable reference genes. The abundant putative C. bowringi transcript sequences reported enrich the genomic resources of this beetle. Importantly, the larger number of gene sequences and valid reference genes provide a valuable platform for future gene expression studies, especially with regard to exploring the molecular mechanisms of different physiological processes in this species.
Population diversity of Diaphorina citri (Hemiptera: Liviidae) in China based on whole mitochondrial genome sequences.

PubMed

Wu, Fengnian; Jiang, Hongyan; Beattie, G Andrew C; Holford, Paul; Chen, Jianchi; Wallis, Christopher M; Zheng, Zheng; Deng, Xiaoling; Cen, Yijing

2018-04-24

Diaphorina citri (Asian citrus psyllid; ACP) transmits 'Candidatus Liberibacter asiaticus' associated with citrus Huanglongbing (HLB). ACP has been reported in 11 provinces/regions in China, yet its population diversity remains unclear. In this study, we evaluated ACP population diversity in China using representative whole mitochondrial genome (mitogenome) sequences. Additional mitogenome sequences outside China were also acquired and evaluated. The sizes of the 27 ACP mitogenome sequences ranged from 14 986 to 15 030 bp. Along with three previously published mitogenome sequences, the 30 sequences formed three major mitochondrial groups (MGs): MG1, present in southwestern China and occurring at elevations above 1000 m; MG2, present in southeastern China and Southeast Asia (Cambodia, Indonesia, Malaysia, and Vietnam) and occurring at elevations below 180 m; and MG3, present in the USA and Pakistan. Single nucleotide polymorphisms in five genes (cox2, atp8, nad3, nad1 and rrnL) contributed mostly in the ACP diversity. Among these genes, rrnL had the most variation. Mitogenome sequences analyses revealed two major phylogenetic groups of ACP present in China as well as a possible unique group present currently in Pakistan and the USA. The information could have significant implications for current ACP control and HLB management. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.
Effects of informed consent for individual genome sequencing on relevant knowledge.

PubMed

Kaphingst, K A; Facio, F M; Cheng, M-R; Brooks, S; Eidem, H; Linn, A; Biesecker, B B; Biesecker, L G

2012-11-01

Increasing availability of individual genomic information suggests that patients will need knowledge about genome sequencing to make informed decisions, but prior research is limited. In this study, we examined genome sequencing knowledge before and after informed consent among 311 participants enrolled in the ClinSeq™ sequencing study. An exploratory factor analysis of knowledge items yielded two factors (sequencing limitations knowledge; sequencing benefits knowledge). In multivariable analysis, high pre-consent sequencing limitations knowledge scores were significantly related to education [odds ratio (OR): 8.7, 95% confidence interval (CI): 2.45-31.10 for post-graduate education, and OR: 3.9; 95% CI: 1.05, 14.61 for college degree compared with less than college degree] and race/ethnicity (OR: 2.4, 95% CI: 1.09, 5.38 for non-Hispanic Whites compared with other racial/ethnic groups). Mean values increased significantly between pre- and post-consent for the sequencing limitations knowledge subscale (6.9-7.7, p < 0.0001) and sequencing benefits knowledge subscale (7.0-7.5, p < 0.0001); increase in knowledge did not differ by sociodemographic characteristics. This study highlights gaps in genome sequencing knowledge and underscores the need to target educational efforts toward participants with less education or from minority racial/ethnic groups. The informed consent process improved genome sequencing knowledge. Future studies could examine how genome sequencing knowledge influences informed decision making. © 2012 John Wiley & Sons A/S.
DIALOG: An executive computer program for linking independent programs

NASA Technical Reports Server (NTRS)

Glatt, C. R.; Hague, D. S.; Watson, D. A.

1973-01-01

A very large scale computer programming procedure called the DIALOG executive system was developed for the CDC 6000 series computers. The executive computer program, DIALOG, controls the sequence of execution and data management function for a library of independent computer programs. Communication of common information is accomplished by DIALOG through a dynamically constructed and maintained data base of common information. Each computer program maintains its individual identity and is unaware of its contribution to the large scale program. This feature makes any computer program a candidate for use with the DIALOG executive system. The installation and uses of the DIALOG executive system are described.
Technical Uncertainty and Project Complexity as Correlates of Information Use by U.S. Industry-Affiliated Aerospace Engineers and Scientists: Results of an Exploratory Investigation

DTIC Science & Technology

1993-09-01

Development 223 .4830* Manufacturing/Production 100 .4235* Management 105 .4091 * r values are statistically significant at p s 0.05. ** Overall mean complexity...by sequencing these items (e.g., #1,#2,#3,#4, and #5). They were instructed to place an "X" beside the step( s ) (i.e., information source) they did not...Worker( s ) Inside the Organization 26.9 45.3 11.5 5.6 0.6 10.1 Spoke With Colleagues Outside of the Organization 5.4 15.5 32.0 13.1 6.2 27.9 Used
Expanding the Delivery of Rapid Earthquake Information and Warnings for Response and Recovery

NASA Astrophysics Data System (ADS)

Blanpied, M. L.; McBride, S.; Hardebeck, J.; Michael, A. J.; van der Elst, N.

2017-12-01

Scientific organizations like the United States Geological Survey (USGS) release information to support effective responses during an earthquake crisis. Information is delivered to the White House, the National Command Center, the Departments of Defense, Homeland Security (including FEMA), Transportation, Energy, and Interior. Other crucial stakeholders include state officials and decision makers, emergency responders, numerous public and private infrastructure management centers (e.g., highways, railroads and pipelines), the media, and the public. To meet the diverse information requirements of these users, rapid earthquake notifications have been developed to be delivered by e-mail and text message, as well as a suite of earthquake information resources such as ShakeMaps, Did You Feel It?, PAGER impact estimates, and data are delivered via the web. The ShakeAlert earthquake early warning system being developed for the U.S. West Coast will identify and characterize an earthquake a few seconds after it begins, estimate the likely intensity of ground shaking, and deliver brief but critically important warnings to people and infrastructure in harm's way. Currently the USGS is also developing a capability to deliver Operational Earthquake Forecasts (OEF). These provide estimates of potential seismic behavior after large earthquakes and during evolving aftershock sequences. Similar work is underway in New Zealand, Japan, and Italy. In the development of OEF forecasts, social science research conducted during these sequences indicates that aftershock forecasts are valued for a variety of reasons, from informing critical response and recovery decisions to psychologically preparing for more earthquakes. New tools will allow users to customize map-based, spatiotemporal forecasts to their specific needs. Hazard curves and other advanced information will also be available. For such authoritative information to be understood and used during the pressures of an earthquake response, it must reach users in an effective form. These new products are being developed and honed using best-practices developed through communication research, experience with forecasts in the U.S., Nepal, and New Zealand, and in consultation with emergency managers, government agencies, businesses, and social scientists.
Scoping Study Investigating PWR Instrumentation during a Severe Accident Scenario

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rempe, J. L.; Knudson, D. L.; Lutz, R. J.

The accidents at the Three Mile Island Unit 2 (TMI-2) and Fukushima Daiichi Units 1, 2, and 3 nuclear power plants demonstrate the critical importance of accurate, relevant, and timely information on the status of reactor systems during a severe accident. These events also highlight the critical importance of understanding and focusing on the key elements of system status information in an environment where operators may be overwhelmed with superfluous and sometimes conflicting data. While progress in these areas has been made since TMI-2, the events at Fukushima suggests that there may still be a potential need to ensure thatmore » critical plant information is available to plant operators. Recognizing the significant technical and economic challenges associated with plant modifications, it is important to focus on instrumentation that can address these information critical needs. As part of a program initiated by the Department of Energy, Office of Nuclear Energy (DOE-NE), a scoping effort was initiated to assess critical information needs identified for severe accident management and mitigation in commercial Light Water Reactors (LWRs), to quantify the environment instruments monitoring this data would have to survive, and to identify gaps where predicted environments exceed instrumentation qualification envelop (QE) limits. Results from the Pressurized Water Reactor (PWR) scoping evaluations are documented in this report. The PWR evaluations were limited in this scoping evaluation to quantifying the environmental conditions for an unmitigated Short-Term Station BlackOut (STSBO) sequence in one unit at the Surry nuclear power station. Results were obtained using the MELCOR models developed for the US Nuclear Regulatory Commission (NRC)-sponsored State of the Art Consequence Assessment (SOARCA) program project. Results from this scoping evaluation indicate that some instrumentation identified to provide critical information would be exposed to conditions that significantly exceeded QE limits for extended time periods for the low frequency STSBO sequence evaluated in this study. It is recognized that the core damage frequency (CDF) of the sequence evaluated in this scoping effort would be considerably lower if evaluations considered new FLEX equipment being installed by industry. Nevertheless, because of uncertainties in instrumentation response when exposed to conditions beyond QE limits and alternate challenges associated with different sequences that may impact sensor performance, it is recommended that additional evaluations of instrumentation performance be completed to provide confidence that operators have access to accurate, relevant, and timely information on the status of reactor systems for a broad range of challenges associated with risk important severe accident sequences.« less
Comprehensive Molecular Profiling of African-American Prostate Cancer to Inform on Prognosis and Disease Biology

DTIC Science & Technology

2016-10-01

prostate cancer through sequencing xenografts and tissue samples. Qualify novel drivers of AR- prostate cancer through in vitro models. Develop novel...ability of RNASEH2A to modulate radio-sensitivity in prostate cancer cell lines and xenograft models. 3: Investigate RNASEH2A as a marker of radio...lung cancer clinical management. List of the Specific Aims: Aim 1: To establish patient-derived xenografts (PDX) models of pre-neoplastic lesions

Record of Decision for the First Active Duty F-35A Operational Base

DTIC Science & Technology

2013-12-02

trucks or sprinkler systems to keep all areas of vehicle movement damp enough to prevent dust from leaving the construction area. - Temporary wind...synergy between the operational and logistics communities in managing a new, highly complex weapon system . ACC’s existing F-16 squadrons at Hill AFB...Share information with local fire departments on F-35A crash response procedures. Soils and Water • Sequence construction activities to limit the soil
FBIS: A regional DNA barcode archival & analysis system for Indian fishes

PubMed Central

Nagpure, Naresh Sahebrao; Rashid, Iliyas; Pathak, Ajey Kumar; Singh, Mahender; Singh, Shri Prakash; Sarkar, Uttam Kumar

2012-01-01

DNA barcode is a new tool for taxon recognition and classification of biological organisms based on sequence of a fragment of mitochondrial gene, cytochrome c oxidase I (COI). In view of the growing importance of the fish DNA barcoding for species identification, molecular taxonomy and fish diversity conservation, we developed a Fish Barcode Information System (FBIS) for Indian fishes, which will serve as a regional DNA barcode archival and analysis system. The database presently contains 2334 sequence records of COI gene for 472 aquatic species belonging to 39 orders and 136 families, collected from available published data sources. Additionally, it contains information on phenotype, distribution and IUCN Red List status of fishes. The web version of FBIS was designed using MySQL, Perl and PHP under Linux operating platform to (a) store and manage the acquisition (b) analyze and explore DNA barcode records (c) identify species and estimate genetic divergence. FBIS has also been integrated with appropriate tools for retrieving and viewing information about the database statistics and taxonomy. It is expected that FBIS would be useful as a potent information system in fish molecular taxonomy, phylogeny and genomics. Availability The database is available for free at http://mail.nbfgr.res.in/fbis/ PMID:22715304
Person-centred pain management for the patient with acute abdominal pain: an ethnography informed by the Fundamentals of Care framework.

PubMed

Avallin, Therese; Muntlin Athlin, Åsa; Sorensen, Erik Elgaard; Kitson, Alison; Björck, Martin; Jangland, Eva

2018-06-12

To explore and describe the impact of the organizational culture on and the patient-practitioner patterns of actions that contribute to or detract from successful pain management for the patient with acute abdominal pain across the acute care pathway. Although pain management is a recognised human right, unmanaged pain continues to cause suffering and prolong hospital care. Unanswered questions about how to successfully manage pain relate to both organizational culture and individual practitioners' performance. Focused ethnography, applying the Developmental Research Sequence and the Fundamentals of Care framework. Participant observation and informal interviews (92 hours) were performed at one emergency department and two surgical wards at a University Hospital during April - November 2015. Data includes 261 interactions between patients, aged ≥18 years seeking care for acute abdominal pain at the emergency department and admitted to a surgical ward (N = 31; aged 20-90 years; 14 men, 17 women; 9 with communicative disabilities) and healthcare practitioners (N =198). The observations revealed an organizational culture with considerable impact on how well pain was managed. Well managed pain presupposed the patient and practitioners to connect in a holistic pain management including a trustful relationship, communication to share knowledge and individualized analgesics. Person-centred pain management requires an organization where patients and practitioners share their knowledge of pain and pain management as true partners. Leaders and practitioners should make small behavioural changes to enable the crucial positive experience of pain management. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Sequencing of intraductal biopsies is feasible and potentially impacts clinical management of patients with indeterminate biliary stricture and cholangiocarcinoma.

PubMed

Bankov, Katrin; Döring, Claudia; Schneider, Markus; Hartmann, Sylvia; Winkelmann, Ria; Albert, Joerg G; Bechstein, Wolf Otto; Zeuzem, Stefan; Hansmann, Martin Leo; Peveling-Oberhag, Jan; Walter, Dirk

2018-04-30

Definite diagnosis and therapeutic management of cholangiocarcinoma (CCA) remains a challenge. The aim of the current study was to investigate feasibility and potential impact on clinical management of targeted sequencing of intraductal biopsies. Intraductal biopsies with suspicious findings from 16 patients with CCA in later clinical course were analyzed with targeted sequencing including tumor and control benign tissue (n = 55 samples). A CCA-specific sequencing panel containing 41 genes was designed and a dual strand targeted enrichment was applied. Sequencing was successfully performed for all samples. In total, 79 mutations were identified and a mean of 1.7 mutations per tumor sample (range 0-4) as well as 2.3 per biopsy (0-6) were detected and potentially therapeutically relevant genes were identified in 6/16 cases. In 14/18 (78%) biopsies with dysplasia or inconclusive findings at least one mutation was detected. The majority of mutations were found in both surgical specimen and biopsy (68%), while 28% were only present in biopsies in contrast to 4% being only present in the surgical tumor specimen. Targeted sequencing from intraductal biopsies is feasible and potentially improves the diagnostic yield. A profound genetic heterogeneity in biliary dysplasia needs to be considered in clinical management and warrants further investigation. The current study is the first to demonstrate the feasibility of sequencing of intraductal biopsies which holds the potential to impact diagnostic and therapeutical management of patients with biliary dysplasia and neoplasia.
Discovery of SNPs for individual identification by reduced representation sequencing of moose (Alces alces).

PubMed

Blåhed, Ida-Maria; Königsson, Helena; Ericsson, Göran; Spong, Göran

2018-01-01

Monitoring of wild animal populations is challenging, yet reliable information about population processes is important for both management and conservation efforts. Access to molecular markers, such as SNPs, enables population monitoring through genotyping of various DNA sources. We have developed 96 high quality SNP markers for individual identification of moose (Alces alces), an economically and ecologically important top-herbivore in boreal regions. Reduced representation libraries constructed from 34 moose were high-throughput de novo sequenced, generating nearly 50 million read pairs. About 50 000 stacks of aligned reads containing one or more SNPs were discovered with the Stacks pipeline. Several quality criteria were applied on the candidate SNPs to find markers informative on the individual level and well representative for the population. An empirical validation by genotyping of sequenced individuals and additional moose, resulted in the selection of a final panel of 86 high quality autosomal SNPs. Additionally, five sex-specific SNPs and five SNPs for sympatric species diagnostics are included in the panel. The genotyping error rate was 0.002 for the total panel and probability of identities were low enough to separate individuals with high confidence. Moreover, the autosomal SNPs were highly informative also for population level analyses. The potential applications of this SNP panel are thus many including investigations of population size, sex ratios, relatedness, reproductive success and population structure. Ideally, SNP-based studies could improve today's population monitoring and increase our knowledge about moose population dynamics.
Lynch Syndrome Patients' Views of and Preferences for Return of Results Following Whole Exome Sequencing

PubMed Central

Joseph, Galen; Guiltinan, Jenna; Kianmahd, Jessica; Youngblom, Janey; Blanco, Amie

2015-01-01

Whole exome sequencing (WES) uses next generation sequencing technology to provide information on nearly all functional, protein-coding regions in an individual's genome. Due to the vast amount of information and incidental findings that can be generated from this technology, patient preferences must be investigated to help clinicians consent and return results to patients. Patients (n=19) who were previously clinically diagnosed with Lynch syndrome, but received uninformative negative Lynch syndrome genetic results through traditional molecular testing methods participated in semi-structured interviews after WES testing but before return of results to explore their views of WES and preferences for return of results. Analyses of interview results found that nearly all participants believed that the benefits of receiving all possible results generated from WES outweighed the undesirable effects. The majority of participants conveyed that relative to coping with a cancer diagnosis, information generated from WES would be manageable. Importantly, participants' experience with Lynch syndrome influenced their notions of genetic determinism, tolerance for uncertain results, and family communication plans. Participants would prefer to receive WES results in person from a genetic counselor or medical geneticist so that an expert could help explain the meaning and implications of the potentially large quantity and range of complicated results. These results underscore the need to study various populations with regard to the clinical use of WES in order to effectively and empathetically communicate the possible implications of this new technology and return results. PMID:24449059
Transcriptomic insights on the ABC transporter gene family in the salmon louse Caligus rogercresseyi.

PubMed

Valenzuela-Muñoz, Valentina; Sturm, Armin; Gallardo-Escárate, Cristian

2015-04-09

ATP-binding cassette (ABC) protein family encode for membrane proteins involved in the transport of various biomolecules through the cellular membrane. These proteins have been identified in all taxa and present important physiological functions, including the process of insecticide detoxification in arthropods. For that reason the ectoparasite Caligus rogercresseyi represents a model species for understanding the molecular underpinnings involved in insecticide drug resistance. llumina sequencing was performed using sea lice exposed to 2 and 3 ppb of deltamethrin and azamethiphos. Contigs obtained from de novo assembly were annotated by Blastx. RNA-Seq analysis was performed and validated by qPCR analysis. From the transcriptome database of C. rogercresseyi, 57 putative members of ABC protein sequences were identified and phylogenetically classified into the eight subfamilies described for ABC transporters in arthropods. Transcriptomic profiles for ABC proteins subfamilies were evaluated throughout C. rogercresseyi development. Moreover, RNA-Seq analysis was performed for adult male and female salmon lice exposed to the delousing drugs azamethiphos and deltamethrin. High transcript levels of the ABCB and ABCC subfamilies were evidenced. Furthermore, SNPs mining was carried out for the ABC proteins sequences, revealing pivotal genomic information. The present study gives a comprehensive transcriptome analysis of ABC proteins from C. rogercresseyi, providing relevant information about transporter roles during ontogeny and in relation to delousing drug responses in salmon lice. This genomic information represents a valuable tool for pest management in the Chilean salmon aquaculture industry.
ReprDB and panDB: minimalist databases with maximal microbial representation.

PubMed

Zhou, Wei; Gay, Nicole; Oh, Julia

2018-01-18

Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses.
Haemophilus influenzae Genome Database (HIGDB): a single point web resource for Haemophilus influenzae.

PubMed

Swetha, Rayapadi G; Kala Sekar, Dinesh Kumar; Ramaiah, Sudha; Anbarasu, Anand; Sekar, Kanagaraj

2014-12-01

Haemophilus influenzae (H. Influenzae) is the causative agent of pneumonia, bacteraemia and meningitis. The organism is responsible for large number of deaths in both developed and developing countries. Even-though the first bacterial genome to be sequenced was that of H. Influenzae, there is no exclusive database dedicated for H. Influenzae. This prompted us to develop the Haemophilus influenzae Genome Database (HIGDB). All data of HIGDB are stored and managed in MySQL database. The HIGDB is hosted on Solaris server and developed using PERL modules. Ajax and JavaScript are used for the interface development. The HIGDB contains detailed information on 42,741 proteins, 18,077 genes including 10 whole genome sequences and also 284 three dimensional structures of proteins of H. influenzae. In addition, the database provides "Motif search" and "GBrowse". The HIGDB is freely accessible through the URL: http://bioserver1.physics.iisc.ernet.in/HIGDB/. The HIGDB will be a single point access for bacteriological, clinical, genomic and proteomic information of H. influenzae. The database can also be used to identify DNA motifs within H. influenzae genomes and to compare gene or protein sequences of a particular strain with other strains of H. influenzae. Copyright © 2014 Elsevier Ltd. All rights reserved.
First Transcriptome and Digital Gene Expression Analysis in Neuroptera with an Emphasis on Chemoreception Genes in Chrysopa pallens (Rambur).

PubMed

Li, Zhao-Qun; Zhang, Shuai; Ma, Yan; Luo, Jun-Yu; Wang, Chun-Yi; Lv, Li-Min; Dong, Shuang-Lin; Cui, Jin-Jie

2013-01-01

Chrysopa pallens (Rambur) are the most important natural enemies and predators of various agricultural pests. Understanding the sophisticated olfactory system in insect antennae is crucial for studying the physiological bases of olfaction and also could lead to effective applications of C. pallens in integrated pest management. However no transcriptome information is available for Neuroptera, and sequence data for C. pallens are scarce, so obtaining more sequence data is a priority for researchers on this species. To facilitate identifying sets of genes involved in olfaction, a normalized transcriptome of C. pallens was sequenced. A total of 104,603 contigs were obtained and assembled into 10,662 clusters and 39,734 singletons; 20,524 were annotated based on BLASTX analyses. A large number of candidate chemosensory genes were identified, including 14 odorant-binding proteins (OBPs), 22 chemosensory proteins (CSPs), 16 ionotropic receptors, 14 odorant receptors, and genes potentially involved in olfactory modulation. To better understand the OBPs, CSPs and cytochrome P450s, phylogenetic trees were constructed. In addition, 10 digital gene expression libraries of different tissues were constructed and gene expression profiles were compared among different tissues in males and females. Our results provide a basis for exploring the mechanisms of chemoreception in C. pallens, as well as other insects. The evolutionary analyses in our study provide new insights into the differentiation and evolution of insect OBPs and CSPs. Our study provided large-scale sequence information for further studies in C. pallens.
Airway and Feeding Outcomes of Mandibular Distraction, Tongue-Lip Adhesion, and Conservative Management in Pierre Robin Sequence: A Prospective Study.

PubMed

Khansa, Ibrahim; Hall, Courtney; Madhoun, Lauren L; Splaingard, Mark; Baylis, Adriane; Kirschner, Richard E; Pearson, Gregory D

2017-04-01

Pierre Robin sequence is characterized by mandibular retrognathia and glossoptosis resulting in airway obstruction and feeding difficulties. When conservative management fails, mandibular distraction osteogenesis or tongue-lip adhesion may be required to avoid tracheostomy. The authors' goal was to prospectively evaluate the airway and feeding outcomes of their comprehensive approach to Pierre Robin sequence, which includes conservative management, mandibular distraction osteogenesis, and tongue-lip adhesion. A longitudinal study of newborns with Pierre Robin sequence treated at a pediatric academic medical center between 2010 and 2015 was performed. Baseline feeding and respiratory data were collected. Patients underwent conservative management if they demonstrated sustainable weight gain without tube feeds, and if their airway was stable with positioning alone. Patients who required surgery underwent tongue-lip adhesion or mandibular distraction osteogenesis based on family and surgeon preference. Postoperative airway and feeding data were collected. Twenty-eight patients with Pierre Robin sequence were followed prospectively. Thirty-two percent had a syndrome. Ten underwent mandibular distraction osteogenesis, eight underwent tongue-lip adhesion, and 10 were treated conservatively. There were no differences in days to extubation or discharge, change in weight percentile, requirement for gastrostomy tube, or residual obstructive sleep apnea between the three groups. No patients required tracheostomy. The greatest reduction in apnea-hypopnea index occurred with mandibular distraction osteogenesis, followed by tongue-lip adhesion and conservative management. Careful selection of which patients with Pierre Robin sequence need surgery, and of the most appropriate surgical procedure for each patient, can minimize the need for postprocedure tracheostomy. A comprehensive approach to Pierre Robin sequence that includes conservative management, mandibular distraction osteogenesis, and tongue-lip adhesion can result in excellent airway and feeding outcomes. Therapeutic, II.
Information science for the future: an innovative nursing informatics curriculum.

PubMed

Travis, L; Flatley Brennan, P

1998-04-01

Health care is increasingly driven by information, and consequently, patient care will demand effective management of information. The report of the Priority Expert Panel E: Nursing Informatics and Enhancing Clinical Care Through Nursing Informatics challenges faculty to produce baccalaureate graduates who use information technologies to improve the patient care process and change health care. The challenge is to construct an evolving nursing informatics curriculum to provide nursing professionals with the foundation for affecting health care delivery. This article discusses the design, implementation, and evaluation of an innovative nursing informatics curriculum incorporated into a baccalaureate nursing program. The basic components of the curriculum framework are information, technology, and clinical care process. The presented integrated curriculum is effective in familiarizing students with informatics and encouraging them to think critically about using informatics in practice. The two groups of students who completed the four-course sequence will be discussed.
Leading the Game, Losing the Competition: Identifying Leaders and Followers in a Repeated Game

PubMed Central

Seip, Knut Lehre; Grøn, Øyvind

2016-01-01

We explore a new method for identifying leaders and followers, LF, in repeated games by analyzing an experimental, repeated (50 rounds) game where Row player shifts the payoff between small and large values–a type of “investor” and Column player determines who gets the payoff–a type of “manager”. We found that i) the Investor (Row) most often is a leading player and the manager (Column) a follower. The longer the Investor leads the game, the higher is both player’s payoff. Surprisingly however, it is always the Manager that achieves the largest payoff. ii) The game has an efficient cooperative strategy where the players alternate in receiving a high payoff, but the players never identify, or accept, that strategy. iii) Under the assumption that the information used by the players is closely associated with the leader- follower sequence, and that information is available before the player’s decisions are made, the players switched LF- strategy primarily as a function of information on the Investor’s investment and moves and secondly as a function of the Manager’s payoff. PMID:26968032
The Genomic Observatories Metadatabase (GeOMe): A new repository for field and sampling event metadata associated with genetic samples.

PubMed

Deck, John; Gaither, Michelle R; Ewing, Rodney; Bird, Christopher E; Davies, Neil; Meyer, Christopher; Riginos, Cynthia; Toonen, Robert J; Crandall, Eric D

2017-08-01

The Genomic Observatories Metadatabase (GeOMe, http://www.geome-db.org/) is an open access repository for geographic and ecological metadata associated with biosamples and genetic data. Whereas public databases have served as vital repositories for nucleotide sequences, they do not accession all the metadata required for ecological or evolutionary analyses. GeOMe fills this need, providing a user-friendly, web-based interface for both data contributors and data recipients. The interface allows data contributors to create a customized yet standard-compliant spreadsheet that captures the temporal and geospatial context of each biosample. These metadata are then validated and permanently linked to archived genetic data stored in the National Center for Biotechnology Information's (NCBI's) Sequence Read Archive (SRA) via unique persistent identifiers. By linking ecologically and evolutionarily relevant metadata with publicly archived sequence data in a structured manner, GeOMe sets a gold standard for data management in biodiversity science.
Arkas: Rapid reproducible RNAseq analysis

PubMed Central

Colombo, Anthony R.; J. Triche Jr, Timothy; Ramsingh, Giridharan

2017-01-01

The recently introduced Kallisto pseudoaligner has radically simplified the quantification of transcripts in RNA-sequencing experiments. We offer cloud-scale RNAseq pipelines Arkas-Quantification, and Arkas-Analysis available within Illumina’s BaseSpace cloud application platform which expedites Kallisto preparatory routines, reliably calculates differential expression, and performs gene-set enrichment of REACTOME pathways . Due to inherit inefficiencies of scale, Illumina's BaseSpace computing platform offers a massively parallel distributive environment improving data management services and data importing. Arkas-Quantification deploys Kallisto for parallel cloud computations and is conveniently integrated downstream from the BaseSpace Sequence Read Archive (SRA) import/conversion application titled SRA Import. Arkas-Analysis annotates the Kallisto results by extracting structured information directly from source FASTA files with per-contig metadata, calculates the differential expression and gene-set enrichment analysis on both coding genes and transcripts. The Arkas cloud pipeline supports ENSEMBL transcriptomes and can be used downstream from the SRA Import facilitating raw sequencing importing, SRA FASTQ conversion, RNA quantification and analysis steps. PMID:28868134
RAD sequencing yields a high success rate for westslope cutthroat and rainbow trout species-diagnostic SNP assays

USGS Publications Warehouse

Stephen J. Amish,; Paul A. Hohenlohe,; Sally Painter,; Robb F. Leary,; Muhlfeld, Clint C.; Fred W. Allendorf,; Luikart, Gordon

2012-01-01

Hybridization with introduced rainbow trout threatens most native westslope cutthroat trout populations. Understanding the genetic effects of hybridization and introgression requires a large set of high-throughput, diagnostic genetic markers to inform conservation and management. Recently, we identified several thousand candidate single-nucleotide polymorphism (SNP) markers based on RAD sequencing of 11 westslope cutthroat trout and 13 rainbow trout individuals. Here, we used flanking sequence for 56 of these candidate SNP markers to design high-throughput genotyping assays. We validated the assays on a total of 92 individuals from 22 populations and seven hatchery strains. Forty-six assays (82%) amplified consistently and allowed easy identification of westslope cutthroat and rainbow trout alleles as well as heterozygote controls. The 46 SNPs will provide high power for early detection of population admixture and improved identification of hybrid and nonhybridized individuals. This technique shows promise as a very low-cost, reliable and relatively rapid method for developing and testing SNP markers for nonmodel organisms with limited genomic resources.
Comprehensive Genome Profiling of Single Sperm Cells by Multiple Annealing and Looping-Based Amplification Cycles and Next-Generation Sequencing from Carriers of Robertsonian Translocation.

PubMed

Sha, Yanwei; Sha, Yankun; Ji, Zhiyong; Ding, Lu; Zhang, Qing; Ouyang, Honggen; Lin, Shaobin; Wang, Xu; Shao, Lin; Shi, Chong; Li, Ping; Song, Yueqiang

2017-03-01

Robertsonian translocation (RT) is a common cause for male infertility, recurrent pregnancy loss, and birth defects. Studying meiotic recombination in RT-carrier patients helps decipher the mechanism and improve the clinical management of infertility and birth defects caused by RT. Here we present a new method to study spermatogenesis on a single-gamete basis from two RT carriers. By using a combined single-cell whole-genome amplification and sequencing protocol, we comprehensively profiled the chromosomal copy number of 88 single sperms from two RT-carrier patients. With the profiled information, chromosomal aberrations were identified on a whole-genome, per-sperm basis. We found that the previously reported interchromosomal effect might not exist with RT carriers. It is suggested that single-cell genome sequencing enables comprehensive chromosomal aneuploidy screening and provides a powerful tool for studying gamete generation from patients carrying chromosomal diseases. © 2017 John Wiley & Sons Ltd/University College London.
PARPs database: A LIMS systems for protein-protein interaction data mining or laboratory information management system

PubMed Central

Droit, Arnaud; Hunter, Joanna M; Rouleau, Michèle; Ethier, Chantal; Picard-Cloutier, Aude; Bourgais, David; Poirier, Guy G

2007-01-01

Background In the "post-genome" era, mass spectrometry (MS) has become an important method for the analysis of proteins and the rapid advancement of this technique, in combination with other proteomics methods, results in an increasing amount of proteome data. This data must be archived and analysed using specialized bioinformatics tools. Description We herein describe "PARPs database," a data analysis and management pipeline for liquid chromatography tandem mass spectrometry (LC-MS/MS) proteomics. PARPs database is a web-based tool whose features include experiment annotation, protein database searching, protein sequence management, as well as data-mining of the peptides and proteins identified. Conclusion Using this pipeline, we have successfully identified several interactions of biological significance between PARP-1 and other proteins, namely RFC-1, 2, 3, 4 and 5. PMID:18093328
Post-disaster Risk Assessment for Hilly Terrain exposed to Seismic Loading

NASA Astrophysics Data System (ADS)

Yates, Katherine; Villeneuve, Marlene; Wilson, Thomas

2013-04-01

The 2010-present Canterbury earthquake sequence in the central South Island of New Zealand has identified and highlighted the value of practical, standardised and coordinated geotechnical risk assessment guidelines for inhabited structures in the aftermath of a geotechnical disaster. The lack of such guidelines and provisions to enforce risk assessments was a major gap which hindered coordinated, timely and transparent management of geotechnical risk. The earthquake sequence initiated a series of rockfall, cliff collapse and landslide events around the Port Hills southeast of Christchurch. This was particularly the case with the 22 February 2011 earthquakes, which put thousands of people inhabiting the area at risk. Lives were lost and thousands of houses and critical infrastructure were damaged. Given the highly seismic environment in New Zealand and a significant number of active faults near population centres, it is prudent to develop such guidelines to ensure response mechanisms and geotechnical risk assessment is effective following an earthquake rupture in a largely populated urban environment. For response and associated risk assessments to be effective, the mechanisms of the geotechnical failure should be taken into consideration as part of the life safety assessment. This is to ensure that the hazard's potential risk is fully assessed and encompassed in decisions regarding life safety. This paper examines the event sequence, slope failure mechanisms and the geotechnical risk management approach that developed immediately post-earthquake. It highlights experiences from key municipal, management and operational stakeholders who were involved in geotechnical risk assessment during the Canterbury earthquake sequence, and sheds light on the evolution of information needed through time during the emergency response and identify the hard won lessons. It then discusses what is needed for life safety assessment post-earthquake and create awareness of potential geotechnical hazards. This is not only important to New Zealand but has international implications as there are many other regions of the world also subject to high seismic risk.
Embedding strategies for effective use of information from multiple sequence alignments.

PubMed Central

Henikoff, S.; Henikoff, J. G.

1997-01-01

We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain. PMID:9070452

Digital visual communications using a Perceptual Components Architecture

NASA Technical Reports Server (NTRS)

Watson, Andrew B.

1991-01-01

The next era of space exploration will generate extraordinary volumes of image data, and management of this image data is beyond current technical capabilities. We propose a strategy for coding visual information that exploits the known properties of early human vision. This Perceptual Components Architecture codes images and image sequences in terms of discrete samples from limited bands of color, spatial frequency, orientation, and temporal frequency. This spatiotemporal pyramid offers efficiency (low bit rate), variable resolution, device independence, error-tolerance, and extensibility.
GenColors: annotation and comparative genomics of prokaryotes made easy.

PubMed

Romualdi, Alessandro; Felder, Marius; Rose, Dominic; Gausmann, Ulrike; Schilhabel, Markus; Glöckner, Gernot; Platzer, Matthias; Sühnel, Jürgen

2007-01-01

GenColors (gencolors.fli-leibniz.de) is a new web-based software/database system aimed at an improved and accelerated annotation of prokaryotic genomes considering information on related genomes and making extensive use of genome comparison. It offers a seamless integration of data from ongoing sequencing projects and annotated genomic sequences obtained from GenBank. A variety of export/import filters manages an effective data flow from sequence assembly and manipulation programs (e.g., GAP4) to GenColors and back as well as to standard GenBank file(s). The genome comparison tools include best bidirectional hits, gene conservation, syntenies, and gene core sets. Precomputed UniProt matches allow annotation and analysis in an effective manner. In addition to these analysis options, base-specific quality data (coverage and confidence) can also be handled if available. The GenColors system can be used both for annotation purposes in ongoing genome projects and as an analysis tool for finished genomes. GenColors comes in two types, as dedicated genome browsers and as the Jena Prokaryotic Genome Viewer (JPGV). Dedicated genome browsers contain genomic information on a set of related genomes and offer a large number of options for genome comparison. The system has been efficiently used in the genomic sequencing of Borrelia garinii and is currently applied to various ongoing genome projects on Borrelia, Legionella, Escherichia, and Pseudomonas genomes. One of these dedicated browsers, the Spirochetes Genome Browser (sgb.fli-leibniz.de) with Borrelia, Leptospira, and Treponema genomes, is freely accessible. The others will be released after finalization of the corresponding genome projects. JPGV (jpgv.fli-leibniz.de) offers information on almost all finished bacterial genomes, as compared to the dedicated browsers with reduced genome comparison functionality, however. As of January 2006, this viewer includes 632 genomic elements (e.g., chromosomes and plasmids) of 293 species. The system provides versatile quick and advanced search options for all currently known prokaryotic genomes and generates circular and linear genome plots. Gene information sheets contain basic gene information, database search options, and links to external databases. GenColors is also available on request for local installation.
Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

DOE PAGES

Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas; ...

2017-08-08

Here, we present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a MetagenomeAssembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Genemore » Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.« less
Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas

Here, we present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a MetagenomeAssembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Genemore » Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.« less
REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

PubMed Central

Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.

2009-01-01

The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722
A method for partitioning the information contained in a protein sequence between its structure and function.

PubMed

Possenti, Andrea; Vendruscolo, Michele; Camilloni, Carlo; Tiana, Guido

2018-05-23

Proteins employ the information stored in the genetic code and translated into their sequences to carry out well-defined functions in the cellular environment. The possibility to encode for such functions is controlled by the balance between the amount of information supplied by the sequence and that left after that the protein has folded into its structure. We study the amount of information necessary to specify the protein structure, providing an estimate that keeps into account the thermodynamic properties of protein folding. We thus show that the information remaining in the protein sequence after encoding for its structure (the 'information gap') is very close to what needed to encode for its function and interactions. Then, by predicting the information gap directly from the protein sequence, we show that it may be possible to use these insights from information theory to discriminate between ordered and disordered proteins, to identify unknown functions, and to optimize artificially-designed protein sequences. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.
A strategic approach for managing conflict in hospitals: responding to the Joint Commission leadership standard, Part 2.

PubMed

Scott, Charity; Gerardi, Debra

2011-02-01

A well-designed conflict management process for hospital leaders should both retain the positive benefits of constructive conflict engagement and minimize the adverse consequences that unmanaged conflict can have on patient care. Dispute system design (DSD) experts recommend processes that emphasize the identification of the disputing parties' interests and that avoid reliance on exertions of power or resort to rights. In an emerging trend in designing conflict management systems, focus is placed on the relational dynamics among those involved in the conflict, in recognition of the reciprocal impact that each participant in a conflict has on the other. The aim is then to restore trust and heal damaged relationships as a component of resolution. The intent of Standard LD.02.04.01 is to prevent escalation to formal legal disputes and encourage leaders to overcome their conflict-avoidance tendencies through the use of well-designed approaches that support engagement with conflict. The sequence of collaborative options consists of individual coaching and counseling; informal face-to-face meetings; informal, internally facilitated meetings; informal, externally facilitated meetings; formal mediation; and postdispute analysis and feedback. Every hospital has unique needs, and every conflict management process must be tailored to individual circumstances. The recommendations in this two-part article can be adapted and incorporated in other, more comprehensive conflict management processes throughout the hospital. Expanding the conflict competence of leaders to enable them to effectively engage in and model constructive conflict-handling behaviors will further support the strategic goal of providing safe and effective patient care.
PWHATSHAP: efficient haplotyping for future generation sequencing.

PubMed

Bracciali, Andrea; Aldinucci, Marco; Patterson, Murray; Marschall, Tobias; Pisanti, Nadia; Merelli, Ivan; Torquati, Massimo

2016-09-22

Haplotype phasing is an important problem in the analysis of genomics information. Given a set of DNA fragments of an individual, it consists of determining which one of the possible alleles (alternative forms of a gene) each fragment comes from. Haplotype information is relevant to gene regulation, epigenetics, genome-wide association studies, evolutionary and population studies, and the study of mutations. Haplotyping is currently addressed as an optimisation problem aiming at solutions that minimise, for instance, error correction costs, where costs are a measure of the confidence in the accuracy of the information acquired from DNA sequencing. Solutions have typically an exponential computational complexity. WHATSHAP is a recent optimal approach which moves computational complexity from DNA fragment length to fragment overlap, i.e., coverage, and is hence of particular interest when considering sequencing technology's current trends that are producing longer fragments. Given the potential relevance of efficient haplotyping in several analysis pipelines, we have designed and engineered PWHATSHAP, a parallel, high-performance version of WHATSHAP. PWHATSHAP is embedded in a toolkit developed in Python and supports genomics datasets in standard file formats. Building on WHATSHAP, PWHATSHAP exhibits the same complexity exploring a number of possible solutions which is exponential in the coverage of the dataset. The parallel implementation on multi-core architectures allows for a relevant reduction of the execution time for haplotyping, while the provided results enjoy the same high accuracy as that provided by WHATSHAP, which increases with coverage. Due to its structure and management of the large datasets, the parallelisation of WHATSHAP posed demanding technical challenges, which have been addressed exploiting a high-level parallel programming framework. The result, PWHATSHAP, is a freely available toolkit that improves the efficiency of the analysis of genomics information.
[Study on ITS sequences of Aconitum vilmorinianum and its medicinal adulterant].

PubMed

Zhang, Xiao-nan; Du, Chun-hua; Fu, De-huan; Gao, Li; Zhou, Pei-jun; Wang, Li

2012-09-01

To analyze and compare the ITS sequences of Aconitum vilmorinianum and its medicinal adulterant Aconitum austroyunnanense. Total genomic DNA were extracted from sample materials by improved CTAB method, ITS sequences of samples were amplified using PCR systems, directly sequenced and analyzed using software DNAStar, ClustalX1.81 and MEGA 4.0. 299 consistent sites, 19 variable sites and 13 informative sites were found in ITS1 sequences, 162 consistent sites, 2 variable sites and 1 informative sites were found in 5.8S sequences, 217 consistent sites, 3 variable sites and 1 informative site were found in ITS2 sequences. Base transition and transversion was not found only in 5.8S sequences, 2 sites transition and 1 site transversion were found in ITS1 sequences, only 1 site transversion was found in ITS2 sequences comparting the ITS sequences data matrix. By analyzing the ITS sequences data matrix from 2 population of Aconitum vilmorinianum and 3 population of Aconitum austroyunnanense, we found a stable informative site at the 596th base in ITS2 sequences, in all the samples of Aconitum vilmorinianum the base was C, and in all the samples of Aconitum austroyunnanense the base was A. Aconitum vilmorinianum and Aconitum austroyunnanense can be identified by their characters of ITS sequences, and the variable sites in ITS1 sequences are more than in ITS2 sequences.
Information systems in food safety management.

PubMed

McMeekin, T A; Baranyi, J; Bowman, J; Dalgaard, P; Kirk, M; Ross, T; Schmid, S; Zwietering, M H

2006-12-01

Information systems are concerned with data capture, storage, analysis and retrieval. In the context of food safety management they are vital to assist decision making in a short time frame, potentially allowing decisions to be made and practices to be actioned in real time. Databases with information on microorganisms pertinent to the identification of foodborne pathogens, response of microbial populations to the environment and characteristics of foods and processing conditions are the cornerstone of food safety management systems. Such databases find application in: Identifying pathogens in food at the genus or species level using applied systematics in automated ways. Identifying pathogens below the species level by molecular subtyping, an approach successfully applied in epidemiological investigations of foodborne disease and the basis for national surveillance programs. Predictive modelling software, such as the Pathogen Modeling Program and Growth Predictor (that took over the main functions of Food Micromodel) the raw data of which were combined as the genesis of an international web based searchable database (ComBase). Expert systems combining databases on microbial characteristics, food composition and processing information with the resulting "pattern match" indicating problems that may arise from changes in product formulation or processing conditions. Computer software packages to aid the practical application of HACCP and risk assessment and decision trees to bring logical sequences to establishing and modifying food safety management practices. In addition there are many other uses of information systems that benefit food safety more globally, including: Rapid dissemination of information on foodborne disease outbreaks via websites or list servers carrying commentary from many sources, including the press and interest groups, on the reasons for and consequences of foodborne disease incidents. Active surveillance networks allowing rapid dissemination of molecular subtyping information between public health agencies to detect foodborne outbreaks and limit the spread of human disease. Traceability of individual animals or crops from (or before) conception or germination to the consumer as an integral part of food supply chain management. Provision of high quality, online educational packages to food industry personnel otherwise precluded from access to such courses.
Equine infectious anemia virus in naturally infected horses from the Brazilian Pantanal.

PubMed

Cursino, Andreia Elisa; Vilela, Ana Paula Pessoa; Franco-Luiz, Ana Paula Moreira; de Oliveira, Jaquelline Germano; Nogueira, Márcia Furlan; Júnior, João Pessoa Araújo; de Aguiar, Daniel Moura; Kroon, Erna Geessien

2018-05-11

Equine infectious anemia (EIA) has a worldwide distribution, and is widespread in Brazil. The Brazilian Pantanal presents with high prevalence comprising equine performance and indirectly the livestock industry, since the horses are used for cattle management. Although EIA is routinely diagnosed by the agar gel immunodiffusion test (AGID), this serological assay has some limitations, so PCR-based detection methods have the potential to overcome these limitations and act as complementary tests to those currently used. Considering the limited number of equine infectious anemia virus (EIAV) sequences which are available in public databases and the great genome variability, studies of EIAV detection and characterization molecular remain important. In this study we detected EIAV proviral DNA from 23 peripheral blood mononuclear cell (PBMCs) samples of naturally infected horses from Brazilian Pantanal using a semi-nested-PCR (sn-PCR). The serological profile of the animals was also evaluated by AGID and ELISA for gp90 and p26. Furthermore, the EIAV PCR amplified DNA was sequenced and phylogenetically analyzed. Here we describe the first EIAV sequences of the 5' LTR of the tat gene in naturally infected horses from Brazil, which presented with 91% similarity to EIAV reference sequences. The Brazilian EIAV sequences also presented variable nucleotide similarities among themselves, ranging from 93,5% to 100%. Phylogenetic analysis showed that Brazilian EIAV sequences grouped in a separate clade relative to other reference sequences. Thus this molecular detection and characterization may provide information about EIAV circulation in Brazilian territories and improve phylogenetic inferences.
Analysis of expressed sequence tags (ESTs) from cocoa (Theobroma cacao L) upon infection with Phytophthora megakarya.

PubMed

Naganeeswaran, Sudalaimuthu Asari; Subbian, Elain Apshara; Ramaswamy, Manimekalai

2012-01-01

Phytophthora megakarya, the causative agent of cacao black pod disease in West African countries causes an extensive loss of yield. In this study we have analyzed 4 libraries of ESTs derived from Phytophthora megakarya infected cocoa leaf and pod tissues. Totally 6379 redundant sequences were retrieved from ESTtik database and EST processing was performed using seqclean tool. Clustering and assembling using CAP3 generated 3333 non-redundant (907 contigs and 2426 singletons) sequences. The primary sequence analysis of 3333 non-redundant sequences showed that the GC percentage was 42.7 and the sequence length ranged from 101 - 2576 nucleotides. Further, functional analysis (Blast, Interproscan, Gene ontology and KEGG search) were executed and 1230 orthologous genes were annotated. Totally 272 enzymes corresponding to 114 metabolic pathways were identified. Functional annotation revealed that most of the sequences are related to molecular function, stress response and biological processes. The annotated enzymes are aldehyde dehydrogenase (E.C: 1.2.1.3), catalase (E.C: 1.11.1.6), acetyl-CoA C-acetyltransferase (E.C: 2.3.1.9), threonine ammonia-lyase (E.C: 4.3.1.19), acetolactate synthase (E.C: 2.2.1.6), O-methyltransferase (E.C: 2.1.1.68) which play an important role in amino acid biosynthesis and phenyl propanoid biosynthesis. All this information was stored in MySQL database management system to be used in future for reconstruction of biotic stress response pathway in cocoa.
GBA manager: an online tool for querying low-complexity regions in proteins.

PubMed

Bandyopadhyay, Nirmalya; Kahveci, Tamer

2010-01-01

Abstract We developed GBA Manager, an online software that facilitates the Graph-Based Algorithm (GBA) we proposed in our earlier work. GBA identifies the low-complexity regions (LCR) of protein sequences. GBA exploits a similarity matrix, such as BLOSUM62, to compute the complexity of the subsequences of the input protein sequence. It uses a graph-based algorithm to accurately compute the regions that have low complexities. GBA Manager is a user friendly web-service that enables online querying of protein sequences using GBA. In addition to querying capabilities of the existing GBA algorithm, GBA Manager computes the p-values of the LCR identified. The p-value gives an estimate of the possibility that the region appears by chance. GBA Manager presents the output in three different understandable formats. GBA Manager is freely accessible at http://bioinformatics.cise.ufl.edu/GBA/GBA.htm .
PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data.

PubMed

Chiu, Kuo Ping; Wong, Chee-Hong; Chen, Qiongyu; Ariyaratne, Pramila; Ooi, Hong Sain; Wei, Chia-Lin; Sung, Wing-Kin Ken; Ruan, Yijun

2006-08-25

We recently developed the Paired End diTag (PET) strategy for efficient characterization of mammalian transcriptomes and genomes. The paired end nature of short PET sequences derived from long DNA fragments raised a new set of bioinformatics challenges, including how to extract PETs from raw sequence reads, and correctly yet efficiently map PETs to reference genome sequences. To accommodate and streamline data analysis of the large volume PET sequences generated from each PET experiment, an automated PET data process pipeline is desirable. We designed an integrated computation program package, PET-Tool, to automatically process PET sequences and map them to the genome sequences. The Tool was implemented as a web-based application composed of four modules: the Extractor module for PET extraction; the Examiner module for analytic evaluation of PET sequence quality; the Mapper module for locating PET sequences in the genome sequences; and the Project Manager module for data organization. The performance of PET-Tool was evaluated through the analyses of 2.7 million PET sequences. It was demonstrated that PET-Tool is accurate and efficient in extracting PET sequences and removing artifacts from large volume dataset. Using optimized mapping criteria, over 70% of quality PET sequences were mapped specifically to the genome sequences. With a 2.4 GHz LINUX machine, it takes approximately six hours to process one million PETs from extraction to mapping. The speed, accuracy, and comprehensiveness have proved that PET-Tool is an important and useful component in PET experiments, and can be extended to accommodate other related analyses of paired-end sequences. The Tool also provides user-friendly functions for data quality check and system for multi-layer data management.
A multilevel ant colony optimization algorithm for classical and isothermic DNA sequencing by hybridization with multiplicity information available.

PubMed

Kwarciak, Kamil; Radom, Marcin; Formanowicz, Piotr

2016-04-01

The classical sequencing by hybridization takes into account a binary information about sequence composition. A given element from an oligonucleotide library is or is not a part of the target sequence. However, the DNA chip technology has been developed and it enables to receive a partial information about multiplicity of each oligonucleotide the analyzed sequence consist of. Currently, it is not possible to assess the exact data of such type but even partial information should be very useful. Two realistic multiplicity information models are taken into consideration in this paper. The first one, called "one and many" assumes that it is possible to obtain information if a given oligonucleotide occurs in a reconstructed sequence once or more than once. According to the second model, called "one, two and many", one is able to receive from biochemical experiment information if a given oligonucleotide is present in an analyzed sequence once, twice or at least three times. An ant colony optimization algorithm has been implemented to verify the above models and to compare with existing algorithms for sequencing by hybridization which utilize the additional information. The proposed algorithm solves the problem with any kind of hybridization errors. Computational experiment results confirm that using even the partial information about multiplicity leads to increased quality of reconstructed sequences. Moreover, they also show that the more precise model enables to obtain better solutions and the ant colony optimization algorithm outperforms the existing ones. Test data sets and the proposed ant colony optimization algorithm are available on: http://bioserver.cs.put.poznan.pl/download/ACO4mSBH.zip. Copyright © 2016 Elsevier Ltd. All rights reserved.
Electronic data generation and display system

NASA Technical Reports Server (NTRS)

Wetekamm, Jules

1988-01-01

The Electronic Data Generation and Display System (EDGADS) is a field tested paperless technical manual system. The authoring provides subject matter experts the option of developing procedureware from digital or hardcopy inputs of technical information from text, graphics, pictures, and recorded media (video, audio, etc.). The display system provides multi-window presentations of graphics, pictures, animations, and action sequences with text and audio overlays on high resolution color CRT and monochrome portable displays. The database management system allows direct access via hierarchical menus, keyword name, ID number, voice command or touch of a screen pictoral of the item (ICON). It contains operations and maintenance technical information at three levels of intelligence for a total system.
AnaBench: a Web/CORBA-based workbench for biomolecular sequence analysis

PubMed Central

Badidi, Elarbi; De Sousa, Cristina; Lang, B Franz; Burger, Gertraud

2003-01-01

Background Sequence data analyses such as gene identification, structure modeling or phylogenetic tree inference involve a variety of bioinformatics software tools. Due to the heterogeneity of bioinformatics tools in usage and data requirements, scientists spend much effort on technical issues including data format, storage and management of input and output, and memorization of numerous parameters and multi-step analysis procedures. Results In this paper, we present the design and implementation of AnaBench, an interactive, Web-based bioinformatics Analysis workBench allowing streamlined data analysis. Our philosophy was to minimize the technical effort not only for the scientist who uses this environment to analyze data, but also for the administrator who manages and maintains the workbench. With new bioinformatics tools published daily, AnaBench permits easy incorporation of additional tools. This flexibility is achieved by employing a three-tier distributed architecture and recent technologies including CORBA middleware, Java, JDBC, and JSP. A CORBA server permits transparent access to a workbench management database, which stores information about the users, their data, as well as the description of all bioinformatics applications that can be launched from the workbench. Conclusion AnaBench is an efficient and intuitive interactive bioinformatics environment, which offers scientists application-driven, data-driven and protocol-driven analysis approaches. The prototype of AnaBench, managed by a team at the Université de Montréal, is accessible on-line at: . Please contact the authors for details about setting up a local-network AnaBench site elsewhere. PMID:14678565
Collaborative Effort for a Centralized Worldwide Tuberculosis Relational Sequencing Data Platform.

PubMed

Starks, Angela M; Avilés, Enrique; Cirillo, Daniela M; Denkinger, Claudia M; Dolinger, David L; Emerson, Claudia; Gallarda, Jim; Hanna, Debra; Kim, Peter S; Liwski, Richard; Miotto, Paolo; Schito, Marco; Zignol, Matteo

2015-10-15

Continued progress in addressing challenges associated with detection and management of tuberculosis requires new diagnostic tools. These tools must be able to provide rapid and accurate information for detecting resistance to guide selection of the treatment regimen for each patient. To achieve this goal, globally representative genotypic, phenotypic, and clinical data are needed in a standardized and curated data platform. A global partnership of academic institutions, public health agencies, and nongovernmental organizations has been established to develop a tuberculosis relational sequencing data platform (ReSeqTB) that seeks to increase understanding of the genetic basis of resistance by correlating molecular data with results from drug susceptibility testing and, optimally, associated patient outcomes. These data will inform development of new diagnostics, facilitate clinical decision making, and improve surveillance for drug resistance. ReSeqTB offers an opportunity for collaboration to achieve improved patient outcomes and to advance efforts to prevent and control this devastating disease. Published by Oxford University Press on behalf of the Infectious Diseases Society of America 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.
The Banana Genome Hub

PubMed Central

Droc, Gaëtan; Larivière, Delphine; Guignon, Valentin; Yahiaoui, Nabila; This, Dominique; Garsmeur, Olivier; Dereeper, Alexis; Hamelin, Chantal; Argout, Xavier; Dufayard, Jean-François; Lengelle, Juliette; Baurens, Franc-Christophe; Cenci, Alberto; Pitollat, Bertrand; D’Hont, Angélique; Ruiz, Manuel; Rouard, Mathieu; Bocs, Stéphanie

2013-01-01

Banana is one of the world’s favorite fruits and one of the most important crops for developing countries. The banana reference genome sequence (Musa acuminata) was recently released. Given the taxonomic position of Musa, the completed genomic sequence has particular comparative value to provide fresh insights about the evolution of the monocotyledons. The study of the banana genome has been enhanced by a number of tools and resources that allows harnessing its sequence. First, we set up essential tools such as a Community Annotation System, phylogenomics resources and metabolic pathways. Then, to support post-genomic efforts, we improved banana existing systems (e.g. web front end, query builder), we integrated available Musa data into generic systems (e.g. markers and genetic maps, synteny blocks), we have made interoperable with the banana hub, other existing systems containing Musa data (e.g. transcriptomics, rice reference genome, workflow manager) and finally, we generated new results from sequence analyses (e.g. SNP and polymorphism analysis). Several uses cases illustrate how the Banana Genome Hub can be used to study gene families. Overall, with this collaborative effort, we discuss the importance of the interoperability toward data integration between existing information systems. Database URL: http://banana-genome.cirad.fr/ PMID:23707967
The Impact of Normalization Methods on RNA-Seq Data Analysis

PubMed Central

Zyprych-Walczak, J.; Szabelska, A.; Handschuh, L.; Górczak, K.; Klamecka, K.; Figlerowicz, M.; Siatkowski, I.

2015-01-01

High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably. PMID:26176014

Morphological and molecular characterization of fungal pathogen, Magnaphorthe oryzae

NASA Astrophysics Data System (ADS)

Hasan, Nor'Aishah; Rafii, Mohd Y.; Rahim, Harun A.; Ali, Nusaibah Syd; Mazlan, Norida; Abdullah, Shamsiah

2016-02-01

Rice is arguably the most crucial food crops supplying quarter of calories intake. Fungal pathogen, Magnaphorthe oryzae promotes blast disease unconditionally to gramineous host including rice species. This disease spurred an outbreaks and constant threat to cereal production. Global rice yield declining almost 10-30% including Malaysia. As Magnaphorthe oryzae and its host is model in disease plant study, the rice blast pathosystem has been the subject of intense interest to overcome the importance of the disease to world agriculture. Therefore, in this study, our prime objective was to isolate samples of Magnaphorthe oryzae from diseased leaf obtained from MARDI Seberang Perai, Penang, Malaysia. Molecular identification was performed by sequences analysis from internal transcribed spacer (ITS) region of nuclear ribosomal RNA genes. Phylogenetic affiliation of the isolated samples were analyzed by comparing the ITS sequences with those deposited in the GenBank database. The sequence of the isolate demonstrated at least 99% nucleotide identity with the corresponding sequence in GenBank for Magnaphorthe oryzae. Morphological observed under microscope demonstrated that the structure of conidia followed similar characteristic as M. oryzae. Finding in this study provide useful information for breeding programs, epidemiology studies and improved disease management.
Morphological and molecular characterization of fungal pathogen, Magnaphorthe oryzae

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hasan, Nor’Aishah, E-mail: aishahnh@ns.uitm.edu.my; Rafii, Mohd Y., E-mail: mrafii@upm.edu.my; Department of Crop Science, Universiti Putra Malaysia

2016-02-01

Rice is arguably the most crucial food crops supplying quarter of calories intake. Fungal pathogen, Magnaphorthe oryzae promotes blast disease unconditionally to gramineous host including rice species. This disease spurred an outbreaks and constant threat to cereal production. Global rice yield declining almost 10-30% including Malaysia. As Magnaphorthe oryzae and its host is model in disease plant study, the rice blast pathosystem has been the subject of intense interest to overcome the importance of the disease to world agriculture. Therefore, in this study, our prime objective was to isolate samples of Magnaphorthe oryzae from diseased leaf obtained from MARDI Seberangmore » Perai, Penang, Malaysia. Molecular identification was performed by sequences analysis from internal transcribed spacer (ITS) region of nuclear ribosomal RNA genes. Phylogenetic affiliation of the isolated samples were analyzed by comparing the ITS sequences with those deposited in the GenBank database. The sequence of the isolate demonstrated at least 99% nucleotide identity with the corresponding sequence in GenBank for Magnaphorthe oryzae. Morphological observed under microscope demonstrated that the structure of conidia followed similar characteristic as M. oryzae. Finding in this study provide useful information for breeding programs, epidemiology studies and improved disease management.« less
Impact of NGS in the medical sciences: Genetic syndromes with an increased risk of developing cancer as an example of the use of new technologies

PubMed Central

Lapunzina, Pablo; López, Rocío Ortiz; Rodríguez-Laguna, Lara; García-Miguel, Purificación; Martínez, Augusto Rojas; Martínez-Glez, Víctor

2014-01-01

The increased speed and decreasing cost of sequencing, along with an understanding of the clinical relevance of emerging information for patient management, has led to an explosion of potential applications in healthcare. Currently, SNP arrays and Next-Generation Sequencing (NGS) technologies are relatively new techniques used to scan genomes for gains and losses, losses of heterozygosity (LOH), SNPs, and indel variants as well as to perform complete sequencing of a panel of candidate genes, the entire exome (whole exome sequencing) or even the whole genome. As a result, these new high-throughput technologies have facilitated progress in the understanding and diagnosis of genetic syndromes and cancers, two disorders traditionally considered to be separate diseases but that can share causal genetic alterations in a group of developmental disorders associated with congenital malformations and cancer risk. The purpose of this work is to review these syndromes as an example of a group of disorders that has been included in a panel of genes for NGS analysis. We also highlight the relationship between development and cancer and underline the connections between these syndromes. PMID:24764758
SACCHARIS: an automated pipeline to streamline discovery of carbohydrate active enzyme activities within polyspecific families and de novo sequence datasets.

PubMed

Jones, Darryl R; Thomas, Dallas; Alger, Nicholas; Ghavidel, Ata; Inglis, G Douglas; Abbott, D Wade

2018-01-01

Deposition of new genetic sequences in online databases is expanding at an unprecedented rate. As a result, sequence identification continues to outpace functional characterization of carbohydrate active enzymes (CAZymes). In this paradigm, the discovery of enzymes with novel functions is often hindered by high volumes of uncharacterized sequences particularly when the enzyme sequence belongs to a family that exhibits diverse functional specificities (i.e., polyspecificity). Therefore, to direct sequence-based discovery and characterization of new enzyme activities we have developed an automated in silico pipeline entitled: Sequence Analysis and Clustering of CarboHydrate Active enzymes for Rapid Informed prediction of Specificity (SACCHARIS). This pipeline streamlines the selection of uncharacterized sequences for discovery of new CAZyme or CBM specificity from families currently maintained on the CAZy website or within user-defined datasets. SACCHARIS was used to generate a phylogenetic tree of a GH43, a CAZyme family with defined subfamily designations. This analysis confirmed that large datasets can be organized into sequence clusters of manageable sizes that possess related functions. Seeding this tree with a GH43 sequence from Bacteroides dorei DSM 17855 (BdGH43b, revealed it partitioned as a single sequence within the tree. This pattern was consistent with it possessing a unique enzyme activity for GH43 as BdGH43b is the first described α-glucanase described for this family. The capacity of SACCHARIS to extract and cluster characterized carbohydrate binding module sequences was demonstrated using family 6 CBMs (i.e., CBM6s). This CBM family displays a polyspecific ligand binding profile and contains many structurally determined members. Using SACCHARIS to identify a cluster of divergent sequences, a CBM6 sequence from a unique clade was demonstrated to bind yeast mannan, which represents the first description of an α-mannan binding CBM. Additionally, we have performed a CAZome analysis of an in-house sequenced bacterial genome and a comparative analysis of B. thetaiotaomicron VPI-5482 and B. thetaiotaomicron 7330, to demonstrate that SACCHARIS can generate "CAZome fingerprints", which differentiate between the saccharolytic potential of two related strains in silico. Establishing sequence-function and sequence-structure relationships in polyspecific CAZyme families are promising approaches for streamlining enzyme discovery. SACCHARIS facilitates this process by embedding CAZyme and CBM family trees generated from biochemically to structurally characterized sequences, with protein sequences that have unknown functions. In addition, these trees can be integrated with user-defined datasets (e.g., genomics, metagenomics, and transcriptomics) to inform experimental characterization of new CAZymes or CBMs not currently curated, and for researchers to compare differential sequence patterns between entire CAZomes. In this light, SACCHARIS provides an in silico tool that can be tailored for enzyme bioprospecting in datasets of increasing complexity and for diverse applications in glycobiotechnology.
Oncologist use and perception of large panel next-generation tumor sequencing.

PubMed

Schram, A M; Reales, D; Galle, J; Cambria, R; Durany, R; Feldman, D; Sherman, E; Rosenberg, J; D'Andrea, G; Baxi, S; Janjigian, Y; Tap, W; Dickler, M; Baselga, J; Taylor, B S; Chakravarty, D; Gao, J; Schultz, N; Solit, D B; Berger, M F; Hyman, D M

2017-09-01

Genomic profiling is increasingly incorporated into oncology research and the clinical care of cancer patients. We sought to determine physician perception and use of enterprise-scale clinical sequencing at our center, including whether testing changed management and the reasoning behind this decision-making. All physicians who consented patients to MSK-IMPACT, a next-generation hybridization capture assay, in tumor types where molecular profiling is not routinely performed were asked to complete a questionnaire for each patient. Physician determination of genomic 'actionability' was compared to an expertly curated knowledgebase of somatic variants. Reported management decisions were compared to chart review. Responses were received from 146 physicians pertaining to 1932 patients diagnosed with 1 of 49 cancer types. Physicians indicated that sequencing altered management in 21% (331/1593) of patients in need of a treatment change. Among those in whom treatment was not altered, physicians indicated the presence of an actionable alteration in 55% (805/1474), however, only 45% (362/805) of these cases had a genomic variant annotated as actionable by expert curators. Further evaluation of these patients revealed that 66% (291/443) had a variant in a gene associated with biologic but not clinical evidence of actionability or a variant of unknown significance in a gene with at least one known actionable alteration. Of the cases annotated as actionable by experts, physicians identified an actionable alteration in 81% (362/445). In total, 13% (245/1932) of patients were enrolled to a genomically matched trial. Although physician and expert assessment differed, clinicians demonstrate substantial awareness of the genes associated with potential actionability and report using this knowledge to inform management in one in five patients. NCT01775072. © The Author 2017. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Software design specification. Part 2: Orbital Flight Test (OFT) detailed design specification. Volume 3: Applications. Book 2: System management

NASA Technical Reports Server (NTRS)

1979-01-01

The functions performed by the systems management (SM) application software are described along with the design employed to accomplish these functions. The operational sequences (OPS) control segments and the cyclic processes they control are defined. The SM specialist function control (SPEC) segments and the display controlled 'on-demand' processes that are invoked by either an OPS or SPEC control segment as a direct result of an item entry to a display are included. Each processing element in the SM application is described including an input/output table and a structured control flow diagram. The flow through the module and other information pertinent to that process and its interfaces to other processes are included.
Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications

PubMed Central

Yilmaz, Pelin; Kottmann, Renzo; Field, Dawn; Knight, Rob; Cole, James R; Amaral-Zettler, Linda; Gilbert, Jack A; Karsch-Mizrachi, Ilene; Johnston, Anjanette; Cochrane, Guy; Vaughan, Robert; Hunter, Christopher; Park, Joonhong; Morrison, Norman; Rocca-Serra, Philippe; Sterk, Peter; Arumugam, Manimozhiyan; Bailey, Mark; Baumgartner, Laura; Birren, Bruce W; Blaser, Martin J; Bonazzi, Vivien; Booth, Tim; Bork, Peer; Bushman, Frederic D; Buttigieg, Pier Luigi; Chain, Patrick S G; Charlson, Emily; Costello, Elizabeth K; Huot-Creasy, Heather; Dawyndt, Peter; DeSantis, Todd; Fierer, Noah; Fuhrman, Jed A; Gallery, Rachel E; Gevers, Dirk; Gibbs, Richard A; Gil, Inigo San; Gonzalez, Antonio; Gordon, Jeffrey I; Guralnick, Robert; Hankeln, Wolfgang; Highlander, Sarah; Hugenholtz, Philip; Jansson, Janet; Kau, Andrew L; Kelley, Scott T; Kennedy, Jerry; Knights, Dan; Koren, Omry; Kuczynski, Justin; Kyrpides, Nikos; Larsen, Robert; Lauber, Christian L; Legg, Teresa; Ley, Ruth E; Lozupone, Catherine A; Ludwig, Wolfgang; Lyons, Donna; Maguire, Eamonn; Methé, Barbara A; Meyer, Folker; Muegge, Brian; Nakielny, Sara; Nelson, Karen E; Nemergut, Diana; Neufeld, Josh D; Newbold, Lindsay K; Oliver, Anna E; Pace, Norman R; Palanisamy, Giriprakash; Peplies, Jörg; Petrosino, Joseph; Proctor, Lita; Pruesse, Elmar; Quast, Christian; Raes, Jeroen; Ratnasingham, Sujeevan; Ravel, Jacques; Relman, David A; Assunta-Sansone, Susanna; Schloss, Patrick D; Schriml, Lynn; Sinha, Rohini; Smith, Michelle I; Sodergren, Erica; Spor, Aymé; Stombaugh, Jesse; Tiedje, James M; Ward, Doyle V; Weinstock, George M; Wendel, Doug; White, Owen; Whiteley, Andrew; Wilke, Andreas; Wortman, Jennifer R; Yatsunenko, Tanya; Glöckner, Frank Oliver

2012-01-01

Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences—the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The ‘environmental packages’ apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere. PMID:21552244
SNPchiMp: a database to disentangle the SNPchip jungle in bovine livestock.

PubMed

Nicolazzi, Ezequiel Luis; Picciolini, Matteo; Strozzi, Francesco; Schnabel, Robert David; Lawley, Cindy; Pirani, Ali; Brew, Fiona; Stella, Alessandra

2014-02-11

Currently, six commercial whole-genome SNP chips are available for cattle genotyping, produced by two different genotyping platforms. Technical issues need to be addressed to combine data that originates from the different platforms, or different versions of the same array generated by the manufacturer. For example: i) genome coordinates for SNPs may refer to different genome assemblies; ii) reference genome sequences are updated over time changing the positions, or even removing sequences which contain SNPs; iii) not all commercial SNP ID's are searchable within public databases; iv) SNPs can be coded using different formats and referencing different strands (e.g. A/B or A/C/T/G alleles, referencing forward/reverse, top/bottom or plus/minus strand); v) Due to new information being discovered, higher density chips do not necessarily include all the SNPs present in the lower density chips; and, vi) SNP IDs may not be consistent across chips and platforms. Most researchers and breed associations manage SNP data in real-time and thus require tools to standardise data in a user-friendly manner. Here we present SNPchiMp, a MySQL database linked to an open access web-based interface. Features of this interface include, but are not limited to, the following functions: 1) referencing the SNP mapping information to the latest genome assembly, 2) extraction of information contained in dbSNP for SNPs present in all commercially available bovine chips, and 3) identification of SNPs in common between two or more bovine chips (e.g. for SNP imputation from lower to higher density). In addition, SNPchiMp can retrieve this information on subsets of SNPs, accessing such data either via physical position on a supported assembly, or by a list of SNP IDs, rs or ss identifiers. This tool combines many different sources of information, that otherwise are time consuming to obtain and difficult to integrate. The SNPchiMp not only provides the information in a user-friendly format, but also enables researchers to perform a large number of operations with a few clicks of the mouse. This significantly reduces the time needed to execute the large number of operations required to manage SNP data.
WHATIF: an open-source desktop application for extraction and management of the incidental findings from next-generation sequencing variant data

PubMed Central

Ye, Zhan; Kadolph, Christopher; Strenn, Robert; Wall, Daniel; McPherson, Elizabeth; Lin, Simon

2015-01-01

Background Identification and evaluation of incidental findings in patients following whole exome (WGS) or whole genome sequencing (WGS) is challenging for both practicing physicians and researchers. The American College of Medical Genetics and Genomics (ACMG) recently recommended a list of reportable incidental genetic findings. However, no informatics tools are currently available to support evaluation of incidental findings in next-generation sequencing data. Methods The Wisconsin Hierarchical Analysis Tool for Incidental Findings (WHATIF), was developed as a stand-alone Windows-based desktop executable, to support the interactive analysis of incidental findings in the context of the ACMG recommendations. WHATIF integrates the European Bioinformatics Institute Variant Effect Predictor (VEP) tool for biological interpretation and the National Center for Biotechnology Information ClinVar tool for clinical interpretation. Results An open-source desktop program was created to annotate incidental findings and present the results with a user-friendly interface. Further, a meaningful index (WHATIF Index) was devised for each gene to facilitate ranking of the relative importance of the variants and estimate the potential workload associated with further evaluation of the variants. Our WHATIF application is available at: http://tinyurl.com/WHATIF-SOFTWARE Conclusions The WHATIF application offers a user-friendly interface and allows users to investigate the extracted variant information efficiently and intuitively while always accessing the up to date information on variants via application programming interfaces (API) connections. WHATIF’s highly flexible design and straightforward implementation aids users in customizing the source code to meet their own special needs. PMID:25890833
Information capacity of nucleotide sequences and its applications.

PubMed

Sadovsky, M G

2006-05-01

The information capacity of nucleotide sequences is defined through the specific entropy of frequency dictionary of a sequence determined with respect to another one containing the most probable continuations of shorter strings. This measure distinguishes a sequence both from a random one, and from ordered entity. A comparison of sequences based on their information capacity is studied. An order within the genetic entities is found at the length scale ranged from 3 to 8. Some other applications of the developed methodology to genetics, bioinformatics, and molecular biology are discussed.
De novo characterization of Larimichthys crocea transcriptome for growth-/immune-related gene identification and massive microsatellite (SSR) marker development

NASA Astrophysics Data System (ADS)

Han, Zhaofang; Xiao, Shijun; Liu, Xiande; Liu, Yang; Li, Jiakai; Xie, Yangjie; Wang, Zhiyong

2017-03-01

The large yellow croaker, Larimichthys crocea is an important marine fish in China with a high economic value. In the last decade, the stock conservation and aquaculture industry of this species have been facing severe challenges because of wild population collapse and degeneration of important economic traits. However, genes contributing to growth and immunity in L. crocea have not been thoroughly analyzed, and available molecular markers are still not sufficient for genetic resource management and molecular selection. In this work, we sequenced the transcriptome in L. crocea liver tissue with a Roche 454 sequencing platform and assembled the transcriptome into 93 801 transcripts. Of them, 38 856 transcripts were successfully annotated in nt, nr, Swiss-Prot, InterPro, COG, GO and KEGG databases. Based on the annotation information, 3 165 unigenes related to growth and immunity were identified. Additionally, a total of 6 391 simple sequence repeats (SSRs) were identified from the transcriptome, among which 4 498 SSRs had enough flanking regions to design primers for polymerase chain reactions (PCR). To access the polymorphism of these markers, 30 primer pairs were randomly selected for PCR amplification and validation in 30 individuals, and 12 primer pairs (40.0%) exhibited obvious length polymorphisms. This work applied RNA-Seq to assemble and analyze a live transcriptome in L. crocea. With gene annotation and sequence information, genes related to growth and immunity were identified and massive SSR markers were developed, providing valuable genetic resources for future gene functional analysis and selective breeding of L. crocea.
First Transcriptome and Digital Gene Expression Analysis in Neuroptera with an Emphasis on Chemoreception Genes in Chrysopa pallens (Rambur)

PubMed Central

Li, Zhao-Qun; Zhang, Shuai; Ma, Yan; Luo, Jun-Yu; Wang, Chun-Yi; Lv, Li-Min; Dong, Shuang-Lin; Cui, Jin-Jie

2013-01-01

Background Chrysopa pallens (Rambur) are the most important natural enemies and predators of various agricultural pests. Understanding the sophisticated olfactory system in insect antennae is crucial for studying the physiological bases of olfaction and also could lead to effective applications of C. pallens in integrated pest management. However no transcriptome information is available for Neuroptera, and sequence data for C. pallens are scarce, so obtaining more sequence data is a priority for researchers on this species. Results To facilitate identifying sets of genes involved in olfaction, a normalized transcriptome of C. pallens was sequenced. A total of 104,603 contigs were obtained and assembled into 10,662 clusters and 39,734 singletons; 20,524 were annotated based on BLASTX analyses. A large number of candidate chemosensory genes were identified, including 14 odorant-binding proteins (OBPs), 22 chemosensory proteins (CSPs), 16 ionotropic receptors, 14 odorant receptors, and genes potentially involved in olfactory modulation. To better understand the OBPs, CSPs and cytochrome P450s, phylogenetic trees were constructed. In addition, 10 digital gene expression libraries of different tissues were constructed and gene expression profiles were compared among different tissues in males and females. Conclusions Our results provide a basis for exploring the mechanisms of chemoreception in C. pallens, as well as other insects. The evolutionary analyses in our study provide new insights into the differentiation and evolution of insect OBPs and CSPs. Our study provided large-scale sequence information for further studies in C. pallens. PMID:23826220
Being a pedestrian with dementia: A qualitative study using photo documentation and focus group interviews.

PubMed

Brorsson, Anna; Öhman, Annika; Lundberg, Stefan; Nygård, Louise

2016-09-01

The aim of the study was to identify problematic situations in using zebra crossings. They were identified from photo documentations comprising film sequences and the perspectives of people with dementia. The aim was also to identify how they would understand, interpret and act in these problematic situations based on their previous experiences and linked to the film sequences.A qualitative grounded theory approach was used. Film sequences from five zebra crossings were analysed. The same film sequences were used as triggers in two focus group interviews with persons with dementia. Individual interviews with three informants were also performed.The core category, the hazard of meeting unfolding problematic traffic situations when only one layer at a time can be kept in focus, showed how a problematic situation as a whole consisted of different layers of problematic situations. The first category, adding layers of problematic traffic situations to each other, was characterized by the informants' creation of a problematic situation as a whole. The different layers were described in the subcategories of layout of streets and zebra crossings, weather conditions, vehicles and crowding of pedestrians. The second category, actions used to meet different layers of problematic traffic situations, was characterized by avoiding problematic situations, using traffic lights as reminders and security precautions, following the flow at the zebra crossing and being cautious pedestrians.In conclusion, as community-dwelling people with dementia commonly are pedestrians, it is important that health care professionals and caregivers take their experiences and management of problematic traffic situations into account when providing support. © The Author(s) 2014.
CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.

PubMed

Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

2016-06-15

Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx : xin.gao@kaust.edu.sa Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Visual management of large scale data mining projects.

PubMed

Shah, I; Hunter, L

2000-01-01

This paper describes a unified framework for visualizing the preparations for, and results of, hundreds of machine learning experiments. These experiments were designed to improve the accuracy of enzyme functional predictions from sequence, and in many cases were successful. Our system provides graphical user interfaces for defining and exploring training datasets and various representational alternatives, for inspecting the hypotheses induced by various types of learning algorithms, for visualizing the global results, and for inspecting in detail results for specific training sets (functions) and examples (proteins). The visualization tools serve as a navigational aid through a large amount of sequence data and induced knowledge. They provided significant help in understanding both the significance and the underlying biological explanations of our successes and failures. Using these visualizations it was possible to efficiently identify weaknesses of the modular sequence representations and induction algorithms which suggest better learning strategies. The context in which our data mining visualization toolkit was developed was the problem of accurately predicting enzyme function from protein sequence data. Previous work demonstrated that approximately 6% of enzyme protein sequences are likely to be assigned incorrect functions on the basis of sequence similarity alone. In order to test the hypothesis that more detailed sequence analysis using machine learning techniques and modular domain representations could address many of these failures, we designed a series of more than 250 experiments using information-theoretic decision tree induction and naive Bayesian learning on local sequence domain representations of problematic enzyme function classes. In more than half of these cases, our methods were able to perfectly discriminate among various possible functions of similar sequences. We developed and tested our visualization techniques on this application.
Location of core diagnostic information across various sequences in brain MRI and implications for efficiency of MRI scanner utilization.

PubMed

Sharma, Aseem; Chatterjee, Arindam; Goyal, Manu; Parsons, Matthew S; Bartel, Seth

2015-04-01

Targeting redundancy within MRI can improve its cost-effective utilization. We sought to quantify potential redundancy in our brain MRI protocols. In this retrospective review, we aggregated 207 consecutive adults who underwent brain MRI and reviewed their medical records to document clinical indication, core diagnostic information provided by MRI, and its clinical impact. Contributory imaging abnormalities constituted positive core diagnostic information whereas absence of imaging abnormalities constituted negative core diagnostic information. The senior author selected core sequences deemed sufficient for extraction of core diagnostic information. For validating core sequences selection, four readers assessed the relative ease of extracting core diagnostic information from the core sequences. Potential redundancy was calculated by comparing the average number of core sequences to the average number of sequences obtained. Scanning had been performed using 9.4±2.8 sequences over 37.3±12.3 minutes. Core diagnostic information was deemed extractable from 2.1±1.1 core sequences, with an assumed scanning time of 8.6±4.8 minutes, reflecting a potential redundancy of 74.5%±19.1%. Potential redundancy was least in scans obtained for treatment planning (14.9%±25.7%) and highest in scans obtained for follow-up of benign diseases (81.4%±12.6%). In 97.4% of cases, all four readers considered core diagnostic information to be either easily extractable from core sequences or the ease to be equivalent to that from the entire study. With only one MRI lacking clinical impact (0.48%), overutilization did not seem to contribute to potential redundancy. High potential redundancy that can be targeted for more efficient scanner utilization exists in brain MRI protocols.
SORTEZ: a relational translator for NCBI's ASN.1 database.

PubMed

Hart, K W; Searls, D B; Overton, G C

1994-07-01

The National Center for Biotechnology Information (NCBI) has created a database collection that includes several protein and nucleic acid sequence databases, a biosequence-specific subset of MEDLINE, as well as value-added information such as links between similar sequences. Information in the NCBI database is modeled in Abstract Syntax Notation 1 (ASN.1) an Open Systems Interconnection protocol designed for the purpose of exchanging structured data between software applications rather than as a data model for database systems. While the NCBI database is distributed with an easy-to-use information retrieval system, ENTREZ, the ASN.1 data model currently lacks an ad hoc query language for general-purpose data access. For that reason, we have developed a software package, SORTEZ, that transforms the ASN.1 database (or other databases with nested data structures) to a relational data model and subsequently to a relational database management system (Sybase) where information can be accessed through the relational query language, SQL. Because the need to transform data from one data model and schema to another arises naturally in several important contexts, including efficient execution of specific applications, access to multiple databases and adaptation to database evolution this work also serves as a practical study of the issues involved in the various stages of database transformation. We show that transformation from the ASN.1 data model to a relational data model can be largely automated, but that schema transformation and data conversion require considerable domain expertise and would greatly benefit from additional support tools.
Genome sequence of Phytophthora ramorum: implications for management

Treesearch

Brett Tyler; Sucheta Tripathy; Nik Grunwald; Kurt Lamour; Kelly Ivors; Matteo Garbelotto; Daniel Rokhsar; Nik Putnam; Igor Grigoriev; Jeffrey Boore

2006-01-01

A draft genome sequence has been determined for Phytophthora ramorum, together with a draft sequence of the soybean pathogen Phytophthora sojae. The P. ramorum genome was sequenced to a depth of 7-fold coverage, while the P. sojae genome was sequenced to a depth of 9-fold coverage. The genome...
SynTrack: DNA Assembly Workflow Management (SynTrack) v2.0.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

MENG, XIANWEI; SIMIRENKO, LISA

2016-12-01

SynTrack is a dynamic, workflow-driven data management system that tracks the DNA build process: Management of the hierarchical relationships of the DNA fragments; Monitoring of process tasks for the assembly of multiple DNA fragments into final constructs; Creations of vendor order forms with selectable building blocks. Organizing plate layouts barcodes for vendor/pcr/fusion/chewback/bioassay/glycerol/master plate maps (default/condensed); Creating or updating Pre-Assembly/Assembly process workflows with selected building blocks; Generating Echo pooling instructions based on plate maps; Tracking of building block orders, received and final assembled for delivering; Bulk updating of colony or PCR amplification information, fusion PCR and chewback results; Updating with QA/QCmore » outcome with .csv & .xlsx template files; Re-work assembly workflow enabled before and after sequencing validation; and Tracking of plate/well data changes and status updates and reporting of master plate status with QC outcomes.« less
A sequential factorial analysis approach to characterize the effects of uncertainties for supporting air quality management

NASA Astrophysics Data System (ADS)

Wang, S.; Huang, G. H.; Veawab, A.

2013-03-01

This study proposes a sequential factorial analysis (SFA) approach for supporting regional air quality management under uncertainty. SFA is capable not only of examining the interactive effects of input parameters, but also of analyzing the effects of constraints. When there are too many factors involved in practical applications, SFA has the advantage of conducting a sequence of factorial analyses for characterizing the effects of factors in a systematic manner. The factor-screening strategy employed in SFA is effective in greatly reducing the computational effort. The proposed SFA approach is applied to a regional air quality management problem for demonstrating its applicability. The results indicate that the effects of factors are evaluated quantitatively, which can help decision makers identify the key factors that have significant influence on system performance and explore the valuable information that may be veiled beneath their interrelationships.

Overview of the critical disaster management challenges faced during Van 2011 earthquakes.

PubMed

Tolon, Mert; Yazgan, Ufuk; Ural, Derin N; Goss, Kay C

2014-01-01

On October 23, 2011, a M7.2 earthquake caused damage in a widespread area in the Van province located in eastern Turkey. This strong earthquake was followed by a M5.7 earthquake on November 9, 2011. This sequence of damaging earthquakes led to 644 fatalities. The management during and after these earthquake disaster imposed many critical challenges. In this article, an overview of these challenges is presented based on the observations by the authors in the aftermath of this disaster. This article presents the characteristics of 2011 Van earthquakes. Afterward, the key information related to the four main phases (ie, preparedness, mitigation, response, and recovery) of the disaster in Van is presented. The potential strategies that can be taken to improve the disaster management practice are identified, and a set of recommendations are proposed to improve the existing situation.
Automated quantitative assessment of proteins' biological function in protein knowledge bases.

PubMed

Mayr, Gabriele; Lepperdinger, Günter; Lackner, Peter

2008-01-01

Primary protein sequence data are archived in databases together with information regarding corresponding biological functions. In this respect, UniProt/Swiss-Prot is currently the most comprehensive collection and it is routinely cross-examined when trying to unravel the biological role of hypothetical proteins. Bioscientists frequently extract single entries and further evaluate those on a subjective basis. In lieu of a standardized procedure for scoring the existing knowledge regarding individual proteins, we here report about a computer-assisted method, which we applied to score the present knowledge about any given Swiss-Prot entry. Applying this quantitative score allows the comparison of proteins with respect to their sequence yet highlights the comprehension of functional data. pfs analysis may be also applied for quality control of individual entries or for database management in order to rank entry listings.
NUREBASE: database of nuclear hormone receptors.

PubMed

Duarte, Jorge; Perrière, Guy; Laudet, Vincent; Robinson-Rechavi, Marc

2002-01-01

Nuclear hormone receptors are an abundant class of ligand activated transcriptional regulators, found in varying numbers in all animals. Based on our experience of managing the official nomenclature of nuclear receptors, we have developed NUREBASE, a database containing protein and DNA sequences, reviewed protein alignments and phylogenies, taxonomy and annotations for all nuclear receptors. The reviewed NUREBASE is completed by NUREBASE_DAILY, automatically updated every 24 h. Both databases are organized under a client/server architecture, with a client written in Java which runs on any platform. This client, named FamFetch, integrates a graphical interface allowing selection of families, and manipulation of phylogenies and alignments. NUREBASE sequence data is also accessible through a World Wide Web server, allowing complex queries. All information on accessing and installing NUREBASE may be found at http://www.ens-lyon.fr/LBMC/laudet/nurebase.html.
Management of information in distributed biomedical collaboratories.

PubMed

Keator, David B

2009-01-01

Organizing and annotating biomedical data in structured ways has gained much interest and focus in the last 30 years. Driven by decreases in digital storage costs and advances in genetics sequencing, imaging, electronic data collection, and microarray technologies, data is being collected at an alarming rate. The specialization of fields in biology and medicine demonstrates the need for somewhat different structures for storage and retrieval of data. For biologists, the need for structured information and integration across a number of domains drives development. For clinical researchers and hospitals, the need for a structured medical record accessible to, ideally, any medical practitioner who might require it during the course of research or patient treatment, patient confidentiality, and security are the driving developmental factors. Scientific data management systems generally consist of a few core services: a backend database system, a front-end graphical user interface, and an export/import mechanism or data interchange format to both get data into and out of the database and share data with collaborators. The chapter introduces some existing databases, distributed file systems, and interchange languages used within the biomedical research and clinical communities for scientific data management and exchange.
The Hawaiian Freshwater Algal Database (HfwADB): a laboratory LIMS and online biodiversity resource

PubMed Central

2012-01-01

Background Biodiversity databases serve the important role of highlighting species-level diversity from defined geographical regions. Databases that are specially designed to accommodate the types of data gathered during regional surveys are valuable in allowing full data access and display to researchers not directly involved with the project, while serving as a Laboratory Information Management System (LIMS). The Hawaiian Freshwater Algal Database, or HfwADB, was modified from the Hawaiian Algal Database to showcase non-marine algal specimens collected from the Hawaiian Archipelago by accommodating the additional level of organization required for samples including multiple species. Description The Hawaiian Freshwater Algal Database is a comprehensive and searchable database containing photographs and micrographs of samples and collection sites, geo-referenced collecting information, taxonomic data and standardized DNA sequence data. All data for individual samples are linked through unique 10-digit accession numbers (“Isolate Accession”), the first five of which correspond to the collection site (“Environmental Accession”). Users can search online for sample information by accession number, various levels of taxonomy, habitat or collection site. HfwADB is hosted at the University of Hawaii, and was made publicly accessible in October 2011. At the present time the database houses data for over 2,825 samples of non-marine algae from 1,786 collection sites from the Hawaiian Archipelago. These samples include cyanobacteria, red and green algae and diatoms, as well as lesser representation from some other algal lineages. Conclusions HfwADB is a digital repository that acts as a Laboratory Information Management System for Hawaiian non-marine algal data. Users can interact with the repository through the web to view relevant habitat data (including geo-referenced collection locations) and download images of collection sites, specimen photographs and micrographs, and DNA sequences. It is publicly available at http://algae.manoa.hawaii.edu/hfwadb/. PMID:23095476
[Study of human immunodeficiency virus transmission chains in Andalusia: analysis from baseline antiretroviral resistance sequences].

PubMed

Pérez-Parra, Santiago; Chueca-Porcuna, Natalia; Álvarez-Estevez, Marta; Pasquau, Juan; Omar, Mohamed; Collado, Antonio; Vinuesa, David; Lozano, Ana Belen; García-García, Federico

2015-11-01

Protease and reverse transcriptase HIV-1 sequences provide useful information for patient clinical management, as well as information on resistance to antiretrovirals. The aim of this study is to evaluate transmission events, transmitted drug resistance, and to georeference subtypes among newly diagnosed patients referred to our center. A study was conducted on 693 patients diagnosed between 2005 and 2012 in Southern Spain. Protease and reverse transcriptase sequences were obtained for resistance to cART analysis with Trugene(®) HIV Genotyping Kit (Siemens, NAD). MEGA 5.2, Neighbor-Joining, ArcGIS and REGA were used for subsequent analysis. The results showed 298 patients clustered into 77 different transmission events. Most of the clusters were formed by pairs (n=49), of men having sex with men (n=26), Spanish (n=37), and below 45 years of age (73.5%). Urban areas from Granada, and the coastal areas of Almeria and Granada showed the greatest subtype heterogeneity. Five clusters were formed by more than 10 patients, and 15 clusters had transmitted drug resistance. The study data demonstrate how the phylogenetic characterization of transmission clusters is a powerful tool to monitor the spread of HIV, and may contribute to design correct preventive measures to minimize it. Copyright © 2015 Elsevier España, S.L.U. y Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.
Novel mutation of FKBP10 in a pediatric patient with osteogenesis imperfecta type XI identified by clinical exome sequencing

PubMed Central

Velasco, Harvy Mauricio; Morales, Jessica L

2017-01-01

Osteogenesis imperfecta (OI) is a hereditary disease characterized by bone fragility caused by mutations in the proteins that support the formation of the extracellular matrix in the bone. The diagnosis of OI begins with clinical suspicion, from phenotypic findings at birth, low-impact fractures during childhood or family history that may lead to it. However, the variability in the semiology of the disease does not allow establishing an early diagnosis in all cases, and unfortunately, specific clinical data provided by the literature only report 28 patients with OI type XI. This information is limited and heterogeneous, and therefore, detailed information on the natural history of this disease is not yet available. This paper reports the case of a male patient who, despite undergoing multidisciplinary management, did not have a diagnosis for a long period of time, and could only be given one with the use of whole-exome sequencing. The use of the next-generation sequencing in patients with ultrarare genetic diseases, including skeletal dysplasias, should be justified when clear clinical criteria and an improvement in the quality of life of the patients and their families are intended while reducing economic and time costs. Thus, this case report corresponds to the 29th patient affected with OI type XI, and the 18th mutation in FKBP10, causative of this pathology. PMID:29158687
GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database

PubMed Central

2012-01-01

Background In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. The system presented here organizes in an ontological way and locally stores the sequence and annotation data contained in the GenBank primary database. Methods The GIDL architecture consists of a relational database and of an intelligent data loader software. The relational database schema is designed to manage biodiversity information (Molecular Biodiversity Database) and it is organized in four areas: MolecularData, Experiment, Collection and Taxonomy. The MolecularData area is inspired to an established standard in Generic Model Organism Databases, the Chado relational schema. The peculiarity of Chado, and also its strength, is the adoption of an ontological schema which makes use of the Sequence Ontology. The Intelligent Data Loader (IDL) component of GIDL is an Extract, Transform and Load software able to parse data, to discover hidden information in the GenBank entries and to populate the Molecular Biodiversity Database. The IDL is composed by three main modules: the Parser, able to parse GenBank flat files; the Reasoner, which automatically builds CLIPS facts mapping the biological knowledge expressed by the Sequence Ontology; the DBFiller, which translates the CLIPS facts into ordered SQL statements used to populate the database. In GIDL Semantic Web technologies have been adopted due to their advantages in data representation, integration and processing. Results and conclusions Entries coming from Virus (814,122), Plant (1,365,360) and Invertebrate (959,065) divisions of GenBank rel.180 have been loaded in the Molecular Biodiversity Database by GIDL. Our system, combining the Sequence Ontology and the Chado schema, allows a more powerful query expressiveness compared with the most commonly used sequence retrieval systems like Entrez or SRS. PMID:22536971
Molecular diagnostic experience of whole-exome sequencing in adult patients.

PubMed

Posey, Jennifer E; Rosenfeld, Jill A; James, Regis A; Bainbridge, Matthew; Niu, Zhiyv; Wang, Xia; Dhar, Shweta; Wiszniewski, Wojciech; Akdemir, Zeynep H C; Gambin, Tomasz; Xia, Fan; Person, Richard E; Walkiewicz, Magdalena; Shaw, Chad A; Sutton, V Reid; Beaudet, Arthur L; Muzny, Donna; Eng, Christine M; Yang, Yaping; Gibbs, Richard A; Lupski, James R; Boerwinkle, Eric; Plon, Sharon E

2016-07-01

Whole-exome sequencing (WES) is increasingly used as a diagnostic tool in medicine, but prior reports focus on predominantly pediatric cohorts with neurologic or developmental disorders. We describe the diagnostic yield and characteristics of WES in adults. We performed a retrospective analysis of consecutive WES reports for adults from a diagnostic laboratory. Phenotype composition was determined using Human Phenotype Ontology terms. Molecular diagnoses were reported for 17.5% (85/486) of adults, which is lower than that for a primarily pediatric population (25.2%; P = 0.0003); the diagnostic rate was higher (23.9%) for those 18-30 years of age compared to patients older than 30 years (10.4%; P = 0.0001). Dual Mendelian diagnoses contributed to 7% of diagnoses, revealing blended phenotypes. Diagnoses were more frequent among individuals with abnormalities of the nervous system, skeletal system, head/neck, and growth. Diagnostic rate was independent of family history information, and de novo mutations contributed to 61.4% of autosomal dominant diagnoses. Early WES experience in adults demonstrates molecular diagnoses in a substantial proportion of patients, informing clinical management, recurrence risk, and recommendations for relatives. A positive family history was not predictive, consistent with molecular diagnoses often revealed by de novo events, informing the Mendelian basis of genetic disease in adults.Genet Med 18 7, 678-685.
From sequencer to supercomputer: an automatic pipeline for managing and processing next generation sequencing data.

PubMed

Camerlengo, Terry; Ozer, Hatice Gulcin; Onti-Srinivasan, Raghuram; Yan, Pearlly; Huang, Tim; Parvin, Jeffrey; Huang, Kun

2012-01-01

Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina's Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.
Tree crops: Advances in insects and disease management

USDA-ARS?s Scientific Manuscript database

Advances in next-generation sequencing have enabled genome sequencing to be fast and affordable. Thus today researchers and industries can address new methods in pest and pathogen management. Biological control of insect pests that occur in large areas, such as forests and farming systems of fruit t...
Sequencing actions: an information-search study of tradeoffs of priorities against spatiotemporal constraints.

PubMed

Gärling, T

1996-09-01

How people choose between sequences of actions was investigated in an everyday errand-planning task. In this task subjects chose the preferred sequence of performing a number of errands in a fictitious environment. Two experiments were conducted with undergraduate students serving as subjects. One group searched information about each alternative. The same information was directly available to another group. In Experiment 1 the results showed that for two errands subjects took into account all attributes describing the errands, thus suggesting a tradeoff between priority, wait time, and travel distance with priority being the most important. Consistent with this finding predominantly intraalternative information search was observed. These results were replicated in Experiment 2 for three errands. In addition choice outcomes, information search, and sequence of responding suggested that for more than two actions sequence choices are made in stages.
Comparison of mitochondrial DNA control region sequence and microsatellite DNA analyses in estimating population structure and gene flow rates in Atlantic sturgeon Acipenser oxyrinchus

USGS Publications Warehouse

Wirgin, I.; Waldman, J.; Stabile, J.; Lubinski, B.; King, T.

2002-01-01

Atlantic sturgeon Acipenser oxyrinchus is large, long-lived, and anadromous with subspecies distributed along the Atlantic (A. oxyrinchus oxyrinchus) and Gulf of Mexico (A. o. desotoi) coasts of North America. Although it is not certain if extirpation of some population units has occurred, because of anthropogenic influences abundances of all populations are low compared with historical levels. Informed management of A. oxyrinchus demands a detailed knowledge of its population structure, levels of genetic diversity, and likelihood to home to natal rivers. We compared the use of mitochondrial DNA (mtDNA) control region sequence and microsatellite nuclear DNA (nDNA) analyses in identifying the stock structure and homing fidelity of Atlantic and Gulf coast populations of A. oxyrinchus. The approaches were concordant in that they revealed moderate to high levels of genetic diversity and suggested that populations of Atlantic sturgeon are highly structured. At least six genetically distinct management units were detected using the two approaches among the rivers surveyed. Mitochondrial DNA sequences revealed a significant cline in haplotype diversity along the Atlantic coast with monomorphism observed in Canadian populations. High levels of nDNA diversity were also observed among populations along the Atlantic coast, including the two Canadian populations, probably resulting from the more rapid rate of mutational and evolutionary change at microsatellite loci. Estimates of gene flow among populations were similar between both approaches with the exception that because of mtDNA monomorphism in Canadian populations, gene flow estimates between them were unobtainable. Analyses of both genomes provided high resolution and confidence in characterizing the population structure of Atlantic sturgeon. Microsatellite analysis was particularly informative in delineating population structure in rivers that were recently glaciated and may prove diagnostic in rivers that are geographically proximal along the south Atlantic coast of the US.
A Chado case study: an ontology-based modular schema for representing genome-associated biological information.

PubMed

Mungall, Christopher J; Emmert, David B

2007-07-01

A few years ago, FlyBase undertook to design a new database schema to store Drosophila data. It would fully integrate genomic sequence and annotation data with bibliographic, genetic, phenotypic and molecular data from the literature representing a distillation of the first 100 years of research on this major animal model system. In developing this new integrated schema, FlyBase also made a commitment to ensure that its design was generic, extensible and available as open source, so that it could be employed as the core schema of any model organism data repository, thereby avoiding redundant software development and potentially increasing interoperability. Our question was whether we could create a relational database schema that would be successfully reused. Chado is a relational database schema now being used to manage biological knowledge for a wide variety of organisms, from human to pathogens, especially the classes of information that directly or indirectly can be associated with genome sequences or the primary RNA and protein products encoded by a genome. Biological databases that conform to this schema can interoperate with one another, and with application software from the Generic Model Organism Database (GMOD) toolkit. Chado is distinctive because its design is driven by ontologies. The use of ontologies (or controlled vocabularies) is ubiquitous across the schema, as they are used as a means of typing entities. The Chado schema is partitioned into integrated subschemas (modules), each encapsulating a different biological domain, and each described using representations in appropriate ontologies. To illustrate this methodology, we describe here the Chado modules used for describing genomic sequences. GMOD is a collaboration of several model organism database groups, including FlyBase, to develop a set of open-source software for managing model organism data. The Chado schema is freely distributed under the terms of the Artistic License (http://www.opensource.org/licenses/artistic-license.php) from GMOD (www.gmod.org).
Misconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants.

PubMed

Eaton, Deren A R; Spriggs, Elizabeth L; Park, Brian; Donoghue, Michael J

2017-05-01

Restriction-site associated DNA (RAD) sequencing and related methods rely on the conservation of enzyme recognition sites to isolate homologous DNA fragments for sequencing, with the consequence that mutations disrupting these sites lead to missing information. There is thus a clear expectation for how missing data should be distributed, with fewer loci recovered between more distantly related samples. This observation has led to a related expectation: that RAD-seq data are insufficiently informative for resolving deeper scale phylogenetic relationships. Here we investigate the relationship between missing information among samples at the tips of a tree and information at edges within it. We re-analyze and review the distribution of missing data across ten RAD-seq data sets and carry out simulations to determine expected patterns of missing information. We also present new empirical results for the angiosperm clade Viburnum (Adoxaceae, with a crown age >50 Ma) for which we examine phylogenetic information at different depths in the tree and with varied sequencing effort. The total number of loci, the proportion that are shared, and phylogenetic informativeness varied dramatically across the examined RAD-seq data sets. Insufficient or uneven sequencing coverage accounted for similar proportions of missing data as dropout from mutation-disruption. Simulations reveal that mutation-disruption, which results in phylogenetically distributed missing data, can be distinguished from the more stochastic patterns of missing data caused by low sequencing coverage. In Viburnum, doubling sequencing coverage nearly doubled the number of parsimony informative sites, and increased by >10X the number of loci with data shared across >40 taxa. Our analysis leads to a set of practical recommendations for maximizing phylogenetic information in RAD-seq studies. [hierarchical redundancy; phylogenetic informativeness; quartet informativeness; Restriction-site associated DNA (RAD) sequencing; sequencing coverage; Viburnum.]. © The authors 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please e-mail: journals.permission@oup.com.
mtDNAmanager: a Web-based tool for the management and quality analysis of mitochondrial DNA control-region sequences

PubMed Central

Lee, Hwan Young; Song, Injee; Ha, Eunho; Cho, Sung-Bae; Yang, Woo Ick; Shin, Kyoung-Jin

2008-01-01

Background For the past few years, scientific controversy has surrounded the large number of errors in forensic and literature mitochondrial DNA (mtDNA) data. However, recent research has shown that using mtDNA phylogeny and referring to known mtDNA haplotypes can be useful for checking the quality of sequence data. Results We developed a Web-based bioinformatics resource "mtDNAmanager" that offers a convenient interface supporting the management and quality analysis of mtDNA sequence data. The mtDNAmanager performs computations on mtDNA control-region sequences to estimate the most-probable mtDNA haplogroups and retrieves similar sequences from a selected database. By the phased designation of the most-probable haplogroups (both expected and estimated haplogroups), mtDNAmanager enables users to systematically detect errors whilst allowing for confirmation of the presence of clear key diagnostic mutations and accompanying mutations. The query tools of mtDNAmanager also facilitate database screening with two options of "match" and "include the queried nucleotide polymorphism". In addition, mtDNAmanager provides Web interfaces for users to manage and analyse their own data in batch mode. Conclusion The mtDNAmanager will provide systematic routines for mtDNA sequence data management and analysis via easily accessible Web interfaces, and thus should be very useful for population, medical and forensic studies that employ mtDNA analysis. mtDNAmanager can be accessed at . PMID:19014619
The MedSeq Project: a randomized trial of integrating whole genome sequencing into clinical medicine.

PubMed

Vassy, Jason L; Lautenbach, Denise M; McLaughlin, Heather M; Kong, Sek Won; Christensen, Kurt D; Krier, Joel; Kohane, Isaac S; Feuerman, Lindsay Z; Blumenthal-Barby, Jennifer; Roberts, J Scott; Lehmann, Lisa Soleymani; Ho, Carolyn Y; Ubel, Peter A; MacRae, Calum A; Seidman, Christine E; Murray, Michael F; McGuire, Amy L; Rehm, Heidi L; Green, Robert C

2014-03-20

Whole genome sequencing (WGS) is already being used in certain clinical and research settings, but its impact on patient well-being, health-care utilization, and clinical decision-making remains largely unstudied. It is also unknown how best to communicate sequencing results to physicians and patients to improve health. We describe the design of the MedSeq Project: the first randomized trials of WGS in clinical care. This pair of randomized controlled trials compares WGS to standard of care in two clinical contexts: (a) disease-specific genomic medicine in a cardiomyopathy clinic and (b) general genomic medicine in primary care. We are recruiting 8 to 12 cardiologists, 8 to 12 primary care physicians, and approximately 200 of their patients. Patient participants in both the cardiology and primary care trials are randomly assigned to receive a family history assessment with or without WGS. Our laboratory delivers a genome report to physician participants that balances the needs to enhance understandability of genomic information and to convey its complexity. We provide an educational curriculum for physician participants and offer them a hotline to genetics professionals for guidance in interpreting and managing their patients' genome reports. Using varied data sources, including surveys, semi-structured interviews, and review of clinical data, we measure the attitudes, behaviors and outcomes of physician and patient participants at multiple time points before and after the disclosure of these results. The impact of emerging sequencing technologies on patient care is unclear. We have designed a process of interpreting WGS results and delivering them to physicians in a way that anticipates how we envision genomic medicine will evolve in the near future. That is, our WGS report provides clinically relevant information while communicating the complexity and uncertainty of WGS results to physicians and, through physicians, to their patients. This project will not only illuminate the impact of integrating genomic medicine into the clinical care of patients but also inform the design of future studies. ClinicalTrials.gov identifier NCT01736566.
Elman RNN based classification of proteins sequences on account of their mutual information.

PubMed

Mishra, Pooja; Nath Pandey, Paras

2012-10-21

In the present work we have employed the method of estimating residue correlation within the protein sequences, by using the mutual information (MI) of adjacent residues, based on structural and solvent accessibility properties of amino acids. The long range correlation between nonadjacent residues is improved by constructing a mutual information vector (MIV) for a single protein sequence, like this each protein sequence is associated with its corresponding MIVs. These MIVs are given to Elman RNN to obtain the classification of protein sequences. The modeling power of MIV was shown to be significantly better, giving a new approach towards alignment free classification of protein sequences. We also conclude that sequence structural and solvent accessible property based MIVs are better predictor. Copyright © 2012 Elsevier Ltd. All rights reserved.
Image encryption using random sequence generated from generalized information domain

NASA Astrophysics Data System (ADS)

Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu

2016-05-01

A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.
Human genome. 1993 Program report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

1994-03-01

The purpose of this report is to update the Human Genome 1991-92 Program Report and provide new information on the DOE genome program to researchers, program managers, other government agencies, and the interested public. This FY 1993 supplement includes abstracts of 60 new or renewed projects and listings of 112 continuing and 28 completed projects. These two reports, taken together, present the most complete published view of the DOE Human Genome Program through FY 1993. Research is progressing rapidly toward 15-year goals of mapping and sequencing the DNA of each of the 24 different human chromosomes.

The Canterbury Tales: Lessons from the Canterbury Earthquake Sequence to Inform Better Public Communication Models

NASA Astrophysics Data System (ADS)

McBride, S.; Tilley, E. N.; Johnston, D. M.; Becker, J.; Orchiston, C.

2015-12-01

This research evaluates the public education earthquake information prior to the Canterbury Earthquake sequence (2010-present), and examines communication learnings to create recommendations for improvement in implementation for these types of campaigns in future. The research comes from a practitioner perspective of someone who worked on these campaigns in Canterbury prior to the Earthquake Sequence and who also was the Public Information Manager Second in Command during the earthquake response in February 2011. Documents, specifically those addressing seismic risk, that were created prior to the earthquake sequence, were analyzed, using a "best practice matrix" created by the researcher, for how closely these aligned to best practice academic research. Readability tests and word counts are also employed to assist with triangulation of the data as was practitioner involvement. This research also outlines the lessons learned by practitioners and explores their experiences in regards to creating these materials and how they perceive these now, given all that has happened since the inception of the booklets. The findings from the research showed these documents lacked many of the attributes of best practice. The overly long, jargon filled text had little positive outcome expectancy messages. This probably would have failed to persuade anyone that earthquakes were a real threat in Canterbury. Paradoxically, it is likely these booklets may have created fatalism in publics who read the booklets. While the overall intention was positive, for scientists to explain earthquakes, tsunami, landslides and other risks to encourage the public to prepare for these events, the implementation could be greatly improved. This final component of the research highlights points of improvement for implementation for more successful campaigns in future. The importance of preparedness and science information campaigns can be not only in preparing the population but also into development of crisis communication plans. These plans are prepared in advance of a major emergency and symbiotic development of strategies, messages, themes and organizational structures in the preparedness stage can impact successful crisis communication plan implementation during an emergency.
A national survey to define a new core curriculum to prepare physicians for managed care practice.

PubMed

Meyer, G S; Potter, A; Gary, N

1997-08-01

All levels of medical education will require modification to address the challenges in health care practice brought about by managed care. Because preparation for practice in a managed care environment has received insufficient attention, and because the need for change is so great, in 1995 the authors sought information from a variety of sources to serve as a basis for identifying the core curricular components and the staging of these components in the medical education process. This research effort consisted of a survey of 125 U.S. medical school curriculum deans (or equivalent school representatives); four focus groups of managed care practitioners, administrators, educators, and residents; and a survey of a national sample of physicians and medical directors. Findings indicate that almost all the 91 responding school representatives recognized the importance of revising their curricula to meet the managed care challenge and that the majority either had or were developing programs to train students for practice in managed care environments. The focus groups identified a core set of competencies for managed care practice, although numbers differed on whether the classroom or a managed care setting was the best place to teach the components of a new curriculum. Although medical directors and staff physicians differed with respect to the relative levels of importance of these competencies, the findings suggest that before medical school, training should focus on communication and interpersonal skills, information systems, and customer relations; during medical school, on clinical epidemiology, quality assurance, risk management, and decision analysis; during residency, on utilization management, managed care essentials, and multidisciplinary team building; and after residency, on a review of customer relations, communication skills, and utilization management. The authors conclude that a core curriculum and its sequencing can be identified, that the majority of curricular components exist but in s some cases needed to be modified to more clearly relate to managed care practice, and that their findings may provide a useful starting point for making decisions about curricular reform.
Expect the unexpected: screening for secondary findings in clinical genomics research.

PubMed

Mackley, Michael P; Capps, Benjamin

2017-06-01

Due to decreasing cost, and increasing speed and precision, genomic sequencing in research is resulting in the generation of vast amounts of genetic data. The question of how to manage that information has been an area of significant debate. In particular, there has been much discussion around the issue of 'secondary findings' (SF)-findings unrelated to the research that have diagnostic significance. The following includes ethical commentaries, guidelines and policies in respect to large-scale clinical genomics studies. Research participant autonomy and their informed consent are paramount-policies around SF must be made clear and participants must have the choice as to which results they wish to receive, if any. While many agree that clinically 'actionable' findings should be returned, some question whether they should be actively sought within a research protocol. SF present challenges to a growing field; diverse policies around their management have the potential to hinder collaboration and future research. The impact of returning SF and accurate estimates of their clinical utility are needed to inform future protocol design. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas

We present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a Metagenome-Assembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Gene Sequencemore » (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.« less
Effects of Sequences of Cognitions on Group Performance Over Time

PubMed Central

Molenaar, Inge; Chiu, Ming Ming

2017-01-01

Extending past research showing that sequences of low cognitions (low-level processing of information) and high cognitions (high-level processing of information through questions and elaborations) influence the likelihoods of subsequent high and low cognitions, this study examines whether sequences of cognitions are related to group performance over time; 54 primary school students (18 triads) discussed and wrote an essay about living in another country (32,375 turns of talk). Content analysis and statistical discourse analysis showed that within each lesson, groups with more low cognitions or more sequences of low cognition followed by high cognition added more essay words. Groups with more high cognitions, sequences of low cognition followed by low cognition, or sequences of high cognition followed by an action followed by low cognition, showed different words and sequences, suggestive of new ideas. The links between cognition sequences and group performance over time can inform facilitation and assessment of student discussions. PMID:28490854
Effects of Sequences of Cognitions on Group Performance Over Time.

PubMed

Molenaar, Inge; Chiu, Ming Ming

2017-04-01

Extending past research showing that sequences of low cognitions (low-level processing of information) and high cognitions (high-level processing of information through questions and elaborations) influence the likelihoods of subsequent high and low cognitions, this study examines whether sequences of cognitions are related to group performance over time; 54 primary school students (18 triads) discussed and wrote an essay about living in another country (32,375 turns of talk). Content analysis and statistical discourse analysis showed that within each lesson, groups with more low cognitions or more sequences of low cognition followed by high cognition added more essay words. Groups with more high cognitions, sequences of low cognition followed by low cognition, or sequences of high cognition followed by an action followed by low cognition, showed different words and sequences, suggestive of new ideas. The links between cognition sequences and group performance over time can inform facilitation and assessment of student discussions.
Discovery and information-theoretic characterization of transcription factor binding sites that act cooperatively.

PubMed

Clifford, Jacob; Adami, Christoph

2015-09-02

Transcription factor binding to the surface of DNA regulatory regions is one of the primary causes of regulating gene expression levels. A probabilistic approach to model protein-DNA interactions at the sequence level is through position weight matrices (PWMs) that estimate the joint probability of a DNA binding site sequence by assuming positional independence within the DNA sequence. Here we construct conditional PWMs that depend on the motif signatures in the flanking DNA sequence, by conditioning known binding site loci on the presence or absence of additional binding sites in the flanking sequence of each site's locus. Pooling known sites with similar flanking sequence patterns allows for the estimation of the conditional distribution function over the binding site sequences. We apply our model to the Dorsal transcription factor binding sites active in patterning the Dorsal-Ventral axis of Drosophila development. We find that those binding sites that cooperate with nearby Twist sites on average contain about 0.5 bits of information about the presence of Twist transcription factor binding sites in the flanking sequence. We also find that Dorsal binding site detectors conditioned on flanking sequence information make better predictions about what is a Dorsal site relative to background DNA than detection without information about flanking sequence features.
Standardization and quality management in next-generation sequencing.

PubMed

Endrullat, Christoph; Glökler, Jörn; Franke, Philipp; Frohme, Marcus

2016-09-01

DNA sequencing continues to evolve quickly even after > 30 years. Many new platforms suddenly appeared and former established systems have vanished in almost the same manner. Since establishment of next-generation sequencing devices, this progress gains momentum due to the continually growing demand for higher throughput, lower costs and better quality of data. In consequence of this rapid development, standardized procedures and data formats as well as comprehensive quality management considerations are still scarce. Here, we listed and summarized current standardization efforts and quality management initiatives from companies, organizations and societies in form of published studies and ongoing projects. These comprise on the one hand quality documentation issues like technical notes, accreditation checklists and guidelines for validation of sequencing workflows. On the other hand, general standard proposals and quality metrics are developed and applied to the sequencing workflow steps with the main focus on upstream processes. Finally, certain standard developments for downstream pipeline data handling, processing and storage are discussed in brief. These standardization approaches represent a first basis for continuing work in order to prospectively implement next-generation sequencing in important areas such as clinical diagnostics, where reliable results and fast processing is crucial. Additionally, these efforts will exert a decisive influence on traceability and reproducibility of sequence data.
A FASTQ compressor based on integer-mapped k-mer indexing for biologist.

PubMed

Zhang, Yeting; Patel, Khyati; Endrawis, Tony; Bowers, Autumn; Sun, Yazhou

2016-03-15

Next generation sequencing (NGS) technologies have gained considerable popularity among biologists. For example, RNA-seq, which provides both genomic and functional information, has been widely used by recent functional and evolutionary studies, especially in non-model organisms. However, storing and transmitting these large data sets (primarily in FASTQ format) have become genuine challenges, especially for biologists with little informatics experience. Data compression is thus a necessity. KIC, a FASTQ compressor based on a new integer-mapped k-mer indexing method, was developed (available at http://www.ysunlab.org/kic.jsp). It offers high compression ratio on sequence data, outstanding user-friendliness with graphic user interfaces, and proven reliability. Evaluated on multiple large RNA-seq data sets from both human and plants, it was found that the compression ratio of KIC had exceeded all major generic compressors, and was comparable to those of the latest dedicated compressors. KIC enables researchers with minimal informatics training to take advantage of the latest sequence compression technologies, easily manage large FASTQ data sets, and reduce storage and transmission cost. Copyright © 2015 Elsevier B.V. All rights reserved.
Integrating RNA sequencing into neuro-oncology practice.

PubMed

Rogawski, David S; Vitanza, Nicholas A; Gauthier, Angela C; Ramaswamy, Vijay; Koschmann, Carl

2017-11-01

Malignant tumors of the central nervous system (CNS) cause substantial morbidity and mortality, yet efforts to optimize chemo- and radiotherapy have largely failed to improve dismal prognoses. Over the past decade, RNA sequencing (RNA-seq) has emerged as a powerful tool to comprehensively characterize the transcriptome of CNS tumor cells in one high-throughput step, leading to improved understanding of CNS tumor biology and suggesting new routes for targeted therapies. RNA-seq has been instrumental in improving the diagnostic classification of brain tumors, characterizing oncogenic fusion genes, and shedding light on intratumor heterogeneity. Currently, RNA-seq is beginning to be incorporated into regular neuro-oncology practice in the form of precision neuro-oncology programs, which use information from tumor sequencing to guide implementation of personalized targeted therapies. These programs show great promise in improving patient outcomes for tumors where single agent trials have been ineffective. As RNA-seq is a relatively new technique, many further applications yielding new advances in CNS tumor research and management are expected in the coming years. Copyright © 2017 Elsevier Inc. All rights reserved.
Information on a Major New Initiative: Mapping and Sequencing the Human Genome (1986 DOE Memorandum)

DOE R&D Accomplishments Database

DeLisi, Charles (Associate Director, Health and Environmental Research, DOE Office of Energy Research)

1986-05-06

In the history of the Human Genome Program, Dr. Charles DeLisi and Dr. Alvin Trivelpiece of the Department of Energy (DOE) were instrumental in moving the seeds of the program forward. This May 1986 memo from DeLisi to Trivelpiece, Director of DOE's Office of Energy Research, documents this fact. Following the March 1986 Santa Fe workshop on the subject of mapping and sequencing the human genome, DeLisi's memo outlines workshop conclusions, explains the relevance of this project to DOE and the importance of the Department's laboratories and capabilities, notes the critical experience of DOE in managing projects of this scale and potential magnitude, and recognizes the fact that the project will impact biomedical science in ways which could not be fully anticipated at the time. Subsequently, program guidance was further sought from the DOE Health Effects Research Advisory Committee (HERAC) and the April 1987 HERAC report recommended that DOE and the nation commit to a large, multidisciplinary, scientific and technological undertaking to map and sequence the human genome.
Diverse Array of New Viral Sequences Identified in Worldwide Populations of the Asian Citrus Psyllid (Diaphorina citri) Using Viral Metagenomics

PubMed Central

Nouri, Shahideh; Salem, Nidá; Nigg, Jared C.

2015-01-01

ABSTRACT The Asian citrus psyllid, Diaphorina citri, is the natural vector of the causal agent of Huanglongbing (HLB), or citrus greening disease. Together; HLB and D. citri represent a major threat to world citrus production. As there is no cure for HLB, insect vector management is considered one strategy to help control the disease, and D. citri viruses might be useful. In this study, we used a metagenomic approach to analyze viral sequences associated with the global population of D. citri. By sequencing small RNAs and the transcriptome coupled with bioinformatics analysis, we showed that the virus-like sequences of D. citri are diverse. We identified novel viral sequences belonging to the picornavirus superfamily, the Reoviridae, Parvoviridae, and Bunyaviridae families, and an unclassified positive-sense single-stranded RNA virus. Moreover, a Wolbachia prophage-related sequence was identified. This is the first comprehensive survey to assess the viral community from worldwide populations of an agricultural insect pest. Our results provide valuable information on new putative viruses, some of which may have the potential to be used as biocontrol agents. IMPORTANCE Insects have the most species of all animals, and are hosts to, and vectors of, a great variety of known and unknown viruses. Some of these most likely have the potential to be important fundamental and/or practical resources. In this study, we used high-throughput next-generation sequencing (NGS) technology and bioinformatics analysis to identify putative viruses associated with Diaphorina citri, the Asian citrus psyllid. D. citri is the vector of the bacterium causing Huanglongbing (HLB), currently the most serious threat to citrus worldwide. Here, we report several novel viral sequences associated with D. citri. PMID:26676774
Diverse Array of New Viral Sequences Identified in Worldwide Populations of the Asian Citrus Psyllid (Diaphorina citri) Using Viral Metagenomics.

PubMed

Nouri, Shahideh; Salem, Nidá; Nigg, Jared C; Falk, Bryce W

2015-12-16

The Asian citrus psyllid, Diaphorina citri, is the natural vector of the causal agent of Huanglongbing (HLB), or citrus greening disease. Together; HLB and D. citri represent a major threat to world citrus production. As there is no cure for HLB, insect vector management is considered one strategy to help control the disease, and D. citri viruses might be useful. In this study, we used a metagenomic approach to analyze viral sequences associated with the global population of D. citri. By sequencing small RNAs and the transcriptome coupled with bioinformatics analysis, we showed that the virus-like sequences of D. citri are diverse. We identified novel viral sequences belonging to the picornavirus superfamily, the Reoviridae, Parvoviridae, and Bunyaviridae families, and an unclassified positive-sense single-stranded RNA virus. Moreover, a Wolbachia prophage-related sequence was identified. This is the first comprehensive survey to assess the viral community from worldwide populations of an agricultural insect pest. Our results provide valuable information on new putative viruses, some of which may have the potential to be used as biocontrol agents. Insects have the most species of all animals, and are hosts to, and vectors of, a great variety of known and unknown viruses. Some of these most likely have the potential to be important fundamental and/or practical resources. In this study, we used high-throughput next-generation sequencing (NGS) technology and bioinformatics analysis to identify putative viruses associated with Diaphorina citri, the Asian citrus psyllid. D. citri is the vector of the bacterium causing Huanglongbing (HLB), currently the most serious threat to citrus worldwide. Here, we report several novel viral sequences associated with D. citri. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Drinking from the Fire Hose: Why the Flight Management System Can Be Hard to Train and Difficult to Use

NASA Technical Reports Server (NTRS)

Sherry, Lance; Feary, Michael; Polson, Peter; Fennell, Karl

2003-01-01

The Flight Management Computer (FMC) and its interface, the Multi-function Control and Display Unit (MCDU) have been identified by researchers and airlines as difficult to train and use. Specifically, airline pilots have described the "drinking from the fire-hose" effect during training. Previous research has identified memorized action sequences as a major factor in a user s ability to learn and operate complex devices. This paper discusses the use of a method to examine the quantity of memorized action sequences required to perform a sample of 102 tasks, using features of the Boeing 777 Flight Management Computer Interface. The analysis identified a large number of memorized action sequences that must be learned during training and then recalled during line operations. Seventy-five percent of the tasks examined require recall of at least one memorized action sequence. Forty-five percent of the tasks require recall of a memorized action sequence and occur infrequently. The large number of memorized action sequences may provide an explanation for the difficulties in training and usage of the automation. Based on these findings, implications for training and the design of new user-interfaces are discussed.
Bone marrow invasion in multiple myeloma and metastatic disease.

PubMed

Vilanova, J C; Luna, A

2016-04-01

Magnetic resonance imaging (MRI) of the spine is the imaging study of choice for the management of bone marrow disease. MRI sequences enable us to integrate structural and functional information for detecting, staging, and monitoring the response the treatment of multiple myeloma and bone metastases in the spine. Whole-body MRI has been incorporated into different guidelines as the technique of choice for managing multiple myeloma and metastatic bone disease. Normal physiological changes in the yellow and red bone marrow represent a challenge in analyses to differentiate clinically significant findings from those that are not clinically significant. This article describes the findings for normal bone marrow, variants, and invasive processes in multiple myeloma and bone metastases. Copyright © 2015 SERAM. Published by Elsevier España, S.L.U. All rights reserved.
A Web Geographic Information System to share data and explorative analysis tools: The application to West Nile disease in the Mediterranean basin.

PubMed

Savini, Lara; Tora, Susanna; Di Lorenzo, Alessio; Cioci, Daniela; Monaco, Federica; Polci, Andrea; Orsini, Massimiliano; Calistri, Paolo; Conte, Annamaria

2018-01-01

In the last decades an increasing number of West Nile Disease cases was observed in equines and humans in the Mediterranean basin and surveillance systems are set up in numerous countries to manage and control the disease. The collection, storage and distribution of information on the spread of the disease becomes important for a shared intervention and control strategy. To this end, a Web Geographic Information System has been developed and disease data, climatic and environmental remote sensed data, full genome sequences of selected isolated strains are made available. This paper describes the Disease Monitoring Dashboard (DMD) web system application, the tools available for the preliminary analysis on climatic and environmental factors and the other interactive tools for epidemiological analysis. WNV occurrence data are collected from multiple official and unofficial sources. Whole genome sequences and metadata of WNV strains are retrieved from public databases or generated in the framework of the Italian surveillance activities. Climatic and environmental data are provided by NASA website. The Geographical Information System is composed by Oracle 10g Database and ESRI ArcGIS Server 10.03; the web mapping client application is developed with the ArcGIS API for Javascript and Phylocanvas library to facilitate and optimize the mash-up approach. ESRI ArcSDE 10.1 has been used to store spatial data. The DMD application is accessible through a generic web browser at https://netmed.izs.it/networkMediterraneo/. The system collects data through on-line forms and automated procedures and visualizes data as interactive graphs, maps and tables. The spatial and temporal dynamic visualization of disease events is managed by a time slider that returns results on both map and epidemiological curve. Climatic and environmental data can be associated to cases through python procedures and downloaded as Excel files. The system compiles multiple datasets through user-friendly web tools; it integrates entomological, veterinary and human surveillance, molecular information on pathogens and environmental and climatic data. The principal result of the DMD development is the transfer and dissemination of knowledge and technologies to develop strategies for integrated prevention and control measures of animal and human diseases.
LookSeq: a browser-based viewer for deep sequencing data.

PubMed

Manske, Heinrich Magnus; Kwiatkowski, Dominic P

2009-11-01

Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.
Evaluation: A Qualitative Pilot Study of Novel Information Technology Infrastructure to Communicate Genetic Variant Updates.

PubMed

Klinkenberg-Ramirez, Stephanie; Neri, Pamela M; Volk, Lynn A; Samaha, Sara J; Newmark, Lisa P; Pollard, Stephanie; Varugheese, Matthew; Baxter, Samantha; Aronson, Samuel J; Rehm, Heidi L; Bates, David W

2016-01-01

Partners HealthCare Personalized Medicine developed GeneInsight Clinic (GIC), a tool designed to communicate updated variant information from laboratory geneticists to treating clinicians through automated alerts, categorized by level of variant interpretation change. The study aimed to evaluate feedback from the initial users of the GIC, including the advantages and challenges to receiving this variant information and using this technology at the point of care. Healthcare professionals from two clinics that ordered genetic testing for cardiomyopathy and related disorders were invited to participate in one-hour semi-structured interviews and/ or a one-hour focus group. Using a Grounded Theory approach, transcript concepts were coded and organized into themes. Two genetic counselors and two physicians from two treatment clinics participated in individual interviews. Focus group participants included one genetic counselor and four physicians. Analysis resulted in 8 major themes related to structuring and communicating variant knowledge, GIC's impact on the clinic, and suggestions for improvements. The interview analysis identified longitudinal patient care, family data, and growth in genetic testing content as potential challenges to optimization of the GIC infrastructure. Participants agreed that GIC implementation increased efficiency and effectiveness of the clinic through increased access to genetic variant information at the point of care. Development of information technology (IT) infrastructure to aid in the organization and management of genetic variant knowledge will be critical as the genetic field moves towards whole exome and whole genome sequencing. Findings from this study could be applied to future development of IT support for genetic variant knowledge management that would serve to improve clinicians' ability to manage and care for patients.
Disclosing medical mistakes: a communication management plan for physicians.

PubMed

Petronio, Sandra; Torke, Alexia; Bosslet, Gabriel; Isenberg, Steven; Wocial, Lucia; Helft, Paul R

2013-01-01

There is a growing consensus that disclosure of medical mistakes is ethically and legally appropriate, but such disclosures are made difficult by medical traditions of concern about medical malpractice suits and by physicians' own emotional reactions. Because the physician may have compelling reasons both to keep the information private and to disclose it to the patient or family, these situations can be conceptualized as privacy dilemmas. These dilemmas may create barriers to effectively addressing the mistake and its consequences. Although a number of interventions exist to address privacy dilemmas that physicians face, current evidence suggests that physicians tend to be slow to adopt the practice of disclosing medical mistakes. This discussion proposes a theoretically based, streamlined, two-step plan that physicians can use as an initial guide for conversations with patients about medical mistakes. The mistake disclosure management plan uses the communication privacy management theory. The steps are 1) physician preparation, such as talking about the physician's emotions and seeking information about the mistake, and 2) use of mistake disclosure strategies that protect the physician-patient relationship. These include the optimal timing, context of disclosure delivery, content of mistake messages, sequencing, and apology. A case study highlighted the disclosure process. This Mistake Disclosure Management Plan may help physicians in the early stages after mistake discovery to prepare for the initial disclosure of a medical mistakes. The next step is testing implementation of the procedures suggested.
Haplotype estimation using sequencing reads.

PubMed

Delaneau, Olivier; Howie, Bryan; Cox, Anthony J; Zagury, Jean-François; Marchini, Jonathan

2013-10-03

High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

Genetic Diversity and Population Structure of F3:6 Nebraska Winter Wheat Genotypes Using Genotyping-By-Sequencing.

PubMed

Eltaher, Shamseldeen; Sallam, Ahmed; Belamkar, Vikas; Emara, Hamdy A; Nower, Ahmed A; Salem, Khaled F M; Poland, Jesse; Baenziger, Peter S

2018-01-01

The availability of information on the genetic diversity and population structure in wheat ( Triticum aestivum L.) breeding lines will help wheat breeders to better use their genetic resources and manage genetic variation in their breeding program. The recent advances in sequencing technology provide the opportunity to identify tens or hundreds of thousands of single nucleotide polymorphism (SNPs) in large genome species (e.g., wheat). These SNPs can be utilized for understanding genetic diversity and performing genome wide association studies (GWAS) for complex traits. In this study, the genetic diversity and population structure were investigated in a set of 230 genotypes (F 3:6 ) derived from various crosses as a prerequisite for GWAS and genomic selection. Genotyping-by-sequencing provided 25,566 high-quality SNPs. The polymorphism information content (PIC) across chromosomes ranged from 0.09 to 0.37 with an average of 0.23. The distribution of SNPs markers on the 21 chromosomes ranged from 319 on chromosome 3D to 2,370 on chromosome 3B. The analysis of population structure revealed three subpopulations (G1, G2, and G3). Analysis of molecular variance identified 8% variance among and 92% within subpopulations. Of the three subpopulations, G2 had the highest level of genetic diversity based on three genetic diversity indices: Shannon's information index ( I ) = 0.494, diversity index ( h ) = 0.328 and unbiased diversity index (uh) = 0.331, while G3 had lowest level of genetic diversity ( I = 0.348, h = 0.226 and uh = 0.236). This high genetic diversity identified among the subpopulations can be used to develop new wheat cultivars.
Genetic Diversity and Population Structure of F3:6 Nebraska Winter Wheat Genotypes Using Genotyping-By-Sequencing

PubMed Central

Eltaher, Shamseldeen; Sallam, Ahmed; Belamkar, Vikas; Emara, Hamdy A.; Nower, Ahmed A.; Salem, Khaled F. M.; Poland, Jesse; Baenziger, Peter S.

2018-01-01

The availability of information on the genetic diversity and population structure in wheat (Triticum aestivum L.) breeding lines will help wheat breeders to better use their genetic resources and manage genetic variation in their breeding program. The recent advances in sequencing technology provide the opportunity to identify tens or hundreds of thousands of single nucleotide polymorphism (SNPs) in large genome species (e.g., wheat). These SNPs can be utilized for understanding genetic diversity and performing genome wide association studies (GWAS) for complex traits. In this study, the genetic diversity and population structure were investigated in a set of 230 genotypes (F3:6) derived from various crosses as a prerequisite for GWAS and genomic selection. Genotyping-by-sequencing provided 25,566 high-quality SNPs. The polymorphism information content (PIC) across chromosomes ranged from 0.09 to 0.37 with an average of 0.23. The distribution of SNPs markers on the 21 chromosomes ranged from 319 on chromosome 3D to 2,370 on chromosome 3B. The analysis of population structure revealed three subpopulations (G1, G2, and G3). Analysis of molecular variance identified 8% variance among and 92% within subpopulations. Of the three subpopulations, G2 had the highest level of genetic diversity based on three genetic diversity indices: Shannon’s information index (I) = 0.494, diversity index (h) = 0.328 and unbiased diversity index (uh) = 0.331, while G3 had lowest level of genetic diversity (I = 0.348, h = 0.226 and uh = 0.236). This high genetic diversity identified among the subpopulations can be used to develop new wheat cultivars. PMID:29593779
Building information models for astronomy projects

NASA Astrophysics Data System (ADS)

Ariño, Javier; Murga, Gaizka; Campo, Ramón; Eletxigerra, Iñigo; Ampuero, Pedro

2012-09-01

A Building Information Model is a digital representation of physical and functional characteristics of a building. BIMs represent the geometrical characteristics of the Building, but also properties like bills of quantities, definition of COTS components, status of material in the different stages of the project, project economic data, etc. The BIM methodology, which is well established in the Architecture Engineering and Construction (AEC) domain for conventional buildings, has been brought one step forward in its application for Astronomical/Scientific facilities. In these facilities steel/concrete structures have high dynamic and seismic requirements, M&E installations are complex and there is a large amount of special equipment and mechanisms involved as a fundamental part of the facility. The detail design definition is typically implemented by different design teams in specialized design software packages. In order to allow the coordinated work of different engineering teams, the overall model, and its associated engineering database, is progressively integrated using a coordination and roaming software which can be used before starting construction phase for checking interferences, planning the construction sequence, studying maintenance operation, reporting to the project office, etc. This integrated design & construction approach will allow to efficiently plan construction sequence (4D). This is a powerful tool to study and analyze in detail alternative construction sequences and ideally coordinate the work of different construction teams. In addition engineering, construction and operational database can be linked to the virtual model (6D), what gives to the end users a invaluable tool for the lifecycle management, as all the facility information can be easily accessed, added or replaced. This paper presents the BIM methodology as implemented by IDOM with the E-ELT and ATST Enclosures as application examples.
A method for automatically extracting infectious disease-related primers and probes from the literature

PubMed Central

2010-01-01

Background Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. Results We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. Conclusions We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch. PMID:20682041
Using phylogenetically-informed annotation (PIA) to search for light-interacting genes in transcriptomes from non-model organisms.

PubMed

Speiser, Daniel I; Pankey, M Sabrina; Zaharoff, Alexander K; Battelle, Barbara A; Bracken-Grissom, Heather D; Breinholt, Jesse W; Bybee, Seth M; Cronin, Thomas W; Garm, Anders; Lindgren, Annie R; Patel, Nipam H; Porter, Megan L; Protas, Meredith E; Rivera, Ajna S; Serb, Jeanne M; Zigler, Kirk S; Crandall, Keith A; Oakley, Todd H

2014-11-19

Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families. We generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 29 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository ( http://bitbucket.org/osiris_phylogenetics/pia/ ) and we demonstrate PIA on a publicly-accessible web server ( http://galaxy-dev.cnsi.ucsb.edu/pia/ ). Our new trees for LIT genes will be a valuable resource for researchers studying the evolution of eyes or other light-interacting structures. We also introduce PIA, a high throughput method for using phylogenetic relationships to identify LIT genes in transcriptomes from non-model organisms. With simple modifications, our methods may be used to search for different sets of genes or to annotate data sets from taxa outside of Metazoa.
Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

PubMed

Bastien, Olivier; Maréchal, Eric

2008-08-07

Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the information hazard rate, and that pairwise sequence alignment scores should follow a Gumbel distribution, which parameters could find some theoretical rationale. In particular, one parameter corresponds to the information hazard rate. Extreme value distribution of alignment scores, assessed from high scoring segments pairs following the Karlin-Altschul model, can also be deduced from the Reliability Theory applied to molecular sequences. It reflects the redundancy of information between homologous sequences, under functional conservative pressure. This model also provides a link between concepts of biological sequence analysis and of systems biology.
Characterization and phylogenetic analysis of the swine leukocyte antigen 3 gene from Korean native pigs.

PubMed

Chung, H Y; Choi, Y C; Park, H N

2015-05-18

We investigated the phylogenetic relationships between pig breeds, compared the genetic similarity between humans and pigs, and provided basic genetic information on Korean native pigs (KNPs), using genetic variants of the swine leukocyte antigen 3 (SLA-3) gene. Primers were based on sequences from GenBank (accession Nos. AF464010 and AF464009). Polymerase chain reaction analysis amplified approximately 1727 bp of segments, which contained 1086 bp of coding regions and 641 bp of the 3'- and 5'-untranslated regions. Bacterial artificial chromosome clones of miniature pigs were used for sequencing the SLA-3 genomic region, which was 3114 bp in total length, including the coding (1086 bp) and non-coding (2028 bp) regions. Sequence analysis detected 53 single nucleotide polymorphisms (SNPs), based on a minor allele frequency greater than 0.01, which is low compared with other pig breeds, and the results suggest that there is low genetic variability in KNPs. Comparative analysis revealed that humans possess approximately three times more genetic variation than do pigs. Approximately 71% of SNPs in exons 2 and 3 were detected in KNPs, and exon 5 in humans is a highly polymorphic region. Newly identified sequences of SLA-3 using KNPs were submitted to GenBank (accession No. DQ992512-18). Cluster analysis revealed that KNPs were grouped according to three major alleles: SLA-3*0502 (DQ992518), SLA-3*0302 (DQ992513 and DQ992516), and SLA-3*0303 (DQ992512, DQ992514, DQ992515, and DQ992517). Alignments revealed that humans have a relatively close genetic relationship with pigs and chimpanzees. The information provided by this study may be useful in KNP management.
Cyber infrastructure for Fusarium: three integrated platforms supporting strain identification, phylogenetics, comparative genomics and knowledge sharing.

PubMed

Park, Bongsoo; Park, Jongsun; Cheong, Kyeong-Chae; Choi, Jaeyoung; Jung, Kyongyong; Kim, Donghan; Lee, Yong-Hwan; Ward, Todd J; O'Donnell, Kerry; Geiser, David M; Kang, Seogchan

2011-01-01

The fungal genus Fusarium includes many plant and/or animal pathogenic species and produces diverse toxins. Although accurate species identification is critical for managing such threats, it is difficult to identify Fusarium morphologically. Fortunately, extensive molecular phylogenetic studies, founded on well-preserved culture collections, have established a robust foundation for Fusarium classification. Genomes of four Fusarium species have been published with more being currently sequenced. The Cyber infrastructure for Fusarium (CiF; http://www.fusariumdb.org/) was built to support archiving and utilization of rapidly increasing data and knowledge and consists of Fusarium-ID, Fusarium Comparative Genomics Platform (FCGP) and Fusarium Community Platform (FCP). The Fusarium-ID archives phylogenetic marker sequences from most known species along with information associated with characterized isolates and supports strain identification and phylogenetic analyses. The FCGP currently archives five genomes from four species. Besides supporting genome browsing and analysis, the FCGP presents computed characteristics of multiple gene families and functional groups. The Cart/Favorite function allows users to collect sequences from Fusarium-ID and the FCGP and analyze them later using multiple tools without requiring repeated copying-and-pasting of sequences. The FCP is designed to serve as an online community forum for sharing and preserving accumulated experience and knowledge to support future research and education.
Promises, pitfalls and practicalities of prenatal whole exome sequencing.

PubMed

Best, Sunayna; Wou, Karen; Vora, Neeta; Van der Veyver, Ignatia B; Wapner, Ronald; Chitty, Lyn S

2018-01-01

Prenatal genetic diagnosis provides information for pregnancy and perinatal decision-making and management. In several small series, prenatal whole exome sequencing (WES) approaches have identified genetic diagnoses when conventional tests (karyotype and microarray) were not diagnostic. Here, we review published prenatal WES studies and recent conference abstracts. Thirty-one studies were identified, with diagnostic rates in series of five or more fetuses varying between 6.2% and 80%. Differences in inclusion criteria and trio versus singleton approaches to sequencing largely account for the wide range of diagnostic rates. The data suggest that diagnostic yields will be greater in fetuses with multiple anomalies or in cases preselected following genetic review. Beyond its ability to improve diagnostic rates, we explore the potential of WES to improve understanding of prenatal presentations of genetic disorders and lethal fetal syndromes. We discuss prenatal phenotyping limitations, counselling challenges regarding variants of uncertain significance, incidental and secondary findings, and technical problems in WES. We review the practical, ethical, social and economic issues that must be considered before prenatal WES could become part of routine testing. Finally, we reflect upon the potential future of prenatal genetic diagnosis, including a move towards whole genome sequencing and non-invasive whole exome and whole genome testing. © 2017 John Wiley & Sons, Ltd. © 2017 John Wiley & Sons, Ltd.
High Throughput Sequencing for Detection of Foodborne Pathogens

PubMed Central

Sekse, Camilla; Holst-Jensen, Arne; Dobrindt, Ulrich; Johannessen, Gro S.; Li, Weihua; Spilsberg, Bjørn; Shi, Jianxin

2017-01-01

High-throughput sequencing (HTS) is becoming the state-of-the-art technology for typing of microbial isolates, especially in clinical samples. Yet, its application is still in its infancy for monitoring and outbreak investigations of foods. Here we review the published literature, covering not only bacterial but also viral and Eukaryote food pathogens, to assess the status and potential of HTS implementation to inform stakeholders, improve food safety and reduce outbreak impacts. The developments in sequencing technology and bioinformatics have outpaced the capacity to analyze and interpret the sequence data. The influence of sample processing, nucleic acid extraction and purification, harmonized protocols for generation and interpretation of data, and properly annotated and curated reference databases including non-pathogenic “natural” strains are other major obstacles to the realization of the full potential of HTS in analytical food surveillance, epidemiological and outbreak investigations, and in complementing preventive approaches for the control and management of foodborne pathogens. Despite significant obstacles, the achieved progress in capacity and broadening of the application range over the last decade is impressive and unprecedented, as illustrated with the chosen examples from the literature. Large consortia, often with broad international participation, are making coordinated efforts to cope with many of the mentioned obstacles. Further rapid progress can therefore be prospected for the next decade. PMID:29104564
Cyber infrastructure for Fusarium: three integrated platforms supporting strain identification, phylogenetics, comparative genomics and knowledge sharing

PubMed Central

Park, Bongsoo; Park, Jongsun; Cheong, Kyeong-Chae; Choi, Jaeyoung; Jung, Kyongyong; Kim, Donghan; Lee, Yong-Hwan; Ward, Todd J.; O'Donnell, Kerry; Geiser, David M.; Kang, Seogchan

2011-01-01

The fungal genus Fusarium includes many plant and/or animal pathogenic species and produces diverse toxins. Although accurate species identification is critical for managing such threats, it is difficult to identify Fusarium morphologically. Fortunately, extensive molecular phylogenetic studies, founded on well-preserved culture collections, have established a robust foundation for Fusarium classification. Genomes of four Fusarium species have been published with more being currently sequenced. The Cyber infrastructure for Fusarium (CiF; http://www.fusariumdb.org/) was built to support archiving and utilization of rapidly increasing data and knowledge and consists of Fusarium-ID, Fusarium Comparative Genomics Platform (FCGP) and Fusarium Community Platform (FCP). The Fusarium-ID archives phylogenetic marker sequences from most known species along with information associated with characterized isolates and supports strain identification and phylogenetic analyses. The FCGP currently archives five genomes from four species. Besides supporting genome browsing and analysis, the FCGP presents computed characteristics of multiple gene families and functional groups. The Cart/Favorite function allows users to collect sequences from Fusarium-ID and the FCGP and analyze them later using multiple tools without requiring repeated copying-and-pasting of sequences. The FCP is designed to serve as an online community forum for sharing and preserving accumulated experience and knowledge to support future research and education. PMID:21087991
SIMBA: a web tool for managing bacterial genome assembly generated by Ion PGM sequencing technology.

PubMed

Mariano, Diego C B; Pereira, Felipe L; Aguiar, Edgar L; Oliveira, Letícia C; Benevides, Leandro; Guimarães, Luís C; Folador, Edson L; Sousa, Thiago J; Ghosh, Preetam; Barh, Debmalya; Figueiredo, Henrique C P; Silva, Artur; Ramos, Rommel T J; Azevedo, Vasco A C

2016-12-15

The evolution of Next-Generation Sequencing (NGS) has considerably reduced the cost per sequenced-base, allowing a significant rise of sequencing projects, mainly in prokaryotes. However, the range of available NGS platforms requires different strategies and software to correctly assemble genomes. Different strategies are necessary to properly complete an assembly project, in addition to the installation or modification of various software. This requires users to have significant expertise in these software and command line scripting experience on Unix platforms, besides possessing the basic expertise on methodologies and techniques for genome assembly. These difficulties often delay the complete genome assembly projects. In order to overcome this, we developed SIMBA (SImple Manager for Bacterial Assemblies), a freely available web tool that integrates several component tools for assembling and finishing bacterial genomes. SIMBA provides a friendly and intuitive user interface so bioinformaticians, even with low computational expertise, can work under a centralized administrative control system of assemblies managed by the assembly center head. SIMBA guides the users to execute assembly process through simple and interactive pages. SIMBA workflow was divided in three modules: (i) projects: allows a general vision of genome sequencing projects, in addition to data quality analysis and data format conversions; (ii) assemblies: allows de novo assemblies with the software Mira, Minia, Newbler and SPAdes, also assembly quality validations using QUAST software; and (iii) curation: presents methods to finishing assemblies through tools for scaffolding contigs and close gaps. We also presented a case study that validated the efficacy of SIMBA to manage bacterial assemblies projects sequenced using Ion Torrent PGM. Besides to be a web tool for genome assembly, SIMBA is a complete genome assemblies project management system, which can be useful for managing of several projects in laboratories. SIMBA source code is available to download and install in local webservers at http://ufmg-simba.sourceforge.net .
[Learning and Repetive Reproduction of Memorized Sequences by the Right and the Left Hand].

PubMed

Bobrova, E V; Lyakhovetskii, V A; Bogacheva, I N

2015-01-01

An important stage of learning a new skill is repetitive reproduction of one and the same sequence of movements, which plays a significant role in forming of the movement stereotypes. Two groups of right-handers repeatedly memorized (6-10 repetitions) the sequences of their hand transitions by experimenter in 6 positions, firstly by the right hand (RH), and then--by the left hand (LH) or vice versa. Random sequences previously unknown to the volunteers were reproduced in the 11 series. Modified sequences were tested in the 2nd and 3rd series, where the same elements' positions were presented in different order. The processes of repetitive sequence reproduction were similar for RH and LH. However, the learning of the modified sequences differed: Information about elements' position disregarding the reproduction order was used only when LH initiated task performing. This information was not used when LH followed RH and when RH performed the task. Consequently, the type of information coding activated by LH helped learn the positions of sequence elements, while the type of information coding activated by RH prevented learning. It is supposedly connected with the predominant role of right hemisphere in the processes of positional coding and motor learning.
Osmylated DNA, a novel concept for sequencing DNA using nanopores

NASA Astrophysics Data System (ADS)

Kanavarioti, Anastassia

2015-03-01

Saenger sequencing has led the advances in molecular biology, while faster and cheaper next generation technologies are urgently needed. A newer approach exploits nanopores, natural or solid-state, set in an electrical field, and obtains base sequence information from current variations due to the passage of a ssDNA molecule through the pore. A hurdle in this approach is the fact that the four bases are chemically comparable to each other which leads to small differences in current obstruction. ‘Base calling’ becomes even more challenging because most nanopores sense a short sequence and not individual bases. Perhaps sequencing DNA via nanopores would be more manageable, if only the bases were two, and chemically very different from each other; a sequence of 1s and 0s comes to mind. Osmylated DNA comes close to such a sequence of 1s and 0s. Osmylation is the addition of osmium tetroxide bipyridine across the C5-C6 double bond of the pyrimidines. Osmylation adds almost 400% mass to the reactive base, creates a sterically and electronically notably different molecule, labeled 1, compared to the unreactive purines, labeled 0. If osmylated DNA were successfully sequenced, the result would be a sequence of osmylated pyrimidines (1), and purines (0), and not of the actual nucleobases. To solve this problem we studied the osmylation reaction with short oligos and with M13mp18, a long ssDNA, developed a UV-vis assay to measure extent of osmylation, and designed two protocols. Protocol A uses mild conditions and yields osmylated thymidines (1), while leaving the other three bases (0) practically intact. Protocol B uses harsher conditions and effectively osmylates both pyrimidines, but not the purines. Applying these two protocols also to the complementary of the target polynucleotide yields a total of four osmylated strands that collectively could define the actual base sequence of the target DNA.
Analysis of BAC end sequences in oak, a keystone forest tree species, providing insight into the composition of its genome

PubMed Central

2011-01-01

Background One of the key goals of oak genomics research is to identify genes of adaptive significance. This information may help to improve the conservation of adaptive genetic variation and the management of forests to increase their health and productivity. Deep-coverage large-insert genomic libraries are a crucial tool for attaining this objective. We report herein the construction of a BAC library for Quercus robur, its characterization and an analysis of BAC end sequences. Results The EcoRI library generated consisted of 92,160 clones, 7% of which had no insert. Levels of chloroplast and mitochondrial contamination were below 3% and 1%, respectively. Mean clone insert size was estimated at 135 kb. The library represents 12 haploid genome equivalents and, the likelihood of finding a particular oak sequence of interest is greater than 99%. Genome coverage was confirmed by PCR screening of the library with 60 unique genetic loci sampled from the genetic linkage map. In total, about 20,000 high-quality BAC end sequences (BESs) were generated by sequencing 15,000 clones. Roughly 5.88% of the combined BAC end sequence length corresponded to known retroelements while ab initio repeat detection methods identified 41 additional repeats. Collectively, characterized and novel repeats account for roughly 8.94% of the genome. Further analysis of the BESs revealed 1,823 putative genes suggesting at least 29,340 genes in the oak genome. BESs were aligned with the genome sequences of Arabidopsis thaliana, Vitis vinifera and Populus trichocarpa. One putative collinear microsyntenic region encoding an alcohol acyl transferase protein was observed between oak and chromosome 2 of V. vinifera. Conclusions This BAC library provides a new resource for genomic studies, including SSR marker development, physical mapping, comparative genomics and genome sequencing. BES analysis provided insight into the structure of the oak genome. These sequences will be used in the assembly of a future genome sequence for oak. PMID:21645357
Fast and Cost-Effective Mining of Microsatellite Markers Using NGS Technology: An Example of a Korean Water Deer Hydropotes inermis argyropus

PubMed Central

Yu, Jeong-Nam; Won, Changman; Jun, Jumin; Lim, YoungWoon; Kwak, Myounghai

2011-01-01

Background Microsatellites, a special class of repetitive DNA sequence, have become one of the most popular genetic markers for population/conservation genetic studies. However, its application to endangered species has been impeded by high development costs, a lack of available sequences, and technical difficulties. The water deer Hydropotes inermis is the sole existing endangered species of the subfamily Capreolinae. Although population genetics studies are urgently required for conservation management, no species-specific microsatellite marker has been reported. Methods We adopted next-generation sequencing (NGS) to elucidate the microsatellite markers of Korean water deer and overcome these impediments on marker developments. We performed genotyping to determine the efficiency of this method as applied to population genetics. Results We obtained 98 Mbp of nucleotide information from 260,467 sequence reads. A total of 20,101 di-/tri-nucleotide repeat motifs were identified; di-repeats were 5.9-fold more common than tri-repeats. [CA]n and [AAC]n/[AAT]n repeats were the most frequent di- and tri-repeats, respectively. Of the 17,206 di-repeats, 12,471 microsatellite primer pairs were derived. PCR amplification of 400 primer pairs yielded 106 amplicons and 79 polymorphic markers from 20 individual Korean water deer. Polymorphic rates of the 79 new microsatellites varied from 2 to 11 alleles per locus (He: 0.050–0.880; Ho: 0.000–1.000), while those of known microsatellite markers transferred from cattle to Chinese water deer ranged from 4 to 6 alleles per locus (He: 0.279–0.714; Ho: 0.300–0.400). Conclusions Polymorphic microsatellite markers from Korean water deer were successfully identified using NGS without any prior sequence information and deposited into the public database. Thus, the methods described herein represent a rapid and low-cost way to investigate the population genetics of endangered/non-model species. PMID:22069476
From prenatal genomic diagnosis to fetal personalized medicine: progress and challenges

PubMed Central

Bianchi, Diana W

2015-01-01

Thus far, the focus of personalized medicine has been the prevention and treatment of conditions that affect adults. Although advances in genetic technology have been applied more frequently to prenatal diagnosis than to fetal treatment, genetic and genomic information is beginning to influence pregnancy management. Recent developments in sequencing the fetal genome combined with progress in understanding fetal physiology using gene expression arrays indicate that we could have the technical capabilities to apply an individualized medicine approach to the fetus. Here I review recent advances in prenatal genetic diagnostics, the challenges associated with these new technologies and how the information derived from them can be used to advance fetal care. Historically, the goal of prenatal diagnosis has been to provide an informed choice to prospective parents. We are now at a point where that goal can and should be expanded to incorporate genetic, genomic and transcriptomic data to develop new approaches to fetal treatment. PMID:22772565
Modeling genome coverage in single-cell sequencing

PubMed Central

Daley, Timothy; Smith, Andrew D.

2014-01-01

Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873
HepSEQ: International Public Health Repository for Hepatitis B

PubMed Central

Gnaneshan, Saravanamuttu; Ijaz, Samreen; Moran, Joanne; Ramsay, Mary; Green, Jonathan

2007-01-01

HepSEQ is a repository for an extensive library of public health and molecular data relating to hepatitis B virus (HBV) infection collected from international sources. It is hosted by the Centre for Infections, Health Protection Agency (HPA), England, United Kingdom. This repository has been developed as a web-enabled, quality-controlled database to act as a tool for surveillance, HBV case management and for research. The web front-end for the database system can be accessed from . The format of the database system allows for comprehensive molecular, clinical and epidemiological data to be deposited into a functional database, to search and manipulate the stored data and to extract and visualize the information on epidemiological, virological, clinical, nucleotide sequence and mutational aspects of HBV infection through web front-end. Specific tools, built into the database, can be utilized to analyse deposited data and provide information on HBV genotype, identify mutations with known clinical significance (e.g. vaccine escape, precore and antiviral-resistant mutations) and carry out sequence homology searches against other deposited strains. Further mechanisms are also in place to allow specific tailored searches of the database to be undertaken. PMID:17130143
Sequence information gain based motif analysis.

PubMed

Maynou, Joan; Pairó, Erola; Marco, Santiago; Perera, Alexandre

2015-11-09

The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70% of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.

A web-based genomic sequence database for the Streptomycetaceae: a tool for systematics and genome mining

USDA-ARS?s Scientific Manuscript database

The ARS Microbial Genome Sequence Database (http://199.133.98.43), a web-based database server, was established utilizing the BIGSdb (Bacterial Isolate Genomics Sequence Database) software package, developed at Oxford University, as a tool to manage multi-locus sequence data for the family Streptomy...
Optimizing the sequence of diameter distributions and selection harvests for uneven-aged stand management

Treesearch

Robert G. Haight; J. Douglas Brodie; Darius M. Adams

1985-01-01

The determination of an optimal sequence of diameter distributions and selection harvests for uneven-aged stand management is formulated as a discrete-time optimal-control problem with bounded control variables and free-terminal point. An efficient programming technique utilizing gradients provides solutions that are stable and interpretable on the basis of economic...
Mississippi Curriculum Framework for Marketing Management Technology (Program CIP: 52.1401--Business Mkt. & Mkt. Mgmt.). Postsecondary Programs.

ERIC Educational Resources Information Center

Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the state's marketing management technology program. Presented in the introduction are a program description and suggested course sequence. Section I lists baseline competencies for the…
A PATO-compliant zebrafish screening database (MODB): management of morpholino knockdown screen information.

PubMed

Knowlton, Michelle N; Li, Tongbin; Ren, Yongliang; Bill, Brent R; Ellis, Lynda Bm; Ekker, Stephen C

2008-01-07

The zebrafish is a powerful model vertebrate amenable to high throughput in vivo genetic analyses. Examples include reverse genetic screens using morpholino knockdown, expression-based screening using enhancer trapping and forward genetic screening using transposon insertional mutagenesis. We have created a database to facilitate web-based distribution of data from such genetic studies. The MOrpholino DataBase is a MySQL relational database with an online, PHP interface. Multiple quality control levels allow differential access to data in raw and finished formats. MODBv1 includes sequence information relating to almost 800 morpholinos and their targets and phenotypic data regarding the dose effect of each morpholino (mortality, toxicity and defects). To improve the searchability of this database, we have incorporated a fixed-vocabulary defect ontology that allows for the organization of morpholino affects based on anatomical structure affected and defect produced. This also allows comparison between species utilizing Phenotypic Attribute Trait Ontology (PATO) designated terminology. MODB is also cross-linked with ZFIN, allowing full searches between the two databases. MODB offers users the ability to retrieve morpholino data by sequence of morpholino or target, name of target, anatomical structure affected and defect produced. MODB data can be used for functional genomic analysis of morpholino design to maximize efficacy and minimize toxicity. MODB also serves as a template for future sequence-based functional genetic screen databases, and it is currently being used as a model for the creation of a mutagenic insertional transposon database.
DraGnET: Software for storing, managing and analyzing annotated draft genome sequence data

PubMed Central

2010-01-01

Background New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioinformatic tools are available for the analysis and management of genome sequences, limitations still remain. For example, restrictions on the submission of data and use of these tools may be imposed, thereby making them unsuitable for sequencing projects that need to remain in-house or proprietary during their initial stages. Furthermore, the availability and use of next generation sequencing in industrial, governmental and academic environments requires biologist to have access to computational support for the curation and analysis of the data generated; however, this type of support is not always immediately available. Results To address these limitations, we have developed DraGnET (Draft Genome Evaluation Tool). DraGnET is an open source web application which allows researchers, with no experience in programming and database management, to setup their own in-house projects for storing, retrieving, organizing and managing annotated draft and complete genome sequence data. The software provides a web interface for the use of BLAST, allowing users to perform preliminary comparative analysis among multiple genomes. We demonstrate the utility of DraGnET for performing comparative genomics on closely related bacterial strains. Furthermore, DraGnET can be further developed to incorporate additional tools for more sophisticated analyses. Conclusions DraGnET is designed for use either by individual researchers or as a collaborative tool available through Internet (or Intranet) deployment. For genome projects that require genome sequencing data to initially remain proprietary, DraGnET provides the means for researchers to keep their data in-house for analysis using local programs or until it is made publicly available, at which point it may be uploaded to additional analysis software applications. The DraGnET home page is available at http://www.dragnet.cvm.iastate.edu and includes example files for examining the functionalities, a link for downloading the DraGnET setup package and a link to the DraGnET source code hosted with full documentation on SourceForge. PMID:20175920
Information theory-based algorithm for in silico prediction of PCR products with whole genomic sequences as templates.

PubMed

Cao, Youfang; Wang, Lianjie; Xu, Kexue; Kou, Chunhai; Zhang, Yulei; Wei, Guifang; He, Junjian; Wang, Yunfang; Zhao, Liping

2005-07-26

A new algorithm for assessing similarity between primer and template has been developed based on the hypothesis that annealing of primer to template is an information transfer process. Primer sequence is converted to a vector of the full potential hydrogen numbers (3 for G or C, 2 for A or T), while template sequence is converted to a vector of the actual hydrogen bond numbers formed after primer annealing. The former is considered as source information and the latter destination information. An information coefficient is calculated as a measure for fidelity of this information transfer process and thus a measure of similarity between primer and potential annealing site on template. Successful prediction of PCR products from whole genomic sequences with a computer program based on the algorithm demonstrated the potential of this new algorithm in areas like in silico PCR and gene finding.
The High-Performance Computing and Communications program, the national information infrastructure and health care.

PubMed Central

Lindberg, D A; Humphreys, B L

1995-01-01

The High-Performance Computing and Communications (HPCC) program is a multiagency federal effort to advance the state of computing and communications and to provide the technologic platform on which the National Information Infrastructure (NII) can be built. The HPCC program supports the development of high-speed computers, high-speed telecommunications, related software and algorithms, education and training, and information infrastructure technology and applications. The vision of the NII is to extend access to high-performance computing and communications to virtually every U.S. citizen so that the technology can be used to improve the civil infrastructure, lifelong learning, energy management, health care, etc. Development of the NII will require resolution of complex economic and social issues, including information privacy. Health-related applications supported under the HPCC program and NII initiatives include connection of health care institutions to the Internet; enhanced access to gene sequence data; the "Visible Human" Project; and test-bed projects in telemedicine, electronic patient records, shared informatics tool development, and image systems. PMID:7614116
State-dependent resource harvesting with lagged information about system states

USGS Publications Warehouse

Johnson, Fred A.; Fackler, Paul L.; Boomer, G Scott; Zimmerman, Guthrie S.; Williams, Byron K.; Nichols, James D.; Dorazio, Robert

2016-01-01

Markov decision processes (MDPs), which involve a temporal sequence of actions conditioned on the state of the managed system, are increasingly being applied in natural resource management. This study focuses on the modification of a traditional MDP to account for those cases in which an action must be chosen after a significant time lag in observing system state, but just prior to a new observation. In order to calculate an optimal decision policy under these conditions, possible actions must be conditioned on the previous observed system state and action taken. We show how to solve these problems when the state transition structure is known and when it is uncertain. Our focus is on the latter case, and we show how actions must be conditioned not only on the previous system state and action, but on the probabilities associated with alternative models of system dynamics. To demonstrate this framework, we calculated and simulated optimal, adaptive policies for MDPs with lagged states for the problem of deciding annual harvest regulations for mallards (Anas platyrhynchos) in the United States. In this particular example, changes in harvest policy induced by the use of lagged information about system state were sufficient to maintain expected management performance (e.g. population size, harvest) even in the face of an uncertain system state at the time of a decision.
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

PubMed

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

2015-05-01

To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

PubMed Central

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

2015-01-01

Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745
Perspectives on the Transition From Bacterial Phytopathogen Genomics Studies to Applications Enhancing Disease Management: From Promise to Practice.

PubMed

Sundin, George W; Wang, Nian; Charkowski, Amy O; Castiblanco, Luisa F; Jia, Hongge; Zhao, Youfu

2016-10-01

The advent of genomics has advanced science into a new era, providing a plethora of "toys" for researchers in many related and disparate fields. Genomics has also spawned many new fields, including proteomics and metabolomics, furthering our ability to gain a more comprehensive view of individual organisms and of interacting organisms. Genomic information of both bacterial pathogens and their hosts has provided the critical starting point in understanding the molecular bases of how pathogens disrupt host cells to cause disease. In addition, knowledge of the complete genome sequence of the pathogen provides a potentially broad slate of targets for the development of novel virulence inhibitors that are desperately needed for disease management. Regarding plant bacterial pathogens and disease management, the potential for utilizing genomics resources in the development of durable resistance is enhanced because of developing technologies that enable targeted modification of the host. Here, we summarize the role of genomics studies in furthering efforts to manage bacterial plant diseases and highlight novel genomics-enabled strategies heading down this path.
[Role and management of cancer clinical database in the application of gastric cancer precision medicine].

PubMed

Li, Yuanfang; Zhou, Zhiwei

2016-02-01

Precision medicine is a new medical concept and medical model, which is based on personalized medicine, rapid progress of genome sequencing technology and cross application of biological information and big data science. Precision medicine improves the diagnosis and treatment of gastric cancer to provide more convenience through more profound analyses of characteristics, pathogenesis and other core issues in gastric cancer. Cancer clinical database is important to promote the development of precision medicine. Therefore, it is necessary to pay close attention to the construction and management of the database. The clinical database of Sun Yat-sen University Cancer Center is composed of medical record database, blood specimen bank, tissue bank and medical imaging database. In order to ensure the good quality of the database, the design and management of the database should follow the strict standard operation procedure(SOP) model. Data sharing is an important way to improve medical research in the era of medical big data. The construction and management of clinical database must also be strengthened and innovated.
DIALOG: An executive computer program for linking independent programs

NASA Technical Reports Server (NTRS)

Glatt, C. R.; Hague, D. S.; Watson, D. A.

1973-01-01

A very large scale computer programming procedure called the DIALOG Executive System has been developed for the Univac 1100 series computers. The executive computer program, DIALOG, controls the sequence of execution and data management function for a library of independent computer programs. Communication of common information is accomplished by DIALOG through a dynamically constructed and maintained data base of common information. The unique feature of the DIALOG Executive System is the manner in which computer programs are linked. Each program maintains its individual identity and as such is unaware of its contribution to the large scale program. This feature makes any computer program a candidate for use with the DIALOG Executive System. The installation and use of the DIALOG Executive System are described at Johnson Space Center.
FASTdoop: a versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications.

PubMed

Ferraro Petrillo, Umberto; Roscigno, Gianluca; Cattaneo, Giuseppe; Giancarlo, Raffaele

2017-05-15

MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files. We show that, with respect to analogous input management routines that have appeared in the Literature, it offers versatility and efficiency. That is, it can handle collections of reads, with or without quality scores, as well as long genomic sequences while the existing routines concentrate mainly on NGS sequence data. Moreover, in the domain where a comparison is possible, the routines proposed here are faster than the available ones. In conclusion, FASTdoop is a much needed addition to Hadoop-BAM. The software and the datasets are available at http://www.di.unisa.it/FASTdoop/ . umberto.ferraro@uniroma1.it. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Multimedia Workstations: Electronic Assistants for Health-Care Professionals.

PubMed

Degoulet, P; Jean, F-C; Safran, C

1996-01-01

The increasing costs of health care and the economic reality has produced an interesting paradox for the health professional to perform more clinical work with fewer support personnel. Moreover, an explosion of the knowledge-base that underlies sound clinical care not only makes effective time management critical, but also knowledge management compelling. A multimedia workstation is an electronic assistant for the busy health professional that can help with administrative tasks and give access to clinical information and knowledge networks. The multimedia nature of processed information reflects an evolution of medical technologies that involve more and more complex objects such as video sequences or digitized signals. Analysis of the 445 Medline-indexed publications for the January 1991 to December 1994 period, that included the word "workstation" either in their title or in their abstract, helps in refining objectives and challenges both for health professionals and decision makers. From an engineering perspective, development of a workstation requires the integration into the same environments of tools to localize, access, manipulate and communicate the required information. The long-term goal is to establish an easy access in a collaborative working environment that gives the end-user the feeling of a single virtual health enterprise, driven by an integrated computer system when the information system relies on a set of heterogeneous and geographically distributed components. Consequences in terms of migration from traditional client/server architectures to more client/network architectures are considered.
Googling DNA sequences on the World Wide Web.

PubMed

Hajibabaei, Mehrdad; Singer, Gregory A C

2009-11-10

New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bioinformatics applications. We have developed a novel algorithm and implemented it for searching species-specific genomic sequences, DNA barcodes, by using popular web-based methods such as Google. We developed an alignment independent character based algorithm based on dividing a sequence library (DNA barcodes) and query sequence to words. The actual search is conducted by conventional search tools such as freely available Google Desktop Search. We implemented our algorithm in two exemplar packages. We developed pre and post-processing software to provide customized input and output services, respectively. Our analysis of all publicly available DNA barcode sequences shows a high accuracy as well as rapid results. Our method makes use of conventional web-based technologies for specialized genetic data. It provides a robust and efficient solution for sequence search on the web. The integration of our search method for large-scale sequence libraries such as DNA barcodes provides an excellent web-based tool for accessing this information and linking it to other available categories of information on the web.
HalX: an open-source LIMS (Laboratory Information Management System) for small- to large-scale laboratories.

PubMed

Prilusky, Jaime; Oueillet, Eric; Ulryck, Nathalie; Pajon, Anne; Bernauer, Julie; Krimm, Isabelle; Quevillon-Cheruel, Sophie; Leulliot, Nicolas; Graille, Marc; Liger, Dominique; Trésaugues, Lionel; Sussman, Joel L; Janin, Joël; van Tilbeurgh, Herman; Poupon, Anne

2005-06-01

Structural genomics aims at the establishment of a universal protein-fold dictionary through systematic structure determination either by NMR or X-ray crystallography. In order to catch up with the explosive amount of protein sequence data, the structural biology laboratories are spurred to increase the speed of the structure-determination process. To achieve this goal, high-throughput robotic approaches are increasingly used in all the steps leading from cloning to data collection and even structure interpretation is becoming more and more automatic. The progress made in these areas has begun to have a significant impact on the more 'classical' structural biology laboratories, dramatically increasing the number of individual experiments. This automation creates the need for efficient data management. Here, a new piece of software, HalX, designed as an 'electronic lab book' that aims at (i) storage and (ii) easy access and use of all experimental data is presented. This should lead to much improved management and tracking of structural genomics experimental data.
Laboratory Diagnosis of Infective Endocarditis

PubMed Central

Liesman, Rachael M.; Pritt, Bobbi S.; Maleszewski, Joseph J.

2017-01-01

ABSTRACT Infective endocarditis is life-threatening; identification of the underlying etiology informs optimized individual patient management. Changing epidemiology, advances in blood culture techniques, and new diagnostics guide the application of laboratory testing for diagnosis of endocarditis. Blood cultures remain the standard test for microbial diagnosis, with directed serological testing (i.e., Q fever serology, Bartonella serology) in culture-negative cases. Histopathology and molecular diagnostics (e.g., 16S rRNA gene PCR/sequencing, Tropheryma whipplei PCR) may be applied to resected valves to aid in diagnosis. Herein, we summarize recent knowledge in this area and propose a microbiologic and pathological algorithm for endocarditis diagnosis. PMID:28659319
Coordinating complex problem-solving among distributed intelligent agents

NASA Technical Reports Server (NTRS)

Adler, Richard M.

1992-01-01

A process-oriented control model is described for distributed problem solving. The model coordinates the transfer and manipulation of information across independent networked applications, both intelligent and conventional. The model was implemented using SOCIAL, a set of object-oriented tools for distributing computing. Complex sequences of distributed tasks are specified in terms of high level scripts. Scripts are executed by SOCIAL objects called Manager Agents, which realize an intelligent coordination model that routes individual tasks to suitable server applications across the network. These tools are illustrated in a prototype distributed system for decision support of ground operations for NASA's Space Shuttle fleet.
Kidney segmentation in CT sequences using graph cuts based active contours model and contextual continuity.

PubMed

Zhang, Pin; Liang, Yanmei; Chang, Shengjiang; Fan, Hailun

2013-08-01

Accurate segmentation of renal tissues in abdominal computed tomography (CT) image sequences is an indispensable step for computer-aided diagnosis and pathology detection in clinical applications. In this study, the goal is to develop a radiology tool to extract renal tissues in CT sequences for the management of renal diagnosis and treatments. In this paper, the authors propose a new graph-cuts-based active contours model with an adaptive width of narrow band for kidney extraction in CT image sequences. Based on graph cuts and contextual continuity, the segmentation is carried out slice-by-slice. In the first stage, the middle two adjacent slices in a CT sequence are segmented interactively based on the graph cuts approach. Subsequently, the deformable contour evolves toward the renal boundaries by the proposed model for the kidney extraction of the remaining slices. In this model, the energy function combining boundary with regional information is optimized in the constructed graph and the adaptive search range is determined by contextual continuity and the object size. In addition, in order to reduce the complexity of the min-cut computation, the nodes in the graph only have n-links for fewer edges. The total 30 CT images sequences with normal and pathological renal tissues are used to evaluate the accuracy and effectiveness of our method. The experimental results reveal that the average dice similarity coefficient of these image sequences is from 92.37% to 95.71% and the corresponding standard deviation for each dataset is from 2.18% to 3.87%. In addition, the average automatic segmentation time for one kidney in each slice is about 0.36 s. Integrating the graph-cuts-based active contours model with contextual continuity, the algorithm takes advantages of energy minimization and the characteristics of image sequences. The proposed method achieves effective results for kidney segmentation in CT sequences.

Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute

PubMed Central

2011-01-01

Background Increasingly large amounts of DNA sequencing data are being generated within the Wellcome Trust Sanger Institute (WTSI). The traditional file system struggles to handle these increasing amounts of sequence data. A good data management system therefore needs to be implemented and integrated into the current WTSI infrastructure. Such a system enables good management of the IT infrastructure of the sequencing pipeline and allows biologists to track their data. Results We have chosen a data grid system, iRODS (Rule-Oriented Data management systems), to act as the data management system for the WTSI. iRODS provides a rule-based system management approach which makes data replication much easier and provides extra data protection. Unlike the metadata provided by traditional file systems, the metadata system of iRODS is comprehensive and allows users to customize their own application level metadata. Users and IT experts in the WTSI can then query the metadata to find and track data. The aim of this paper is to describe how we designed and used (from both system and user viewpoints) iRODS as a data management system. Details are given about the problems faced and the solutions found when iRODS was implemented. A simple use case describing how users within the WTSI use iRODS is also introduced. Conclusions iRODS has been implemented and works as the production system for the sequencing pipeline of the WTSI. Both biologists and IT experts can now track and manage data, which could not previously be achieved. This novel approach allows biologists to define their own metadata and query the genomic data using those metadata. PMID:21906284
A CACNA1D mutation in a patient with persistent hyperinsulinaemic hypoglycaemia, heart defects, and severe hypotonia.

PubMed

Flanagan, S E; Vairo, F; Johnson, M B; Caswell, R; Laver, T W; Lango Allen, H; Hussain, K; Ellard, S

2017-06-01

Congenital hyperinsulinaemic hypoglycaemia (HH) can occur in isolation or it may present as part of a wider syndrome. For approximately 40%-50% of individuals with this condition, sequence analysis of the known HH genes identifies a causative mutation. Identifying the underlying genetic aetiology in the remaining cases is important as a genetic diagnosis will inform on recurrence risk, may guide medical management and will provide valuable insights into β-cell physiology. We sequenced the exome of a child with persistent diazoxide-responsive HH, mild aortic insufficiency, severe hypotonia, and developmental delay as well as the unaffected parents. This analysis identified a de novo mutation, p.G403D, in the proband's CACNA1D gene. CACNA1D encodes the main L-type voltage-gated calcium channel in the pancreatic β-cell, a key component of the insulin secretion pathway. The p.G403D mutation had been reported previously as an activating mutation in an individual with primary hyper-aldosteronism, neuromuscular abnormalities, and transient hypoglycaemia. Sequence analysis of the CACNA1D gene in 60 further cases with HH did not identify a pathogenic mutation. Identification of an activating CACNA1D mutation in a second patient with congenital HH confirms the aetiological role of CACNA1D mutations in this disorder. A genetic diagnosis is important as treatment with a calcium channel blocker may be an option for the medical management of this patient. © 2017 The Authors. Pediatric Diabetes published by John Wiley & Sons Ltd.
Using information content and base frequencies to distinguish mutations from genetic polymorphisms in splice junction recognition sites.

PubMed

Rogan, P K; Schneider, T D

1995-01-01

Predicting the effects of nucleotide substitutions in human splice sites has been based on analysis of consensus sequences. We used a graphic representation of sequence conservation and base frequency, the sequence logo, to demonstrate that a change in a splice acceptor of hMSH2 (a gene associated with familial nonpolyposis colon cancer) probably does not reduce splicing efficiency. This confirms a population genetic study that suggested that this substitution is a genetic polymorphism. The information theory-based sequence logo is quantitative and more sensitive than the corresponding splice acceptor consensus sequence for detection of true mutations. Information analysis may potentially be used to distinguish polymorphisms from mutations in other types of transcriptional, translational, or protein-coding motifs.
76 FR 76622 - Federal Management Regulation; Motor Vehicle Management

Federal Register 2010, 2011, 2012, 2013, 2014

2011-12-08

...; Docket 2011-0011; Sequence 2] RIN 3090-AJ14 Federal Management Regulation; Motor Vehicle Management... Administration is amending the Federal Management Regulation (FMR) by revising current policy on the definitions... CONTACT: For clarification of content, contact Mr. James Vogelsinger, Director, Motor Vehicle Management...
Transcriptome analysis in Concholepas concholepas (Gastropoda, Muricidae): mining and characterization of new genomic and molecular markers.

PubMed

Cárdenas, Leyla; Sánchez, Roland; Gomez, Daniela; Fuenzalida, Gonzalo; Gallardo-Escárate, Cristián; Tanguy, Arnaud

2011-09-01

The marine gastropod Concholepas concholepas, locally known as the "loco", is the main target species of the benthonic Chilean fisheries. Genetic and genomic tools are necessary to study the genome of this species in order to understand the molecular basis of its development, growth, and other key traits to improve the management strategies and to identify local adaptation to prevent loss of biodiversity. Here, we use pyrosequencing technologies to generate the first transcriptomic database from adult specimens of the loco. After trimming, a total of 140,756 Expressed Sequence Tag sequences were achieved. Clustering and assembly analysis identified 19,219 contigs and 105,435 singleton sequences. BlastN analysis showed a significant identity with Expressed Sequence Tags of different gastropod species available in public databases. Similarly, BlastX results showed that only 895 out of the total 124,654 had significant hits and may represent novel genes for marine gastropods. From this database, simple sequence repeat motifs were also identified and a total of 38 primer pairs were designed and tested to assess their potential as informative markers and to investigate their cross-species amplification in different related gastropod species. This dataset represents the first publicly available 454 data for a marine gastropod endemic to the southeastern Pacific coast, providing a valuable transcriptomic resource for future efforts of gene discovery and development of functional markers in other marine gastropods. Copyright © 2011 Elsevier B.V. All rights reserved.
Variation in faecal microbiota in a group of horses managed at pasture over a 12-month period.

PubMed

Salem, Shebl E; Maddox, Thomas W; Berg, Adam; Antczak, Philipp; Ketley, Julian M; Williams, Nicola J; Archer, Debra C

2018-05-31

Colic (abdominal pain) is a common cause of mortality in horses. Change in management of horses is associated with increased colic risk and seasonal patterns of increased risk have been identified. Shifts in gut microbiota composition in response to management change have been proposed as one potential underlying mechanism for colic. However, the intestinal microbiota in normal horses and how this varies over different seasons has not previously been investigated. In this study the faecal microbiota composition was studied over 12 months in a population of horses managed at pasture with minimal changes in management. We hypothesised that gut microbiota would be stable in this population over time. Faecal samples were collected every 14 days from 7 horses for 52 weeks and the faecal microbiota was characterised by next-generation sequencing of 16S rRNA genes. The faecal microbiota was dominated by members of the phylum Firmicutes and Bacteroidetes throughout. Season, supplementary forage and ambient weather conditions were significantly associated with change in the faecal microbiota composition. These results provide important baseline information demonstrating physiologic variation in the faecal microbiota of normal horses over a 12-month period without development of colic.
Management of familial cancer: sequencing, surveillance and society.

PubMed

Samuel, Nardin; Villani, Anita; Fernandez, Conrad V; Malkin, David

2014-12-01

The clinical management of familial cancer begins with recognition of patterns of cancer occurrence suggestive of genetic susceptibility in a proband or pedigree, to enable subsequent investigation of the underlying DNA mutations. In this regard, next-generation sequencing of DNA continues to transform cancer diagnostics, by enabling screening for cancer-susceptibility genes in the context of known and emerging familial cancer syndromes. Increasingly, not only are candidate cancer genes sequenced, but also entire 'healthy' genomes are mapped in children with cancer and their family members. Although large-scale genomic analysis is considered intrinsic to the success of cancer research and discovery, a number of accompanying ethical and technical issues must be addressed before this approach can be adopted widely in personalized therapy. In this Perspectives article, we describe our views on how the emergence of new sequencing technologies and cancer surveillance strategies is altering the framework for the clinical management of hereditary cancer. Genetic counselling and disclosure issues are discussed, and strategies for approaching ethical dilemmas are proposed.
A methodological approach for deriving regional crop rotations as basis for the assessment of the impact of agricultural strategies using soil erosion as example.

PubMed

Lorenz, Marco; Fürst, Christine; Thiel, Enrico

2013-09-01

Regarding increasing pressures by global societal and climate change, the assessment of the impact of land use and land management practices on land degradation and the related decrease in sustainable provision of ecosystem services gains increasing interest. Existing approaches to assess agricultural practices focus on the assessment of single crops or statistical data because spatially explicit information on practically applied crop rotations is mostly not available. This provokes considerable uncertainties in crop production models as regional specifics have to be neglected or cannot be considered in an appropriate way. In a case study in Saxony, we developed an approach to (i) derive representative regional crop rotations by combining different data sources and expert knowledge. This includes the integration of innovative crop sequences related to bio-energy production or organic farming and different soil tillage, soil management and soil protection techniques. Furthermore, (ii) we developed a regionalization approach for transferring crop rotations and related soil management strategies on the basis of statistical data and spatially explicit data taken from so called field blocks. These field blocks are the smallest spatial entity for which agricultural practices must be reported to apply for agricultural funding within the frame of the European Agricultural Fund for Rural Development (EAFRD) program. The information was finally integrated into the spatial decision support tool GISCAME to assess and visualize in spatially explicit manner the impact of alternative agricultural land use strategies on soil erosion risk and ecosystem services provision. Objective of this paper is to present the approach how to create spatially explicit information on agricultural management practices for a study area around Dresden, the capital of the German Federal State Saxony. Copyright © 2013 Elsevier Ltd. All rights reserved.
WebAlchemist: a Web transcoding system for mobile Web access in handheld devices

NASA Astrophysics Data System (ADS)

Whang, Yonghyun; Jung, Changwoo; Kim, Jihong; Chung, Sungkwon

2001-11-01

In this paper, we describe the design and implementation of WebAlchemist, a prototype web transcoding system, which automatically converts a given HTML page into a sequence of equivalent HTML pages that can be properly displayed on a hand-held device. The Web/Alchemist system is based on a set of HTML transcoding heuristics managed by the Transcoding Manager (TM) module. In order to tackle difficult-to-transcode pages such as ones with large or complex table structures, we have developed several new transcoding heuristics that extract partial semantics from syntactic information such as the table width, font size and cascading style sheet. Subjective evaluation results using popular HTML pages (such as the CNN home page) show that WebAlchemist generates readable, structure-preserving transcoded pages, which can be properly displayed on hand-held devices.
Protein Information Resource: a community resource for expert annotation of protein data

PubMed Central

Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy

2001-01-01

The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-International databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041
75 FR 51392 - Federal Management Regulation; Transportation Management

Federal Register 2010, 2011, 2012, 2013, 2014

2010-08-20

...; Docket Number 2010-0011, sequence 1] RIN 3090-AJ03 Federal Management Regulation; Transportation Management AGENCY: Office of Governmentwide Policy, General Services Administration (GSA). ACTION: Final rule. SUMMARY: The General Services Administration (GSA) is amending the Federal Management Regulation (FMR) by...
Information Topics of Greatest Interest for Return of Genome Sequencing Results among Women Diagnosed with Breast Cancer at a Young Age.

PubMed

Seo, Joann; Ivanovich, Jennifer; Goodman, Melody S; Biesecker, Barbara B; Kaphingst, Kimberly A

2017-06-01

We investigated what information women diagnosed with breast cancer at a young age would want to learn when genome sequencing results are returned. We conducted 60 semi-structured interviews with women diagnosed with breast cancer at age 40 or younger. We examined what specific information participants would want to learn across result types and for each type of result, as well as how much information they would want. Genome sequencing was not offered to participants as part of the study. Two coders independently coded interview transcripts; analysis was conducted using NVivo10. Across result types, participants wanted to learn about health implications, risk and prevalence in quantitative terms, causes of variants, and causes of diseases. Participants wanted to learn actionable information for variants affecting risk of preventable or treatable disease, medication response, and carrier status. The amount of desired information differed for variants affecting risk of unpreventable or untreatable disease, with uncertain significance, and not health-related. Women diagnosed with breast cancer at a young age recognize the value of genome sequencing results in identifying potential causes and effective treatments and expressed interest in using the information to help relatives and to further understand their other health risks. Our findings can inform the development of effective feedback strategies for genome sequencing that meet patients' information needs and preferences.
Delayed diagnosis of a patient with Usher syndrome 1C in a Louisiana Acadian family highlights the necessity of timely genetic testing for the diagnosis and management of congenital hearing loss.

PubMed

Umrigar, Ayesha; Musso, Amanda; Mercer, Danielle; Hurley, Annette; Glausier, Cassondra; Bakeer, Mona; Marble, Michael; Hicks, Chindo; Tsien, Fern

2017-01-01

Advances in sequencing technologies and increased understanding of the contribution of genetics to congenital sensorineural hearing loss have led to vastly improved outcomes for patients and their families. Next-generation sequencing and diagnostic panels have become increasingly reliable and less expensive for clinical use. Despite these developments, the diagnosis of genetic sensorineural hearing loss still presents challenges for healthcare providers. Inherited sensorineural hearing loss has high levels of genetic heterogeneity and variable expressivity. Additionally, syndromic hearing loss (hearing loss and additional clinical abnormalities) should be distinguished from non-syndromic (hearing loss is the only clinical symptom). Although the diagnosis of genetic sensorineural hearing loss can be challenging, the patient's family history and ethnicity may provide critical information, as certain genetic mutations are more common in specific ethnic populations. The early identification of the cause of deafness can benefit patients and their families by estimating recurrence risks for future family planning and offering the proper interventions to improve their quality of life. Collaboration between pediatricians, audiologists, otolaryngologists, geneticists, and other specialists are essential in the diagnosis and management of patients with hearing disorders. An early diagnosis is vital for proper management and care, as some clinical manifestations of syndromic sensorineural hearing loss are not apparent at birth and have a delayed age of onset. We present a case of Usher syndrome (congenital deafness and childhood-onset blindness) illustrating the challenges encountered in the diagnosis and management of children presenting with congenital genetic sensorineural hearing loss, along with helpful resources for clinicians and families.
Serial data correlator/code translator

NASA Technical Reports Server (NTRS)

Morgan, L. E. (Inventor)

1982-01-01

A system for analyzing asynchronous signals containing bits of information for ensuring the validity of said signals, by sampling each bit of information a plurality of times, and feeding the sampled pieces of bits of information into a sequence controlled is described. The sequence controller has a plurality of maps or programs through which the sampled pieces of bits are stepped so as to identify the particular bit of information and determine the validity and phase of the bit. The step in which the sequence controller is clocked is controlled by a storage register. A data decoder decodes the information fed out of the storage register and feeds such information to shift registers for storage.
openBIS: a flexible framework for managing and analyzing complex data in biology research

PubMed Central

2011-01-01

Background Modern data generation techniques used in distributed systems biology research projects often create datasets of enormous size and diversity. We argue that in order to overcome the challenge of managing those large quantitative datasets and maximise the biological information extracted from them, a sound information system is required. Ease of integration with data analysis pipelines and other computational tools is a key requirement for it. Results We have developed openBIS, an open source software framework for constructing user-friendly, scalable and powerful information systems for data and metadata acquired in biological experiments. openBIS enables users to collect, integrate, share, publish data and to connect to data processing pipelines. This framework can be extended and has been customized for different data types acquired by a range of technologies. Conclusions openBIS is currently being used by several SystemsX.ch and EU projects applying mass spectrometric measurements of metabolites and proteins, High Content Screening, or Next Generation Sequencing technologies. The attributes that make it interesting to a large research community involved in systems biology projects include versatility, simplicity in deployment, scalability to very large data, flexibility to handle any biological data type and extensibility to the needs of any research domain. PMID:22151573
BEAUTY: an enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results.

PubMed

Worley, K C; Wiese, B A; Smith, R F

1995-09-01

BEAUTY (BLAST enhanced alignment utility) is an enhanced version of the NCBI's BLAST data base search tool that facilitates identification of the functions of matched sequences. We have created new data bases of conserved regions and functional domains for protein sequences in NCBI's Entrez data base, and BEAUTY allows this information to be incorporated directly into BLAST search results. A Conserved Regions Data Base, containing the locations of conserved regions within Entrez protein sequences, was constructed by (1) clustering the entire data base into families, (2) aligning each family using our PIMA multiple sequence alignment program, and (3) scanning the multiple alignments to locate the conserved regions within each aligned sequence. A separate Annotated Domains Data Base was constructed by extracting the locations of all annotated domains and sites from sequences represented in the Entrez, PROSITE, BLOCKS, and PRINTS data bases. BEAUTY performs a BLAST search of those Entrez sequences with conserved regions and/or annotated domains. BEAUTY then uses the information from the Conserved Regions and Annotated Domains data bases to generate, for each matched sequence, a schematic display that allows one to directly compare the relative locations of (1) the conserved regions, (2) annotated domains and sites, and (3) the locally aligned regions matched in the BLAST search. In addition, BEAUTY search results include World-Wide Web hypertext links to a number of external data bases that provide a variety of additional types of information on the function of matched sequences. This convenient integration of protein families, conserved regions, annotated domains, alignment displays, and World-Wide Web resources greatly enhances the biological informativeness of sequence similarity searches. BEAUTY searches can be performed remotely on our system using the "BCM Search Launcher" World-Wide Web pages (URL is < http:/ /gc.bcm.tmc.edu:8088/ search-launcher/launcher.html > ).
Studying long 16S rDNA sequences with ultrafast-metagenomic sequence classification using exact alignments (Kraken).

PubMed

Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco

2016-03-01

Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences. Copyright © 2016 Elsevier B.V. All rights reserved.
The String Stability of a Trajectory-Based Interval Management Algorithm in the Midterm Airspace

NASA Technical Reports Server (NTRS)

Swieringa, Kurt A.

2015-01-01

NASA's first Air Traffic Management (ATM) Technology Demonstration (ATD-1) was created to facilitate the transition of mature ATM technologies from the laboratory to operational use. The technologies selected for demonstration are the Traffic Management Advisor with Terminal Metering (TMA-TM), which provides precise time-based scheduling in the terminal airspace; Controller Managed Spacing (CMS), which provides terminal controllers with decision support tools enabling precise schedule conformance; and Interval Management (IM), which consists of flight deck automation that enables aircraft to achieve or maintain a precise spacing interval behind a target aircraft. As the percentage of IM equipped aircraft increases, controllers may provide IM clearances to sequences, or strings, of IM-equipped aircraft. It is important for these strings to maintain stable performance. This paper describes an analytic analysis of the string stability of the latest version of NASA's IM algorithm and a fast-time simulation designed to characterize the string performance of the IM algorithm. The analytic analysis showed that the spacing algorithm has stable poles, indicating that a spacing error perturbation will be reduced as a function of string position. The fast-time simulation investigated IM operations at two airports using constraints associated with the midterm airspace, including limited information of the target aircraft's intended speed profile and limited information of the wind forecast on the target aircraft's route. The results of the fast-time simulation demonstrated that the performance of the spacing algorithm is acceptable for strings of moderate length; however, there is some degradation in IM performance as a function of string position.
Influence of Data and Formulas on Trust in Information from Journal Articles in an Operating Room Management Course.

PubMed

Dexter, Franklin; Van Swol, Lyn M

2016-06-01

To make good decisions, operating room (OR) managers often act autocratically after obtaining expert advice. When such advice is provided by e-mail, attachments of research articles can be included. We performed a quasi-experimental study using an evaluation of 4 articles used in a 50-hour OR management course to assess how their content influences trust in the article's content, including its quality, usefulness, and reliability. There were (a) 2 articles containing data with specific examples of application for health systems and 2 without and (b) 2 articles containing appendices of formulas and 2 without. Some of the formulas in the readings were relatively complicated (e.g., stochastic optimization using the Lagrange method) and unlikely to be used by the subjects (i.e., they show what does not need to be done). Content complexity (±data, ±formulas) served both as sources of limitation in understanding the content and potentially as peripheral cues influencing perception of the content. The 2-page evaluation forms were generated with random sequences of articles and response items. The N = 17 subjects each completed 9 items about each of the 4 articles (i.e., answered 36 questions). The 9-item assessment of trust provided a unidimensional construct (Cronbach α, 0.94). Formulas in the articles significantly increased trust in the information (P = 0.0019). Presence of data did not significantly influence trust (P = 0.15). Therefore, when an expert sends e-mail to a manager who has completed this basic OR management science and asks a question, choosing a paper with formulas has no disadvantage.
76 FR 31545 - Federal Management Regulation; Motor Vehicle Management

Federal Register 2010, 2011, 2012, 2013, 2014

2011-06-01

...; Sequence 1] RIN 3090-AJ14 Federal Management Regulation; Motor Vehicle Management AGENCY: Office of... Services Administration is proposing to amend the Federal Management Regulation (FMR) by revising current....C. 553(a)(2) because it applies to agency management. However, this proposed rule is being published...

Identifying the Critical Time Period for Information Extraction when Recognizing Sequences of Play

ERIC Educational Resources Information Center

North, Jamie S.; Williams, A. Mark

2008-01-01

The authors attempted to determine the critical time period for information extraction when recognizing play sequences in soccer. Although efforts have been made to identify the perceptual information underpinning such decisions, no researchers have attempted to determine "when" this information may be extracted from the display. The authors…
Incongruent genetic connectivity patterns for VME indicator taxa: implications for the management of New Zealand's vulnerable marine ecosystems

NASA Astrophysics Data System (ADS)

Clark, M. R.; Gardner, J.; Holland, L.; Zeng, C.; Hamilton, J. S.; Rowden, A. A.

2016-02-01

In the New Zealand region vulnerable marine ecosystems (VMEs) are at risk from commercial fishing activity and future seabed mining. Understanding connectivity among VMEs is important for the design of effective spatial management strategies, i.e. a network of protected areas. To date however, genetic connectivity in the New Zealand region has rarely been documented. As part of a project developing habitat suitability models and spatial management options for VMEs we used DNA sequence data and microsatellite genotyping to assess genetic connectivity for a range of VME indicator taxa, including the coral Desmophyllum dianthus, and the sponges Poecilastra laminaris and Penares palmatoclada. Overall, patterns of connectivity were inconsistent amonst taxa. Nonetheless, genetic data from each taxon were relevant to inform management at a variety of spatial scales. D. dianthus populations in the Kermadec volcanic arc and the Louisville Seamount Chain were indistinguishable, highlighting the importance of considering source-sink dynamics between populations beyond the EEZ in conservation planning. Poecilastra laminaris populations showed significant divergence across the Chatham Rise, in contrast to P. palmatoclada, which had a uniform haplotypic distribution. However, both sponge species exhibited the highest genetic diversity on the Chatham Rise, suggesting that this area is a genetic hotspot. The spatial heterogeneity of genetic patterns of structure suggest that inclusion of several taxa is necessary to facilitate understanding of regional connectivity patterns, variation in which may be attributed to alternate life history strategies, local hydrodynamic regimes, or in some cases, suboptimal sample sizes. Our findings provide important information for use by environmental managers, including summary maps of genetic diversity and barriers to gene flow, which will be used in spatial management decision-support tools.
Identifying functionally informative evolutionary sequence profiles.

PubMed

Gil, Nelson; Fiser, Andras

2018-04-15

Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.
MaGelLAn 1.0: a software to facilitate quantitative and population genetic analysis of maternal inheritance by combination of molecular and pedigree information.

PubMed

Ristov, Strahil; Brajkovic, Vladimir; Cubric-Curik, Vlatka; Michieli, Ivan; Curik, Ino

2016-09-10

Identification of genes or even nucleotides that are responsible for quantitative and adaptive trait variation is a difficult task due to the complex interdependence between a large number of genetic and environmental factors. The polymorphism of the mitogenome is one of the factors that can contribute to quantitative trait variation. However, the effects of the mitogenome have not been comprehensively studied, since large numbers of mitogenome sequences and recorded phenotypes are required to reach the adequate power of analysis. Current research in our group focuses on acquiring the necessary mitochondria sequence information and analysing its influence on the phenotype of a quantitative trait. To facilitate these tasks we have produced software for processing pedigrees that is optimised for maternal lineage analysis. We present MaGelLAn 1.0 (maternal genealogy lineage analyser), a suite of four Python scripts (modules) that is designed to facilitate the analysis of the impact of mitogenome polymorphism on quantitative trait variation by combining molecular and pedigree information. MaGelLAn 1.0 is primarily used to: (1) optimise the sampling strategy for molecular analyses; (2) identify and correct pedigree inconsistencies; and (3) identify maternal lineages and assign the corresponding mitogenome sequences to all individuals in the pedigree, this information being used as input to any of the standard software for quantitative genetic (association) analysis. In addition, MaGelLAn 1.0 allows computing the mitogenome (maternal) effective population sizes and probability of mitogenome (maternal) identity that are useful for conservation management of small populations. MaGelLAn is the first tool for pedigree analysis that focuses on quantitative genetic analyses of mitogenome data. It is conceived with the purpose to significantly reduce the effort in handling and preparing large pedigrees for processing the information linked to maternal lines. The software source code, along with the manual and the example files can be downloaded at http://lissp.irb.hr/software/magellan-1-0/ and https://github.com/sristov/magellan .
The new emergency structure of the Istituto Nazionale di Geofisica e Vulcanologia during the L’Aquila 2009 seismic sequence: the contribution of the COES (Seismological Emergency Operation Center - Centro Operativo Emergenza Sismica)

NASA Astrophysics Data System (ADS)

Moretti, M.; Govoni, A.; Nostro, C.; La Longa, F.; Crescimbene, M.; Pignone, M.; Selvaggi, G.; Working Group, C.

2009-12-01

The Centro Nazionale Terremoti (CNT - National Earthquake Center), departement of Istituto Nazionale di Geofisica e Vulcanologia (INGV), has designed and setup a rapid response emergency structure to face the occurrence of strong earthquakes. This structure is composed by a real time satellite telemetered temporary seismic network (see Abruzzese et al., 2009 Fall AGU) used to improve the hypocentral locations of the INGV National Seismic Network, a stand alone temporary seismic network whose goal is primarily the high dynamic/high resolution data acquisition in the epicentral area and a mobile operational center, the COES (Centro Operativo Emergenza Sismica, Seismological Emergency Operational Center). The COES structure is a sort of mobile office equipped with satellite internet communication that can be rapidly installed in the disaster area to support all the INGV staff operative needs and to cooperate with the Civil Protection department (DPC) aggregating all the scientific information available on the seismic sequence and providing updated information to Civil Protection for the decision making stage during the emergency. The structure is equipped with a heavy load trolley that carries a 6x6 inflatable tent, a satellite router, an UPS, computers, monitors and furniture. The facility can be installed in a couple of hours in the epicentral area and provides a full featured office with dedicated internet connection and VPN access to the INGV data management center in Rome. Just after the April 6 2009 Mw 6.3 earthquake in L’Aquila (Central Italy) the COES has been installed upon request of the Italian Civil Protection (DPC) in the DICOMAC (Directorate of Command and Control - which is the central structure of the DPC that coordinates the emergency activities in the areas affected by the earthquake) located in the Guardia di Finanza headquarters in Coppito nearby L'Aquila (the same location that hosted the G8 meeting). The COES produces real time reports on the seismic sequence evolution constituted by a detailed sequence map, histograms showing the evolution of the magnitude and the number of earthquakes in time and a daily event list. Moreover, it is possible to see the real time hypocentral locations using “SISMAP” (Doumaz et al., 2008), the same tool used in the Seismic Monitoring Center in Rome. During the emergency the COES has been also a reference information point for all people involved in the crisis management and has provided also psychological support to the rescuers and to the earthquake affected population. This education and outreach activity has proved to be extremely effective. References Abruzzese L. et al., AGU Fall Meeting 2009. Doumaz F. et al., Geomedia, speciale geologia, pp. 10-13, 2008.
Comparison of traditional phenotypic identification methods with partial 5' 16S rRNA gene sequencing for species-level identification of nonfermenting Gram-negative bacilli.

PubMed

Cloud, Joann L; Harmsen, Dag; Iwen, Peter C; Dunn, James J; Hall, Gerri; Lasala, Paul Rocco; Hoggan, Karen; Wilson, Deborah; Woods, Gail L; Mellmann, Alexander

2010-04-01

Correct identification of nonfermenting Gram-negative bacilli (NFB) is crucial for patient management. We compared phenotypic identifications of 96 clinical NFB isolates with identifications obtained by 5' 16S rRNA gene sequencing. Sequencing identified 88 isolates (91.7%) with >99% similarity to a sequence from the assigned species; 61.5% of sequencing results were concordant with phenotypic results, indicating the usability of sequencing to identify NFB.
Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas

The number of genomes from uncultivated microbes will soon surpass the number of isolate genomes in public databases (Hugenholtz, Skarshewski, & Parks, 2016). Technological advancements in high-throughput sequencing and assembly, including single-cell genomics and the computational extraction of genomes from metagenomes (GFMs), are largely responsible. Here we propose community standards for reporting the Minimum Information about a Single-Cell Genome (MIxS-SCG) and Minimum Information about Genomes extracted From Metagenomes (MIxS-GFM) specific for Bacteria and Archaea. The standards have been developed in the context of the International Genomics Standards Consortium (GSC) community (Field et al., 2014) and can be viewed as amore » supplement to other GSC checklists including the Minimum Information about a Genome Sequence (MIGS), Minimum information about a Metagenomic Sequence(s) (MIMS) (Field et al., 2008) and Minimum Information about a Marker Gene Sequence (MIMARKS) (P. Yilmaz et al., 2011). Community-wide acceptance of MIxS-SCG and MIxS-GFM for Bacteria and Archaea will enable broad comparative analyses of genomes from the majority of taxa that remain uncultivated, improving our understanding of microbial function, ecology, and evolution.« less
MIPS: analysis and annotation of proteins from whole genomes

PubMed Central

Mewes, H. W.; Amid, C.; Arnold, R.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Münsterkötter, M.; Pagel, P.; Strack, N.; Stümpflen, V.; Warfsmann, J.; Ruepp, A.

2004-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein–protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354
MIPS: analysis and annotation of proteins from whole genomes.

PubMed

Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

2004-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
From Information Management to Information Visualization

PubMed Central

Karami, Mahtab

2016-01-01

Summary Objective The development and implementation of a dashboard of medical imaging department (MID) performance indicators. Method Several articles discussing performance measures of imaging departments were searched for this study. All the related measures were extracted. Then, a panel of imaging experts were asked to rate these measures with an open ended question to seek further potential indicators. A second round was performed to confirm the performance rating. The indicators and their ratings were then reviewed by an executive panel. Based on the final panel’s rating, a list of indicators to be used was developed. A team of information technology consultants were asked to determine a set of user interface requirements for the building of the dashboard. In the first round, based on the panel’s rating, a list of main features or requirements to be used was determined. Next, Qlikview was utilized to implement the dashboard to visualize a set of selected KPI metrics. Finally, an evaluation of the dashboard was performed. Results 92 MID indicators were identified. On top of this, 53 main user interface requirements to build of the prototype of dashboard were determined. Then, the project team successfully implemented a prototype of radiology management dashboards into study site. The visual display that was designed was rated highly by users. Conclusion To develop a dashboard, management of information is essential. It is recommended that a quality map be designed for the MID. It can be used to specify the sequence of activities, their related indicators and required data for calculating these indicators. To achieve both an effective dashboard and a comprehensive view of operations, it is necessary to design a data warehouse for gathering data from a variety of systems. Utilizing interoperability standards for exchanging data among different systems can be also effective in this regard. PMID:27437043
Applications of statistical physics and information theory to the analysis of DNA sequences

NASA Astrophysics Data System (ADS)

Grosse, Ivo

2000-10-01

DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.
Golay sequences coded coherent optical OFDM for long-haul transmission

NASA Astrophysics Data System (ADS)

Qin, Cui; Ma, Xiangrong; Hua, Tao; Zhao, Jing; Yu, Huilong; Zhang, Jian

2017-09-01

We propose to use binary Golay sequences in coherent optical orthogonal frequency division multiplexing (CO-OFDM) to improve the long-haul transmission performance. The Golay sequences are generated by binary Reed-Muller codes, which have low peak-to-average power ratio and certain error correction capability. A low-complexity decoding algorithm for the Golay sequences is then proposed to recover the signal. Under same spectral efficiency, the QPSK modulated OFDM with binary Golay sequences coding with and without discrete Fourier transform (DFT) spreading (DFTS-QPSK-GOFDM and QPSK-GOFDM) are compared with the normal BPSK modulated OFDM with and without DFT spreading (DFTS-BPSK-OFDM and BPSK-OFDM) after long-haul transmission. At a 7% forward error correction code threshold (Q2 factor of 8.5 dB), it is shown that DFTS-QPSK-GOFDM outperforms DFTS-BPSK-OFDM by extending the transmission distance by 29% and 18%, in non-dispersion managed and dispersion managed links, respectively.
Unipro UGENE: a unified bioinformatics toolkit.

PubMed

Okonechnikov, Konstantin; Golosova, Olga; Fursov, Mikhail

2012-04-15

Unipro UGENE is a multiplatform open-source software with the main goal of assisting molecular biologists without much expertise in bioinformatics to manage, analyze and visualize their data. UGENE integrates widely used bioinformatics tools within a common user interface. The toolkit supports multiple biological data formats and allows the retrieval of data from remote data sources. It provides visualization modules for biological objects such as annotated genome sequences, Next Generation Sequencing (NGS) assembly data, multiple sequence alignments, phylogenetic trees and 3D structures. Most of the integrated algorithms are tuned for maximum performance by the usage of multithreading and special processor instructions. UGENE includes a visual environment for creating reusable workflows that can be launched on local resources or in a High Performance Computing (HPC) environment. UGENE is written in C++ using the Qt framework. The built-in plugin system and structured UGENE API make it possible to extend the toolkit with new functionality. UGENE binaries are freely available for MS Windows, Linux and Mac OS X at http://ugene.unipro.ru/download.html. UGENE code is licensed under the GPLv2; the information about the code licensing and copyright of integrated tools can be found in the LICENSE.3rd_party file provided with the source bundle.
Discovery of sex-related genes through high-throughput transcriptome sequencing from the salmon louse Caligus rogercresseyi.

PubMed

Farlora, Rodolfo; Araya-Garay, José; Gallardo-Escárate, Cristian

2014-06-01

Understanding the molecular underpinnings involved in the reproduction of the salmon louse is critical for designing novel strategies of pest management for this ectoparasite. However, genomic information on sex-related genes is still limited. In the present work, sex-specific gene transcription was revealed in the salmon louse Caligus rogercresseyi using high-throughput Illumina sequencing. A total of 30,191,914 and 32,292,250 high quality reads were generated for females and males, and these were de novo assembled into 32,173 and 38,177 contigs, respectively. Gene ontology analysis showed a pattern of higher expression in the female as compared to the male transcriptome. Based on our sequence analysis and known sex-related proteins, several genes putatively involved in sex differentiation, including Dmrt3, FOXL2, VASA, and FEM1, and other potentially significant candidate genes in C. rogercresseyi, were identified for the first time. In addition, the occurrence of SNPs in several differentially expressed contigs annotating for sex-related genes was found. This transcriptome dataset provides a useful resource for future functional analyses, opening new opportunities for sea lice pest control. Copyright © 2014 Elsevier B.V. All rights reserved.
Whose genome is it, anyway?

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marshall, E.

1996-09-27

The genome program has issued guidelines to ensure that sequencing is done on DNA from diverse sources who have given informed consent and are anonymous. Most current sources don`t meet those criteria. It may be the first question every nonexpert asks on learning about the Human Genome Project: Whose genome are we studying, anyway? It sounds naive, says one government scientist-so naive, in fact, that {open_quotes}we chuckle as we explain that we aren`t sequencing anyone`s genome in particular; we`re sequencing a representative genome{close_quotes} made up of a mosaic of DNA from a variety of anonymous sources. And Bruce Birren, amore » clone-maker now at the Massachusetts Institute of Technology`s (MIT`s) Whitehead Center for Genome Research says: {open_quotes}We spent many years pooh-poohing the question{close_quotes} of whose genome would be stored in the database. But now that labs have begun working on large stretches of human DNA-aiming to identify all 3 billion base pairs in the genetic code-the question no longer seems to laughable. To the distress of program managers in Bethesda, Maryland, the initial sources of DNA are not as diverse or as anonymous as they had assumed.« less
Human-In-The-Loop Investigation of Interoperability Between Terminal Sequencing and Spacing, Automated Terminal Proximity Alert, and Wake-Separation Recategorization

NASA Technical Reports Server (NTRS)

Callantine, Todd J.; Bienert, Nancy; Borade, Abhay; Gabriel, Conrad; Gujral, Vimmy; Jobe, Kim; Martin, Lynne; Omar, Faisal; Prevot, Thomas; Mercer, Joey

2016-01-01

A human-in-the-loop simulation study addressed terminal-area controller-workstation interface variations for interoperability between three new capabilities being introduced by the FAA. The capabilities are Terminal Sequencing and Spacing (TSAS), Automated Terminal Proximity Alert (ATPA), and wake-separation recategorization, or 'RECAT.' TSAS provides controllers with Controller-Managed Spacing (CMS) tools, including slot markers, speed advisories, and early/late indications, together with runway assignments and sequence numbers. ATPA provides automatic monitor, warning, and alert cones to inform controllers about spacing between aircraft on approach. ATPA cones are sized according to RECAT, an improved method of specifying wake-separation standards. The objective of the study was to identify potential issues and provide recommendations for integrating TSAS with ATPA and RECAT. Participants controlled arrival traffic under seven different display configurations, then tested an 'exploratory' configuration developed with participant input. All the display conditions were workable and acceptable, but controllers strongly preferred having the CMS tools available on Feeder positions, and both CMS tools and ATPA available on Final positions. Controllers found the integrated systems favorable and liked being able to tailor configurations to individual preferences.
Disclosing Medical Mistakes: A Communication Management Plan for Physicians

PubMed Central

Petronio, Sandra; Torke, Alexia; Bosslet, Gabriel; Isenberg, Steven; Wocial, Lucia; Helft, Paul R

2013-01-01

Introduction: There is a growing consensus that disclosure of medical mistakes is ethically and legally appropriate, but such disclosures are made difficult by medical traditions of concern about medical malpractice suits and by physicians’ own emotional reactions. Because the physician may have compelling reasons both to keep the information private and to disclose it to the patient or family, these situations can be conceptualized as privacy dilemmas. These dilemmas may create barriers to effectively addressing the mistake and its consequences. Although a number of interventions exist to address privacy dilemmas that physicians face, current evidence suggests that physicians tend to be slow to adopt the practice of disclosing medical mistakes. Methods: This discussion proposes a theoretically based, streamlined, two-step plan that physicians can use as an initial guide for conversations with patients about medical mistakes. The mistake disclosure management plan uses the communication privacy management theory. Results: The steps are 1) physician preparation, such as talking about the physician’s emotions and seeking information about the mistake, and 2) use of mistake disclosure strategies that protect the physician-patient relationship. These include the optimal timing, context of disclosure delivery, content of mistake messages, sequencing, and apology. A case study highlighted the disclosure process. Conclusion: This Mistake Disclosure Management Plan may help physicians in the early stages after mistake discovery to prepare for the initial disclosure of a medical mistakes. The next step is testing implementation of the procedures suggested. PMID:23704848
Expectation, information processing, and subjective duration.

PubMed

Simchy-Gross, Rhimmon; Margulis, Elizabeth Hellmuth

2018-01-01

In research on psychological time, it is important to examine the subjective duration of entire stimulus sequences, such as those produced by music (Teki, Frontiers in Neuroscience, 10, 2016). Yet research on the temporal oddball illusion (according to which oddball stimuli seem longer than standard stimuli of the same duration) has examined only the subjective duration of single events contained within sequences, not the subjective duration of sequences themselves. Does the finding that oddballs seem longer than standards translate to entire sequences, such that entire sequences that contain oddballs seem longer than those that do not? Is this potential translation influenced by the mode of information processing-whether people are engaged in direct or indirect temporal processing? Two experiments aimed to answer both questions using different manipulations of information processing. In both experiments, musical sequences either did or did not contain oddballs (auditory sliding tones). To manipulate information processing, we varied the task (Experiment 1), the sequence event structure (Experiments 1 and 2), and the sequence familiarity (Experiment 2) independently within subjects. Overall, in both experiments, the sequences that contained oddballs seemed shorter than those that did not when people were engaged in direct temporal processing, but longer when people were engaged in indirect temporal processing. These findings support the dual-process contingency model of time estimation (Zakay, Attention, Perception & Psychophysics, 54, 656-664, 1993). Theoretical implications for attention-based and memory-based models of time estimation, the pacemaker accumulator and coding efficiency hypotheses of time perception, and dynamic attending theory are discussed.
GeoProMT: A Collaborative Platform Supporting Natural Hazards Project Management From Assessment to Resilience

NASA Astrophysics Data System (ADS)

Renschler, C.; Sheridan, M. F.; Patra, A. K.

2008-05-01

The impact and consequences of extreme geophysical events (hurricanes, floods, wildfires, volcanic flows, mudflows, etc.) on properties and processes should be continuously assessed by a well-coordinated interdisciplinary research and outreach approach addressing risk assessment and resilience. Communication between various involved disciplines and stakeholders is the key to a successful implementation of an integrated risk management plan. These issues become apparent at the level of decision support tools for extreme events/disaster management in natural and managed environments. The Geospatial Project Management Tool (GeoProMT) is a collaborative platform for research and training to document and communicate the fundamental steps in transforming information for extreme events at various scales for analysis and management. GeoProMT is an internet-based interface for the management of shared geo-spatial and multi-temporal information such as measurements, remotely sensed images, and other GIS data. This tool enhances collaborative research activities and the ability to assimilate data from diverse sources by integrating information management. This facilitates a better understanding of natural processes and enhances the integrated assessment of resilience against both the slow and fast onset of hazard risks. Fundamental to understanding and communicating complex natural processes are: (a) representation of spatiotemporal variability, extremes, and uncertainty of environmental properties and processes in the digital domain, (b) transformation of their spatiotemporal representation across scales (e.g. interpolation, aggregation, disaggregation.) during data processing and modeling in the digital domain, and designing and developing tools for (c) geo-spatial data management, and (d) geo-spatial process modeling and effective implementation, and (e) supporting decision- and policy-making in natural resources and hazard management at various spatial and temporal scales of interest. GeoProMT is useful for researchers, practitioners, and decision-makers, because it provides an integrated environmental system assessment and data management approach that considers the spatial and temporal scales and variability in natural processes. Particularly in the occurrence or onset of extreme events it can utilize the latest data sources that are available at variable scales, combine them with existing information, and update assessment products such as risk and vulnerability assessment maps. Because integrated geo-spatial assessment requires careful consideration of all the steps in utilizing data, modeling and decision-making formats, each step in the sequence must be assessed in terms of how information is being scaled. At the process scale various geophysical models (e.g. TITAN, LAHARZ, or many other examples) are appropriate for incorporation in the tool. Some examples that illustrate our approach include: 1) coastal parishes impacted by Hurricane Rita (Southwestern Louisiana), 2) a watershed affected by extreme rainfall induced debris-flows (Madison County, Virginia; Panabaj, Guatemala; Casita, Nicaragua), and 3) the potential for pyroclastic flows to threaten a city (Tungurahua, Ecuador). This research was supported by the National Science Foundation.
SSHscreen and SSHdb, generic software for microarray based gene discovery: application to the stress response in cowpea

PubMed Central

2010-01-01

Background Suppression subtractive hybridization is a popular technique for gene discovery from non-model organisms without an annotated genome sequence, such as cowpea (Vigna unguiculata (L.) Walp). We aimed to use this method to enrich for genes expressed during drought stress in a drought tolerant cowpea line. However, current methods were inefficient in screening libraries and management of the sequence data, and thus there was a need to develop software tools to facilitate the process. Results Forward and reverse cDNA libraries enriched for cowpea drought response genes were screened on microarrays, and the R software package SSHscreen 2.0.1 was developed (i) to normalize the data effectively using spike-in control spot normalization, and (ii) to select clones for sequencing based on the calculation of enrichment ratios with associated statistics. Enrichment ratio 3 values for each clone showed that 62% of the forward library and 34% of the reverse library clones were significantly differentially expressed by drought stress (adjusted p value < 0.05). Enrichment ratio 2 calculations showed that > 88% of the clones in both libraries were derived from rare transcripts in the original tester samples, thus supporting the notion that suppression subtractive hybridization enriches for rare transcripts. A set of 118 clones were chosen for sequencing, and drought-induced cowpea genes were identified, the most interesting encoding a late embryogenesis abundant Lea5 protein, a glutathione S-transferase, a thaumatin, a universal stress protein, and a wound induced protein. A lipid transfer protein and several components of photosynthesis were down-regulated by the drought stress. Reverse transcriptase quantitative PCR confirmed the enrichment ratio values for the selected cowpea genes. SSHdb, a web-accessible database, was developed to manage the clone sequences and combine the SSHscreen data with sequence annotations derived from BLAST and Blast2GO. The self-BLAST function within SSHdb grouped redundant clones together and illustrated that the SSHscreen plots are a useful tool for choosing anonymous clones for sequencing, since redundant clones cluster together on the enrichment ratio plots. Conclusions We developed the SSHscreen-SSHdb software pipeline, which greatly facilitates gene discovery using suppression subtractive hybridization by improving the selection of clones for sequencing after screening the library on a small number of microarrays. Annotation of the sequence information and collaboration was further enhanced through a web-based SSHdb database, and we illustrated this through identification of drought responsive genes from cowpea, which can now be investigated in gene function studies. SSH is a popular and powerful gene discovery tool, and therefore this pipeline will have application for gene discovery in any biological system, particularly non-model organisms. SSHscreen 2.0.1 and a link to SSHdb are available from http://microarray.up.ac.za/SSHscreen. PMID:20359330

IG and TR single chain fragment variable (scFv) sequence analysis: a new advanced functionality of IMGT/V-QUEST and IMGT/HighV-QUEST.

PubMed

Giudicelli, Véronique; Duroux, Patrice; Kossida, Sofia; Lefranc, Marie-Paule

2017-06-26

IMGT®, the international ImMunoGeneTics information system® ( http://www.imgt.org ), was created in 1989 in Montpellier, France (CNRS and Montpellier University) to manage the huge and complex diversity of the antigen receptors, and is at the origin of immunoinformatics, a science at the interface between immunogenetics and bioinformatics. Immunoglobulins (IG) or antibodies and T cell receptors (TR) are managed and described in the IMGT® databases and tools at the level of receptor, chain and domain. The analysis of the IG and TR variable (V) domain rearranged nucleotide sequences is performed by IMGT/V-QUEST (online since 1997, 50 sequences per batch) and, for next generation sequencing (NGS), by IMGT/HighV-QUEST, the high throughput version of IMGT/V-QUEST (portal begun in 2010, 500,000 sequences per batch). In vitro combinatorial libraries of engineered antibody single chain Fragment variable (scFv) which mimic the in vivo natural diversity of the immune adaptive responses are extensively screened for the discovery of novel antigen binding specificities. However the analysis of NGS full length scFv (~850 bp) represents a challenge as they contain two V domains connected by a linker and there is no tool for the analysis of two V domains in a single chain. The functionality "Analyis of single chain Fragment variable (scFv)" has been implemented in IMGT/V-QUEST and, for NGS, in IMGT/HighV-QUEST for the analysis of the two V domains of IG and TR scFv. It proceeds in five steps: search for a first closest V-REGION, full characterization of the first V-(D)-J-REGION, then search for a second V-REGION and full characterization of the second V-(D)-J-REGION, and finally linker delimitation. For each sequence or NGS read, positions of the 5'V-DOMAIN, linker and 3'V-DOMAIN in the scFv are provided in the 'V-orientated' sense. Each V-DOMAIN is fully characterized (gene identification, sequence description, junction analysis, characterization of mutations and amino changes). The functionality is generic and can analyse any IG or TR single chain nucleotide sequence containing two V domains, provided that the corresponding species IMGT reference directory is available. The "Analysis of single chain Fragment variable (scFv)" implemented in IMGT/V-QUEST and, for NGS, in IMGT/HighV-QUEST provides the identification and full characterization of the two V domains of full-length scFv (~850 bp) nucleotide sequences from combinatorial libraries. The analysis can also be performed on concatenated paired chains of expressed antigen receptor IG or TR repertoires.
Brassica ASTRA: an integrated database for Brassica genomic research.

PubMed

Love, Christopher G; Robinson, Andrew J; Lim, Geraldine A C; Hopkins, Clare J; Batley, Jacqueline; Barker, Gary; Spangenberg, German C; Edwards, David

2005-01-01

Brassica ASTRA is a public database for genomic information on Brassica species. The database incorporates expressed sequences with Swiss-Prot and GenBank comparative sequence annotation as well as secondary Gene Ontology (GO) annotation derived from the comparison with Arabidopsis TAIR GO annotations. Simple sequence repeat molecular markers are identified within resident sequences and mapped onto the closely related Arabidopsis genome sequence. Bacterial artificial chromosome (BAC) end sequences derived from the Multinational Brassica Genome Project are also mapped onto the Arabidopsis genome sequence enabling users to identify candidate Brassica BACs corresponding to syntenic regions of Arabidopsis. This information is maintained in a MySQL database with a web interface providing the primary means of interrogation. The database is accessible at http://hornbill.cspp.latrobe.edu.au.
EGenBio: A Data Management System for Evolutionary Genomics and Biodiversity

PubMed Central

Nahum, Laila A; Reynolds, Matthew T; Wang, Zhengyuan O; Faith, Jeremiah J; Jonna, Rahul; Jiang, Zhi J; Meyer, Thomas J; Pollock, David D

2006-01-01

Background Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; ) to begin to address this. Description EGenBio is a system for manipulation and filtering of large numbers of sequences, integrating curated sequence alignments and phylogenetic trees, managing evolutionary analyses, and visualizing their output. EGenBio is organized into three conceptual divisions, Evolution, Genomics, and Biodiversity. The Genomics division includes tools for selecting pre-aligned sequences from different genes and species, and for modifying and filtering these alignments for further analysis. Species searches are handled through queries that can be modified based on a tree-based navigation system and saved. The Biodiversity division contains tools for analyzing individual sequences or sequence alignments, whereas the Evolution division contains tools involving phylogenetic trees. Alignments are annotated with analytical results and modification history using our PRAED format. A miscellaneous Tools section and Help framework are also available. EGenBio was developed around our comparative genomic research and a prototype database of mtDNA genomes. It utilizes MySQL-relational databases and dynamic page generation, and calls numerous custom programs. Conclusion EGenBio was designed to serve as a platform for tools and resources to ease combined analysis in evolution, genomics, and biodiversity. PMID:17118150
Deep Ion Torrent sequencing identifies soil fungal community shifts after frequent prescribed fires in a southeastern US forest ecosystem.

PubMed

Brown, Shawn P; Callaham, Mac A; Oliver, Alena K; Jumpponen, Ari

2013-12-01

Prescribed burning is a common management tool to control fuel loads, ground vegetation, and facilitate desirable game species. We evaluated soil fungal community responses to long-term prescribed fire treatments in a loblolly pine forest on the Piedmont of Georgia and utilized deep Internal Transcribed Spacer Region 1 (ITS1) amplicon sequencing afforded by the recent Ion Torrent Personal Genome Machine (PGM). These deep sequence data (19,000 + reads per sample after subsampling) indicate that frequent fires (3-year fire interval) shift soil fungus communities, whereas infrequent fires (6-year fire interval) permit system resetting to a state similar to that without prescribed fire. Furthermore, in nonmetric multidimensional scaling analyses, primarily ectomycorrhizal taxa were correlated with axes associated with long fire intervals, whereas soil saprobes tended to be correlated with the frequent fire recurrence. We conclude that (1) multiplexed Ion Torrent PGM analyses allow deep cost effective sequencing of fungal communities but may suffer from short read lengths and inconsistent sequence quality adjacent to the sequencing adaptor; (2) frequent prescribed fires elicit a shift in soil fungal communities; and (3) such shifts do not occur when fire intervals are longer. Our results emphasize the general responsiveness of these forests to management, and the importance of fire return intervals in meeting management objectives. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.
Event-related potentials in response to violations of content and temporal event knowledge.

PubMed

Drummer, Janna; van der Meer, Elke; Schaadt, Gesa

2016-01-08

Scripts that store knowledge of everyday events are fundamentally important for managing daily routines. Content event knowledge (i.e., knowledge about which events belong to a script) and temporal event knowledge (i.e., knowledge about the chronological order of events in a script) constitute qualitatively different forms of knowledge. However, there is limited information about each distinct process and the time course involved in accessing content and temporal event knowledge. Therefore, we analyzed event-related potentials (ERPs) in response to either correctly presented event sequences or event sequences that contained a content or temporal error. We found an N400, which was followed by a posteriorly distributed P600 in response to content errors in event sequences. By contrast, we did not find an N400 but an anteriorly distributed P600 in response to temporal errors in event sequences. Thus, the N400 seems to be elicited as a response to a general mismatch between an event and the established event model. We assume that the expectancy violation of content event knowledge, as indicated by the N400, induces the collapse of the established event model, a process indicated by the posterior P600. The expectancy violation of temporal event knowledge is assumed to induce an attempt to reorganize the event model in working memory, a process indicated by the frontal P600. Copyright © 2015 Elsevier Ltd. All rights reserved.
The"minimum information about an environmental sequence" (MIENS) specification

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yilmaz, P.; Kottmann, R.; Field, D.

We present the Genomic Standards Consortium's (GSC) 'Minimum Information about an ENvironmental Sequence' (MIENS) standard for describing marker genes. Adoption of MIENS will enhance our ability to analyze natural genetic diversity across the Tree of Life as it is currently being documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.
RStrucFam: a web server to associate structure and cognate RNA for RNA-binding proteins from sequence information.

PubMed

Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan

2016-10-07

RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential information pertaining to an RBP, like overall function annotations, are provided. The web server can be accessed at the following link: http://caps.ncbs.res.in/rstrucfam .
SCALCE: boosting sequence compression algorithms using locally consistent encoding

PubMed Central

Hach, Faraz; Numanagić, Ibrahim; Sahinalp, S Cenk

2012-01-01

Motivation: The high throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for the computational infrastructure. Data management, storage and analysis have become major logistical obstacles for those adopting the new platforms. The requirement for large investment for this purpose almost signalled the end of the Sequence Read Archive hosted at the National Center for Biotechnology Information (NCBI), which holds most of the sequence data generated world wide. Currently, most HTS data are compressed through general purpose algorithms such as gzip. These algorithms are not designed for compressing data generated by the HTS platforms; for example, they do not take advantage of the specific nature of genomic sequence data, that is, limited alphabet size and high similarity among reads. Fast and efficient compression algorithms designed specifically for HTS data should be able to address some of the issues in data management, storage and communication. Such algorithms would also help with analysis provided they offer additional capabilities such as random access to any read and indexing for efficient sequence similarity search. Here we present SCALCE, a ‘boosting’ scheme based on Locally Consistent Parsing technique, which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome. Results: Our tests indicate that SCALCE can improve the compression rate achieved through gzip by a factor of 4.19—when the goal is to compress the reads alone. In fact, on SCALCE reordered reads, gzip running time can improve by a factor of 15.06 on a standard PC with a single core and 6 GB memory. Interestingly even the running time of SCALCE + gzip improves that of gzip alone by a factor of 2.09. When compared with the recently published BEETL, which aims to sort the (inverted) reads in lexicographic order for improving bzip2, SCALCE + gzip provides up to 2.01 times better compression while improving the running time by a factor of 5.17. SCALCE also provides the option to compress the quality scores as well as the read names, in addition to the reads themselves. This is achieved by compressing the quality scores through order-3 Arithmetic Coding (AC) and the read names through gzip through the reordering SCALCE provides on the reads. This way, in comparison with gzip compression of the unordered FASTQ files (including reads, read names and quality scores), SCALCE (together with gzip and arithmetic encoding) can provide up to 3.34 improvement in the compression rate and 1.26 improvement in running time. Availability: Our algorithm, SCALCE (Sequence Compression Algorithm using Locally Consistent Encoding), is implemented in C++ with both gzip and bzip2 compression options. It also supports multithreading when gzip option is selected, and the pigz binary is available. It is available at http://scalce.sourceforge.net. Contact: fhach@cs.sfu.ca or cenk@cs.sfu.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23047557
Filovirus RefSeq Entries: Evaluation and Selection of Filovirus Type Variants, Type Sequences, and Names

PubMed Central

Kuhn, Jens H.; Andersen, Kristian G.; Bào, Yīmíng; Bavari, Sina; Becker, Stephan; Bennett, Richard S.; Bergman, Nicholas H.; Blinkova, Olga; Bradfute, Steven; Brister, J. Rodney; Bukreyev, Alexander; Chandran, Kartik; Chepurnov, Alexander A.; Davey, Robert A.; Dietzgen, Ralf G.; Doggett, Norman A.; Dolnik, Olga; Dye, John M.; Enterlein, Sven; Fenimore, Paul W.; Formenty, Pierre; Freiberg, Alexander N.; Garry, Robert F.; Garza, Nicole L.; Gire, Stephen K.; Gonzalez, Jean-Paul; Griffiths, Anthony; Happi, Christian T.; Hensley, Lisa E.; Herbert, Andrew S.; Hevey, Michael C.; Hoenen, Thomas; Honko, Anna N.; Ignatyev, Georgy M.; Jahrling, Peter B.; Johnson, Joshua C.; Johnson, Karl M.; Kindrachuk, Jason; Klenk, Hans-Dieter; Kobinger, Gary; Kochel, Tadeusz J.; Lackemeyer, Matthew G.; Lackner, Daniel F.; Leroy, Eric M.; Lever, Mark S.; Mühlberger, Elke; Netesov, Sergey V.; Olinger, Gene G.; Omilabu, Sunday A.; Palacios, Gustavo; Panchal, Rekha G.; Park, Daniel J.; Patterson, Jean L.; Paweska, Janusz T.; Peters, Clarence J.; Pettitt, James; Pitt, Louise; Radoshitzky, Sheli R.; Ryabchikova, Elena I.; Saphire, Erica Ollmann; Sabeti, Pardis C.; Sealfon, Rachel; Shestopalov, Aleksandr M.; Smither, Sophie J.; Sullivan, Nancy J.; Swanepoel, Robert; Takada, Ayato; Towner, Jonathan S.; van der Groen, Guido; Volchkov, Viktor E.; Volchkova, Valentina A.; Wahl-Jensen, Victoria; Warren, Travis K.; Warfield, Kelly L.; Weidmann, Manfred; Nichol, Stuart T.

2014-01-01

Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences. PMID:25256396
New insights into phosphorus management in agriculture--A crop rotation approach.

PubMed

Łukowiak, Remigiusz; Grzebisz, Witold; Sassenrath, Gretchen F

2016-01-15

This manuscript presents research results examining phosphorus (P) management in a soil–plant system for three variables: i) internal resources of soil available phosphorus, ii) cropping sequence, and iii) external input of phosphorus (manure, fertilizers). The research was conducted in long-term cropping sequences with oilseed rape (10 rotations) and maize (six rotations) over three consecutive growing seasons (2004/2005, 2005/2006, and 2006/2007) in a production farm on soils originated from Albic Luvisols in Poland. The soil available phosphorus pool, measured as calcium chloride extractable P (CCE-P), constituted 28% to 67% of the total phosphorus input (PTI) to the soil–plant system in the spring. Oilseed rape and maize dominant cropping sequences showed a significant potential to utilize the CCE-P pool within the soil profile. Cropping sequences containing oilseed rape significantly affected the CCE-P pool, and in turn contributed to the P(TI). The P(TI) uptake use efficiency was 50% on average. Therefore, the CCE-P pool should be taken into account as an important component of a sound and reliable phosphorus balance. The instability of the yield prediction, based on the P(TI), was mainly due to an imbalanced management of both farmyard manure and phosphorus fertilizer. Oilseed rape plants provide a significant positive impact on the CCE-P pool after harvest, improving the productive stability of the entire cropping sequence. This phenomenon was documented by the P(TI) increase during wheat cultivation following oilseed rape. The Unit Phosphorus Uptake index also showed a higher stability in oilseed rape cropping systems compared to rotations based on maize. Cropping sequences are a primary factor impacting phosphorus management. Judicious implementation of crop rotations can improve soil P resources, efficiency of crop P use, and crop yield and yield stability. Use of cropping sequences can reduce the need for external P sources such as farmyard manure and chemical fertilizers.
Genome sequencing of idiopathic pulmonary fibrosis in conjunction with a medical school human anatomy course.

PubMed

Kumar, Akash; Dougherty, Max; Findlay, Gregory M; Geisheker, Madeleine; Klein, Jason; Lazar, John; Machkovech, Heather; Resnick, Jesse; Resnick, Rebecca; Salter, Alexander I; Talebi-Liasi, Faezeh; Arakawa, Christopher; Baudin, Jacob; Bogaard, Andrew; Salesky, Rebecca; Zhou, Qian; Smith, Kelly; Clark, John I; Shendure, Jay; Horwitz, Marshall S

2014-01-01

Even in cases where there is no obvious family history of disease, genome sequencing may contribute to clinical diagnosis and management. Clinical application of the genome has not yet become routine, however, in part because physicians are still learning how best to utilize such information. As an educational research exercise performed in conjunction with our medical school human anatomy course, we explored the potential utility of determining the whole genome sequence of a patient who had died following a clinical diagnosis of idiopathic pulmonary fibrosis (IPF). Medical students performed dissection and whole genome sequencing of the cadaver. Gross and microscopic findings were more consistent with the fibrosing variant of nonspecific interstitial pneumonia (NSIP), as opposed to IPF per se. Variants in genes causing Mendelian disorders predisposing to IPF were not detected. However, whole genome sequencing identified several common variants associated with IPF, including a single nucleotide polymorphism (SNP), rs35705950, located in the promoter region of the gene encoding mucin glycoprotein MUC5B. The MUC5B promoter polymorphism was recently found to markedly elevate risk for IPF, though a particular association with NSIP has not been previously reported, nor has its contribution to disease risk previously been evaluated in the genome-wide context of all genetic variants. We did not identify additional predicted functional variants in a region of linkage disequilibrium (LD) adjacent to MUC5B, nor did we discover other likely risk-contributing variants elsewhere in the genome. Whole genome sequencing thus corroborates the association of rs35705950 with MUC5B dysregulation and interstitial lung disease. This novel exercise additionally served a unique mission in bridging clinical and basic science education.
[Influence of "prehistory" of sequential movements of the right and the left hand on reproduction: coding of positions, movements and sequence structure].

PubMed

Bobrova, E V; Liakhovetskiĭ, V A; Borshchevskaia, E R

2011-01-01

The dependence of errors during reproduction of a sequence of hand movements without visual feedback on the previous right- and left-hand performance ("prehistory") and on positions in space of sequence elements (random or ordered by the explicit rule) was analyzed. It was shown that the preceding information about the ordered positions of the sequence elements was used during right-hand movements, whereas left-hand movements were performed with involvement of the information about the random sequence. The data testify to a central mechanism of the analysis of spatial structure of sequence elements. This mechanism activates movement coding specific for the left hemisphere (vector coding) in case of an ordered sequence structure and positional coding specific for the right hemisphere in case of a random sequence structure.
Neisseria gonorrhoeae molecular typing for understanding sexual networks and antimicrobial resistance transmission: A systematic review.

PubMed

Town, Katy; Bolt, Hikaru; Croxford, Sara; Cole, Michelle; Harris, Simon; Field, Nigel; Hughes, Gwenda

2018-06-01

Neisseria gonorrhoeae (NG) is a significant global public health concern due to rising diagnoses rates and antimicrobial resistance. Molecular combined with epidemiological data have been used to understand the distribution and spread of NG, as well as relationships between cases in sexual networks, but the public health value gained from these studies is unclear. We conducted a systematic review to examine how molecular epidemiological studies have informed understanding of sexual networks and NG transmission, and subsequent public health interventions. Five research databases were systematically searched up to 31st March 2017 for studies that used sequence-based DNA typing methods, including whole genome sequencing, and linked molecular data to patient-level epidemiological data. Data were extracted and summarised to identify common themes. Of the 49 studies included, 82% used NG Multi-antigen Sequence Typing. Gender and sexual orientation were commonly used to characterise sexual networks that were inferred using molecular clusters; clusters predominantly of one patient group often contained a small number of isolates from other patient groups. Suggested public health applications included using these data to target interventions at specific populations, confirm outbreaks, and inform partner management, but these were mainly untested. Combining molecular and epidemiological data has provided insight into sexual mixing patterns, and dissemination of NG, but few studies have applied these findings to design or evaluate public health interventions. Future studies should focus on the application of molecular epidemiology in public health practice to provide evidence for how to prevent and control NG. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
HIITE: HIV-1 incidence and infection time estimator.

PubMed

Park, Sung Yong; Love, Tanzy M T; Kapoor, Shivankur; Lee, Ha Youn

2018-06-15

Around 2.1 million new HIV-1 infections were reported in 2015, alerting that the HIV-1 epidemic remains a significant global health challenge. Precise incidence assessment strengthens epidemic monitoring efforts and guides strategy optimization for prevention programs. Estimating the onset time of HIV-1 infection can facilitate optimal clinical management and identify key populations largely responsible for epidemic spread and thereby infer HIV-1 transmission chains. Our goal is to develop a genomic assay estimating the incidence and infection time in a single cross-sectional survey setting. We created a web-based platform, HIV-1 incidence and infection time estimator (HIITE), which processes envelope gene sequences using hierarchical clustering algorithms and informs the stage of infection, along with time since infection for incident cases. HIITE's performance was evaluated using 585 incident and 305 chronic specimens' envelope gene sequences collected from global cohorts including HIV-1 vaccine trial participants. HIITE precisely identified chronically infected individuals as being chronic with an error less than 1% and correctly classified 94% of recently infected individuals as being incident. Using a mixed-effect model, an incident specimen's time since infection was estimated from its single lineage diversity, showing 14% prediction error for time since infection. HIITE is the first algorithm to inform two key metrics from a single time point sequence sample. HIITE has the capacity for assessing not only population-level epidemic spread but also individual-level transmission events from a single survey, advancing HIV prevention and intervention programs. Web-based HIITE and source code of HIITE are available at http://www.hayounlee.org/software.html. Supplementary data are available at Bioinformatics online.
78 FR 27908 - Federal Management Regulation (FMR); Mail Management; Financial Requirements for All Agencies

Federal Register 2010, 2011, 2012, 2013, 2014

2013-05-13

...; Sequence 1] RIN 3090-AI79 Federal Management Regulation (FMR); Mail Management; Financial Requirements for... Services Administration (GSA) is proposing to amend the Federal Management Regulation (FMR) by revising its mail management policy. A major part of the proposed revision involves the removal of the agency...
Multiplexed fragaria chloroplast genome sequencing

Treesearch

W. Njuguna; A. Liston; R. Cronn; N.V. Bassil

2010-01-01

A method to sequence multiple chloroplast genomes using ultra high throughput sequencing technologies was recently described. Complete chloroplast genome sequences can resolve phylogenetic relationships at low taxonomic levels and identify informative point mutations and indels. The objective of this research was to sequence multiple Fragaria...
Opinion: Clarifying Two Controversies about Information Mapping's Method.

ERIC Educational Resources Information Center

Horn, Robert E.

1992-01-01

Describes Information Mapping, a methodology for the analysis, organization, sequencing, and presentation of information and explains three major parts of the method: (1) content analysis, (2) project life-cycle synthesis and integration of the content analysis, and (3) sequencing and formatting. Major criticisms of the methodology are addressed.…
The path to enlightenment: making sense of genomic and proteomic information.

PubMed

Maurer, Martin H

2004-05-01

Whereas genomics describes the study of genome, mainly represented by its gene expression on the DNA or RNA level, the term proteomics denotes the study of the proteome, which is the protein complement encoded by the genome. In recent years, the number of proteomic experiments increased tremendously. While all fields of proteomics have made major technological advances, the biggest step was seen in bioinformatics. Biological information management relies on sequence and structure databases and powerful software tools to translate experimental results into meaningful biological hypotheses and answers. In this resource article, I provide a collection of databases and software available on the Internet that are useful to interpret genomic and proteomic data. The article is a toolbox for researchers who have genomic or proteomic datasets and need to put their findings into a biological context.
Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

NASA Technical Reports Server (NTRS)

Gatlin, L. L.

1974-01-01

Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.
On the Origin of Protein Superfamilies and Superfolds

NASA Astrophysics Data System (ADS)

Magner, Abram; Szpankowski, Wojciech; Kihara, Daisuke

2015-02-01

Distributions of protein families and folds in genomes are highly skewed, having a small number of prevalent superfamiles/superfolds and a large number of families/folds of a small size. Why are the distributions of protein families and folds skewed? Why are there only a limited number of protein families? Here, we employ an information theoretic approach to investigate the protein sequence-structure relationship that leads to the skewed distributions. We consider that protein sequences and folds constitute an information theoretic channel and computed the most efficient distribution of sequences that code all protein folds. The identified distributions of sequences and folds are found to follow a power law, consistent with those observed for proteins in nature. Importantly, the skewed distributions of sequences and folds are suggested to have different origins: the skewed distribution of sequences is due to evolutionary pressure to achieve efficient coding of necessary folds, whereas that of folds is based on the thermodynamic stability of folds. The current study provides a new information theoretic framework for proteins that could be widely applied for understanding protein sequences, structures, functions, and interactions.

Shotgun metagenomic data streams: surfing without fear

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berendzen, Joel R

2010-12-06

Timely information about bio-threat prevalence, consequence, propagation, attribution, and mitigation is needed to support decision-making, both routinely and in a crisis. One DNA sequencer can stream 25 Gbp of information per day, but sampling strategies and analysis techniques are needed to turn raw sequencing power into actionable knowledge. Shotgun metagenomics can enable biosurveillance at the level of a single city, hospital, or airplane. Metagenomics characterizes viruses and bacteria from complex environments such as soil, air filters, or sewage. Unlike targeted-primer-based sequencing, shotgun methods are not blind to sequences that are truly novel, and they can measure absolute prevalence. Shotgun metagenomicmore » sampling can be non-invasive, efficient, and inexpensive while being informative. We have developed analysis techniques for shotgun metagenomic sequencing that rely upon phylogenetic signature patterns. They work by indexing local sequence patterns in a manner similar to web search engines. Our methods are laptop-fast and favorable scaling properties ensure they will be sustainable as sequencing methods grow. We show examples of application to soil metagenomic samples.« less
Reporting Differences Between Spacecraft Sequence Files

NASA Technical Reports Server (NTRS)

Khanampompan, Teerapat; Gladden, Roy E.; Fisher, Forest W.

2010-01-01

A suite of computer programs, called seq diff suite, reports differences between the products of other computer programs involved in the generation of sequences of commands for spacecraft. These products consist of files of several types: replacement sequence of events (RSOE), DSN keyword file [DKF (wherein DSN signifies Deep Space Network)], spacecraft activities sequence file (SASF), spacecraft sequence file (SSF), and station allocation file (SAF). These products can include line numbers, request identifications, and other pieces of information that are not relevant when generating command sequence products, though these fields can result in the appearance of many changes to the files, particularly when using the UNIX diff command to inspect file differences. The outputs of prior software tools for reporting differences between such products include differences in these non-relevant pieces of information. In contrast, seq diff suite removes the fields containing the irrelevant pieces of information before processing to extract differences, so that only relevant differences are reported. Thus, seq diff suite is especially useful for reporting changes between successive versions of the various products and in particular flagging difference in fields relevant to the sequence command generation and review process.
MIPS: analysis and annotation of genome information in 2007

PubMed Central

Mewes, H. W.; Dietmann, S.; Frishman, D.; Gregory, R.; Mannhaupt, G.; Mayer, K. F. X.; Münsterkötter, M.; Ruepp, A.; Spannagl, M.; Stümpflen, V.; Rattei, T.

2008-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:18158298
MIPS: analysis and annotation of genome information in 2007.

PubMed

Mewes, H W; Dietmann, S; Frishman, D; Gregory, R; Mannhaupt, G; Mayer, K F X; Münsterkötter, M; Ruepp, A; Spannagl, M; Stümpflen, V; Rattei, T

2008-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
E-MSD: an integrated data resource for bioinformatics.

PubMed

Velankar, S; McNeil, P; Mittard-Runte, V; Suarez, A; Barrell, D; Apweiler, R; Henrick, K

2005-01-01

The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the 'Structure Integration with Function, Taxonomy and Sequences (SIFTS)' initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group.
Inferring Higher Functional Information for RIKEN Mouse Full-Length cDNA Clones With FACTS

PubMed Central

Nagashima, Takeshi; Silva, Diego G.; Petrovsky, Nikolai; Socha, Luis A.; Suzuki, Harukazu; Saito, Rintaro; Kasukawa, Takeya; Kurochkin, Igor V.; Konagaya, Akihiko; Schönbach, Christian

2003-01-01

FACTS (Functional Association/Annotation of cDNA Clones from Text/Sequence Sources) is a semiautomated knowledge discovery and annotation system that integrates molecular function information derived from sequence analysis results (sequence inferred) with functional information extracted from text. Text-inferred information was extracted from keyword-based retrievals of MEDLINE abstracts and by matching of gene or protein names to OMIM, BIND, and DIP database entries. Using FACTS, we found that 47.5% of the 60,770 RIKEN mouse cDNA FANTOM2 clone annotations were informative for text searches. MEDLINE queries yielded molecular interaction-containing sentences for 23.1% of the clones. When disease MeSH and GO terms were matched with retrieved abstracts, 22.7% of clones were associated with potential diseases, and 32.5% with GO identifiers. A significant number (23.5%) of disease MeSH-associated clones were also found to have a hereditary disease association (OMIM Morbidmap). Inferred neoplastic and nervous system disease represented 49.6% and 36.0% of disease MeSH-associated clones, respectively. A comparison of sequence-based GO assignments with informative text-based GO assignments revealed that for 78.2% of clones, identical GO assignments were provided for that clone by either method, whereas for 21.8% of clones, the assignments differed. In contrast, for OMIM assignments, only 28.5% of clones had identical sequence-based and text-based OMIM assignments. Sequence, sentence, and term-based functional associations are included in the FACTS database (http://facts.gsc.riken.go.jp/), which permits results to be annotated and explored through web-accessible keyword and sequence search interfaces. The FACTS database will be a critical tool for investigating the functional complexity of the mouse transcriptome, cDNA-inferred interactome (molecular interactions), and pathome (pathologies). PMID:12819151
Genotyping by sequencing resolves shallow population structure to inform conservation of Chinook salmon (Oncorhynchus tshawytscha)

PubMed Central

Larson, Wesley A; Seeb, Lisa W; Everett, Meredith V; Waples, Ryan K; Templin, William D; Seeb, James E

2014-01-01

Recent advances in population genomics have made it possible to detect previously unidentified structure, obtain more accurate estimates of demographic parameters, and explore adaptive divergence, potentially revolutionizing the way genetic data are used to manage wild populations. Here, we identified 10 944 single-nucleotide polymorphisms using restriction-site-associated DNA (RAD) sequencing to explore population structure, demography, and adaptive divergence in five populations of Chinook salmon (Oncorhynchus tshawytscha) from western Alaska. Patterns of population structure were similar to those of past studies, but our ability to assign individuals back to their region of origin was greatly improved (>90% accuracy for all populations). We also calculated effective size with and without removing physically linked loci identified from a linkage map, a novel method for nonmodel organisms. Estimates of effective size were generally above 1000 and were biased downward when physically linked loci were not removed. Outlier tests based on genetic differentiation identified 733 loci and three genomic regions under putative selection. These markers and genomic regions are excellent candidates for future research and can be used to create high-resolution panels for genetic monitoring and population assignment. This work demonstrates the utility of genomic data to inform conservation in highly exploited species with shallow population structure. PMID:24665338
Data security in genomics: A review of Australian privacy requirements and their relation to cryptography in data storage.

PubMed

Schlosberg, Arran

2016-01-01

The advent of next-generation sequencing (NGS) brings with it a need to manage large volumes of patient data in a manner that is compliant with both privacy laws and long-term archival needs. Outside of the realm of genomics there is a need in the broader medical community to store data, and although radiology aside the volume may be less than that of NGS, the concepts discussed herein are similarly relevant. The relation of so-called "privacy principles" to data protection and cryptographic techniques is explored with regards to the archival and backup storage of health data in Australia, and an example implementation of secure management of genomic archives is proposed with regards to this relation. Readers are presented with sufficient detail to have informed discussions - when implementing laboratory data protocols - with experts in the fields.
Data security in genomics: A review of Australian privacy requirements and their relation to cryptography in data storage

PubMed Central

Schlosberg, Arran

2016-01-01

The advent of next-generation sequencing (NGS) brings with it a need to manage large volumes of patient data in a manner that is compliant with both privacy laws and long-term archival needs. Outside of the realm of genomics there is a need in the broader medical community to store data, and although radiology aside the volume may be less than that of NGS, the concepts discussed herein are similarly relevant. The relation of so-called “privacy principles” to data protection and cryptographic techniques is explored with regards to the archival and backup storage of health data in Australia, and an example implementation of secure management of genomic archives is proposed with regards to this relation. Readers are presented with sufficient detail to have informed discussions – when implementing laboratory data protocols – with experts in the fields. PMID:26955504
Clinical management of scar tissue.

PubMed

Kasch, M C

1988-01-01

This paper will review the physiology of scar formation including the properties of wound healing and scar remodeling. A clinical scar management program that includes evaluation of scar adhesions and use of a variety of therapy interventions to minimize the formation of scar will be described. Use of compression, massage, splints and functional activities is included in this program. The information is applicable for the general occupational therapist who sees patients with hand dysfunction as well as a therapist specializing in hand rehabilitation. Every therapist who treats hand trauma must be familiar with the sequence and the properties of scar formation in order to reestablish tendon gliding and facilitate early remodeling of scar tissue. Many treatment techniques can be directed toward scar adhesions and no one method is totally effective when used alone; used together, these techniques can positively influence scar formation and restore maximal hand function.
A Case for Pharmacogenomics in Management of Cardiac Arrhythmias

PubMed Central

Kandoi, Gaurav; Nanda, Anjali; Scaria, Vinod; Sivasubbu, Sridhar

2012-01-01

Disorders of the cardiac rhythm are quite prevalent in clinical practice. Though the variability in drug response between individuals has been extensively studied, this information has not been widely used in clinical practice. Rapid advances in the field of pharmacogenomics have provided us with crucial insights on inter-individual genetic variability and its impact on drug metabolism and action. Technologies for faster and cheaper genetic testing and even personal genome sequencing would enable clinicians to optimize prescription based on the genetic makeup of the individual, which would open up new avenues in the area of personalized medicine. We have systematically looked at literature evidence on pharmacogenomics markers for anti-arrhythmic agents from the OpenPGx consortium collection and reason the applicability of genetics in the management of arrhythmia. We also discuss potential issues that need to be resolved before personalized pharmacogenomics becomes a reality in regular clinical practice. PMID:22557843
What are Whole Exome Sequencing and Whole Genome Sequencing?

MedlinePlus

... the future. For more information about DNA sequencing technologies and their use: Genetics Home Reference discusses whether ... University in St. Louis describes the different sequencing technologies and what the new technologies have meant for ...
Use of Genome Sequence Information for Meat Quality Trait QTL Mining for Causal Genes and Mutations on Pig Chromosome 17

PubMed Central

Hu, Zhi-Liang; Ramos, Antonio M.; Humphray, Sean J.; Rogers, Jane; Reecy, James M.; Rothschild, Max F.

2011-01-01

The newly available pig genome sequence has provided new information to fine map quantitative trait loci (QTL) in order to eventually identify causal variants. With targeted genomic sequencing efforts, we were able to obtain high quality BAC sequences that cover a region on pig chromosome 17 where a number of meat quality QTL have been previously discovered. Sequences from 70 BAC clones were assembled to form an 8-Mbp contig. Subsequently, we successfully mapped five previously identified QTL, three for meat color and two for lactate related traits, to the contig. With an additional 25 genetic markers that were identified by sequence comparison, we were able to carry out further linkage disequilibrium analysis to narrow down the genomic locations of these QTL, which allowed identification of the chromosomal regions that likely contain the causative variants. This research has provided one practical approach to combine genetic and molecular information for QTL mining. PMID:22303339
Human genomics projects and precision medicine.

PubMed

Carrasco-Ramiro, F; Peiró-Pastor, R; Aguado, B

2017-09-01

The completion of the Human Genome Project (HGP) in 2001 opened the floodgates to a deeper understanding of medicine. There are dozens of HGP-like projects which involve from a few tens to several million genomes currently in progress, which vary from having specialized goals or a more general approach. However, data generation, storage, management and analysis in public and private cloud computing platforms have raised concerns about privacy and security. The knowledge gained from further research has changed the field of genomics and is now slowly permeating into clinical medicine. The new precision (personalized) medicine, where genome sequencing and data analysis are essential components, allows tailored diagnosis and treatment according to the information from the patient's own genome and specific environmental factors. P4 (predictive, preventive, personalized and participatory) medicine is introducing new concepts, challenges and opportunities. This review summarizes current sequencing technologies, concentrates on ongoing human genomics projects, and provides some examples in which precision medicine has already demonstrated clinical impact in diagnosis and/or treatment.
A prospective evaluation of whole-exome sequencing as a first-tier molecular test in infants with suspected monogenic disorders.

PubMed

Stark, Zornitza; Tan, Tiong Y; Chong, Belinda; Brett, Gemma R; Yap, Patrick; Walsh, Maie; Yeung, Alison; Peters, Heidi; Mordaunt, Dylan; Cowie, Shannon; Amor, David J; Savarirayan, Ravi; McGillivray, George; Downie, Lilian; Ekert, Paul G; Theda, Christiane; James, Paul A; Yaplito-Lee, Joy; Ryan, Monique M; Leventer, Richard J; Creed, Emma; Macciocca, Ivan; Bell, Katrina M; Oshlack, Alicia; Sadedin, Simon; Georgeson, Peter; Anderson, Charlotte; Thorne, Natalie; Melbourne Genomics Health Alliance; Gaff, Clara; White, Susan M

2016-11-01

To prospectively evaluate the diagnostic and clinical utility of singleton whole-exome sequencing (WES) as a first-tier test in infants with suspected monogenic disease. Singleton WES was performed as a first-tier sequencing test in infants recruited from a single pediatric tertiary center. This occurred in parallel with standard investigations, including single- or multigene panel sequencing when clinically indicated. The diagnosis rate, clinical utility, and impact on management of singleton WES were evaluated. Of 80 enrolled infants, 46 received a molecular genetic diagnosis through singleton WES (57.5%) compared with 11 (13.75%) who underwent standard investigations in the same patient group. Clinical management changed following exome diagnosis in 15 of 46 diagnosed participants (32.6%). Twelve relatives received a genetic diagnosis following cascade testing, and 28 couples were identified as being at high risk of recurrence in future pregnancies. This prospective study provides strong evidence for increased diagnostic and clinical utility of singleton WES as a first-tier sequencing test for infants with a suspected monogenic disorder. Singleton WES outperformed standard care in terms of diagnosis rate and the benefits of a diagnosis, namely, impact on management of the child and clarification of reproductive risks for the extended family in a timely manner.Genet Med 18 11, 1090-1096.
Antenatal management of twin-twin transfusion syndrome and twin anemia-polycythemia sequence.

PubMed

Slaghekke, Femke; Zhao, Depeng P; Middeldorp, Johanna M; Klumper, Frans J; Haak, Monique C; Oepkes, Dick; Lopriore, Enrico

2016-08-01

Twin-twin transfusion syndrome (TTTS) and twin anemia polycythemia sequence (TAPS) are severe complications in monochorionic twin pregnancies associated with high mortality and morbidity risk if left untreated. Both diseases result from imbalanced inter-twin blood transfusion through placental vascular anastomoses. This review focuses on the differences in antenatal management between TTTS and TAPS. Expert commentary: The optimal management for TTTS is fetoscopic laser coagulation of the vascular anastomoses, preferably using the Solomon technique in which the whole vascular equator is coagulated. The Solomon technique is associated with a reduction of residual anastomosis and a reduction in post-operative complications. The optimal management for TAPS is not clear and includes expectant management, intra-uterine transfusion with or without partial exchange transfusion and fetoscopic laser surgery.
The Protein Information Resource: an integrated public resource of functional annotation of proteins

PubMed Central

Wu, Cathy H.; Huang, Hongzhan; Arminski, Leslie; Castro-Alvear, Jorge; Chen, Yongxing; Hu, Zhang-Zhi; Ledley, Robert S.; Lewis, Kali C.; Mewes, Hans-Werner; Orcutt, Bruce C.; Suzek, Baris E.; Tsugita, Akira; Vinayaka, C. R.; Yeh, Lai-Su L.; Zhang, Jian; Barker, Winona C.

2002-01-01

The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247
De novo characterization of the pine aphid Cinara pinitabulaeformis Zhang et Zhang transcriptome and analysis of genes relevant to pesticides

PubMed Central

Rebeca, Carballar-Lejarazú; Zhu, Xiaoli; Guo, Yajie; Lin, Qiannan; Hu, Xia; Wang, Rong; Liang, Guanghong; Guan, Xiong

2017-01-01

The pine aphid Cinara pinitabulaeformis Zhang et Zhang is the main pine pest in China, it causes pine needles to produce dense dew (honeydew) which can lead to sooty mold (black filamentous saprophytic ascomycetes). Although common chemical and physical strategies are used to prevent the disease caused by C. pinitabulaeformis Zhang et Zhang, new strategies based on biological and/or genetic approaches are promising to control and eradicate the disease. However, there is no information about genomics, proteomics or transcriptomics to allow the design of new control strategies for this pine aphid. We used next generation sequencing technology to sequence the transcriptome of C. pinitabulaeformis Zhang et Zhang and built a transcriptome database. We identified 80,259 unigenes assigned for Gene Ontology (GO) terms and information for a total of 11,609 classified unigenes was obtained in the Clusters of Orthologous Groups (COGs). A total of 10,806 annotated unigenes were analyzed to identify the represented biological pathways, among them 8,845 unigenes matched with 228 KEGG pathways. In addition, our data describe propagative viruses, nutrition-related genes, detoxification related molecules, olfactory related receptors, stressed-related protein, putative insecticide resistance genes and possible insecticide targets. Moreover, this study provides valuable information about putative insecticide resistance related genes and for the design of new genetic/biological based strategies to manage and control C. pinitabulaeformis Zhang et Zhang populations. PMID:28570707
Legume information system (LegumeInfo.org): a key component of a set of federated data resources for the legume family.

PubMed

Dash, Sudhansu; Campbell, Jacqueline D; Cannon, Ethalinda K S; Cleary, Alan M; Huang, Wei; Kalberer, Scott R; Karingula, Vijay; Rice, Alex G; Singh, Jugpreet; Umale, Pooja E; Weeks, Nathan T; Wilkey, Andrew P; Farmer, Andrew D; Cannon, Steven B

2016-01-04

Legume Information System (LIS), at http://legumeinfo.org, is a genomic data portal (GDP) for the legume family. LIS provides access to genetic and genomic information for major crop and model legumes. With more than two-dozen domesticated legume species, there are numerous specialists working on particular species, and also numerous GDPs for these species. LIS has been redesigned in the last three years both to better integrate data sets across the crop and model legumes, and to better accommodate specialized GDPs that serve particular legume species. To integrate data sets, LIS provides genome and map viewers, holds synteny mappings among all sequenced legume species and provides a set of gene families to allow traversal among orthologous and paralogous sequences across the legumes. To better accommodate other specialized GDPs, LIS uses open-source GMOD components where possible, and advocates use of common data templates, formats, schemas and interfaces so that data collected by one legume research community are accessible across all legume GDPs, through similar interfaces and using common APIs. This federated model for the legumes is managed as part of the 'Legume Federation' project (accessible via http://legumefederation.org), which can be thought of as an umbrella project encompassing LIS and other legume GDPs. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Salinity monitoring in Western Australia using remotely sensed and other spatial data.

PubMed

Furby, Suzanne; Caccetta, Peter; Wallace, Jeremy

2010-01-01

The southwest of Western Australia is affected by dryland salinity that results in the loss of previously productive agricultural land, damage to buildings, roads, and other infrastructure, decline in pockets of remnant vegetation and biodiversity, and reduction in water quality. Accurate information on the location and rate of change of the extent of saline land over the region is required by resource managers. For the first time, comprehensive, spatially explicit maps of dryland salinity and its change over approximately 10 yr for the southwest agricultural region of Western Australia have been produced operationally in the 'Land Monitor' project. The methods rely on an integrated analysis of long-term sequences of Landsat TM satellite image data together with variables derived from digital elevation models (DEMs). Understanding of the physical process and surface expression of salinity provided by experts was used to guide the analyses. Ground data-the delineation of salt-affected land by field experts-was collected for training and validation. The results indicate that the land area currently affected by salinity in Western Australia's southwest is about 1 million hectares (in 1996) and the annual rate of increase is about 14,000 ha. This is a lesser extent than many previous estimates and lower rate of change than generally predicted from limited hydrological data. The results are widely distributed and publicly available. The key to providing accurate mapping and monitoring information was the incorporation of time series classification of a sequence of images over several years combined with landform information.

Accessing and distributing EMBL data using CORBA (common object request broker architecture).

PubMed

Wang, L; Rodriguez-Tomé, P; Redaschi, N; McNeil, P; Robinson, A; Lijnzaad, P

2000-01-01

The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences and related information traditionally made available in flat-file format. Queries through tools such as SRS (Sequence Retrieval System) also return data in flat-file format. Flat files have a number of shortcomings, however, and the resources therefore currently lack a flexible environment to meet individual researchers' needs. The Object Management Group's common object request broker architecture (CORBA) is an industry standard that provides platform-independent programming interfaces and models for portable distributed object-oriented computing applications. Its independence from programming languages, computing platforms and network protocols makes it attractive for developing new applications for querying and distributing biological data. A CORBA infrastructure developed by EMBL-EBI provides an efficient means of accessing and distributing EMBL data. The EMBL object model is defined such that it provides a basis for specifying interfaces in interface definition language (IDL) and thus for developing the CORBA servers. The mapping from the object model to the relational schema in the underlying Oracle database uses the facilities provided by PersistenceTM, an object/relational tool. The techniques of developing loaders and 'live object caching' with persistent objects achieve a smart live object cache where objects are created on demand. The objects are managed by an evictor pattern mechanism. The CORBA interfaces to the EMBL database address some of the problems of traditional flat-file formats and provide an efficient means for accessing and distributing EMBL data. CORBA also provides a flexible environment for users to develop their applications by building clients to our CORBA servers, which can be integrated into existing systems.
A clinical research integration special program (CRISP) for young women with primary ovarian insufficiency

PubMed Central

FALORNI, A.; MINARELLI, V.; EADS, C. M.; JOACHIM, C. M.; PERSANI, L.; ROSSETTI, R.; BEIM, P. YURTTAS; PELLEGRINI, V. A.; SCHNATZ, P. F.; RAFIQUE, S.; KISSELL, K.; CALIS, K. A.; POPAT, V.; NELSON, L. M.

2015-01-01

Large-scale medical sequencing provides a focal point around which to reorganize health care and health care research. Mobile health (mHealth) is also currently undergoing explosive growth and could be another innovation that will change the face of future health care. We are employing primary ovarian insufficiency (POI) as a model rare condition to explore the intersection of these potentials. As both sequencing capabilities and our ability to intepret this information improve, sequencing for medical purposes will play an increasing role in health care beyond basic research: it will help guide the delivery of care to patients. POI is a serious chronic disorder and syndrome characterized by hypergonadotrophic hypogonadism before the age of 40 years and most commonly presents with amenorrhea. It may have adverse health effects that become fully evident years after the initial diagnosis. The condition is most commonly viewed as one of infertility, however, it may also be associated with adverse long-term outcomes related to inadequate bone mineral density, increased risk of cardiovascular disease, adrenal insufficiency, hypothyroidism and, if pregnancy ensues, having a child with Fragile X Syndrome. There may also be adverse outcomes related to increased rates of anxiety and depression. POI is also a rare disease, and accordingly, presents special challenges. Too often advances in research are not effectively integrated into community care at the point of service for those with rare diseases. There is a need to connect community health providers in real time with investigators who have the requisite knowledge and expertise to help manage the rare disease and to conduct ongoing research. Here we review the pathophysiology and management of POI and propose the development of an international Clinical Research Integration Special Program (CRISP) for the condition. PMID:25288327
A time-and-motion approach to micro-costing of high-throughput genomic assays

PubMed Central

Costa, S.; Regier, D.A.; Meissner, B.; Cromwell, I.; Ben-Neriah, S.; Chavez, E.; Hung, S.; Steidl, C.; Scott, D.W.; Marra, M.A.; Peacock, S.J.; Connors, J.M.

2016-01-01

Background Genomic technologies are increasingly used to guide clinical decision-making in cancer control. Economic evidence about the cost-effectiveness of genomic technologies is limited, in part because of a lack of published comprehensive cost estimates. In the present micro-costing study, we used a time-and-motion approach to derive cost estimates for 3 genomic assays and processes—digital gene expression profiling (gep), fluorescence in situ hybridization (fish), and targeted capture sequencing, including bioinformatics analysis—in the context of lymphoma patient management. Methods The setting for the study was the Department of Lymphoid Cancer Research laboratory at the BC Cancer Agency in Vancouver, British Columbia. Mean per-case hands-on time and resource measurements were determined from a series of direct observations of each assay. Per-case cost estimates were calculated using a bottom-up costing approach, with labour, capital and equipment, supplies and reagents, and overhead costs included. Results The most labour-intensive assay was found to be fish at 258.2 minutes per case, followed by targeted capture sequencing (124.1 minutes per case) and digital gep (14.9 minutes per case). Based on a historical case throughput of 180 cases annually, the mean per-case cost (2014 Canadian dollars) was estimated to be $1,029.16 for targeted capture sequencing and bioinformatics analysis, $596.60 for fish, and $898.35 for digital gep with an 807-gene code set. Conclusions With the growing emphasis on personalized approaches to cancer management, the need for economic evaluations of high-throughput genomic assays is increasing. Through economic modelling and budget-impact analyses, the cost estimates presented here can be used to inform priority-setting decisions about the implementation of such assays in clinical practice. PMID:27803594
Environmental DNA (eDNA) metabarcoding assays to detect invasive invertebrate species in the Great Lakes.

PubMed

Klymus, Katy E; Marshall, Nathaniel T; Stepien, Carol A

2017-01-01

Describing and monitoring biodiversity comprise integral parts of ecosystem management. Recent research coupling metabarcoding and environmental DNA (eDNA) demonstrate that these methods can serve as important tools for surveying biodiversity, while significantly decreasing the time, expense and resources spent on traditional survey methods. The literature emphasizes the importance of genetic marker development, as the markers dictate the applicability, sensitivity and resolution ability of an eDNA assay. The present study developed two metabarcoding eDNA assays using the mtDNA 16S RNA gene with Illumina MiSeq platform to detect invertebrate fauna in the Laurentian Great Lakes and surrounding waterways, with a focus for use on invasive bivalve and gastropod species monitoring. We employed careful primer design and in vitro testing with mock communities to assess ability of the markers to amplify and sequence targeted species DNA, while retaining rank abundance information. In our mock communities, read abundances reflected the initial input abundance, with regressions having significant slopes (p<0.05) and high coefficients of determination (R2) for all comparisons. Tests on field environmental samples revealed similar ability of our markers to measure relative abundance. Due to the limited reference sequence data available for these invertebrate species, care must be taken when analyzing results and identifying sequence reads to species level. These markers extend eDNA metabarcoding research for molluscs and appear relevant to other invertebrate taxa, such as rotifers and bryozoans. Furthermore, the sphaeriid mussel assay is group-specific, exclusively amplifying bivalves in the Sphaeridae family and providing species-level identification. Our assays provide useful tools for managers and conservation scientists, facilitating early detection of invasive species as well as improving resolution of mollusc diversity.
Environmental DNA (eDNA) metabarcoding assays to detect invasive invertebrate species in the Great Lakes

PubMed Central

Klymus, Katy E.; Marshall, Nathaniel T.

2017-01-01

Describing and monitoring biodiversity comprise integral parts of ecosystem management. Recent research coupling metabarcoding and environmental DNA (eDNA) demonstrate that these methods can serve as important tools for surveying biodiversity, while significantly decreasing the time, expense and resources spent on traditional survey methods. The literature emphasizes the importance of genetic marker development, as the markers dictate the applicability, sensitivity and resolution ability of an eDNA assay. The present study developed two metabarcoding eDNA assays using the mtDNA 16S RNA gene with Illumina MiSeq platform to detect invertebrate fauna in the Laurentian Great Lakes and surrounding waterways, with a focus for use on invasive bivalve and gastropod species monitoring. We employed careful primer design and in vitro testing with mock communities to assess ability of the markers to amplify and sequence targeted species DNA, while retaining rank abundance information. In our mock communities, read abundances reflected the initial input abundance, with regressions having significant slopes (p<0.05) and high coefficients of determination (R2) for all comparisons. Tests on field environmental samples revealed similar ability of our markers to measure relative abundance. Due to the limited reference sequence data available for these invertebrate species, care must be taken when analyzing results and identifying sequence reads to species level. These markers extend eDNA metabarcoding research for molluscs and appear relevant to other invertebrate taxa, such as rotifers and bryozoans. Furthermore, the sphaeriid mussel assay is group-specific, exclusively amplifying bivalves in the Sphaeridae family and providing species-level identification. Our assays provide useful tools for managers and conservation scientists, facilitating early detection of invasive species as well as improving resolution of mollusc diversity. PMID:28542313
Accessing and distributing EMBL data using CORBA (common object request broker architecture)

PubMed Central

Wang, Lichun; Rodriguez-Tomé, Patricia; Redaschi, Nicole; McNeil, Phil; Robinson, Alan; Lijnzaad, Philip

2000-01-01

Background: The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences and related information traditionally made available in flat-file format. Queries through tools such as SRS (Sequence Retrieval System) also return data in flat-file format. Flat files have a number of shortcomings, however, and the resources therefore currently lack a flexible environment to meet individual researchers' needs. The Object Management Group's common object request broker architecture (CORBA) is an industry standard that provides platform-independent programming interfaces and models for portable distributed object-oriented computing applications. Its independence from programming languages, computing platforms and network protocols makes it attractive for developing new applications for querying and distributing biological data. Results: A CORBA infrastructure developed by EMBL-EBI provides an efficient means of accessing and distributing EMBL data. The EMBL object model is defined such that it provides a basis for specifying interfaces in interface definition language (IDL) and thus for developing the CORBA servers. The mapping from the object model to the relational schema in the underlying Oracle database uses the facilities provided by PersistenceTM, an object/relational tool. The techniques of developing loaders and 'live object caching' with persistent objects achieve a smart live object cache where objects are created on demand. The objects are managed by an evictor pattern mechanism. Conclusions: The CORBA interfaces to the EMBL database address some of the problems of traditional flat-file formats and provide an efficient means for accessing and distributing EMBL data. CORBA also provides a flexible environment for users to develop their applications by building clients to our CORBA servers, which can be integrated into existing systems. PMID:11178259
Cardiac magnetic resonance imaging in heart failure: where the alphabet begins!

PubMed

Aljizeeri, Ahmed; Sulaiman, Abdulbaset; Alhulaimi, Naji; Alsaileek, Ahmed; Al-Mallah, Mouaz H

2017-07-01

Cardiac Magnetic Resonance Imaging has become a cornerstone in the evaluation of heart failure. It provides a comprehensive evaluation by answering all the pertinent clinical questions across the full pathological spectrum of heart failure. Nowadays, CMR is considered the gold standard in evaluation of ventricular volumes, wall motion and systolic function. Through its unique ability of tissue characterization, it provides incremental diagnostic and prognostic information and thus has emerged as a comprehensive imaging modality in heart failure. This review outlines the role of main conventional CMR sequences in the evaluation of heart failure and their impact in the management and prognosis.
Connecting Earth observation to high-throughput biodiversity data.

PubMed

Bush, Alex; Sollmann, Rahel; Wilting, Andreas; Bohmann, Kristine; Cole, Beth; Balzter, Heiko; Martius, Christopher; Zlinszky, András; Calvignac-Spencer, Sébastien; Cobbold, Christina A; Dawson, Terence P; Emerson, Brent C; Ferrier, Simon; Gilbert, M Thomas P; Herold, Martin; Jones, Laurence; Leendertz, Fabian H; Matthews, Louise; Millington, James D A; Olson, John R; Ovaskainen, Otso; Raffaelli, Dave; Reeve, Richard; Rödel, Mark-Oliver; Rodgers, Torrey W; Snape, Stewart; Visseren-Hamakers, Ingrid; Vogler, Alfried P; White, Piran C L; Wooster, Martin J; Yu, Douglas W

2017-06-22

Understandably, given the fast pace of biodiversity loss, there is much interest in using Earth observation technology to track biodiversity, ecosystem functions and ecosystem services. However, because most biodiversity is invisible to Earth observation, indicators based on Earth observation could be misleading and reduce the effectiveness of nature conservation and even unintentionally decrease conservation effort. We describe an approach that combines automated recording devices, high-throughput DNA sequencing and modern ecological modelling to extract much more of the information available in Earth observation data. This approach is achievable now, offering efficient and near-real-time monitoring of management impacts on biodiversity and its functions and services.
Genetic analysis of captive proboscis monkeys.

PubMed

Ogata, Mitsuaki; Seino, Satoru

2015-01-01

Information on the genetic relationships of captive founders is important for captive population management. In this study, we investigated DNA polymorphisms of four microsatellite loci and the mitochondrial control region sequence of five proboscis monkeys residing in a Japanese zoo as captive founders, to clarify their genetic relationship. We found that two of the five monkeys appeared to be genetically related. Furthermore, the haplotypes of the mitochondrial control region of the five monkeys were well differentiated from the haplotypes previously reported from wild populations from the northern area of Borneo, indicating a greater amount of genetic diversity in proboscis monkeys than previously reported. © 2014 Wiley Periodicals, Inc.
MGIS: managing banana (Musa spp.) genetic resources information and high-throughput genotyping data

PubMed Central

Guignon, V.; Sempere, G.; Sardos, J.; Hueber, Y.; Duvergey, H.; Andrieu, A.; Chase, R.; Jenny, C.; Hazekamp, T.; Irish, B.; Jelali, K.; Adeka, J.; Ayala-Silva, T.; Chao, C.P.; Daniells, J.; Dowiya, B.; Effa effa, B.; Gueco, L.; Herradura, L.; Ibobondji, L.; Kempenaers, E.; Kilangi, J.; Muhangi, S.; Ngo Xuan, P.; Paofa, J.; Pavis, C.; Thiemele, D.; Tossou, C.; Sandoval, J.; Sutanto, A.; Vangu Paka, G.; Yi, G.; Van den houwe, I.; Roux, N.

2017-01-01

Abstract Unraveling the genetic diversity held in genebanks on a large scale is underway, due to advances in Next-generation sequence (NGS) based technologies that produce high-density genetic markers for a large number of samples at low cost. Genebank users should be in a position to identify and select germplasm from the global genepool based on a combination of passport, genotypic and phenotypic data. To facilitate this, a new generation of information systems is being designed to efficiently handle data and link it with other external resources such as genome or breeding databases. The Musa Germplasm Information System (MGIS), the database for global ex situ-held banana genetic resources, has been developed to address those needs in a user-friendly way. In developing MGIS, we selected a generic database schema (Chado), the robust content management system Drupal for the user interface, and Tripal, a set of Drupal modules which links the Chado schema to Drupal. MGIS allows germplasm collection examination, accession browsing, advanced search functions, and germplasm orders. Additionally, we developed unique graphical interfaces to compare accessions and to explore them based on their taxonomic information. Accession-based data has been enriched with publications, genotyping studies and associated genotyping datasets reporting on germplasm use. Finally, an interoperability layer has been implemented to facilitate the link with complementary databases like the Banana Genome Hub and the MusaBase breeding database. Database URL: https://www.crop-diversity.org/mgis/ PMID:29220435
An information maximization model of eye movements

NASA Technical Reports Server (NTRS)

Renninger, Laura Walker; Coughlan, James; Verghese, Preeti; Malik, Jitendra

2005-01-01

We propose a sequential information maximization model as a general strategy for programming eye movements. The model reconstructs high-resolution visual information from a sequence of fixations, taking into account the fall-off in resolution from the fovea to the periphery. From this framework we get a simple rule for predicting fixation sequences: after each fixation, fixate next at the location that minimizes uncertainty (maximizes information) about the stimulus. By comparing our model performance to human eye movement data and to predictions from a saliency and random model, we demonstrate that our model is best at predicting fixation locations. Modeling additional biological constraints will improve the prediction of fixation sequences. Our results suggest that information maximization is a useful principle for programming eye movements.
Genotypic and Phylogenetic Insights on Prevention of the Spread of HIV-1 and Drug Resistance in “Real-World” Settings

PubMed Central

Brenner, Bluma G.; Ibanescu, Ruxandra-Ilinca; Hardy, Isabelle; Roger, Michel

2017-01-01

HIV continues to spread among vulnerable heterosexual (HET), Men-having-Sex with Men (MSM) and intravenous drug user (IDU) populations, influenced by a complex array of biological, behavioral and societal factors. Phylogenetics analyses of large sequence datasets from national drug resistance testing programs reveal the evolutionary interrelationships of viral strains implicated in the dynamic spread of HIV in different regional settings. Viral phylogenetics can be combined with demographic and behavioral information to gain insights on epidemiological processes shaping transmission networks at the population-level. Drug resistance testing programs also reveal emergent mutational pathways leading to resistance to the 23 antiretroviral drugs used in HIV-1 management in low-, middle- and high-income settings. This article describes how genotypic and phylogenetic information from Quebec and elsewhere provide critical information on HIV transmission and resistance, Cumulative findings can be used to optimize public health strategies to tackle the challenges of HIV in “real-world” settings. PMID:29283390
Avatar DNA Nanohybrid System in Chip-on-a-Phone

NASA Astrophysics Data System (ADS)

Park, Dae-Hwan; Han, Chang Jo; Shul, Yong-Gun; Choy, Jin-Ho

2014-05-01

Long admired for informational role and recognition function in multidisciplinary science, DNA nanohybrids have been emerging as ideal materials for molecular nanotechnology and genetic information code. Here, we designed an optical machine-readable DNA icon on microarray, Avatar DNA, for automatic identification and data capture such as Quick Response and ColorZip codes. Avatar icon is made of telepathic DNA-DNA hybrids inscribed on chips, which can be identified by camera of smartphone with application software. Information encoded in base-sequences can be accessed by connecting an off-line icon to an on-line web-server network to provide message, index, or URL from database library. Avatar DNA is then converged with nano-bio-info-cogno science: each building block stands for inorganic nanosheets, nucleotides, digits, and pixels. This convergence could address item-level identification that strengthens supply-chain security for drug counterfeits. It can, therefore, provide molecular-level vision through mobile network to coordinate and integrate data management channels for visual detection and recording.
Avatar DNA Nanohybrid System in Chip-on-a-Phone

PubMed Central

Park, Dae-Hwan; Han, Chang Jo; Shul, Yong-Gun; Choy, Jin-Ho

2014-01-01

Long admired for informational role and recognition function in multidisciplinary science, DNA nanohybrids have been emerging as ideal materials for molecular nanotechnology and genetic information code. Here, we designed an optical machine-readable DNA icon on microarray, Avatar DNA, for automatic identification and data capture such as Quick Response and ColorZip codes. Avatar icon is made of telepathic DNA-DNA hybrids inscribed on chips, which can be identified by camera of smartphone with application software. Information encoded in base-sequences can be accessed by connecting an off-line icon to an on-line web-server network to provide message, index, or URL from database library. Avatar DNA is then converged with nano-bio-info-cogno science: each building block stands for inorganic nanosheets, nucleotides, digits, and pixels. This convergence could address item-level identification that strengthens supply-chain security for drug counterfeits. It can, therefore, provide molecular-level vision through mobile network to coordinate and integrate data management channels for visual detection and recording. PMID:24824876
Roadmap to a Comprehensive Clinical Data Warehouse for Precision Medicine Applications in Oncology

PubMed Central

Foran, David J; Chen, Wenjin; Chu, Huiqi; Sadimin, Evita; Loh, Doreen; Riedlinger, Gregory; Goodell, Lauri A; Ganesan, Shridar; Hirshfield, Kim; Rodriguez, Lorna; DiPaola, Robert S

2017-01-01

Leading institutions throughout the country have established Precision Medicine programs to support personalized treatment of patients. A cornerstone for these programs is the establishment of enterprise-wide Clinical Data Warehouses. Working shoulder-to-shoulder, a team of physicians, systems biologists, engineers, and scientists at Rutgers Cancer Institute of New Jersey have designed, developed, and implemented the Warehouse with information originating from data sources, including Electronic Medical Records, Clinical Trial Management Systems, Tumor Registries, Biospecimen Repositories, Radiology and Pathology archives, and Next Generation Sequencing services. Innovative solutions were implemented to detect and extract unstructured clinical information that was embedded in paper/text documents, including synoptic pathology reports. Supporting important precision medicine use cases, the growing Warehouse enables physicians to systematically mine and review the molecular, genomic, image-based, and correlated clinical information of patient tumors individually or as part of large cohorts to identify changes and patterns that may influence treatment decisions and potential outcomes. PMID:28469389
Proprioceptive coordination of movement sequences: role of velocity and position information.

PubMed

Cordo, P; Carlton, L; Bevan, L; Carlton, M; Kerr, G K

1994-05-01

1. Recent studies have shown that the CNS uses proprioceptive information to coordinate multijoint movement sequences; proprioceptive input related to the kinematics of one joint rotation in a movement sequence can be used to trigger a subsequent joint rotation. In this paper we adopt a broad definition of "proprioception," which includes all somatosensory information related to joint posture and kinematics. This paper addresses how the CNS uses proprioceptive information related to the velocity and position of joints to coordinate multijoint movement sequences. 2. Normal human subjects sat at an experimental apparatus and performed a movement sequence with the right arm without visual feedback. The apparatus passively rotated the right elbow horizontally in the extension direction with either a constant velocity trajectory or an unpredictable velocity trajectory. The subjects' task was to open briskly the right hand when the elbow passed through a prescribed target position, similar to backhand throwing in the horizontal plane. The randomization of elbow velocities and the absence of visual information was used to discourage subjects from using any information other than proprioceptive input to perform the task. 3. Our results indicate that the CNS is able to extract the necessary kinematic information from proprioceptive input to trigger the hand opening at the correct elbow position. We estimated the minimal sensory conduction and processing delay to be 150 ms, and on the basis of this estimate, we predicted the expected performance with different degrees of reduced proprioceptive information. These predictions were compared with the subjects' actual performances, revealing that the CNS was using proprioceptive input related to joint velocity in this motor task. To determine whether position information was also being used, we examined the subjects' performances with unpredictable velocity trajectories. The results from experiments with unpredictable velocity trajectories indicate that the CNS extracts proprioceptive information related to both the velocity and the angular position of the joint to trigger the hand movement in this movement sequence. 4. To determine the generality of proprioceptive triggering in movement sequences, we estimated the minimal movement duration with which proprioceptive information can be used as well as the amount of learning required to use proprioceptive input to perform the task. The temporal limits for proprioceptive processing in this movement task were established by determining the minimal movement time during which the task could be performed.(ABSTRACT TRUNCATED AT 400 WORDS)
Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

USDA-ARS?s Scientific Manuscript database

We present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the minimum information about any (x) sequence (MIxS). The standards are the minimum information about a single amplified genome (MISAG) and the ...
Asymptotic convertibility of entanglement: An information-spectrum approach to entanglement concentration and dilution

NASA Astrophysics Data System (ADS)

Jiao, Yong; Wakakuwa, Eyuri; Ogawa, Tomohiro

2018-02-01

We consider asymptotic convertibility of an arbitrary sequence of bipartite pure states into another by local operations and classical communication (LOCC). We adopt an information-spectrum approach to address cases where each element of the sequences is not necessarily a tensor power of a bipartite pure state. We derive necessary and sufficient conditions for the LOCC convertibility of one sequence to another in terms of spectral entropy rates of entanglement of the sequences. Based on these results, we also provide simple proofs for previously known results on the optimal rates of entanglement concentration and dilution of general sequences of bipartite pure states.
Review of Current Conservation Genetic Analyses of Northeast Pacific Sharks.

PubMed

Larson, Shawn E; Daly-Engel, Toby S; Phillips, Nicole M

Conservation genetics is an applied science that utilizes molecular tools to help solve problems in species conservation and management. It is an interdisciplinary specialty in which scientists apply the study of genetics in conjunction with traditional ecological fieldwork and other techniques to explore molecular variation, population boundaries, and evolutionary relationships with the goal of enabling resource managers to better protect biodiversity and identify unique populations. Several shark species in the northeast Pacific (NEP) have been studied using conservation genetics techniques, which are discussed here. The primary methods employed to study population genetics of sharks have historically been nuclear microsatellites and mitochondrial (mt) DNA. These markers have been used to assess genetic diversity, mating systems, parentage, relatedness, and genetically distinct populations to inform management decisions. Novel approaches in conservation genetics, including next-generation DNA and RNA sequencing, environmental DNA (eDNA), and epigenetics are just beginning to be applied to elasmobranch evolution, physiology, and ecology. Here, we review the methods and results of past studies, explore future directions for shark conservation genetics, and discuss the implications of molecular research and techniques for the long-term management of shark populations in the NEP. © 2017 Elsevier Ltd. All rights reserved.
dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees.

PubMed

Wise, Michael J

2016-01-01

Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa.

dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees

PubMed Central

2016-01-01

Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa. PMID:27898695
Evaluation of sequence alignments and oligonucleotide probes with respect to three-dimensional structure of ribosomal RNA using ARB software package

PubMed Central

Kumar, Yadhu; Westram, Ralf; Kipfer, Peter; Meier, Harald; Ludwig, Wolfgang

2006-01-01

Background Availability of high-resolution RNA crystal structures for the 30S and 50S ribosomal subunits and the subsequent validation of comparative secondary structure models have prompted the biologists to use three-dimensional structure of ribosomal RNA (rRNA) for evaluating sequence alignments of rRNA genes. Furthermore, the secondary and tertiary structural features of rRNA are highly useful and successfully employed in designing rRNA targeted oligonucleotide probes intended for in situ hybridization experiments. RNA3D, a program to combine sequence alignment information with three-dimensional structure of rRNA was developed. Integration into ARB software package, which is used extensively by the scientific community for phylogenetic analysis and molecular probe designing, has substantially extended the functionality of ARB software suite with 3D environment. Results Three-dimensional structure of rRNA is visualized in OpenGL 3D environment with the abilities to change the display and overlay information onto the molecule, dynamically. Phylogenetic information derived from the multiple sequence alignments can be overlaid onto the molecule structure in a real time. Superimposition of both statistical and non-statistical sequence associated information onto the rRNA 3D structure can be done using customizable color scheme, which is also applied to a textual sequence alignment for reference. Oligonucleotide probes designed by ARB probe design tools can be mapped onto the 3D structure along with the probe accessibility models for evaluation with respect to secondary and tertiary structural conformations of rRNA. Conclusion Visualization of three-dimensional structure of rRNA in an intuitive display provides the biologists with the greater possibilities to carry out structure based phylogenetic analysis. Coupled with secondary structure models of rRNA, RNA3D program aids in validating the sequence alignments of rRNA genes and evaluating probe target sites. Superimposition of the information derived from the multiple sequence alignment onto the molecule dynamically allows the researchers to observe any sequence inherited characteristics (phylogenetic information) in real-time environment. The extended ARB software package is made freely available for the scientific community via . PMID:16672074
The Einstein Genome Gateway using WASP - a high throughput multi-layered life sciences portal for XSEDE.

PubMed

Golden, Aaron; McLellan, Andrew S; Dubin, Robert A; Jing, Qiang; O Broin, Pilib; Moskowitz, David; Zhang, Zhengdong; Suzuki, Masako; Hargitai, Joseph; Calder, R Brent; Greally, John M

2012-01-01

Massively-parallel sequencing (MPS) technologies and their diverse applications in genomics and epigenomics research have yielded enormous new insights into the physiology and pathophysiology of the human genome. The biggest hurdle remains the magnitude and diversity of the datasets generated, compromising our ability to manage, organize, process and ultimately analyse data. The Wiki-based Automated Sequence Processor (WASP), developed at the Albert Einstein College of Medicine (hereafter Einstein), uniquely manages to tightly couple the sequencing platform, the sequencing assay, sample metadata and the automated workflows deployed on a heterogeneous high performance computing cluster infrastructure that yield sequenced, quality-controlled and 'mapped' sequence data, all within the one operating environment accessible by a web-based GUI interface. WASP at Einstein processes 4-6 TB of data per week and since its production cycle commenced it has processed ~ 1 PB of data overall and has revolutionized user interactivity with these new genomic technologies, who remain blissfully unaware of the data storage, management and most importantly processing services they request. The abstraction of such computational complexity for the user in effect makes WASP an ideal middleware solution, and an appropriate basis for the development of a grid-enabled resource - the Einstein Genome Gateway - as part of the Extreme Science and Engineering Discovery Environment (XSEDE) program. In this paper we discuss the existing WASP system, its proposed middleware role, and its planned interaction with XSEDE to form the Einstein Genome Gateway.
Software for Managing Parametric Studies

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; McCann, Karen M.; DeVivo, Adrian

2003-01-01

The Information Power Grid Virtual Laboratory (ILab) is a Practical Extraction and Reporting Language (PERL) graphical-user-interface computer program that generates shell scripts to facilitate parametric studies performed on the Grid. (The Grid denotes a worldwide network of supercomputers used for scientific and engineering computations involving data sets too large to fit on desktop computers.) Heretofore, parametric studies on the Grid have been impeded by the need to create control language scripts and edit input data files painstaking tasks that are necessary for managing multiple jobs on multiple computers. ILab reflects an object-oriented approach to automation of these tasks: All data and operations are organized into packages in order to accelerate development and debugging. A container or document object in ILab, called an experiment, contains all the information (data and file paths) necessary to define a complex series of repeated, sequenced, and/or branching processes. For convenience and to enable reuse, this object is serialized to and from disk storage. At run time, the current ILab experiment is used to generate required input files and shell scripts, create directories, copy data files, and then both initiate and monitor the execution of all computational processes.
Utility of Whole-Genome Sequencing of Escherichia coli O157 for Outbreak Detection and Epidemiological Surveillance.

PubMed

Holmes, Anne; Allison, Lesley; Ward, Melissa; Dallman, Timothy J; Clark, Richard; Fawkes, Angie; Murphy, Lee; Hanson, Mary

2015-11-01

Detailed laboratory characterization of Escherichia coli O157 is essential to inform epidemiological investigations. This study assessed the utility of whole-genome sequencing (WGS) for outbreak detection and epidemiological surveillance of E. coli O157, and the data were used to identify discernible associations between genotypes and clinical outcomes. One hundred five E. coli O157 strains isolated over a 5-year period from human fecal samples in Lothian, Scotland, were sequenced with the Ion Torrent Personal Genome Machine. A total of 8,721 variable sites in the core genome were identified among the 105 isolates; 47% of the single nucleotide polymorphisms (SNPs) were attributable to six "atypical" E. coli O157 strains and included recombinant regions. Phylogenetic analyses showed that WGS correlated well with the epidemiological data. Epidemiological links existed between cases whose isolates differed by three or fewer SNPs. WGS also correlated well with multilocus variable-number tandem repeat analysis (MLVA) typing data, with only three discordant results observed, all among isolates from cases not known to be epidemiologically related. WGS produced a better-supported, higher-resolution phylogeny than MLVA, confirming that the method is more suitable for epidemiological surveillance of E. coli O157. A combination of in silico analyses (VirulenceFinder, ResFinder, and local BLAST searches) were used to determine stx subtypes, multilocus sequence types (15 loci), and the presence of virulence and acquired antimicrobial resistance genes. There was a high level of correlation between the WGS data and our routine typing methods, although some discordant results were observed, mostly related to the limitation of short sequence read assembly. The data were used to identify sublineages and clades of E. coli O157, and when they were correlated with the clinical outcome data, they showed that one clade, Ic3, was significantly associated with severe disease. Together, the results show that WGS data can provide higher resolution of the relationships between E. coli O157 isolates than that provided by MLVA. The method has the potential to streamline the laboratory workflow and provide detailed information for the clinical management of patients and public health interventions. Copyright © 2015, Holmes et al.
Utility of Whole-Genome Sequencing of Escherichia coli O157 for Outbreak Detection and Epidemiological Surveillance

PubMed Central

Allison, Lesley; Ward, Melissa; Dallman, Timothy J.; Clark, Richard; Fawkes, Angie; Murphy, Lee; Hanson, Mary

2015-01-01

Detailed laboratory characterization of Escherichia coli O157 is essential to inform epidemiological investigations. This study assessed the utility of whole-genome sequencing (WGS) for outbreak detection and epidemiological surveillance of E. coli O157, and the data were used to identify discernible associations between genotypes and clinical outcomes. One hundred five E. coli O157 strains isolated over a 5-year period from human fecal samples in Lothian, Scotland, were sequenced with the Ion Torrent Personal Genome Machine. A total of 8,721 variable sites in the core genome were identified among the 105 isolates; 47% of the single nucleotide polymorphisms (SNPs) were attributable to six “atypical” E. coli O157 strains and included recombinant regions. Phylogenetic analyses showed that WGS correlated well with the epidemiological data. Epidemiological links existed between cases whose isolates differed by three or fewer SNPs. WGS also correlated well with multilocus variable-number tandem repeat analysis (MLVA) typing data, with only three discordant results observed, all among isolates from cases not known to be epidemiologically related. WGS produced a better-supported, higher-resolution phylogeny than MLVA, confirming that the method is more suitable for epidemiological surveillance of E. coli O157. A combination of in silico analyses (VirulenceFinder, ResFinder, and local BLAST searches) were used to determine stx subtypes, multilocus sequence types (15 loci), and the presence of virulence and acquired antimicrobial resistance genes. There was a high level of correlation between the WGS data and our routine typing methods, although some discordant results were observed, mostly related to the limitation of short sequence read assembly. The data were used to identify sublineages and clades of E. coli O157, and when they were correlated with the clinical outcome data, they showed that one clade, Ic3, was significantly associated with severe disease. Together, the results show that WGS data can provide higher resolution of the relationships between E. coli O157 isolates than that provided by MLVA. The method has the potential to streamline the laboratory workflow and provide detailed information for the clinical management of patients and public health interventions. PMID:26354815
The GermOnline cross-species systems browser provides comprehensive information on genes and gene products relevant for sexual reproduction.

PubMed

Gattiker, Alexandre; Niederhauser-Wiederkehr, Christa; Moore, James; Hermida, Leandro; Primig, Michael

2007-01-01

We report a novel release of the GermOnline knowledgebase covering genes relevant for the cell cycle, gametogenesis and fertility. GermOnline was extended into a cross-species systems browser including information on DNA sequence annotation, gene expression and the function of gene products. The database covers eight model organisms and Homo sapiens, for which complete genome annotation data are available. The database is now built around a sophisticated genome browser (Ensembl), our own microarray information management and annotation system (MIMAS) used to extensively describe experimental data obtained with high-density oligonucleotide microarrays (GeneChips) and a comprehensive system for online editing of database entries (MediaWiki). The RNA data include results from classical microarrays as well as tiling arrays that yield information on RNA expression levels, transcript start sites and lengths as well as exon composition. Members of the research community are solicited to help GermOnline curators keep database entries on genes and gene products complete and accurate. The database is accessible at http://www.germonline.org/.
Neutral Theory is the Foundation of Conservation Genetics.

PubMed

Yoder, Anne D; Poelstra, Jelmer; Tiley, George P; Williams, Rachel

2018-04-16

Kimura's neutral theory of molecular evolution has been essential to virtually every advance in evolutionary genetics, and by extension, is foundational to the field of conservation genetics. Conservation genetics utilizes the key concepts of neutral theory to identify species and populations at risk of losing evolutionary potential by detecting patterns of inbreeding depression and low effective population size. In turn, this information can inform the management of organisms and their habitat providing hope for the long-term preservation of both. We expand upon Avise's "inventorial" and "functional" categories of conservation genetics by proposing a third category that is linked to the coalescent and that we refer to as "process-driven." It is here that connections between Kimura's theory and conservation genetics are strongest. Process-driven conservation genetics can be especially applied to large genomic datasets to identify patterns of historical risk, such as population bottlenecks, and accordingly, yield informed intuitions for future outcomes. By examining inventorial, functional, and process-driven conservation genetics in sequence, we assess the progression from theory, to data collection and analysis, and ultimately, to the production of hypotheses that can inform conservation policies.
Great expectations: patient perspectives and anticipated utility of non-diagnostic genomic-sequencing results.

PubMed

Hylind, Robyn; Smith, Maureen; Rasmussen-Torvik, Laura; Aufox, Sharon

2018-01-01

The management of secondary findings is a challenge to health-care providers relaying clinical genomic-sequencing results to patients. Understanding patients' expectations from non-diagnostic genomic sequencing could help guide this management. This study interviewed 14 individuals enrolled in the eMERGE (Electronic Medical Records and Genomics) study. Participants in eMERGE consent to undergo non-diagnostic genomic sequencing, receive results, and have results returned to their physicians. The interviews assessed expectations and intended use of results. The majority of interviewees were male (64%) and 43% identified as non-Caucasian. A unique theme identified was that many participants expressed uncertainty about the type of diseases they expected to receive results on, what results they wanted to learn about, and how they intended to use results. Participant uncertainty highlights the complex nature of deciding to undergo genomic testing and a deficiency in genomic knowledge. These results could help improve how genomic sequencing and secondary findings are discussed with patients.
A knowledge engineering approach to recognizing and extracting sequences of nucleic acids from scientific literature.

PubMed

García-Remesal, Miguel; Maojo, Victor; Crespo, José

2010-01-01

In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences.
Storage and utilization of HLA genomic data--new approaches to HLA typing.

PubMed

Helmberg, W

2000-01-01

Currently available DNA-based HLA typing assays can provide detailed information about sequence motifs of a tested sample. It is still a common practice, however, for information acquired by high-resolution sequence specific oligonucleotide probe (SSOP) typing or sequence specific priming (SSP) to be presented in a low-resolution serological format. Unfortunately, this representation can lead to significant loss of useful data in many cases. An alternative to assigning allele equivalents to suchDNA typing results is simply to store the observed typing pattern and utilize the information with the help of Virtual DNA Analysis (VDA). Interpretation of the stored typing patterns can then be updated based on newly defined alleles, assuming the sequence motifs detected by the typing reagents are known. Rather than updating reagent specificities in individual laboratories, such updates should be performed in a central, publicly available sequence database. By referring to this database, HLA genomic data can then be stored and transferred between laboratories without loss of information. The 13th International Histocompatibility Workshop offers an ideal opportunity to begin building this common database for the entire human MHC.
Reprint of: Synthesising the effects of land use on natural and managed landscapes.

PubMed

Thackway, Richard; Specht, Alison

2015-11-15

To properly manage our natural and managed landscapes, and to restore or repair degraded areas, it is important to know the changes that have taken place over time, particularly with respect to land use and its cumulative effect on ecological function. In common with many places in the world, where the industrial revolution resulted in profound changes to land use and management, Australia's landscapes have been transformed in the last 200 years. Initially the VAST (Vegetation Assets, States and Transitions) system was developed to describe and map changes in vegetation over time through a series of condition states or classes; here we describe an enhancement to the VAST method which will enable identification of the factors contributing to those changes in state as a result of changes in management practice. The 'VAST-2' system provides a structure in which to compile, interpret and sequence a range of data about past management practices, their effect on site and vegetation condition. Alongside a systematic chronology of land use and management, a hierarchy of indices is used to build a picture of the condition of the vegetation through time: 22 indicators within ten criteria representing three components of vegetation condition-regenerative capacity, vegetation structure and species composition-are scored using information from a variety of sources. These indicators are assessed relative to a pre-European reference state, either actual or synthetic. Each component is weighted proportionally to its contribution to the whole, determined through expert opinion. These weighted condition components are used to produce an aggregated transformation score for the vegetation. The application of this system to a range of sites selected across Australia's tropical, sub-tropical and temperate bioregions is presented, illustrating the utility of the system. Notably, the method accommodates a range of different types of information to be aggregated. Copyright © 2015 Elsevier B.V. All rights reserved.
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

PubMed Central

Pruitt, Kim D.; Tatusova, Tatiana; Maglott, Donna R.

2005-01-01

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248
Diversity, Distribution, and Evolution of Tomato Viruses in China Uncovered by Small RNA Sequencing.

PubMed

Xu, Chenxi; Sun, Xuepeng; Taylor, Angela; Jiao, Chen; Xu, Yimin; Cai, Xiaofeng; Wang, Xiaoli; Ge, Chenhui; Pan, Guanghui; Wang, Quanxi; Fei, Zhangjun; Wang, Quanhua

2017-06-01

Tomato is a major vegetable crop that has tremendous popularity. However, viral disease is still a major factor limiting tomato production. Here, we report the tomato virome identified through sequencing small RNAs of 170 field-grown samples collected in China. A total of 22 viruses were identified, including both well-documented and newly detected viruses. The tomato viral community is dominated by a few species, and they exhibit polymorphisms and recombination in the genomes with cold spots and hot spots. Most samples were coinfected by multiple viruses, and the majority of identified viruses are positive-sense single-stranded RNA viruses. Evolutionary analysis of one of the most dominant tomato viruses, Tomato yellow leaf curl virus (TYLCV), predicts its origin and the time back to its most recent common ancestor. The broadly sampled data have enabled us to identify several unreported viruses in tomato, including a completely new virus, which has a genome of ∼13.4 kb and groups with aphid-transmitted viruses in the genus Cytorhabdovirus Although both DNA and RNA viruses can trigger the biogenesis of virus-derived small interfering RNAs (vsiRNAs), we show that features such as length distribution, paired distance, and base selection bias of vsiRNA sequences reflect different plant Dicer-like proteins and Argonautes involved in vsiRNA biogenesis. Collectively, this study offers insights into host-virus interaction in tomato and provides valuable information to facilitate the management of viral diseases. IMPORTANCE Tomato is an important source of micronutrients in the human diet and is extensively consumed around the world. Virus is among the major constraints on tomato production. Categorizing virus species that are capable of infecting tomato and understanding their diversity and evolution are challenging due to difficulties in detecting such fast-evolving biological entities. Here, we report the landscape of the tomato virome in China, the leading country in tomato production. We identified dozens of viruses present in tomato, including both well-documented and completely new viruses. Some newly emerged viruses in tomato were found to spread fast, and therefore, prompt attention is needed to control them. Moreover, we show that the virus genomes exhibit considerable degree of polymorphisms and recombination, and the virus-derived small interfering RNA (vsiRNA) sequences indicate distinct vsiRNA biogenesis mechanisms for different viruses. The Chinese tomato virome that we developed provides valuable information to facilitate the management of tomato viral diseases. Copyright © 2017 American Society for Microbiology.
Diversity, Distribution, and Evolution of Tomato Viruses in China Uncovered by Small RNA Sequencing

PubMed Central

Xu, Chenxi; Taylor, Angela; Jiao, Chen; Xu, Yimin; Cai, Xiaofeng; Wang, Xiaoli; Ge, Chenhui; Pan, Guanghui; Wang, Quanxi

2017-01-01

ABSTRACT Tomato is a major vegetable crop that has tremendous popularity. However, viral disease is still a major factor limiting tomato production. Here, we report the tomato virome identified through sequencing small RNAs of 170 field-grown samples collected in China. A total of 22 viruses were identified, including both well-documented and newly detected viruses. The tomato viral community is dominated by a few species, and they exhibit polymorphisms and recombination in the genomes with cold spots and hot spots. Most samples were coinfected by multiple viruses, and the majority of identified viruses are positive-sense single-stranded RNA viruses. Evolutionary analysis of one of the most dominant tomato viruses, Tomato yellow leaf curl virus (TYLCV), predicts its origin and the time back to its most recent common ancestor. The broadly sampled data have enabled us to identify several unreported viruses in tomato, including a completely new virus, which has a genome of ∼13.4 kb and groups with aphid-transmitted viruses in the genus Cytorhabdovirus. Although both DNA and RNA viruses can trigger the biogenesis of virus-derived small interfering RNAs (vsiRNAs), we show that features such as length distribution, paired distance, and base selection bias of vsiRNA sequences reflect different plant Dicer-like proteins and Argonautes involved in vsiRNA biogenesis. Collectively, this study offers insights into host-virus interaction in tomato and provides valuable information to facilitate the management of viral diseases. IMPORTANCE Tomato is an important source of micronutrients in the human diet and is extensively consumed around the world. Virus is among the major constraints on tomato production. Categorizing virus species that are capable of infecting tomato and understanding their diversity and evolution are challenging due to difficulties in detecting such fast-evolving biological entities. Here, we report the landscape of the tomato virome in China, the leading country in tomato production. We identified dozens of viruses present in tomato, including both well-documented and completely new viruses. Some newly emerged viruses in tomato were found to spread fast, and therefore, prompt attention is needed to control them. Moreover, we show that the virus genomes exhibit considerable degree of polymorphisms and recombination, and the virus-derived small interfering RNA (vsiRNA) sequences indicate distinct vsiRNA biogenesis mechanisms for different viruses. The Chinese tomato virome that we developed provides valuable information to facilitate the management of tomato viral diseases. PMID:28331089
Inferring Short-Range Linkage Information from Sequencing Chromatograms

PubMed Central

Beggel, Bastian; Neumann-Fraune, Maria; Kaiser, Rolf; Verheyen, Jens; Lengauer, Thomas

2013-01-01

Direct Sanger sequencing of viral genome populations yields multiple ambiguous sequence positions. It is not straightforward to derive linkage information from sequencing chromatograms, which in turn hampers the correct interpretation of the sequence data. We present a method for determining the variants existing in a viral quasispecies in the case of two nearby ambiguous sequence positions by exploiting the effect of sequence context-dependent incorporation of dideoxynucleotides. The computational model was trained on data from sequencing chromatograms of clonal variants and was evaluated on two test sets of in vitro mixtures. The approach achieved high accuracies in identifying the mixture components of 97.4% on a test set in which the positions to be analyzed are only one base apart from each other, and of 84.5% on a test set in which the ambiguous positions are separated by three bases. In silico experiments suggest two major limitations of our approach in terms of accuracy. First, due to a basic limitation of Sanger sequencing, it is not possible to reliably detect minor variants with a relative frequency of no more than 10%. Second, the model cannot distinguish between mixtures of two or four clonal variants, if one of two sets of linear constraints is fulfilled. Furthermore, the approach requires repetitive sequencing of all variants that might be present in the mixture to be analyzed. Nevertheless, the effectiveness of our method on the two in vitro test sets shows that short-range linkage information of two ambiguous sequence positions can be inferred from Sanger sequencing chromatograms without any further assumptions on the mixture composition. Additionally, our model provides new insights into the established and widely used Sanger sequencing technology. The source code of our method is made available at http://bioinf.mpi-inf.mpg.de/publications/beggel/linkageinformation.zip. PMID:24376502
Ultraaccurate genome sequencing and haplotyping of single human cells.

PubMed

Chu, Wai Keung; Edge, Peter; Lee, Ho Suk; Bansal, Vikas; Bafna, Vineet; Huang, Xiaohua; Zhang, Kun

2017-11-21

Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10 -8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs. Copyright © 2017 the Author(s). Published by PNAS.
STAR: an integrated solution to management and visualization of sequencing data.

PubMed

Wang, Tao; Liu, Jie; Shen, Li; Tonti-Filippini, Julian; Zhu, Yun; Jia, Haiyang; Lister, Ryan; Whitaker, John W; Ecker, Joseph R; Millar, A Harvey; Ren, Bing; Wang, Wei

2013-12-15

Easily visualization of complex data features is a necessary step to conduct studies on next-generation sequencing (NGS) data. We developed STAR, an integrated web application that enables online management, visualization and track-based analysis of NGS data. STAR is a multilayer web service system. On the client side, STAR leverages JavaScript, HTML5 Canvas and asynchronous communications to deliver a smoothly scrolling desktop-like graphical user interface with a suite of in-browser analysis tools that range from providing simple track configuration controls to sophisticated feature detection within datasets. On the server side, STAR supports private session state retention via an account management system and provides data management modules that enable collection, visualization and analysis of third-party sequencing data from the public domain with over thousands of tracks hosted to date. Overall, STAR represents a next-generation data exploration solution to match the requirements of NGS data, enabling both intuitive visualization and dynamic analysis of data. STAR browser system is freely available on the web at http://wanglab.ucsd.edu/star/browser and https://github.com/angell1117/STAR-genome-browser.
The utility of transcriptomics in fish conservation.

PubMed

Connon, Richard E; Jeffries, Ken M; Komoroske, Lisa M; Todgham, Anne E; Fangue, Nann A

2018-01-29

There is growing recognition of the need to understand the mechanisms underlying organismal resilience (i.e. tolerance, acclimatization) to environmental change to support the conservation management of sensitive and economically important species. Here, we discuss how functional genomics can be used in conservation biology to provide a cellular-level understanding of organismal responses to environmental conditions. In particular, the integration of transcriptomics with physiological and ecological research is increasingly playing an important role in identifying functional physiological thresholds predictive of compensatory responses and detrimental outcomes, transforming the way we can study issues in conservation biology. Notably, with technological advances in RNA sequencing, transcriptome-wide approaches can now be applied to species where no prior genomic sequence information is available to develop species-specific tools and investigate sublethal impacts that can contribute to population declines over generations and undermine prospects for long-term conservation success. Here, we examine the use of transcriptomics as a means of determining organismal responses to environmental stressors and use key study examples of conservation concern in fishes to highlight the added value of transcriptome-wide data to the identification of functional response pathways. Finally, we discuss the gaps between the core science and policy frameworks and how thresholds identified through transcriptomic evaluations provide evidence that can be more readily used by resource managers. © 2018. Published by The Company of Biologists Ltd.
A Window Into Clinical Next-Generation Sequencing-Based Oncology Testing Practices.

PubMed

Nagarajan, Rakesh; Bartley, Angela N; Bridge, Julia A; Jennings, Lawrence J; Kamel-Reid, Suzanne; Kim, Annette; Lazar, Alexander J; Lindeman, Neal I; Moncur, Joel; Rai, Alex J; Routbort, Mark J; Vasalos, Patricia; Merker, Jason D

2017-12-01

- Detection of acquired variants in cancer is a paradigm of precision medicine, yet little has been reported about clinical laboratory practices across a broad range of laboratories. - To use College of American Pathologists proficiency testing survey results to report on the results from surveys on next-generation sequencing-based oncology testing practices. - College of American Pathologists proficiency testing survey results from more than 250 laboratories currently performing molecular oncology testing were used to determine laboratory trends in next-generation sequencing-based oncology testing. - These presented data provide key information about the number of laboratories that currently offer or are planning to offer next-generation sequencing-based oncology testing. Furthermore, we present data from 60 laboratories performing next-generation sequencing-based oncology testing regarding specimen requirements and assay characteristics. The findings indicate that most laboratories are performing tumor-only targeted sequencing to detect single-nucleotide variants and small insertions and deletions, using desktop sequencers and predesigned commercial kits. Despite these trends, a diversity of approaches to testing exists. - This information should be useful to further inform a variety of topics, including national discussions involving clinical laboratory quality systems, regulation and oversight of next-generation sequencing-based oncology testing, and precision oncology efforts in a data-driven manner.

Non-redundant patent sequence databases with value-added annotations at two levels

PubMed Central

Li, Weizhong; McWilliam, Hamish; de la Torre, Ana Richart; Grodowski, Adam; Benediktovich, Irina; Goujon, Mickael; Nauche, Stephane; Lopez, Rodrigo

2010-01-01

The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/. PMID:19884134
Non-redundant patent sequence databases with value-added annotations at two levels.

PubMed

Li, Weizhong; McWilliam, Hamish; de la Torre, Ana Richart; Grodowski, Adam; Benediktovich, Irina; Goujon, Mickael; Nauche, Stephane; Lopez, Rodrigo

2010-01-01

The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/.
On the specificity of sequential congruency effects in implicit learning of motor and perceptual sequences.

PubMed

D'Angelo, Maria C; Jiménez, Luis; Milliken, Bruce; Lupiáñez, Juan

2013-01-01

Individuals experience less interference from conflicting information following events that contain conflicting information. Recently, Jiménez, Lupiáñez, and Vaquero (2009) demonstrated that such adaptations to conflict occur even when the source of conflict arises from implicit knowledge of sequences. There is accumulating evidence that momentary changes in adaptations made in response to conflicting information are conflict-type specific (e.g., Funes, Lupiáñez, & Humphreys, 2010a), suggesting that there are multiple modes of control. The current study examined whether conflict-specific sequential congruency effects occur when the 2 sources of conflict are implicitly learned. Participants implicitly learned a motor sequence while simultaneously learning a perceptual sequence. In a first experiment, after learning the 2 orthogonal sequences, participants expressed knowledge of the 2 sequences independently of each other in a transfer phase. In Experiments 2 and 3, within each sequence, the presence of a single control trial disrupted the expression of this specific type of learning on the following trial. There was no evidence of cross-conflict modulations in the expression of sequence learning. The results suggest that the mechanisms involved in transient shifts in conflict-specific control, as reflected in sequential congruency effects, are also engaged when the source of conflict is implicit. (c) 2013 APA, all rights reserved.
SEQATOMS: a web tool for identifying missing regions in PDB in sequence context.

PubMed

Brandt, Bernd W; Heringa, Jaap; Leunissen, Jack A M

2008-07-01

With over 46 000 proteins, the Protein Data Bank (PDB) is the most important database with structural information of biological macromolecules. PDB files contain sequence and coordinate information. Residues present in the sequence can be absent from the coordinate section, which means their position in space is unknown. Similarity searches are routinely carried out against sequences taken from PDB SEQRES. However, there no distinction is made between residues that have a known or unknown position in the 3D protein structure. We present a FASTA sequence database that is produced by combining the sequence and coordinate information. All residues absent from the PDB coordinate section are masked with lower-case letters, thereby providing a view of these residues in the context of the entire protein sequence, which facilitates inspecting 'missing' regions. We also provide a masked version of the CATH domain database. A user-friendly BLAST interface is available for similarity searching. In contrast to standard (stand-alone) BLAST output, which only contains upper-case letters, our output retains the lower-case letters of the masked regions. Thus, our server can be used to perform BLAST searching case-sensitively. Here, we have applied it to the study of missing regions in their sequence context. SEQATOMS is available at http://www.bioinformatics.nl/tools/seqatoms/.
Future Newspaper Managers Learn Basics at Oregon.

ERIC Educational Resources Information Center

Halverson, Roy

1978-01-01

Describes an experimental program that prepares students for careers in newspaper management with a sequence of courses in journalism, accounting, marketing, management, finance, and statistics, ending with an internship in the business office of a daily or weekly newspaper. (RL)
Ordering Design Tasks Based on Coupling Strengths

NASA Technical Reports Server (NTRS)

Rogers, J. L.; Bloebaum, C. L.

1994-01-01

The design process associated with large engineering systems requires an initial decomposition of the complex system into modules of design tasks which are coupled through the transference of output data. In analyzing or optimizing such a coupled system, it is essential to be able to determine which interactions figure prominently enough to significantly affect the accuracy of the system solution. Many decomposition approaches assume the capability is available to determine what design tasks and interactions exist and what order of execution will be imposed during the analysis process. Unfortunately, this is often a complex problem and beyond the capabilities of a human design manager. A new feature for DeMAID (Design Manager's Aid for Intelligent Decomposition) will allow the design manager to use coupling strength information to find a proper sequence for ordering the design tasks. In addition, these coupling strengths aid in deciding if certain tasks or couplings could be removed (or temporarily suspended) from consideration to achieve computational savings without a significant loss of system accuracy. New rules are presented and two small test cases are used to show the effects of using coupling strengths in this manner.
Ordering design tasks based on coupling strengths

NASA Technical Reports Server (NTRS)

Rogers, James L., Jr.; Bloebaum, Christina L.

1994-01-01

The design process associated with large engineering systems requires an initial decomposition of the complex system into modules of design tasks which are coupled through the transference of output data. In analyzing or optimizing such a coupled system, it is essential to be able to determine which interactions figure prominently enough to significantly affect the accuracy of the system solution. Many decomposition approaches assume the capability is available to determine what design tasks and interactions exist and what order of execution will be imposed during the analysis process. Unfortunately, this is often a complex problem and beyond the capabilities of a human design manager. A new feature for DeMAID (Design Manager's Aid for Intelligent Decomposition) will allow the design manager to use coupling strength information to find a proper sequence for ordering the design tasks. In addition, these coupling strengths aid in deciding if certain tasks or couplings could be removed (or temporarily suspended) from consideration to achieve computational savings without a significant loss of system accuracy. New rules are presented and two small test cases are used to show the effects of using coupling strengths in this manner.
Shuttle Abort Flight Management (SAFM) - Application Overview

NASA Technical Reports Server (NTRS)

Hu, Howard; Straube, Tim; Madsen, Jennifer; Ricard, Mike

2002-01-01

One of the most demanding tasks that must be performed by the Space Shuttle flight crew is the process of determining whether, when and where to abort the vehicle should engine or system failures occur during ascent or entry. Current Shuttle abort procedures involve paging through complicated paper checklists to decide on the type of abort and where to abort. Additional checklists then lead the crew through a series of actions to execute the desired abort. This process is even more difficult and time consuming in the absence of ground communications since the ground flight controllers have the analysis tools and information that is currently not available in the Shuttle cockpit. Crew workload specifically abort procedures will be greatly simplified with the implementation of the Space Shuttle Cockpit Avionics Upgrade (CAU) project. The intent of CAU is to maximize crew situational awareness and reduce flight workload thru enhanced controls and displays, and onboard abort assessment and determination capability. SAFM was developed to help satisfy the CAU objectives by providing the crew with dynamic information about the capability of the vehicle to perform a variety of abort options during ascent and entry. This paper- presents an overview of the SAFM application. As shown in Figure 1, SAFM processes the vehicle navigation state and other guidance information to provide the CAU displays with evaluations of abort options, as well as landing site recommendations. This is accomplished by three main SAFM components: the Sequencer Executive, the Powered Flight Function, and the Glided Flight Function, The Sequencer Executive dispatches the Powered and Glided Flight Functions to evaluate the vehicle's capability to execute the current mission (or current abort), as well as more than IS hypothetical abort options or scenarios. Scenarios are sequenced and evaluated throughout powered and glided flight. Abort scenarios evaluated include Abort to Orbit (ATO), Transatlantic Abort Landing (TAL), East Coast Abort Landing (ECAL) and Return to Launch Site (RTLS). Sequential and simultaneous engine failures are assessed and landing footprint information is provided during actual entry scenarios as well as hypothetical "loss of thrust now" scenarios during ascent.
E-MSD: an integrated data resource for bioinformatics

PubMed Central

Velankar, S.; McNeil, P.; Mittard-Runte, V.; Suarez, A.; Barrell, D.; Apweiler, R.; Henrick, K.

2005-01-01

The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the ‘Structure Integration with Function, Taxonomy and Sequences (SIFTS)’ initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group. PMID:15608192
MitoRes: a resource of nuclear-encoded mitochondrial genes and their products in Metazoa.

PubMed

Catalano, Domenico; Licciulli, Flavio; Turi, Antonio; Grillo, Giorgio; Saccone, Cecilia; D'Elia, Domenica

2006-01-24

Mitochondria are sub-cellular organelles that have a central role in energy production and in other metabolic pathways of all eukaryotic respiring cells. In the last few years, with more and more genomes being sequenced, a huge amount of data has been generated providing an unprecedented opportunity to use the comparative analysis approach in studies of evolution and functional genomics with the aim of shedding light on molecular mechanisms regulating mitochondrial biogenesis and metabolism. In this context, the problem of the optimal extraction of representative datasets of genomic and proteomic data assumes a crucial importance. Specialised resources for nuclear-encoded mitochondria-related proteins already exist; however, no mitochondrial database is currently available with the same features of MitoRes, which is an update of the MitoNuc database extensively modified in its structure, data sources and graphical interface. It contains data on nuclear-encoded mitochondria-related products for any metazoan species for which this type of data is available and also provides comprehensive sequence datasets (gene, transcript and protein) as well as useful tools for their extraction and export. MitoRes http://www2.ba.itb.cnr.it/MitoRes/ consolidates information from publicly external sources and automatically annotates them into a relational database. Additionally, it also clusters proteins on the basis of their sequence similarity and interconnects them with genomic data. The search engine and sequence management tools allow the query/retrieval of the database content and the extraction and export of sequences (gene, transcript, protein) and related sub-sequences (intron, exon, UTR, CDS, signal peptide and gene flanking regions) ready to be used for in silico analysis. The tool we describe here has been developed to support lab scientists and bioinformaticians alike in the characterization of molecular features and evolution of mitochondrial targeting sequences. The way it provides for the retrieval and extraction of sequences allows the user to overcome the obstacles encountered in the integrative use of different bioinformatic resources and the completeness of the sequence collection allows intra- and interspecies comparison at different biological levels (gene, transcript and protein).
shRNA target prediction informed by comprehensive enquiry (SPICE): a supporting system for high-throughput screening of shRNA library.

PubMed

Kamatuka, Kenta; Hattori, Masahiro; Sugiyama, Tomoyasu

2016-12-01

RNA interference (RNAi) screening is extensively used in the field of reverse genetics. RNAi libraries constructed using random oligonucleotides have made this technology affordable. However, the new methodology requires exploration of the RNAi target gene information after screening because the RNAi library includes non-natural sequences that are not found in genes. Here, we developed a web-based tool to support RNAi screening. The system performs short hairpin RNA (shRNA) target prediction that is informed by comprehensive enquiry (SPICE). SPICE automates several tasks that are laborious but indispensable to evaluate the shRNAs obtained by RNAi screening. SPICE has four main functions: (i) sequence identification of shRNA in the input sequence (the sequence might be obtained by sequencing clones in the RNAi library), (ii) searching the target genes in the database, (iii) demonstrating biological information obtained from the database, and (iv) preparation of search result files that can be utilized in a local personal computer (PC). Using this system, we demonstrated that genes targeted by random oligonucleotide-derived shRNAs were not different from those targeted by organism-specific shRNA. The system facilitates RNAi screening, which requires sequence analysis after screening. The SPICE web application is available at http://www.spice.sugysun.org/.
Prefrontal neural correlates of memory for sequences.

PubMed

Averbeck, Bruno B; Lee, Daeyeol

2007-02-28

The sequence of actions appropriate to solve a problem often needs to be discovered by trial and error and recalled in the future when faced with the same problem. Here, we show that when monkeys had to discover and then remember a sequence of decisions across trials, ensembles of prefrontal cortex neurons reflected the sequence of decisions the animal would make throughout the interval between trials. This signal could reflect either an explicit memory process or a sequence-planning process that begins far in advance of the actual sequence execution. This finding extended to error trials such that, when the neural activity during the intertrial interval specified the wrong sequence, the animal also attempted to execute an incorrect sequence. More specifically, we used a decoding analysis to predict the sequence the monkey was planning to execute at the end of the fore-period, just before sequence execution. When this analysis was applied to error trials, we were able to predict where in the sequence the error would occur, up to three movements into the future. This suggests that prefrontal neural activity can retain information about sequences between trials, and that regardless of whether information is remembered correctly or incorrectly, the prefrontal activity veridically reflects the animal's action plan.
Preferences for learning different types of genome sequencing results among young breast cancer patients: Role of psychological and clinical factors.

PubMed

Kaphingst, Kimberly A; Ivanovich, Jennifer; Lyons, Sarah; Biesecker, Barbara; Dresser, Rebecca; Elrick, Ashley; Matsen, Cindy; Goodman, Melody

2018-01-29

The growing importance of genome sequencing means that patients will increasingly face decisions regarding what results they would like to learn. The present study examined psychological and clinical factors that might affect these preferences. 1,080 women diagnosed with breast cancer at age 40 or younger completed an online survey. We assessed their interest in learning various types of genome sequencing results: risk of preventable disease or unpreventable disease, cancer treatment response, uncertain meaning, risk to relatives' health, and ancestry/physical traits. Multivariable logistic regression was used to examine whether being "very" interested in each result type was associated with clinical factors: BRCA1/2 mutation status, prior genetic testing, family history of breast cancer, and psychological factors: cancer recurrence worry, genetic risk worry, future orientation, health information orientation, and genome sequencing knowledge. The proportion of respondents who were very interested in learning each type of result ranged from 16% to 77%. In all multivariable models, those who were very interested in learning a result type had significantly higher knowledge about sequencing benefits, greater genetic risks worry, and stronger health information orientation compared to those with less interest (p-values < .05). Our findings indicate that high interest in return of various types of genome sequencing results was more closely related to psychological factors. Shared decision-making approaches that increase knowledge about genome sequencing and incorporate patient preferences for health information and learning about genetic risks may help support patients' informed choices about learning different types of sequencing results. © Society of Behavioral Medicine 2018.
Prediction of glutathionylation sites in proteins using minimal sequence information and their experimental validation.

PubMed

Pal, Debojyoti; Sharma, Deepak; Kumar, Mukesh; Sandur, Santosh K

2016-09-01

S-glutathionylation of proteins plays an important role in various biological processes and is known to be protective modification during oxidative stress. Since, experimental detection of S-glutathionylation is labor intensive and time consuming, bioinformatics based approach is a viable alternative. Available methods require relatively longer sequence information, which may prevent prediction if sequence information is incomplete. Here, we present a model to predict glutathionylation sites from pentapeptide sequences. It is based upon differential association of amino acids with glutathionylated and non-glutathionylated cysteines from a database of experimentally verified sequences. This data was used to calculate position dependent F-scores, which measure how a particular amino acid at a particular position may affect the likelihood of glutathionylation event. Glutathionylation-score (G-score), indicating propensity of a sequence to undergo glutathionylation, was calculated using position-dependent F-scores for each amino-acid. Cut-off values were used for prediction. Our model returned an accuracy of 58% with Matthew's correlation-coefficient (MCC) value of 0.165. On an independent dataset, our model outperformed the currently available model, in spite of needing much less sequence information. Pentapeptide motifs having high abundance among glutathionylated proteins were identified. A list of potential glutathionylation hotspot sequences were obtained by assigning G-scores and subsequent Protein-BLAST analysis revealed a total of 254 putative glutathionable proteins, a number of which were already known to be glutathionylated. Our model predicted glutathionylation sites in 93.93% of experimentally verified glutathionylated proteins. Outcome of this study may assist in discovering novel glutathionylation sites and finding candidate proteins for glutathionylation.
Arrays of probes for positional sequencing by hybridization

DOEpatents

Cantor, Charles R [Boston, MA; Prezetakiewiczr, Marek [East Boston, MA; Smith, Cassandra L [Boston, MA; Sano, Takeshi [Waltham, MA

2008-01-15

This invention is directed to methods and reagents useful for sequencing nucleic acid targets utilizing sequencing by hybridization technology comprising probes, arrays of probes and methods whereby sequence information is obtained rapidly and efficiently in discrete packages. That information can be used for the detection, identification, purification and complete or partial sequencing of a particular target nucleic acid. When coupled with a ligation step, these methods can be performed under a single set of hybridization conditions. The invention also relates to the replication of probe arrays and methods for making and replicating arrays of probes which are useful for the large scale manufacture of diagnostic aids used to screen biological samples for specific target sequences. Arrays created using PCR technology may comprise probes with 5'- and/or 3'-overhangs.
Obtaining a more resolute teleost growth hormone phylogeny by the introduction of gaps in sequence alignment.

PubMed

Rubin, D A; Dores, R M

1995-06-01

In order to obtain a more resolute phylogeny of teleosts based on growth hormone (GH) sequences, phylogenetic analyses were performed in which deletions (gaps), which appear to be order specific, were upheld to maintain GH's structural information. Sequences were analyzed at 194 amino acid positions. In addition, the two closest genealogically related groups to the teleosts, Amia calva and Acipenser guldenstadti, were used as outgroups. Modified sequence alignments were also analyzed to determine clade stability. Analyses indicated, in the most parsimonious cladogram, that molecular and morphological relationships for the orders of fishes are congruent. With GH molecular sequence data it was possible to resolve all clades at the familial level. Analyses of the primary sequence data indicate that: (a) the halecomorphean and chondrostean GH sequences are the appropriate outgroups for generating the most parsimonious cladogram for teleosts; (b) proper alignment of teleost GH sequence by the inclusion of gaps is necessary for resolution of the Percomorpha; and (c) removal of sequence information by deleting improperly aligned sequence decreases the phylogenetic signal obtained.
Extension of the COG and arCOG databases by amino acid and nucleotide sequences

PubMed Central

Meereis, Florian; Kaufmann, Michael

2008-01-01

Background The current versions of the COG and arCOG databases, both excellent frameworks for studies in comparative and functional genomics, do not contain the nucleotide sequences corresponding to their protein or protein domain entries. Results Using sequence information obtained from GenBank flat files covering the completely sequenced genomes of the COG and arCOG databases, we constructed NUCOCOG (nucleotide sequences containing COG databases) as an extended version including all nucleotide sequences and in addition the amino acid sequences originally utilized to construct the current COG and arCOG databases. We make available three comprehensive single XML files containing the complete databases including all sequence information. In addition, we provide a web interface as a utility suitable to browse the NUCOCOG database for sequence retrieval. The database is accessible at . Conclusion NUCOCOG offers the possibility to analyze any sequence related property in the context of the COG and arCOG framework simply by using script languages such as PERL applied to a large but single XML document. PMID:19014535
Launch mission summary and sequence of events Telesat-F(anik-D1)/Delta-164

NASA Technical Reports Server (NTRS)

1982-01-01

The launch vehicle, spacecraft, and mission are summarized. Launch window information, vehicle telemetry coverage, real time data flow, telemetry coverage by station, selected trajectory information, and a brief sequence of flight events are included.
Beyond the bucket: testing the effect of experimental design on rate and sequence of decay

NASA Astrophysics Data System (ADS)

Gabbott, Sarah; Murdock, Duncan; Purnell, Mark

2016-04-01

Experimental decay has revealed the potential for profound biases in our interpretations of exceptionally preserved fossils, with non-random sequences of character loss distorting the position of fossil taxa in phylogenetic trees. By characterising these sequences we can rewind this distortion and make better-informed interpretations of the affinity of enigmatic fossil taxa. Equally, rate of character loss is crucial for estimating the preservation potential of phylogentically informative characters, and revealing the mechanisms of preservation themselves. However, experimental decay has been criticised for poorly modeling 'real' conditions, and dismissed as unsophisticated 'bucket science'. Here we test the effect of a differing experimental parameters on the rate and sequence of decay. By doing so, we can test the assumption that the results of decay experiments are applicable to informing interpretations of exceptionally preserved fossils from diverse preservational settings. The results of our experiments demonstrate the validity of using the sequence of character loss as a phylogenetic tool, and sheds light on the extent to which environment must be considered before making decay-informed interpretations, or reconstructing taphonomic pathways. With careful consideration of experimental design, driven by testable hypotheses, decay experiments are robust and informative - experimental taphonomy needn't kick the bucket just yet.
Human Genome Sequencing in Health and Disease

PubMed Central

Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

2013-01-01

Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

Three ingredients for Improved global aftershock forecasts: Tectonic region, time-dependent catalog incompleteness, and inter-sequence variability

USGS Publications Warehouse

Page, Morgan T.; Van Der Elst, Nicholas; Hardebeck, Jeanne L.; Felzer, Karen; Michael, Andrew J.

2016-01-01

Following a large earthquake, seismic hazard can be orders of magnitude higher than the long‐term average as a result of aftershock triggering. Because of this heightened hazard, emergency managers and the public demand rapid, authoritative, and reliable aftershock forecasts. In the past, U.S. Geological Survey (USGS) aftershock forecasts following large global earthquakes have been released on an ad hoc basis with inconsistent methods, and in some cases aftershock parameters adapted from California. To remedy this, the USGS is currently developing an automated aftershock product based on the Reasenberg and Jones (1989) method that will generate more accurate forecasts. To better capture spatial variations in aftershock productivity and decay, we estimate regional aftershock parameters for sequences within the García et al. (2012) tectonic regions. We find that regional variations for mean aftershock productivity reach almost a factor of 10. We also develop a method to account for the time‐dependent magnitude of completeness following large events in the catalog. In addition to estimating average sequence parameters within regions, we develop an inverse method to estimate the intersequence parameter variability. This allows for a more complete quantification of the forecast uncertainties and Bayesian updating of the forecast as sequence‐specific information becomes available.
Magnetic resonance imaging in local staging of endometrial carcinoma: diagnostic performance, pitfalls, and literature review.

PubMed

Zandrino, Franco; La Paglia, Ernesto; Musante, Francesco

2010-01-01

To assess the diagnostic accuracy of magnetic resonance imaging in local staging of endometrial carcinoma, and to review the results and pitfalls described in the literature. Thirty women with a histological diagnosis of endometrial carcinoma underwent magnetic resonance imaging. Unenhanced T2-weighted and dynamic contrast-enhanced Ti-weighted sequences were obtained. Hysterectomy and salpingo-oophorectomy was performed in all patients. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were calculated for the detection of deep myometrial and cervical infiltration. For deep myometrial infiltration T2-weighted sequences reached a sensitivity of 85%, specificity of 76%, PPV of 73%, NVP of 87%, and accuracy of 80%, while contrast-enhanced scans reached a sensitivity of 90%, specificity of 80%, PPV of 82%, NPV of 89%, and accuracy of 85%. For cervical infiltration T2-weighted sequences reached a sensitivity of 75%, specificity of 88%, PPV of 50%, NPV of 96%, and accuracy of 87%, while contrast-enhanced scans reached a sensitivity of 100%, specificity of 94%, PPV of 75%, NPV of 100%, and accuracy of 95%. Unenhanced and dynamic gadolinium-enhanced magnetic resonance allows accurate assessment of myometrial and cervical infiltration. Information provided by magnetic resonance imaging can define prognosis and management.
DNA barcoding of human-biting black flies (Diptera: Simuliidae) in Thailand.

PubMed

Pramual, Pairot; Thaijarern, Jiraporn; Wongpakam, Komgrit

2016-12-01

Black flies (Diptera: Simuliidae) are important insect vectors and pests of humans and animals. Accurate identification, therefore, is important for control and management. In this study, we used mitochondrial cytochrome oxidase I (COI) barcoding sequences to test the efficiency of species identification for the human-biting black flies in Thailand. We used human-biting specimens because they enabled us to link information with previous studies involving the immature stages. Three black fly taxa, Simulium nodosum, S. nigrogilvum and S. doipuiense complex, were collected. The S. doipuiense complex was confirmed for the first time as having human-biting habits. The COI sequences revealed considerable genetic diversity in all three species. Comparisons to a COI sequence library of black flies in Thailand and in a public database indicated a high efficiency for specimen identification for S. nodosum and S. nigrogilvum, but this method was not successful for the S. doipuiense complex. Phylogenetic analyses revealed two divergent lineages in the S. doipuiense complex. Human-biting specimens formed a separate clade from other members of this complex. The results are consistent with the Barcoding Index Number System (BINs) analysis that found six BINs in the S. doipuiense complex. Further taxonomic work is needed to clarify the species status of these human-biting specimens. Copyright © 2016 Elsevier B.V. All rights reserved.
DNA fingerprints in physical anthropology.

PubMed

Weiss, Mark L

1989-01-01

Hypervariabal minisatellite DNA is a recently described class of nuclear sequences with no known biological function. The minisatellites do form a subtype of restricition fragment length polymorphisms possessing several characteristics particularly intriguing to anthropologists interested in forensics, sociobiology, primate conservation, genetic variability, and molecular evolution. The sequence occupy at least five dozen loci scattered throughout the human genome. Unlike many polymorphisms, many of the loci have numerous alleles each present at similar frequencies. Such a genetic structure produces exceptionally high levels of heterozygosity and thus provides a tool for the individualization of tissue samples. Additionally, as the alleles are inherited in a Mendelian fashion, the minisatellites provide a superb tool for the identification of paternity (or maternity). Unlike standard blood groups, levels of variability are so high in populations studied to data that parentage can be established by inclusion rather than exclution. Homologous sequences are shown to exist in a variety of Old World primates. Visualization of genetic fingerprints in nonhumans may allow for determination of paternity where the pool of potential sires is available, while also providing information on levels of genetic variability. These capabilities will ultimately provide for better management of primate colonies. Used in concert with behavioral data, a number of sociobiological will also become more amenable to investigation. Copyright © 1989 Wiley-Liss, Inc., A Wiley Company.
MIPS: a database for protein sequences and complete genomes.

PubMed Central

Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

1998-01-01

The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795
Expressed sequence tag based identification and expression analysis of some cold inducible elements in seabuckthorn (Hippophae rhamnoides L.).

PubMed

Ghangal, Rajesh; Raghuvanshi, Saurabh; Sharma, Prakash C

2012-02-01

A cDNA library was constructed from the mature leaves of seabuckthorn (Hippophae rhamnoides). Expressed Sequence Tags (ESTs) were generated by single pass sequencing of 4500 cDNA clones. We submitted 3412 ESTs to dbEST of NCBI. Clustering of these ESTs yielded 1665 unigenes comprising of 345 contigs and 1320 singletons. Out of 1665 unigenes, 1278 unigenes were annotated by similarity search while the remaining 387 unannotated unigenes were considered as organism specific. Gene Ontology (GO) analysis of the unigene dataset showed 691 unigenes related to biological processes, 727 to molecular functions and 588 to cellular component category. On the basis of similarity search and GO annotation, 43 unigenes were found responsive to biotic and abiotic stresses. To validate this observation, 13 genes that are known to be associated with cold stress tolerance from previous studies in Arabidopsis and 3 novel transcripts were examined by Real time RT-PCR to understand the change in expression pattern under cold/freeze stress. In silico study of occurrence of microsatellites in these ESTs revealed the presence of 62 Simple Sequence Repeats (SSRs), some of which are being explored to assess genetic diversity among seabuckthorn collections. This is the first report of generation of transcriptome data providing information about genes involved in managing plant abiotic stress in seabuckthorn, a plant known for its enormous medicinal and ecological value. Copyright © 2011 Elsevier Masson SAS. All rights reserved.
When is it MODY? Challenges in the Interpretation of Sequence Variants in MODY Genes

PubMed Central

Althari, Sara; Gloyn, Anna L.

2015-01-01

The genomics revolution has raised more questions than it has provided answers. Big data from large population-scale resequencing studies are increasingly deconstructing classic notions of Mendelian disease genetics, which support a simplistic correlation between mutational severity and phenotypic outcome. The boundaries are being blurred as the body of evidence showing monogenic disease-causing alleles in healthy genomes, and in the genomes of individu-als with increased common complex disease risk, continues to grow. In this review, we focus on the newly emerging challenges which pertain to the interpretation of sequence variants in genes implicated in the pathogenesis of maturity-onset diabetes of the young (MODY), a presumed mono-genic form of diabetes characterized by Mendelian inheritance. These challenges highlight the complexities surrounding the assignments of pathogenicity, in particular to rare protein-alerting variants, and bring to the forefront some profound clinical diagnostic implications. As MODY is both genetically and clinically heterogeneous, an accurate molecular diagnosis and cautious extrapolation of sequence data are critical to effective disease management and treatment. The biological and translational value of sequence information can only be attained by adopting a multitude of confirmatory analyses, which interrogate variant implication in disease from every possible angle. Indeed, studies which have effectively detected rare damaging variants in known MODY genes in normoglycemic individuals question the existence of a sin-gle gene mutation scenario: does monogenic diabetes exist when the genetic culprits of MODY have been systematical-ly identified in individuals without MODY? PMID:27111119
Feasibility of 3.0T pelvic MR imaging in the evaluation of endometriosis.

PubMed

Manganaro, L; Fierro, F; Tomei, A; Irimia, D; Lodise, P; Sergi, M E; Vinci, V; Sollazzo, P; Porpora, M G; Delfini, R; Vittori, G; Marini, M

2012-06-01

Endometriosis represents an important clinical problem in women of reproductive age with high impact on quality of life, work productivity and health care management. The aim of this study is to define the role of 3T magnetom system MRI in the evaluation of endometriosis. Forty-six women, with transvaginal (TV) ultrasound examination positive for endometriosis, with pelvic pain, or infertile underwent an MR 3.0T examination with the following protocol: T2 weighted FRFSE HR sequences, T2 weighted FRFSE HR CUBE 3D sequences, T1 w FSE sequences, LAVA-flex sequences. Pelvic anatomy, macroscopic endometriosis implants, deep endometriosis implants, fallopian tube involvement, adhesions presence, fluid effusion in Douglas pouch, uterus and kidney pathologies or anomalies associated and sacral nervous routes were considered by two radiologists in consensus. Laparoscopy was considered the gold standard. MRI imaging diagnosed deep endometriosis in 22/46 patients, endometriomas not associated to deep implants in 9/46 patients, 15/46 patients resulted negative for endometriosis, 11 of 22 patients with deep endometriosis reported ovarian endometriosis cyst. We obtained high percentages of sensibility (96.97%), specificity (100.00%), VPP (100.00%), VPN (92.86%). Pelvic MRI performed with 3T system guarantees high spatial and contrast resolution, providing accurate information about endometriosis implants, with a good pre-surgery mapping of the lesions involving both bowels and bladder surface and recto-uterine ligaments. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Cell illustrator 4.0: a computational platform for systems biology.

PubMed

Nagasaki, Masao; Saito, Ayumu; Jeong, Euna; Li, Chen; Kojima, Kaname; Ikeda, Emi; Miyano, Satoru

2011-01-01

Cell Illustrator is a software platform for Systems Biology that uses the concept of Petri net for modeling and simulating biopathways. It is intended for biological scientists working at bench. The latest version of Cell Illustrator 4.0 uses Java Web Start technology and is enhanced with new capabilities, including: automatic graph grid layout algorithms using ontology information; tools using Cell System Markup Language (CSML) 3.0 and Cell System Ontology 3.0; parameter search module; high-performance simulation module; CSML database management system; conversion from CSML model to programming languages (FORTRAN, C, C++, Java, Python and Perl); import from SBML, CellML, and BioPAX; and, export to SVG and HTML. Cell Illustrator employs an extension of hybrid Petri net in an object-oriented style so that biopathway models can include objects such as DNA sequence, molecular density, 3D localization information, transcription with frame-shift, translation with codon table, as well as biochemical reactions.
Bio-Intelligence: A Research Program Facilitating the Development of New Paradigms for Tomorrow's Patient Care

NASA Astrophysics Data System (ADS)

Phan, Sieu; Famili, Fazel; Liu, Ziying; Peña-Castillo, Lourdes

The advancement of omics technologies in concert with the enabling information technology development has accelerated biological research to a new realm in a blazing speed and sophistication. The limited single gene assay to the high throughput microarray assay and the laborious manual count of base-pairs to the robotic assisted machinery in genome sequencing are two examples to name. Yet even more sophisticated, the recent development in literature mining and artificial intelligence has allowed researchers to construct complex gene networks unraveling many formidable biological puzzles. To harness these emerging technologies to their full potential to medical applications, the Bio-intelligence program at the Institute for Information Technology, National Research Council Canada, aims to develop and exploit artificial intelligence and bioinformatics technologies to facilitate the development of intelligent decision support tools and systems to improve patient care - for early detection, accurate diagnosis/prognosis of disease, and better personalized therapeutic management.
GSDC: A Unique Data Center in Korea for HEP research

NASA Astrophysics Data System (ADS)

Ahn, Sang-Un

2017-04-01

Global Science experimental Data hub Center (GSDC) at Korea Institute of Science and Technology Information (KISTI) is a unique data center in South Korea established for promoting the fundamental research fields by supporting them with the expertise on Information and Communication Technology (ICT) and the infrastructure for High Performance Computing (HPC), High Throughput Computing (HTC) and Networking. GSDC has supported various research fields in South Korea dealing with the large scale of data, e.g. RENO experiment for neutrino research, LIGO experiment for gravitational wave detection, Genome sequencing project for bio-medical, and HEP experiments such as CDF at FNAL, Belle at KEK, and STAR at BNL. In particular, GSDC has run a Tier-1 center for ALICE experiment using the LHC at CERN since 2013. In this talk, we present the overview on computing infrastructure that GSDC runs for the research fields and we discuss on the data center infrastructure management system deployed at GSDC.
Automatic processing of spoken dialogue in the home hemodialysis domain.

PubMed

Lacson, Ronilda; Barzilay, Regina

2005-01-01

Spoken medical dialogue is a valuable source of information, and it forms a foundation for diagnosis, prevention and therapeutic management. However, understanding even a perfect transcript of spoken dialogue is challenging for humans because of the lack of structure and the verbosity of dialogues. This work presents a first step towards automatic analysis of spoken medical dialogue. The backbone of our approach is an abstraction of a dialogue into a sequence of semantic categories. This abstraction uncovers structure in informal, verbose conversation between a caregiver and a patient, thereby facilitating automatic processing of dialogue content. Our method induces this structure based on a range of linguistic and contextual features that are integrated in a supervised machine-learning framework. Our model has a classification accuracy of 73%, compared to 33% achieved by a majority baseline (p<0.01). This work demonstrates the feasibility of automatically processing spoken medical dialogue.
Cell Illustrator 4.0: a computational platform for systems biology.

PubMed

Nagasaki, Masao; Saito, Ayumu; Jeong, Euna; Li, Chen; Kojima, Kaname; Ikeda, Emi; Miyano, Satoru

2010-01-01

Cell Illustrator is a software platform for Systems Biology that uses the concept of Petri net for modeling and simulating biopathways. It is intended for biological scientists working at bench. The latest version of Cell Illustrator 4.0 uses Java Web Start technology and is enhanced with new capabilities, including: automatic graph grid layout algorithms using ontology information; tools using Cell System Markup Language (CSML) 3.0 and Cell System Ontology 3.0; parameter search module; high-performance simulation module; CSML database management system; conversion from CSML model to programming languages (FORTRAN, C, C++, Java, Python and Perl); import from SBML, CellML, and BioPAX; and, export to SVG and HTML. Cell Illustrator employs an extension of hybrid Petri net in an object-oriented style so that biopathway models can include objects such as DNA sequence, molecular density, 3D localization information, transcription with frame-shift, translation with codon table, as well as biochemical reactions.
New development and validation of 50 SSR markers in breadfruit (Artocarpus altilis, Moraceae) by next-generation sequencing.

PubMed

De Bellis, Fabien; Malapa, Roger; Kagy, Valérie; Lebegin, Stéphane; Billot, Claire; Labouisse, Jean-Pierre

2016-08-01

Using next-generation sequencing technology, new microsatellite loci were characterized in Artocarpus altilis (Moraceae) and two congeners to increase the number of available markers for genotyping breadfruit cultivars. A total of 47,607 simple sequence repeat loci were obtained by sequencing a library of breadfruit genomic DNA with an Illumina MiSeq system. Among them, 50 single-locus markers were selected and assessed using 41 samples (39 A. altilis, one A. camansi, and one A. heterophyllus). All loci were polymorphic in A. altilis, 44 in A. camansi, and 21 in A. heterophyllus. The number of alleles per locus ranged from two to 19. The new markers will be useful for assessing the identity and genetic diversity of breadfruit cultivars on a small geographical scale, gaining a better understanding of farmer management practices, and will help to optimize breadfruit genebank management.
Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.

PubMed

Neuwald, Andrew F

2009-08-01

The patterns of sequence similarity and divergence present within functionally diverse, evolutionarily related proteins contain implicit information about corresponding biochemical similarities and differences. A first step toward accessing such information is to statistically analyze these patterns, which, in turn, requires that one first identify and accurately align a very large set of protein sequences. Ideally, the set should include many distantly related, functionally divergent subgroups. Because it is extremely difficult, if not impossible for fully automated methods to align such sequences correctly, researchers often resort to manual curation based on detailed structural and biochemical information. However, multiply-aligning vast numbers of sequences in this way is clearly impractical. This problem is addressed using Multiply-Aligned Profiles for Global Alignment of Protein Sequences (MAPGAPS). The MAPGAPS program uses a set of multiply-aligned profiles both as a query to detect and classify related sequences and as a template to multiply-align the sequences. It relies on Karlin-Altschul statistics for sensitivity and on PSI-BLAST (and other) heuristics for speed. Using as input a carefully curated multiple-profile alignment for P-loop GTPases, MAPGAPS correctly aligned weakly conserved sequence motifs within 33 distantly related GTPases of known structure. By comparison, the sequence- and structurally based alignment methods hmmalign and PROMALS3D misaligned at least 11 and 23 of these regions, respectively. When applied to a dataset of 65 million protein sequences, MAPGAPS identified, classified and aligned (with comparable accuracy) nearly half a million putative P-loop GTPase sequences. A C++ implementation of MAPGAPS is available at http://mapgaps.igs.umaryland.edu. Supplementary data are available at Bioinformatics online.
Implementation of Quality Management in Core Service Laboratories

PubMed Central

Creavalle, T.; Haque, K.; Raley, C.; Subleski, M.; Smith, M.W.; Hicks, B.

2010-01-01

CF-28 The Genetics and Genomics group of the Advanced Technology Program of SAIC-Frederick exists to bring innovative genomic expertise, tools and analysis to NCI and the scientific community. The Sequencing Facility (SF) provides next generation short read (Illumina) sequencing capacity to investigators using a streamlined production approach. The Laboratory of Molecular Technology (LMT) offers a wide range of genomics core services including microarray expression analysis, miRNA analysis, array comparative genome hybridization, long read (Roche) next generation sequencing, quantitative real time PCR, transgenic genotyping, Sanger sequencing, and clinical mutation detection services to investigators from across the NIH. As the technology supporting this genomic research becomes more complex, the need for basic quality processes within all aspects of the core service groups becomes critical. The Quality Management group works alongside members of these labs to establish or improve processes supporting operations control (equipment, reagent and materials management), process improvement (reengineering/optimization, automation, acceptance criteria for new technologies and tech transfer), and quality assurance and customer support (controlled documentation/SOPs, training, service deficiencies and continual improvement efforts). Implementation and expansion of quality programs within unregulated environments demonstrates SAIC-Frederick's dedication to providing the highest quality products and services to the NIH community.
Configuring the Orion Guidance, Navigation, and Control Flight Software for Automated Sequencing

NASA Technical Reports Server (NTRS)

Odegard, Ryan G.; Siliwinski, Tomasz K.; King, Ellis T.; Hart, Jeremy J.

2010-01-01

The Orion Crew Exploration Vehicle is being designed with greater automation capabilities than any other crewed spacecraft in NASA s history. The Guidance, Navigation, and Control (GN&C) flight software architecture is designed to provide a flexible and evolvable framework that accommodates increasing levels of automation over time. Within the GN&C flight software, a data-driven approach is used to configure software. This approach allows data reconfiguration and updates to automated sequences without requiring recompilation of the software. Because of the great dependency of the automation and the flight software on the configuration data, the data management is a vital component of the processes for software certification, mission design, and flight operations. To enable the automated sequencing and data configuration of the GN&C subsystem on Orion, a desktop database configuration tool has been developed. The database tool allows the specification of the GN&C activity sequences, the automated transitions in the software, and the corresponding parameter reconfigurations. These aspects of the GN&C automation on Orion are all coordinated via data management, and the database tool provides the ability to test the automation capabilities during the development of the GN&C software. In addition to providing the infrastructure to manage the GN&C automation, the database tool has been designed with capabilities to import and export artifacts for simulation analysis and documentation purposes. Furthermore, the database configuration tool, currently used to manage simulation data, is envisioned to evolve into a mission planning tool for generating and testing GN&C software sequences and configurations. A key enabler of the GN&C automation design, the database tool allows both the creation and maintenance of the data artifacts, as well as serving the critical role of helping to manage, visualize, and understand the data-driven parameters both during software development and throughout the life of the Orion project.
Phylogeny and Haplotype Analysis of Fungi Within the Fusarium incarnatum-equiseti Species Complex.

PubMed

Ramdial, H; Latchoo, R K; Hosein, F N; Rampersad, S N

2017-01-01

Fusarium spp. are ranked among the top 10 most economically and scientifically important plant-pathogenic fungi in the world and are associated with plant diseases that include fruit decay of a number of crops. Fusarium isolates infecting bell pepper in Trinidad were identified based on sequence comparisons of the translation elongation factor gene (EF-1a) with sequences of Fusarium incarnatum-equiseti species complex (FIESC) verified in the FUSARIUM-ID database. Eighty-two isolates were identified as belonging to one of four phylogenetic species within the subclades FIESC-1, FIESC-15, FIESC-16, and FIESC-26, with the majority of isolates belonging to FIESC-15. A comparison of the level of DNA polymorphism and phylogenetic inference for sequences of the internal transcribed spacer region (ITS1-5.8S-ITS2) and EF-1a sequences for Trinidad and FUSARIUM-ID type species was carried out. The ITS sequences were less informative, had lower haplotype diversity and restricted haplotype distribution, and resulted in poor resolution and taxa placement in the consensus maximum-likelihood tree. EF-1a sequences enabled strongly supported phylogenetic inference with highly resolved branching patterns of the 30 phylogenetic species within the FIESC and placement of representative Trinidad isolates. Therefore, global phylogeny was inferred from EF-1a sequences representing 11 countries, and separation into distinct Incarnatum and Equiseti clades was again evident. In total, 42 haplotypes were identified: 12 were shared and the remaining were unique haplotypes. The most diverse haplotype was represented by sequences from China, Indonesia, Malaysia, and Trinidad and consisted exclusively of F. incarnatum isolates. Spain had the highest haplotype diversity, perhaps because both F. equiseti and F. incarnatum sequences were represented; followed by the United States, which contributed both F. equiseti and F. incarnatum sequences to the data set; then by countries representing Southeast Asia (China, Indonesia, Malaysia, Thailand, and Philippines) and Trinidad; both of these regions were represented by only F. incarnatum sequences. Trinidad shared two haplotypes with China and one haplotype with the United States for only F. incarnatum isolates. The findings of this study are important for devising disease management strategies and for understanding the phylogenetic relationships among members of the FIESC.
Next-generation digital information storage in DNA.

PubMed

Church, George M; Gao, Yuan; Kosuri, Sriram

2012-09-28

Digital information is accumulating at an astounding rate, straining our ability to store and archive it. DNA is among the most dense and stable information media known. The development of new technologies in both DNA synthesis and sequencing make DNA an increasingly feasible digital storage medium. We developed a strategy to encode arbitrary digital information in DNA, wrote a 5.27-megabit book using DNA microchips, and read the book by using next-generation DNA sequencing.
Allele Identification for Transcriptome-Based Population Genomics in the Invasive Plant Centaurea solstitialis

PubMed Central

Dlugosch, Katrina M.; Lai, Zhao; Bonin, Aurélie; Hierro, José; Rieseberg, Loren H.

2013-01-01

Transcriptome sequences are becoming more broadly available for multiple individuals of the same species, providing opportunities to derive population genomic information from these datasets. Using the 454 Life Science Genome Sequencer FLX and FLX-Titanium next-generation platforms, we generated 11−430 Mbp of sequence for normalized cDNA for 40 wild genotypes of the invasive plant Centaurea solstitialis, yellow starthistle, from across its worldwide distribution. We examined the impact of sequencing effort on transcriptome recovery and overlap among individuals. To do this, we developed two novel publicly available software pipelines: SnoWhite for read cleaning before assembly, and AllelePipe for clustering of loci and allele identification in assembled datasets with or without a reference genome. AllelePipe is designed specifically for cases in which read depth information is not appropriate or available to assist with disentangling closely related paralogs from allelic variation, as in transcriptome or previously assembled libraries. We find that modest applications of sequencing effort recover most of the novel sequences present in the transcriptome of this species, including single-copy loci and a representative distribution of functional groups. In contrast, the coverage of variable sites, observation of heterozygosity, and overlap among different libraries are all highly dependent on sequencing effort. Nevertheless, the information gained from overlapping regions was informative regarding coarse population structure and variation across our small number of population samples, providing the first genetic evidence in support of hypothesized invasion scenarios. PMID:23390612

Some links on this page may take you to non-federal websites. Their policies may differ from this site.