obtain sequence information: Topics by Science.gov

Sample records for obtain sequence information

Identifying functionally informative evolutionary sequence profiles.

PubMed

Gil, Nelson; Fiser, Andras

2018-04-15

Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.
Calculating the quality of public high-throughput sequencing data to obtain a suitable subset for reanalysis from the Sequence Read Archive.

PubMed

Ohta, Tazro; Nakazato, Takeru; Bono, Hidemasa

2017-06-01

It is important for public data repositories to promote the reuse of archived data. In the growing field of omics science, however, the increasing number of submissions of high-throughput sequencing (HTSeq) data to public repositories prevents users from choosing a suitable data set from among the large number of search results. Repository users need to be able to set a threshold to reduce the number of results to obtain a suitable subset of high-quality data for reanalysis. We calculated the quality of sequencing data archived in a public data repository, the Sequence Read Archive (SRA), by using the quality control software FastQC. We obtained quality values for 1 171 313 experiments, which can be used to evaluate the suitability of data for reuse. We also visualized the data distribution in SRA by integrating the quality information and metadata of experiments and samples. We provide quality information of all of the archived sequencing data, which enable users to obtain sufficient quality sequencing data for reanalyses. The calculated quality data are available to the public in various formats. Our data also provide an example of enhancing the reuse of public data by adding metadata to published research data by a third party. © The Authors 2017. Published by Oxford University Press.
Calculating the quality of public high-throughput sequencing data to obtain a suitable subset for reanalysis from the Sequence Read Archive

PubMed Central

Nakazato, Takeru; Bono, Hidemasa

2017-01-01

Abstract It is important for public data repositories to promote the reuse of archived data. In the growing field of omics science, however, the increasing number of submissions of high-throughput sequencing (HTSeq) data to public repositories prevents users from choosing a suitable data set from among the large number of search results. Repository users need to be able to set a threshold to reduce the number of results to obtain a suitable subset of high-quality data for reanalysis. We calculated the quality of sequencing data archived in a public data repository, the Sequence Read Archive (SRA), by using the quality control software FastQC. We obtained quality values for 1 171 313 experiments, which can be used to evaluate the suitability of data for reuse. We also visualized the data distribution in SRA by integrating the quality information and metadata of experiments and samples. We provide quality information of all of the archived sequencing data, which enable users to obtain sufficient quality sequencing data for reanalyses. The calculated quality data are available to the public in various formats. Our data also provide an example of enhancing the reuse of public data by adding metadata to published research data by a third party. PMID:28449062
Obtaining a more resolute teleost growth hormone phylogeny by the introduction of gaps in sequence alignment.

PubMed

Rubin, D A; Dores, R M

1995-06-01

In order to obtain a more resolute phylogeny of teleosts based on growth hormone (GH) sequences, phylogenetic analyses were performed in which deletions (gaps), which appear to be order specific, were upheld to maintain GH's structural information. Sequences were analyzed at 194 amino acid positions. In addition, the two closest genealogically related groups to the teleosts, Amia calva and Acipenser guldenstadti, were used as outgroups. Modified sequence alignments were also analyzed to determine clade stability. Analyses indicated, in the most parsimonious cladogram, that molecular and morphological relationships for the orders of fishes are congruent. With GH molecular sequence data it was possible to resolve all clades at the familial level. Analyses of the primary sequence data indicate that: (a) the halecomorphean and chondrostean GH sequences are the appropriate outgroups for generating the most parsimonious cladogram for teleosts; (b) proper alignment of teleost GH sequence by the inclusion of gaps is necessary for resolution of the Percomorpha; and (c) removal of sequence information by deleting improperly aligned sequence decreases the phylogenetic signal obtained.
Elman RNN based classification of proteins sequences on account of their mutual information.

PubMed

Mishra, Pooja; Nath Pandey, Paras

2012-10-21

In the present work we have employed the method of estimating residue correlation within the protein sequences, by using the mutual information (MI) of adjacent residues, based on structural and solvent accessibility properties of amino acids. The long range correlation between nonadjacent residues is improved by constructing a mutual information vector (MIV) for a single protein sequence, like this each protein sequence is associated with its corresponding MIVs. These MIVs are given to Elman RNN to obtain the classification of protein sequences. The modeling power of MIV was shown to be significantly better, giving a new approach towards alignment free classification of protein sequences. We also conclude that sequence structural and solvent accessible property based MIVs are better predictor. Copyright © 2012 Elsevier Ltd. All rights reserved.
Sequence information gain based motif analysis.

PubMed

Maynou, Joan; Pairó, Erola; Marco, Santiago; Perera, Alexandre

2015-11-09

The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70% of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.
Complete Genome Sequence of Mycobacterium marinum ATCC 927T, Obtained Using Nanopore and Illumina Sequencing Technologies.

PubMed

Yoshida, Mitsunori; Fukano, Hanako; Miyamoto, Yuji; Shibayama, Keigo; Suzuki, Masato; Hoshino, Yoshihiko

2018-05-17

Mycobacterium marinum is a slowly growing, broad-host-range mycobacterial species. Here, we report the complete genome sequence of a Mycobacterium marinum type strain that was isolated from tubercles of diseased fish. This sequence will provide essential information for future taxonomic and comparative genome studies of its relatives. Copyright © 2018 Yoshida et al.
When are pathogen genome sequences informative of transmission events?

PubMed Central

Ferguson, Neil; Jombart, Thibaut

2018-01-01

Recent years have seen the development of numerous methodologies for reconstructing transmission trees in infectious disease outbreaks from densely sampled whole genome sequence data. However, a fundamental and as of yet poorly addressed limitation of such approaches is the requirement for genetic diversity to arise on epidemiological timescales. Specifically, the position of infected individuals in a transmission tree can only be resolved by genetic data if mutations have accumulated between the sampled pathogen genomes. To quantify and compare the useful genetic diversity expected from genetic data in different pathogen outbreaks, we introduce here the concept of ‘transmission divergence’, defined as the number of mutations separating whole genome sequences sampled from transmission pairs. Using parameter values obtained by literature review, we simulate outbreak scenarios alongside sequence evolution using two models described in the literature to describe transmission divergence of ten major outbreak-causing pathogens. We find that while mean values vary significantly between the pathogens considered, their transmission divergence is generally very low, with many outbreaks characterised by large numbers of genetically identical transmission pairs. We describe the impact of transmission divergence on our ability to reconstruct outbreaks using two outbreak reconstruction tools, the R packages outbreaker and phybreak, and demonstrate that, in agreement with previous observations, genetic sequence data of rapidly evolving pathogens such as RNA viruses can provide valuable information on individual transmission events. Conversely, sequence data of pathogens with lower mean transmission divergence, including Streptococcus pneumoniae, Shigella sonnei and Clostridium difficile, provide little to no information about individual transmission events. Our results highlight the informational limitations of genetic sequence data in certain outbreak scenarios, and
Location of core diagnostic information across various sequences in brain MRI and implications for efficiency of MRI scanner utilization.

PubMed

Sharma, Aseem; Chatterjee, Arindam; Goyal, Manu; Parsons, Matthew S; Bartel, Seth

2015-04-01

Targeting redundancy within MRI can improve its cost-effective utilization. We sought to quantify potential redundancy in our brain MRI protocols. In this retrospective review, we aggregated 207 consecutive adults who underwent brain MRI and reviewed their medical records to document clinical indication, core diagnostic information provided by MRI, and its clinical impact. Contributory imaging abnormalities constituted positive core diagnostic information whereas absence of imaging abnormalities constituted negative core diagnostic information. The senior author selected core sequences deemed sufficient for extraction of core diagnostic information. For validating core sequences selection, four readers assessed the relative ease of extracting core diagnostic information from the core sequences. Potential redundancy was calculated by comparing the average number of core sequences to the average number of sequences obtained. Scanning had been performed using 9.4±2.8 sequences over 37.3±12.3 minutes. Core diagnostic information was deemed extractable from 2.1±1.1 core sequences, with an assumed scanning time of 8.6±4.8 minutes, reflecting a potential redundancy of 74.5%±19.1%. Potential redundancy was least in scans obtained for treatment planning (14.9%±25.7%) and highest in scans obtained for follow-up of benign diseases (81.4%±12.6%). In 97.4% of cases, all four readers considered core diagnostic information to be either easily extractable from core sequences or the ease to be equivalent to that from the entire study. With only one MRI lacking clinical impact (0.48%), overutilization did not seem to contribute to potential redundancy. High potential redundancy that can be targeted for more efficient scanner utilization exists in brain MRI protocols.
40 CFR 1515.10 - Obtaining available information.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 40 Protection of Environment 34 2013-07-01 2013-07-01 false Obtaining available information. 1515.10 Section 1515.10 Protection of Environment COUNCIL ON ENVIRONMENTAL QUALITY FREEDOM OF INFORMATION ACT PROCEDURES Availability of Information § 1515.10 Obtaining available information. (a) When a...
Comparison of ZP3 protein sequences among vertebrate species: to obtain a consensus sequence for immunocontraception.

PubMed

Zhu, X; Naz, R K

1999-03-01

The deduced ZP3 amino acid (aa) sequences of 13 vertebrate species namely mouse, hamster, rabbit, pig, porcine, cow, dog, cat, human, bonnet, marmoset, carp, and frog were compared using the PILEUP and PRETTY alignment programs (GCG, Wisconsin, USA). The published aa sequences obtained from 13 vertebrate species indicated the overall evolutionarily conservation in the N-terminus, central region, and C-terminus of the ZP3 polypeptide. More variations of ZP3 polypeptide sequences were seen in the alignments of carp and frog from the 11 mammalian species making the leader sequence more prominent. The canonical furin proteolytic processing signal at the C-terminus was found in all the ZP3 polypeptide sequences except of carp and frog. In the central region, the ZP3 deduced aa sequences of all the 13 vertebrate species aligned well, and six relatively conserved sequences were found. There are 11 conserved cysteine residues in the central region across all species including carp and frog, indicating that these residues have longer evolutionary history. The ZP3 aa sequence similarities were examined using the GAP program (GCG). The highest aa similarities are observed between the members of the same order within the class mammalia, and also (95.4%) between pig (ungulata) and rabbit (lagomorpha). The deduced ZP3 aa sequences per se may not be enough to build a phylogenetic tree.
48 CFR 209.105-1 - Obtaining information.

Code of Federal Regulations, 2011 CFR

2011-10-01

... Information Retrieval System (PPIRS), available at http://www.ppirs.gov. Information relating to contract... 48 Federal Acquisition Regulations System 3 2011-10-01 2011-10-01 false Obtaining information. 209....105-1 Obtaining information. (1) For guidance on using the Excluded Parties List System, see PGI 209...
48 CFR 209.105-1 - Obtaining information.

Code of Federal Regulations, 2013 CFR

2013-10-01

... Performance Information Retrieval System (PPIRS), available at http://www.ppirs.gov. Information relating to... 48 Federal Acquisition Regulations System 3 2013-10-01 2013-10-01 false Obtaining information. 209....105-1 Obtaining information. (1) For guidance on using the System for Award Management Exclusions, see...
48 CFR 209.105-1 - Obtaining information.

Code of Federal Regulations, 2010 CFR

2010-10-01

... Information Retrieval System (PPIRS), available at http://www.ppirs.gov. Information relating to contract... 48 Federal Acquisition Regulations System 3 2010-10-01 2010-10-01 false Obtaining information. 209....105-1 Obtaining information. (1) For guidance on using the Excluded Parties List System, see PGI 209...
48 CFR 209.105-1 - Obtaining information.

Code of Federal Regulations, 2012 CFR

2012-10-01

... Information Retrieval System (PPIRS), available at http://www.ppirs.gov. Information relating to contract... 48 Federal Acquisition Regulations System 3 2012-10-01 2012-10-01 false Obtaining information. 209....105-1 Obtaining information. (1) For guidance on using the Excluded Parties List System, see PGI 209...
Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications

PubMed Central

Yilmaz, Pelin; Kottmann, Renzo; Field, Dawn; Knight, Rob; Cole, James R; Amaral-Zettler, Linda; Gilbert, Jack A; Karsch-Mizrachi, Ilene; Johnston, Anjanette; Cochrane, Guy; Vaughan, Robert; Hunter, Christopher; Park, Joonhong; Morrison, Norman; Rocca-Serra, Philippe; Sterk, Peter; Arumugam, Manimozhiyan; Bailey, Mark; Baumgartner, Laura; Birren, Bruce W; Blaser, Martin J; Bonazzi, Vivien; Booth, Tim; Bork, Peer; Bushman, Frederic D; Buttigieg, Pier Luigi; Chain, Patrick S G; Charlson, Emily; Costello, Elizabeth K; Huot-Creasy, Heather; Dawyndt, Peter; DeSantis, Todd; Fierer, Noah; Fuhrman, Jed A; Gallery, Rachel E; Gevers, Dirk; Gibbs, Richard A; Gil, Inigo San; Gonzalez, Antonio; Gordon, Jeffrey I; Guralnick, Robert; Hankeln, Wolfgang; Highlander, Sarah; Hugenholtz, Philip; Jansson, Janet; Kau, Andrew L; Kelley, Scott T; Kennedy, Jerry; Knights, Dan; Koren, Omry; Kuczynski, Justin; Kyrpides, Nikos; Larsen, Robert; Lauber, Christian L; Legg, Teresa; Ley, Ruth E; Lozupone, Catherine A; Ludwig, Wolfgang; Lyons, Donna; Maguire, Eamonn; Methé, Barbara A; Meyer, Folker; Muegge, Brian; Nakielny, Sara; Nelson, Karen E; Nemergut, Diana; Neufeld, Josh D; Newbold, Lindsay K; Oliver, Anna E; Pace, Norman R; Palanisamy, Giriprakash; Peplies, Jörg; Petrosino, Joseph; Proctor, Lita; Pruesse, Elmar; Quast, Christian; Raes, Jeroen; Ratnasingham, Sujeevan; Ravel, Jacques; Relman, David A; Assunta-Sansone, Susanna; Schloss, Patrick D; Schriml, Lynn; Sinha, Rohini; Smith, Michelle I; Sodergren, Erica; Spor, Aymé; Stombaugh, Jesse; Tiedje, James M; Ward, Doyle V; Weinstock, George M; Wendel, Doug; White, Owen; Whiteley, Andrew; Wilke, Andreas; Wortman, Jennifer R; Yatsunenko, Tanya; Glöckner, Frank Oliver

2012-01-01

Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences—the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The ‘environmental packages’ apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere. PMID:21552244
48 CFR 9.105-1 - Obtaining information.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 1 2013-10-01 2013-10-01 false Obtaining information. 9... information. (a) Before making a determination of responsibility, the contracting officer shall possess or obtain information sufficient to be satisfied that a prospective contractor currently meets the...
A multilevel ant colony optimization algorithm for classical and isothermic DNA sequencing by hybridization with multiplicity information available.

PubMed

Kwarciak, Kamil; Radom, Marcin; Formanowicz, Piotr

2016-04-01

The classical sequencing by hybridization takes into account a binary information about sequence composition. A given element from an oligonucleotide library is or is not a part of the target sequence. However, the DNA chip technology has been developed and it enables to receive a partial information about multiplicity of each oligonucleotide the analyzed sequence consist of. Currently, it is not possible to assess the exact data of such type but even partial information should be very useful. Two realistic multiplicity information models are taken into consideration in this paper. The first one, called "one and many" assumes that it is possible to obtain information if a given oligonucleotide occurs in a reconstructed sequence once or more than once. According to the second model, called "one, two and many", one is able to receive from biochemical experiment information if a given oligonucleotide is present in an analyzed sequence once, twice or at least three times. An ant colony optimization algorithm has been implemented to verify the above models and to compare with existing algorithms for sequencing by hybridization which utilize the additional information. The proposed algorithm solves the problem with any kind of hybridization errors. Computational experiment results confirm that using even the partial information about multiplicity leads to increased quality of reconstructed sequences. Moreover, they also show that the more precise model enables to obtain better solutions and the ant colony optimization algorithm outperforms the existing ones. Test data sets and the proposed ant colony optimization algorithm are available on: http://bioserver.cs.put.poznan.pl/download/ACO4mSBH.zip. Copyright © 2016 Elsevier Ltd. All rights reserved.
48 CFR 509.105-1 - Obtaining information.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 4 2013-10-01 2013-10-01 false Obtaining information. 509... Obtaining information. (a) From a prospective contractor. FAR 9.105-1 lists a number of sources of information that a contracting officer may utilize before making a determination of responsibility. The...
48 CFR 509.105-1 - Obtaining information.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 4 2011-10-01 2011-10-01 false Obtaining information. 509... Obtaining information. (a) From a prospective contractor. FAR 9.105-1 lists a number of sources of information that a contracting officer may utilize before making a determination of responsibility. The...

48 CFR 509.105-1 - Obtaining information.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 4 2012-10-01 2012-10-01 false Obtaining information. 509... Obtaining information. (a) From a prospective contractor. FAR 9.105-1 lists a number of sources of information that a contracting officer may utilize before making a determination of responsibility. The...
48 CFR 509.105-1 - Obtaining information.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 4 2014-10-01 2014-10-01 false Obtaining information. 509... Obtaining information. (a) From a prospective contractor. FAR 9.105-1 lists a number of sources of information that a contracting officer may utilize before making a determination of responsibility. The...
48 CFR 209.105-1 - Obtaining information.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 3 2014-10-01 2014-10-01 false Obtaining information. 209....105-1 Obtaining information. (1) For guidance on using the Exclusions section of the System for Award... responsibility (see FAR 9.104-1(c)). One source of information relating to contractor performance is the Past...
5 CFR 1501.12 - Obtaining further information.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 5 Administrative Personnel 3 2010-01-01 2010-01-01 false Obtaining further information. 1501.12 Section 1501.12 Administrative Personnel THE INTERNATIONAL ORGANIZATIONS EMPLOYEES LOYALTY BOARD OPERATIONS OF THE INTERNATIONAL ORGANIZATIONS EMPLOYEES LOYALTY BOARD § 1501.12 Obtaining further information...
Information capacity of nucleotide sequences and its applications.

PubMed

Sadovsky, M G

2006-05-01

The information capacity of nucleotide sequences is defined through the specific entropy of frequency dictionary of a sequence determined with respect to another one containing the most probable continuations of shorter strings. This measure distinguishes a sequence both from a random one, and from ordered entity. A comparison of sequences based on their information capacity is studied. An order within the genetic entities is found at the length scale ranged from 3 to 8. Some other applications of the developed methodology to genetics, bioinformatics, and molecular biology are discussed.
PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

PubMed Central

2011-01-01

Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/. PMID:21385349
PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities.

PubMed

Troshin, Peter V; Postis, Vincent Lg; Ashworth, Denise; Baldwin, Stephen A; McPherson, Michael J; Barton, Geoffrey J

2011-03-07

Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.
Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

PubMed Central

2012-01-01

Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence
5 CFR 2411.5 - Procedure for obtaining information.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 5 Administrative Personnel 3 2010-01-01 2010-01-01 false Procedure for obtaining information. 2411.5 Section 2411.5 Administrative Personnel FEDERAL LABOR RELATIONS AUTHORITY, GENERAL COUNSEL OF THE... OFFICIAL INFORMATION § 2411.5 Procedure for obtaining information. (a) Authority/General Counsel/Panel/IG...
Integration of Temporal and Ordinal Information During Serial Interception Sequence Learning

PubMed Central

Gobel, Eric W.; Sanchez, Daniel J.; Reber, Paul J.

2011-01-01

The expression of expert motor skills typically involves learning to perform a precisely timed sequence of movements (e.g., language production, music performance, athletic skills). Research examining incidental sequence learning has previously relied on a perceptually-cued task that gives participants exposure to repeating motor sequences but does not require timing of responses for accuracy. Using a novel perceptual-motor sequence learning task, learning a precisely timed cued sequence of motor actions is shown to occur without explicit instruction. Participants learned a repeating sequence through practice and showed sequence-specific knowledge via a performance decrement when switched to an unfamiliar sequence. In a second experiment, the integration of representation of action order and timing sequence knowledge was examined. When either action order or timing sequence information was selectively disrupted, performance was reduced to levels similar to completely novel sequences. Unlike prior sequence-learning research that has found timing information to be secondary to learning action sequences, when the task demands require accurate action and timing information, an integrated representation of these types of information is acquired. These results provide the first evidence for incidental learning of fully integrated action and timing sequence information in the absence of an independent representation of action order, and suggest that this integrative mechanism may play a material role in the acquisition of complex motor skills. PMID:21417511
[Complete genome sequencing and sequence analysis of BCG Tice].

PubMed

Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

2012-10-04

The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.
RStrucFam: a web server to associate structure and cognate RNA for RNA-binding proteins from sequence information.

PubMed

Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan

2016-10-07

RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential
Prediction of glutathionylation sites in proteins using minimal sequence information and their experimental validation.

PubMed

Pal, Debojyoti; Sharma, Deepak; Kumar, Mukesh; Sandur, Santosh K

2016-09-01

S-glutathionylation of proteins plays an important role in various biological processes and is known to be protective modification during oxidative stress. Since, experimental detection of S-glutathionylation is labor intensive and time consuming, bioinformatics based approach is a viable alternative. Available methods require relatively longer sequence information, which may prevent prediction if sequence information is incomplete. Here, we present a model to predict glutathionylation sites from pentapeptide sequences. It is based upon differential association of amino acids with glutathionylated and non-glutathionylated cysteines from a database of experimentally verified sequences. This data was used to calculate position dependent F-scores, which measure how a particular amino acid at a particular position may affect the likelihood of glutathionylation event. Glutathionylation-score (G-score), indicating propensity of a sequence to undergo glutathionylation, was calculated using position-dependent F-scores for each amino-acid. Cut-off values were used for prediction. Our model returned an accuracy of 58% with Matthew's correlation-coefficient (MCC) value of 0.165. On an independent dataset, our model outperformed the currently available model, in spite of needing much less sequence information. Pentapeptide motifs having high abundance among glutathionylated proteins were identified. A list of potential glutathionylation hotspot sequences were obtained by assigning G-scores and subsequent Protein-BLAST analysis revealed a total of 254 putative glutathionable proteins, a number of which were already known to be glutathionylated. Our model predicted glutathionylation sites in 93.93% of experimentally verified glutathionylated proteins. Outcome of this study may assist in discovering novel glutathionylation sites and finding candidate proteins for glutathionylation.
Information theory applications for biological sequence analysis.

PubMed

Vinga, Susana

2014-05-01

Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.
Effects of informed consent for individual genome sequencing on relevant knowledge.

PubMed

Kaphingst, K A; Facio, F M; Cheng, M-R; Brooks, S; Eidem, H; Linn, A; Biesecker, B B; Biesecker, L G

2012-11-01

Increasing availability of individual genomic information suggests that patients will need knowledge about genome sequencing to make informed decisions, but prior research is limited. In this study, we examined genome sequencing knowledge before and after informed consent among 311 participants enrolled in the ClinSeq™ sequencing study. An exploratory factor analysis of knowledge items yielded two factors (sequencing limitations knowledge; sequencing benefits knowledge). In multivariable analysis, high pre-consent sequencing limitations knowledge scores were significantly related to education [odds ratio (OR): 8.7, 95% confidence interval (CI): 2.45-31.10 for post-graduate education, and OR: 3.9; 95% CI: 1.05, 14.61 for college degree compared with less than college degree] and race/ethnicity (OR: 2.4, 95% CI: 1.09, 5.38 for non-Hispanic Whites compared with other racial/ethnic groups). Mean values increased significantly between pre- and post-consent for the sequencing limitations knowledge subscale (6.9-7.7, p < 0.0001) and sequencing benefits knowledge subscale (7.0-7.5, p < 0.0001); increase in knowledge did not differ by sociodemographic characteristics. This study highlights gaps in genome sequencing knowledge and underscores the need to target educational efforts toward participants with less education or from minority racial/ethnic groups. The informed consent process improved genome sequencing knowledge. Future studies could examine how genome sequencing knowledge influences informed decision making. © 2012 John Wiley & Sons A/S.
21 CFR 20.109 - Data and information obtained by contract.

Code of Federal Regulations, 2011 CFR

2011-04-01

... 21 Food and Drugs 1 2011-04-01 2011-04-01 false Data and information obtained by contract. 20.109... GENERAL PUBLIC INFORMATION Availability of Specific Categories of Records § 20.109 Data and information obtained by contract. (a) All data and information obtained by the Food and Drug Administration by contract...
48 CFR 9.105-1 - Obtaining information.

Code of Federal Regulations, 2011 CFR

2011-10-01

... the responsibility of prospective contractors, including requesting preaward surveys when necessary..., especially when research and development is involved, the contracting officer may obtain this information... information concerning (i) the low bidder or (ii) those offerors in range for award. (2) Preaward surveys...
48 CFR 9.105-1 - Obtaining information.

Code of Federal Regulations, 2014 CFR

2014-10-01

... the responsibility of prospective contractors, including requesting preaward surveys when necessary..., especially when research and development is involved, the contracting officer may obtain this information... information concerning (i) the low bidder or (ii) those offerors in range for award. (2) Preaward surveys...
48 CFR 9.105-1 - Obtaining information.

Code of Federal Regulations, 2012 CFR

2012-10-01

... the responsibility of prospective contractors, including requesting preaward surveys when necessary..., especially when research and development is involved, the contracting officer may obtain this information... information concerning (i) the low bidder or (ii) those offerors in range for award. (2) Preaward surveys...
Inferring Short-Range Linkage Information from Sequencing Chromatograms

PubMed Central

Beggel, Bastian; Neumann-Fraune, Maria; Kaiser, Rolf; Verheyen, Jens; Lengauer, Thomas

2013-01-01

Direct Sanger sequencing of viral genome populations yields multiple ambiguous sequence positions. It is not straightforward to derive linkage information from sequencing chromatograms, which in turn hampers the correct interpretation of the sequence data. We present a method for determining the variants existing in a viral quasispecies in the case of two nearby ambiguous sequence positions by exploiting the effect of sequence context-dependent incorporation of dideoxynucleotides. The computational model was trained on data from sequencing chromatograms of clonal variants and was evaluated on two test sets of in vitro mixtures. The approach achieved high accuracies in identifying the mixture components of 97.4% on a test set in which the positions to be analyzed are only one base apart from each other, and of 84.5% on a test set in which the ambiguous positions are separated by three bases. In silico experiments suggest two major limitations of our approach in terms of accuracy. First, due to a basic limitation of Sanger sequencing, it is not possible to reliably detect minor variants with a relative frequency of no more than 10%. Second, the model cannot distinguish between mixtures of two or four clonal variants, if one of two sets of linear constraints is fulfilled. Furthermore, the approach requires repetitive sequencing of all variants that might be present in the mixture to be analyzed. Nevertheless, the effectiveness of our method on the two in vitro test sets shows that short-range linkage information of two ambiguous sequence positions can be inferred from Sanger sequencing chromatograms without any further assumptions on the mixture composition. Additionally, our model provides new insights into the established and widely used Sanger sequencing technology. The source code of our method is made available at http://bioinf.mpi-inf.mpg.de/publications/beggel/linkageinformation.zip. PMID:24376502

Information-Theoretical Analysis of EEG Microstate Sequences in Python.

PubMed

von Wegner, Frederic; Laufs, Helmut

2018-01-01

We present an open-source Python package to compute information-theoretical quantities for electroencephalographic data. Electroencephalography (EEG) measures the electrical potential generated by the cerebral cortex and the set of spatial patterns projected by the brain's electrical potential on the scalp surface can be clustered into a set of representative maps called EEG microstates. Microstate time series are obtained by competitively fitting the microstate maps back into the EEG data set, i.e., by substituting the EEG data at a given time with the label of the microstate that has the highest similarity with the actual EEG topography. As microstate sequences consist of non-metric random variables, e.g., the letters A-D, we recently introduced information-theoretical measures to quantify these time series. In wakeful resting state EEG recordings, we found new characteristics of microstate sequences such as periodicities related to EEG frequency bands. The algorithms used are here provided as an open-source package and their use is explained in a tutorial style. The package is self-contained and the programming style is procedural, focusing on code intelligibility and easy portability. Using a sample EEG file, we demonstrate how to perform EEG microstate segmentation using the modified K-means approach, and how to compute and visualize the recently introduced information-theoretical tests and quantities. The time-lagged mutual information function is derived as a discrete symbolic alternative to the autocorrelation function for metric time series and confidence intervals are computed from Markov chain surrogate data. The software package provides an open-source extension to the existing implementations of the microstate transform and is specifically designed to analyze resting state EEG recordings.
48 CFR 9.105-1 - Obtaining information.

Code of Federal Regulations, 2010 CFR

2010-10-01

... information required concerning the adequacy of prospective contractors' accounting systems and these systems... accounting systems, and these systems' suitability for use in administering the proposed type of contract. (3... 48 Federal Acquisition Regulations System 1 2010-10-01 2010-10-01 false Obtaining information. 9...
Image encryption using random sequence generated from generalized information domain

NASA Astrophysics Data System (ADS)

Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu

2016-05-01

A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.
Embedding strategies for effective use of information from multiple sequence alignments.

PubMed Central

Henikoff, S.; Henikoff, J. G.

1997-01-01

We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain. PMID:9070452
BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads.

PubMed

Hong, Lewis Z; Hong, Shuzhen; Wong, Han Teng; Aw, Pauline P K; Cheng, Yan; Wilm, Andreas; de Sessions, Paola F; Lim, Seng Gee; Nagarajan, Niranjan; Hibberd, Martin L; Quake, Stephen R; Burkholder, William F

2014-01-01

We present a method for obtaining long haplotypes, of over 3 kb in length, using a short-read sequencer, Barcode-directed Assembly for Extra-long Sequences (BAsE-Seq). BAsE-Seq relies on transposing a template-specific barcode onto random segments of the template molecule and assembling the barcoded short reads into complete haplotypes. We applied BAsE-Seq on mixed clones of hepatitis B virus and accurately identified haplotypes occurring at frequencies greater than or equal to 0.4%, with >99.9% specificity. Applying BAsE-Seq to a clinical sample, we obtained over 9,000 viral haplotypes, which provided an unprecedented view of hepatitis B virus population structure during chronic infection. BAsE-Seq is readily applicable for monitoring quasispecies evolution in viral diseases.
MACSIMS : multiple alignment of complete sequences information management system

PubMed Central

Thompson, Julie D; Muller, Arnaud; Waterhouse, Andrew; Procter, Jim; Barton, Geoffrey J; Plewniak, Frédéric; Poch, Olivier

2006-01-01

Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at . PMID:16792820
Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing.

PubMed

Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas

2009-06-01

The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.
19 CFR 201.9 - Methods employed in obtaining information.

Code of Federal Regulations, 2014 CFR

2014-04-01

... 19 Customs Duties 3 2014-04-01 2014-04-01 false Methods employed in obtaining information. 201.9 Section 201.9 Customs Duties UNITED STATES INTERNATIONAL TRADE COMMISSION GENERAL RULES OF GENERAL APPLICATION Initiation and Conduct of Investigations § 201.9 Methods employed in obtaining information. In...
19 CFR 201.9 - Methods employed in obtaining information.

Code of Federal Regulations, 2013 CFR

2013-04-01

... 19 Customs Duties 3 2013-04-01 2013-04-01 false Methods employed in obtaining information. 201.9 Section 201.9 Customs Duties UNITED STATES INTERNATIONAL TRADE COMMISSION GENERAL RULES OF GENERAL APPLICATION Initiation and Conduct of Investigations § 201.9 Methods employed in obtaining information. In...
Obtaining information by dynamic (effortful) touching

PubMed Central

Turvey, M. T.; Carello, Claudia

2011-01-01

Dynamic touching is effortful touching. It entails deformation of muscles and fascia and activation of the embedded mechanoreceptors, as when an object is supported and moved by the body. It is realized as exploratory activities that can vary widely in spatial and temporal extents (a momentary heft, an extended walk). Research has revealed the potential of dynamic touching for obtaining non-visual information about the body (e.g. limb orientation), attachments to the body (e.g. an object's height and width) and the relation of the body both to attachments (e.g. hand's location on a grasped object) and surrounding surfaces (e.g. places and their distances). Invariants over the exploratory activity (e.g. moments of a wielded object's mass distribution) seem to ground this ‘information about’. The conception of a haptic medium as a nested tensegrity structure has been proposed to express the obtained information realized by myofascia deformation, by its invariants and transformations. The tensegrity proposal rationalizes the relative indifference of dynamic touch to the site of mechanical contact (hand, foot, torso or probe) and the overtness of exploratory activity. It also provides a framework for dynamic touching's fractal nature, and the finding that its degree of fractality may matter to its accomplishments. PMID:21969694
National Practice Patterns of Obtaining Informed Consent for Stroke Thrombolysis.

PubMed

Mendelson, Scott J; Courtney, D Mark; Gordon, Elisa J; Thomas, Leena F; Holl, Jane L; Prabhakaran, Shyam

2018-03-01

No standard approach to obtaining informed consent for stroke thrombolysis with tPA (tissue-type plasminogen activator) currently exists. We aimed to assess current nationwide practice patterns of obtaining informed consent for tPA. An online survey was developed and distributed by e-mail to clinicians involved in acute stroke care. Multivariable logistic regression analyses were performed to determine independent factors contributing to always obtaining informed consent for tPA. Among 268 respondents, 36.7% reported always obtaining informed consent and 51.8% reported the informed consent process caused treatment delays. Being an emergency medicine physician (odds ratio, 5.8; 95% confidence interval, 2.9-11.5) and practicing at a nonacademic medical center (odds ratio, 2.1; 95% confidence interval, 1.0-4.3) were independently associated with always requiring informed consent. The most commonly cited cause of delay was waiting for a patient's family to reach consensus about treatment. Most clinicians always or often require informed consent for stroke thrombolysis. Future research should focus on standardizing content and delivery of tPA information to reduce delays. © 2018 American Heart Association, Inc.
Protocol to obtain targeted transcript sequence data from snake venom samples collected in the Colombian field.

PubMed

Fonseca, Alejandra; Renjifo-Ibáñez, Camila; Renjifo, Juan Manuel; Cabrera, Rodrigo

2018-03-21

Snake venoms are a mixture of different molecules that can be used in the design of drugs for various diseases. The study of these venoms has relied on strategies that use complete venom extracted from animals in captivity or from venom glands that require the sacrifice of the animals. Colombia, a country with political and geographical conflicts has difficult access to certain regions. A strategy that can prevent the sacrifice of animals and could allow the study of samples collected in the field is necessary. We report the use of lyophilized venom from Crotalus durissus cumanensis as a model to test, for the first time, a protocol for the amplification of complete toxins from Colombian venom samples collected in the field. In this protocol, primers were designed from conserved region from Crotalus sp. mRNA and EST regions to maximize the likelihood of coding sequence amplification. We obtained the sequences of Metalloproteinases II, Disintegrins, Disintegrin-Like, Phospholipases A 2, C-type Lectins and Serine proteinases from Crotalus durissus cumanensis and compared them to different Crotalus sp sequences available on databases obtaining concordance between the toxins amplified and those reported. Our strategy allows the use of lyophilized venom to obtain complete toxin sequences from samples collected in the field and the study of poorly characterized venoms in challenging environments. Copyright © 2018 Elsevier Ltd. All rights reserved.
28 CFR 51.38 - Obtaining information from others.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 28 Judicial Administration 2 2011-07-01 2011-07-01 false Obtaining information from others. 51.38 Section 51.38 Judicial Administration DEPARTMENT OF JUSTICE (CONTINUED) PROCEDURES FOR THE ADMINISTRATION OF SECTION 5 OF THE VOTING RIGHTS ACT OF 1965, AS AMENDED Processing of Submissions § 51.38 Obtaining...
28 CFR 51.38 - Obtaining information from others.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 28 Judicial Administration 2 2010-07-01 2010-07-01 false Obtaining information from others. 51.38 Section 51.38 Judicial Administration DEPARTMENT OF JUSTICE (CONTINUED) PROCEDURES FOR THE ADMINISTRATION OF SECTION 5 OF THE VOTING RIGHTS ACT OF 1965, AS AMENDED Processing of Submissions § 51.38 Obtaining...
Sequence and phylogenetic analysis of chicken anaemia virus obtained from backyard and commercial chickens in Nigeria.

PubMed

Oluwayelu, D O; Todd, D; Olaleye, O D

2008-12-01

This work reports the first molecular analysis study of chicken anaemia virus (CAV) in backyard chickens in Africa using molecular cloning and sequence analysis to characterize CAV strains obtained from commercial chickens and Nigerian backyard chickens. Partial VP1 gene sequences were determined for three CAVs from commercial chickens and for six CAV variants present in samples from a backyard chicken. Multiple alignment analysis revealed that the 6% and 4% nucleotide diversity obtained respectively for the commercial and backyard chicken strains translated to only 2% amino acid diversity for each breed. Overall, the amino acid composition of Nigerian CAVs was found to be highly conserved. Since the partial VP1 gene sequence of two backyard chicken cloned CAV strains (NGR/CI-8 and NGR/CI-9) were almost identical and evolutionarily closely related to the commercial chicken strains NGR-1, and NGR-4 and NGR-5, respectively, we concluded that CAV infections had crossed the farm boundary.
The"minimum information about an environmental sequence" (MIENS) specification

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yilmaz, P.; Kottmann, R.; Field, D.

We present the Genomic Standards Consortium's (GSC) 'Minimum Information about an ENvironmental Sequence' (MIENS) standard for describing marker genes. Adoption of MIENS will enhance our ability to analyze natural genetic diversity across the Tree of Life as it is currently being documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.
Sequence information signal processor

DOEpatents

Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

1999-01-01

An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.
48 CFR 1809.505-4 - Obtaining access to sensitive information.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 6 2011-10-01 2011-10-01 false Obtaining access to sensitive information. 1809.505-4 Section 1809.505-4 Federal Acquisition Regulations System NATIONAL... Organizational and Consultant Conflicts of Interest 1809.505-4 Obtaining access to sensitive information. (b) In...
A DNA sequence obtained by replacement of the dopamine RNA aptamer bases is not an aptamer.

PubMed

Álvarez-Martos, Isabel; Ferapontova, Elena E

2017-08-05

A unique specificity of the aptamer-ligand biorecognition and binding facilitates bioanalysis and biosensor development, contributing to discrimination of structurally related molecules, such as dopamine and other catecholamine neurotransmitters. The aptamer sequence capable of specific binding of dopamine is a 57 nucleotides long RNA sequence reported in 1997 (Biochemistry, 1997, 36, 9726). Later, it was suggested that the DNA homologue of the RNA aptamer retains the specificity of dopamine binding (Biochem. Biophys. Res. Commun., 2009, 388, 732). Here, we show that the DNA sequence obtained by the replacement of the RNA aptamer bases for their DNA analogues is not able of specific biorecognition of dopamine, in contrast to the original RNA aptamer sequence. This DNA sequence binds dopamine and structurally related catecholamine neurotransmitters non-specifically, as any DNA sequence, and, thus, is not an aptamer and cannot be used neither for in vivo nor in situ analysis of dopamine in the presence of structurally related neurotransmitters. Copyright © 2017 Elsevier Inc. All rights reserved.
Properties of some monkey DNA sequences obtained by a procedure that enriches for DNA replication origins.

PubMed

Zannis-Hadjopoulos, M; Kaufmann, G; Wang, S S; Lechner, R L; Karawya, E; Hesse, J; Martin, R G

1985-07-01

Twelve clones of monkey DNA obtained by a procedure that enriches 10(3)- to 10(4)-fold for nascent sequences activated early in S phase (G. Kaufmann, M. Zannis-Hadjopoulos, and R. G. Martin, Mol. Cell. Biol. 5:721-727, 1985) have been examined. Only 2 of the 12 ors sequences (origin-enriched sequences) are unique (ors1 and ors8). Three contain the highly reiterated Alu family (ors3, ors9, and ors11). One contains the highly reiterated alpha-satellite family (ors12), but none contain the Kpn family. Those remaining contain middle repetitive sequences. Two examples of the same middle repetitive sequence were found (ors2 and ors6). Three of the middle repetitive sequences (the ors2-ors6 pair, ors5, and ors10) are moderately dispersed; one (ors4) is highly dispersed. The last, ors7, has been mapped to the bona fide replication origin of the D loop of mitochondrial DNA. Of the nine ors sequences tested, half possess snapback (intrachain reannealing) properties.

Use of Genome Sequence Information for Meat Quality Trait QTL Mining for Causal Genes and Mutations on Pig Chromosome 17

PubMed Central

Hu, Zhi-Liang; Ramos, Antonio M.; Humphray, Sean J.; Rogers, Jane; Reecy, James M.; Rothschild, Max F.

2011-01-01

The newly available pig genome sequence has provided new information to fine map quantitative trait loci (QTL) in order to eventually identify causal variants. With targeted genomic sequencing efforts, we were able to obtain high quality BAC sequences that cover a region on pig chromosome 17 where a number of meat quality QTL have been previously discovered. Sequences from 70 BAC clones were assembled to form an 8-Mbp contig. Subsequently, we successfully mapped five previously identified QTL, three for meat color and two for lactate related traits, to the contig. With an additional 25 genetic markers that were identified by sequence comparison, we were able to carry out further linkage disequilibrium analysis to narrow down the genomic locations of these QTL, which allowed identification of the chromosomal regions that likely contain the causative variants. This research has provided one practical approach to combine genetic and molecular information for QTL mining. PMID:22303339
Proprioceptive coordination of movement sequences: role of velocity and position information.

PubMed

Cordo, P; Carlton, L; Bevan, L; Carlton, M; Kerr, G K

1994-05-01

1. Recent studies have shown that the CNS uses proprioceptive information to coordinate multijoint movement sequences; proprioceptive input related to the kinematics of one joint rotation in a movement sequence can be used to trigger a subsequent joint rotation. In this paper we adopt a broad definition of "proprioception," which includes all somatosensory information related to joint posture and kinematics. This paper addresses how the CNS uses proprioceptive information related to the velocity and position of joints to coordinate multijoint movement sequences. 2. Normal human subjects sat at an experimental apparatus and performed a movement sequence with the right arm without visual feedback. The apparatus passively rotated the right elbow horizontally in the extension direction with either a constant velocity trajectory or an unpredictable velocity trajectory. The subjects' task was to open briskly the right hand when the elbow passed through a prescribed target position, similar to backhand throwing in the horizontal plane. The randomization of elbow velocities and the absence of visual information was used to discourage subjects from using any information other than proprioceptive input to perform the task. 3. Our results indicate that the CNS is able to extract the necessary kinematic information from proprioceptive input to trigger the hand opening at the correct elbow position. We estimated the minimal sensory conduction and processing delay to be 150 ms, and on the basis of this estimate, we predicted the expected performance with different degrees of reduced proprioceptive information. These predictions were compared with the subjects' actual performances, revealing that the CNS was using proprioceptive input related to joint velocity in this motor task. To determine whether position information was also being used, we examined the subjects' performances with unpredictable velocity trajectories. The results from experiments with unpredictable velocity
Sequence information signal processor for local and global string comparisons

DOEpatents

Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

1997-01-01

A sequence information signal processing integrated circuit chip designed to perform high speed calculation of a dynamic programming algorithm based upon the algorithm defined by Waterman and Smith. The signal processing chip of the present invention is designed to be a building block of a linear systolic array, the performance of which can be increased by connecting additional sequence information signal processing chips to the array. The chip provides a high speed, low cost linear array processor that can locate highly similar global sequences or segments thereof such as contiguous subsequences from two different DNA or protein sequences. The chip is implemented in a preferred embodiment using CMOS VLSI technology to provide the equivalent of about 400,000 transistors or 100,000 gates. Each chip provides 16 processing elements, and is designed to provide 16 bit, two's compliment operation for maximum score precision of between -32,768 and +32,767. It is designed to provide a comparison between sequences as long as 4,194,304 elements without external software and between sequences of unlimited numbers of elements with the aid of external software. Each sequence can be assigned different deletion and insertion weight functions. Each processor is provided with a similarity measure device which is independently variable. Thus, each processor can contribute to maximum value score calculation using a different similarity measure.
Draft genome sequences of 9 LA-MRSA ST5 isolates obtained from humans after short term swine contact

USDA-ARS?s Scientific Manuscript database

Livestock associated methicillin resistant Staphylococcus aureus (LA-MRSA) sequence type 5 have raised concerns surrounding the potential for these isolates to colonize or cause disease in humans with swine contact. Here, we report draft genome sequences for 9 LA-MRSA ST5 isolates obtained from huma...
12 CFR 1202.3 - What information can I obtain through FOIA?

Code of Federal Regulations, 2010 CFR

2010-01-01

... 12 Banks and Banking 7 2010-01-01 2010-01-01 false What information can I obtain through FOIA? 1202.3 Section 1202.3 Banks and Banking FEDERAL HOUSING FINANCE AGENCY ORGANIZATION AND OPERATIONS FREEDOM OF INFORMATION ACT § 1202.3 What information can I obtain through FOIA? (a) General. FHFA...
Information Avoidance Tendencies, Threat Management Resources, and Interest in Genetic Sequencing Feedback.

PubMed

Taber, Jennifer M; Klein, William M P; Ferrer, Rebecca A; Lewis, Katie L; Harris, Peter R; Shepperd, James A; Biesecker, Leslie G

2015-08-01

Information avoidance is a defensive strategy that undermines receipt of potentially beneficial but threatening health information and may especially occur when threat management resources are unavailable. We examined whether individual differences in information avoidance predicted intentions to receive genetic sequencing results for preventable and unpreventable (i.e., more threatening) disease and, secondarily, whether threat management resources of self-affirmation or optimism mitigated any effects. Participants (N = 493) in an NIH study (ClinSeq®) piloting the use of genome sequencing reported intentions to receive (optional) sequencing results and completed individual difference measures of information avoidance, self-affirmation, and optimism. Information avoidance tendencies corresponded with lower intentions to learn results, particularly for unpreventable diseases. The association was weaker among individuals higher in self-affirmation or optimism, but only for results regarding preventable diseases. Information avoidance tendencies may influence decisions to receive threatening health information; threat management resources hold promise for mitigating this association.
AMPLIFICATION OF RIBOSOMAL RNA SEQUENCES

EPA Science Inventory

This book chapter offers an overview of the use of ribosomal RNA sequences. A history of the technology traces the evolution of techniques to measure bacterial phylogenetic relationships and recent advances in obtaining rRNA sequence information. The manual also describes procedu...
48 CFR 1509.505-4 - Obtaining access to proprietary information.

Code of Federal Regulations, 2010 CFR

2010-10-01

... PROTECTION AGENCY ACQUISITION PLANNING CONTRACTOR QUALIFICATIONS Organizational Conflicts of Interests 1509.505-4 Obtaining access to proprietary information. Contractors gaining access to confidential business... business information. ...
Sequencing actions: an information-search study of tradeoffs of priorities against spatiotemporal constraints.

PubMed

Gärling, T

1996-09-01

How people choose between sequences of actions was investigated in an everyday errand-planning task. In this task subjects chose the preferred sequence of performing a number of errands in a fictitious environment. Two experiments were conducted with undergraduate students serving as subjects. One group searched information about each alternative. The same information was directly available to another group. In Experiment 1 the results showed that for two errands subjects took into account all attributes describing the errands, thus suggesting a tradeoff between priority, wait time, and travel distance with priority being the most important. Consistent with this finding predominantly intraalternative information search was observed. These results were replicated in Experiment 2 for three errands. In addition choice outcomes, information search, and sequence of responding suggested that for more than two actions sequence choices are made in stages.
Whole-Genome Sequences of Cronobacter sakazakii Isolates Obtained from Foods of Plant Origin and Dried-Food Manufacturing Environments.

PubMed

Jang, Hyein; Addy, Nicole; Ewing, Laura; Jean-Gilles Beaubrun, Junia; Lee, YouYoung; Woo, JungHa; Negrete, Flavia; Finkelstein, Samantha; Tall, Ben D; Lehner, Angelika; Eshwar, Athmanya; Gopinath, Gopal R

2018-04-12

Here, we present draft genome sequences of 29 Cronobacter sakazakii isolates obtained from foods of plant origin and dried-food manufacturing facilities. Assemblies and annotations resulted in genome sizes ranging from 4.3 to 4.5 Mb and 3,977 to 4,256 gene-coding sequences with G+C contents of ∼57.0%.
An improved and validated RNA HLA class I SBT approach for obtaining full length coding sequences.

PubMed

Gerritsen, K E H; Olieslagers, T I; Groeneweg, M; Voorter, C E M; Tilanus, M G J

2014-11-01

The functional relevance of human leukocyte antigen (HLA) class I allele polymorphism beyond exons 2 and 3 is difficult to address because more than 70% of the HLA class I alleles are defined by exons 2 and 3 sequences only. For routine application on clinical samples we improved and validated the HLA sequence-based typing (SBT) approach based on RNA templates, using either a single locus-specific or two overlapping group-specific polymerase chain reaction (PCR) amplifications, with three forward and three reverse sequencing reactions for full length sequencing. Locus-specific HLA typing with RNA SBT of a reference panel, representing the major antigen groups, showed identical results compared to DNA SBT typing. Alleles encountered with unknown exons in the IMGT/HLA database and three samples, two with Null and one with a Low expressed allele, have been addressed by the group-specific RNA SBT approach to obtain full length coding sequences. This RNA SBT approach has proven its value in our routine full length definition of alleles. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Whole-Genome Sequences of Cronobacter sakazakii Isolates Obtained from Foods of Plant Origin and Dried-Food Manufacturing Environments

PubMed Central

Addy, Nicole; Ewing, Laura; Jean-Gilles Beaubrun, Junia; Lee, YouYoung; Woo, JungHa; Negrete, Flavia; Finkelstein, Samantha; Tall, Ben D.; Lehner, Angelika; Eshwar, Athmanya; Gopinath, Gopal R.

2018-01-01

ABSTRACT Here, we present draft genome sequences of 29 Cronobacter sakazakii isolates obtained from foods of plant origin and dried-food manufacturing facilities. Assemblies and annotations resulted in genome sizes ranging from 4.3 to 4.5 Mb and 3,977 to 4,256 gene-coding sequences with G+C contents of ∼57.0%. PMID:29650569
21 CFR 20.109 - Data and information obtained by contract.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 21 Food and Drugs 1 2010-04-01 2010-04-01 false Data and information obtained by contract. 20.109 Section 20.109 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES GENERAL PUBLIC INFORMATION Availability of Specific Categories of Records § 20.109 Data and information...
Poliovirus serotype-specific VP1 sequencing primers.

PubMed

Kilpatrick, David R; Iber, Jane C; Chen, Qi; Ching, Karen; Yang, Su-Ju; De, Lina; Mandelbaum, Mark D; Emery, Brian; Campagnoli, Ray; Burns, Cara C; Kew, Olen

2011-06-01

The Global Polio Laboratory Network routinely uses poliovirus-specific PCR primers and probes to determine the serotype and genotype of poliovirus isolates obtained as part of global poliovirus surveillance. To provide detailed molecular epidemiologic information, poliovirus isolates are further characterized by sequencing the ~900-nucleotide region encoding the major capsid protein, VP1. It is difficult to obtain quality sequence information when clinical or environmental samples contain poliovirus mixtures. As an alternative to conventional methods for resolving poliovirus mixtures, sets of serotype-specific primers were developed for amplifying and sequencing the VP1 regions of individual components of mixed populations of vaccine-vaccine, vaccine-wild, and wild-wild polioviruses. Published by Elsevier B.V.
Primer development to obtain complete coding sequence of HA and NA genes of influenza A/H3N2 virus.

PubMed

Agustiningsih, Agustiningsih; Trimarsanto, Hidayat; Setiawaty, Vivi; Artika, I Made; Muljono, David Handojo

2016-08-30

Influenza is an acute respiratory illness and has become a serious public health problem worldwide. The need to study the HA and NA genes in influenza A virus is essential since these genes frequently undergo mutations. This study describes the development of primer sets for RT-PCR to obtain complete coding sequence of Hemagglutinin (HA) and Neuraminidase (NA) genes of influenza A/H3N2 virus from Indonesia. The primers were developed based on influenza A/H3N2 sequence worldwide from Global Initiative on Sharing All Influenza Data (GISAID) and further tested using Indonesian influenza A/H3N2 archived samples of influenza-like illness (ILI) surveillance from 2008 to 2009. An optimum RT-PCR condition was acquired for all HA and NA fragments designed to cover complete coding sequence of HA and NA genes. A total of 71 samples were successfully sequenced for complete coding sequence both of HA and NA genes out of 145 samples of influenza A/H3N2 tested. The developed primer sets were suitable for obtaining complete coding sequences of HA and NA genes of Indonesian samples from 2008 to 2009.
Applications of statistical physics and information theory to the analysis of DNA sequences

NASA Astrophysics Data System (ADS)

Grosse, Ivo

2000-10-01

DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.
Genotype calling from next-generation sequencing data using haplotype information of reads

PubMed Central

Zhi, Degui; Wu, Jihua; Liu, Nianjun; Zhang, Kui

2012-01-01

Motivation: Low coverage sequencing provides an economic strategy for whole genome sequencing. When sequencing a set of individuals, genotype calling can be challenging due to low sequencing coverage. Linkage disequilibrium (LD) based refinement of genotyping calling is essential to improve the accuracy. Current LD-based methods use read counts or genotype likelihoods at individual potential polymorphic sites (PPSs). Reads that span multiple PPSs (jumping reads) can provide additional haplotype information overlooked by current methods. Results: In this article, we introduce a new Hidden Markov Model (HMM)-based method that can take into account jumping reads information across adjacent PPSs and implement it in the HapSeq program. Our method extends the HMM in Thunder and explicitly models jumping reads information as emission probabilities conditional on the states of adjacent PPSs. Our simulation results show that, compared to Thunder, HapSeq reduces the genotyping error rate by 30%, from 0.86% to 0.60%. The results from the 1000 Genomes Project show that HapSeq reduces the genotyping error rate by 12 and 9%, from 2.24% and 2.76% to 1.97% and 2.50% for individuals with European and African ancestry, respectively. We expect our program can improve genotyping qualities of the large number of ongoing and planned whole genome sequencing projects. Contact: dzhi@ms.soph.uab.edu; kzhang@ms.soph.uab.edu Availability: The software package HapSeq and its manual can be found and downloaded at www.ssg.uab.edu/hapseq/. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22285565
Cryptic species revealed by molecular phylogenetic analysis of sequences obtained from basidiomata of Tulasnella.

PubMed

Cruz, Darío; Suárez, Juan Pablo; Kottke, Ingrid; Piepenbring, Meike

2014-01-01

Delimitation of species and the search for a proper threshold for defining phylogenetic species in fungi are under discussion. In this study, morphological and molecular data are correlated to delimit species of Tulasnella, the most important mycobionts of Orchidaceae, which suffer from poor taxonomy. Resupinate basidiomata of Tulasnella species were collected in Ecuador and Germany, and 11 specimens (seven from Ecuador, four from Germany) were assigned to traditional species concepts by use of morphological keys. The specimens were compared by micro-anatomical examination with 75 specimens of Tulasnella borrowed from fungaria to obtain better insights on variation of characters. Sequences of the ITS region (127) were obtained after cloning from the fresh basidiomata and from pure cultures. Proportional variability of ITS sequences was analyzed within and among the cultures and the specimens designated to different morphospecies. Results suggested an intragenomic variation of less than 2%, an intraspecific variation of up to 4% and an interspecific divergence of more than 9% in Tulasnella. Cryptic species in Tulasnella, mostly from Ecuador, were revealed by phylogenetic analyses with 4% intraspecific divergence as a minimum threshold for delimiting species. Conventional diagnostic morphological characters appeared insufficient for species characterization. Arguments are presented for molecular delimitation of the established species Tulasnella albida, T. asymmetrica, T. eichleriana, T. cf. pinicola, T. tomaculum and T. violea. © 2014 by The Mycological Society of America.
A method for partitioning the information contained in a protein sequence between its structure and function.

PubMed

Possenti, Andrea; Vendruscolo, Michele; Camilloni, Carlo; Tiana, Guido

2018-05-23

Proteins employ the information stored in the genetic code and translated into their sequences to carry out well-defined functions in the cellular environment. The possibility to encode for such functions is controlled by the balance between the amount of information supplied by the sequence and that left after that the protein has folded into its structure. We study the amount of information necessary to specify the protein structure, providing an estimate that keeps into account the thermodynamic properties of protein folding. We thus show that the information remaining in the protein sequence after encoding for its structure (the 'information gap') is very close to what needed to encode for its function and interactions. Then, by predicting the information gap directly from the protein sequence, we show that it may be possible to use these insights from information theory to discriminate between ordered and disordered proteins, to identify unknown functions, and to optimize artificially-designed protein sequences. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.
Information performances and illative sequences: Sequential organization of explanations of chemical phase equilibrium

NASA Astrophysics Data System (ADS)

Brown, Nathaniel James Swanton

While there is consensus that conceptual change is surprisingly difficult, many competing theories of conceptual change co-exist in the literature. This dissertation argues that this discord is partly the result of an inadequate account of the unwritten rules of human social interaction that underlie the field's preferred methodology---semi-structured interviewing. To better understand the contributions of interaction during explanations, I analyze eight undergraduate general chemistry students as they attempt to explain to various people, for various reasons, why phenomena involving chemical phase equilibrium occur. Using the methods of interaction analysis, I characterize the unwritten, but systematic, rules that these participants follow as they explain. The result is a description of the contributions of interaction to explaining. Each step in each explanation is a jointly performed expression of a subject-predicate relation, an interactive accomplishment I call an information performance (in-form, for short). Unlike clauses, in-forms need not have a coherent grammatical structure. Unlike speaker turns, in-forms have the clear function of expressing information. Unlike both clauses and speaker turns, in-forms are a co-construction, jointly performed by both the primary speaker and the other interlocutor. The other interlocutor strongly affects the form and content of each explanation by giving or withholding feedback at the end of each in-form, moments I call feedback-relevant places. While in-forms are the bricks out of which the explanation is constructed, they are secured by a series of inferential links I call an illative sequence. Illative sequences are forward-searching, starting with a remembered fact or observation and following a chain of inferences in the hope it leads to the target phenomenon. The participants treat an explanation as a success if the illative sequence generates an in-form that describes the phenomenon. If the illative sequence does

Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

NASA Technical Reports Server (NTRS)

Gatlin, L. L.

1974-01-01

Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.
Securing recruitment and obtaining informed consent in minority ethnic groups in the UK.

PubMed

Lloyd, Cathy E; Johnson, Mark R D; Mughal, Shanaz; Sturt, Jackie A; Collins, Gary S; Roy, Tapash; Bibi, Rukhsana; Barnett, Anthony H

2008-03-30

Previous health research has often explicitly excluded individuals from minority ethnic backgrounds due to perceived cultural and communication difficulties, including studies where there might be language/literacy problems in obtaining informed consent. This study addressed these difficulties by developing audio-recorded methods of obtaining informed consent and recording data. This report outlines 1) our experiences with securing recruitment to a qualitative study investigating alternative methods of data collection, and 2) the development of a standardised process for obtaining informed consent from individuals from minority ethnic backgrounds whose main language does not have an agreed written form. Two researchers from South Asian backgrounds recruited adults with Type 2 diabetes whose main language was spoken and not written, to attend a series of focus groups. A screening tool was used at recruitment in order to assess literacy skills in potential participants. Informed consent was obtained using audio-recordings of the patient information and recording patients' verbal consent. Participants' perceptions of this method of obtaining consent were recorded. Recruitment rates were improved by using telephone compared to face-to-face methods. The screening tool was found to be acceptable by all potential participants. Audio-recorded methods of obtaining informed consent were easy to implement and accepted by all participants. Attrition rates differed according to ethnic group. Snowballing techniques only partly improved participation rates. Audio-recorded methods of obtaining informed consent are an acceptable alternative to written consent in study populations where literacy skills are variable. Further exploration of issues relating to attrition is required, and a range of methods may be necessary in order to maximise response and participation rates.
50 CFR 23.86 - How can I obtain information on a CoP?

Code of Federal Regulations, 2014 CFR

2014-10-01

... 50 Wildlife and Fisheries 9 2014-10-01 2014-10-01 false How can I obtain information on a CoP? 23... FAUNA AND FLORA (CITES) CITES Administration § 23.86 How can I obtain information on a CoP? As we receive information on an upcoming CoP from the CITES Secretariat, we will notify the public either...
50 CFR 23.86 - How can I obtain information on a CoP?

Code of Federal Regulations, 2010 CFR

2010-10-01

... 50 Wildlife and Fisheries 6 2010-10-01 2010-10-01 false How can I obtain information on a CoP? 23... FAUNA AND FLORA (CITES) CITES Administration § 23.86 How can I obtain information on a CoP? As we receive information on an upcoming CoP from the CITES Secretariat, we will notify the public either...
50 CFR 23.86 - How can I obtain information on a CoP?

Code of Federal Regulations, 2012 CFR

2012-10-01

... 50 Wildlife and Fisheries 9 2012-10-01 2012-10-01 false How can I obtain information on a CoP? 23... FAUNA AND FLORA (CITES) CITES Administration § 23.86 How can I obtain information on a CoP? As we receive information on an upcoming CoP from the CITES Secretariat, we will notify the public either...
50 CFR 23.86 - How can I obtain information on a CoP?

Code of Federal Regulations, 2013 CFR

2013-10-01

... 50 Wildlife and Fisheries 9 2013-10-01 2013-10-01 false How can I obtain information on a CoP? 23... FAUNA AND FLORA (CITES) CITES Administration § 23.86 How can I obtain information on a CoP? As we receive information on an upcoming CoP from the CITES Secretariat, we will notify the public either...
50 CFR 23.86 - How can I obtain information on a CoP?

Code of Federal Regulations, 2011 CFR

2011-10-01

... 50 Wildlife and Fisheries 8 2011-10-01 2011-10-01 false How can I obtain information on a CoP? 23... FAUNA AND FLORA (CITES) CITES Administration § 23.86 How can I obtain information on a CoP? As we receive information on an upcoming CoP from the CITES Secretariat, we will notify the public either...
12 CFR 232.3 - Financial information exception for obtaining and using medical information.

Code of Federal Regulations, 2010 CFR

2010-01-01

... that the debt is current and that the consumer has no delinquencies in her repayment history. If the..., mental, or behavioral health, condition or history, type of treatment, or prognosis into account as part... example, to obtain and use information about: (i) The dollar amount, repayment terms, repayment history...
Highly Informative Simple Sequence Repeat (SSR) Markers for Fingerprinting Hazelnut

USDA-ARS?s Scientific Manuscript database

Simple sequence repeat (SSR) or microsatellite markers have many applications in breeding and genetic studies of plants, including fingerprinting of cultivars and investigations of genetic diversity, and therefore provide information for better management of germplasm collections. They are repeatab...
40 CFR 2.311 - Special rules governing certain information obtained under the Motor Vehicle Information and Cost...

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 1 2010-07-01 2010-07-01 false Special rules governing certain information obtained under the Motor Vehicle Information and Cost Savings Act. 2.311 Section 2.311 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY GENERAL PUBLIC INFORMATION Confidentiality of Business Information § 2.311 Special rules governing...
Studying long 16S rDNA sequences with ultrafast-metagenomic sequence classification using exact alignments (Kraken).

PubMed

Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco

2016-03-01

Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences. Copyright © 2016 Elsevier B.V. All rights reserved.
SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information

PubMed Central

2014-01-01

Background The recent introduction of the Pacific Biosciences RS single molecule sequencing technology has opened new doors to scaffolding genome assemblies in a cost-effective manner. The long read sequence information is promised to enhance the quality of incomplete and inaccurate draft assemblies constructed from Next Generation Sequencing (NGS) data. Results Here we propose a novel hybrid assembly methodology that aims to scaffold pre-assembled contigs in an iterative manner using PacBio RS long read information as a backbone. On a test set comprising six bacterial draft genomes, assembled using either a single Illumina MiSeq or Roche 454 library, we show that even a 50× coverage of uncorrected PacBio RS long reads is sufficient to drastically reduce the number of contigs. Comparisons to the AHA scaffolder indicate our strategy is better capable of producing (nearly) complete bacterial genomes. Conclusions The current work describes our SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences. We conclude that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run our program, allow to scaffold genomes in a fast and reliable manner. PMID:24950923
40 CFR 166.34 - EPA review of information obtained in connection with emergency exemptions.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 23 2010-07-01 2010-07-01 false EPA review of information obtained in... PESTICIDES UNDER EMERGENCY CONDITIONS Specific, Quarantine, and Public Health Exemptions § 166.34 EPA review of information obtained in connection with emergency exemptions. EPA shall review information...
Modeling genome coverage in single-cell sequencing

PubMed Central

Daley, Timothy; Smith, Andrew D.

2014-01-01

Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873
Construction of an Ostrea edulis database from genomic and expressed sequence tags (ESTs) obtained from Bonamia ostreae infected haemocytes: Development of an immune-enriched oligo-microarray.

PubMed

Pardo, Belén G; Álvarez-Dios, José Antonio; Cao, Asunción; Ramilo, Andrea; Gómez-Tato, Antonio; Planas, Josep V; Villalba, Antonio; Martínez, Paulino

2016-12-01

The flat oyster, Ostrea edulis, is one of the main farmed oysters, not only in Europe but also in the United States and Canada. Bonamiosis due to the parasite Bonamia ostreae has been associated with high mortality episodes in this species. This parasite is an intracellular protozoan that infects haemocytes, the main cells involved in oyster defence. Due to the economical and ecological importance of flat oyster, genomic data are badly needed for genetic improvement of the species, but they are still very scarce. The objective of this study is to develop a sequence database, OedulisDB, with new genomic and transcriptomic resources, providing new data and convenient tools to improve our knowledge of the oyster's immune mechanisms. Transcriptomic and genomic sequences were obtained using 454 pyrosequencing and compiled into an O. edulis database, OedulisDB, consisting of two sets of 10,318 and 7159 unique sequences that represent the oyster's genome (WG) and de novo haemocyte transcriptome (HT), respectively. The flat oyster transcriptome was obtained from two strains (naïve and tolerant) challenged with B. ostreae, and from their corresponding non-challenged controls. Approximately 78.5% of 5619 HT unique sequences were successfully annotated by Blast search using public databases. A total of 984 sequences were identified as being related to immune response and several key immune genes were identified for the first time in flat oyster. Additionally, transcriptome information was used to design and validate the first oligo-microarray in flat oyster enriched with immune sequences from haemocytes. Our transcriptomic and genomic sequencing and subsequent annotation have largely increased the scarce resources available for this economically important species and have enabled us to develop an OedulisDB database and accompanying tools for gene expression analysis. This study represents the first attempt to characterize in depth the O. edulis haemocyte transcriptome in
Pennsylvania StreamStats--A web-based application for obtaining water-resource-related information

USGS Publications Warehouse

Stuckey, Marla H.; Hoffman, Scott A.

2010-01-01

StreamStats is a national web-based Geographic Information System (GIS) application, developed by the U.S. Geological Survey (USGS), in cooperation with Environmental Systems Research Institute, Inc., to provide a variety of water-resource-related information. Users can easily obtain descriptive information, basin characteristics, and streamflow statistics for USGS streamgages and ungaged stream locations throughout Pennsylvania. StreamStats also allows users to search upstream and (or) downstream from user-selected points to identify locations of and obtain information for water-resource-related activities, such as dams and streamgages.
32 CFR Appendix G to Part 275 - Releasing Information Obtained From Financial Institutions

Code of Federal Regulations, 2013 CFR

2013-07-01

... 32 National Defense 2 2013-07-01 2013-07-01 false Releasing Information Obtained From Financial Institutions G Appendix G to Part 275 National Defense Department of Defense (Continued) OFFICE OF THE... FINANCIAL PRIVACY ACT OF 1978 Pt. 275, App. G Appendix G to Part 275—Releasing Information Obtained From...
32 CFR Appendix G to Part 275 - Releasing Information Obtained From Financial Institutions

Code of Federal Regulations, 2011 CFR

2011-07-01

... 32 National Defense 2 2011-07-01 2011-07-01 false Releasing Information Obtained From Financial Institutions G Appendix G to Part 275 National Defense Department of Defense (Continued) OFFICE OF THE... FINANCIAL PRIVACY ACT OF 1978 Pt. 275, App. G Appendix G to Part 275—Releasing Information Obtained From...
32 CFR Appendix G to Part 275 - Releasing Information Obtained From Financial Institutions

Code of Federal Regulations, 2012 CFR

2012-07-01

... 32 National Defense 2 2012-07-01 2012-07-01 false Releasing Information Obtained From Financial Institutions G Appendix G to Part 275 National Defense Department of Defense (Continued) OFFICE OF THE... FINANCIAL PRIVACY ACT OF 1978 Pt. 275, App. G Appendix G to Part 275—Releasing Information Obtained From...
32 CFR Appendix G to Part 275 - Releasing Information Obtained From Financial Institutions

Code of Federal Regulations, 2014 CFR

2014-07-01

... 32 National Defense 2 2014-07-01 2014-07-01 false Releasing Information Obtained From Financial Institutions G Appendix G to Part 275 National Defense Department of Defense (Continued) OFFICE OF THE... FINANCIAL PRIVACY ACT OF 1978 Pt. 275, App. G Appendix G to Part 275—Releasing Information Obtained From...

Studies of a biochemical factory: tomato trichome deep expressed sequence tag sequencing and proteomics.

PubMed

Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L

2010-07-01

Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.
Nanopore Sequencing as a Rapidly Deployable Ebola Outbreak Tool.

PubMed

Hoenen, Thomas; Groseth, Allison; Rosenke, Kyle; Fischer, Robert J; Hoenen, Andreas; Judson, Seth D; Martellaro, Cynthia; Falzarano, Darryl; Marzi, Andrea; Squires, R Burke; Wollenberg, Kurt R; de Wit, Emmie; Prescott, Joseph; Safronetz, David; van Doremalen, Neeltje; Bushmaker, Trenton; Feldmann, Friederike; McNally, Kristin; Bolay, Fatorma K; Fields, Barry; Sealy, Tara; Rayfield, Mark; Nichol, Stuart T; Zoon, Kathryn C; Massaquoi, Moses; Munster, Vincent J; Feldmann, Heinz

2016-02-01

Rapid sequencing of RNA/DNA from pathogen samples obtained during disease outbreaks provides critical scientific and public health information. However, challenges exist for exporting samples to laboratories or establishing conventional sequencers in remote outbreak regions. We successfully used a novel, pocket-sized nanopore sequencer at a field diagnostic laboratory in Liberia during the current Ebola virus outbreak.
Nanopore Sequencing as a Rapidly Deployable Ebola Outbreak Tool

PubMed Central

Groseth, Allison; Rosenke, Kyle; Fischer, Robert J.; Hoenen, Andreas; Judson, Seth D.; Martellaro, Cynthia; Falzarano, Darryl; Marzi, Andrea; Squires, R. Burke; Wollenberg, Kurt R.; de Wit, Emmie; Prescott, Joseph; Safronetz, David; van Doremalen, Neeltje; Bushmaker, Trenton; Feldmann, Friederike; McNally, Kristin; Bolay, Fatorma K.; Fields, Barry; Sealy, Tara; Rayfield, Mark; Nichol, Stuart T.; Zoon, Kathryn C.; Massaquoi, Moses; Munster, Vincent J.; Feldmann, Heinz

2016-01-01

Rapid sequencing of RNA/DNA from pathogen samples obtained during disease outbreaks provides critical scientific and public health information. However, challenges exist for exporting samples to laboratories or establishing conventional sequencers in remote outbreak regions. We successfully used a novel, pocket-sized nanopore sequencer at a field diagnostic laboratory in Liberia during the current Ebola virus outbreak. PMID:26812583
MendeLIMS: a web-based laboratory information management system for clinical genome sequencing.

PubMed

Grimes, Susan M; Ji, Hanlee P

2014-08-27

Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis. To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies. We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.
Deciphering the Hidden Informational Content of Protein Sequences

PubMed Central

Liu, Ming; Hua, Qing-xin; Hu, Shi-Quan; Jia, Wenhua; Yang, Yanwu; Saith, Sunil Evan; Whittaker, Jonathan; Arvan, Peter; Weiss, Michael A.

2010-01-01

Protein sequences encode both structure and foldability. Whereas the interrelationship of sequence and structure has been extensively investigated, the origins of folding efficiency are enigmatic. We demonstrate that the folding of proinsulin requires a flexible N-terminal hydrophobic residue that is dispensable for the structure, activity, and stability of the mature hormone. This residue (PheB1 in placental mammals) is variably positioned within crystal structures and exhibits 1H NMR motional narrowing in solution. Despite such flexibility, its deletion impaired insulin chain combination and led in cell culture to formation of non-native disulfide isomers with impaired secretion of the variant proinsulin. Cellular folding and secretion were maintained by hydrophobic substitutions at B1 but markedly perturbed by polar or charged side chains. We propose that, during folding, a hydrophobic side chain at B1 anchors transient long-range interactions by a flexible N-terminal arm (residues B1–B8) to mediate kinetic or thermodynamic partitioning among disulfide intermediates. Evidence for the overall contribution of the arm to folding was obtained by alanine scanning mutagenesis. Together, our findings demonstrate that efficient folding of proinsulin requires N-terminal sequences that are dispensable in the native state. Such arm-dependent folding can be abrogated by mutations associated with β-cell dysfunction and neonatal diabetes mellitus. PMID:20663888
32 CFR Appendix A to Part 275 - Obtaining Basic Identifying Account Information

Code of Federal Regulations, 2010 CFR

2010-07-01

... 32 National Defense 2 2010-07-01 2010-07-01 false Obtaining Basic Identifying Account Information... Information A. A DoD law enforcement office may issue a formal written request for basic identifying account... only the above specified basic identifying information concerning a customer's account. C. A format for...
48 CFR 1809.505-4 - Obtaining access to sensitive information.

Code of Federal Regulations, 2010 CFR

2010-10-01

... Organizational and Consultant Conflicts of Interest 1809.505-4 Obtaining access to sensitive information. (b) In... support management activities and administrative functions. The Assistant Administrator for Procurement... require contractors and subcontractors and their employees in procurements that support management...
Tagmentation on Microbeads: Restore Long-Range DNA Sequence Information Using Next Generation Sequencing with Library Prepared by Surface-Immobilized Transposomes.

PubMed

Chen, He; Yao, Jiacheng; Fu, Yusi; Pang, Yuhong; Wang, Jianbin; Huang, Yanyi

2018-04-11

The next generation sequencing (NGS) technologies have been rapidly evolved and applied to various research fields, but they often suffer from losing long-range information due to short library size and read length. Here, we develop a simple, cost-efficient, and versatile NGS library preparation method, called tagmentation on microbeads (TOM). This method is capable of recovering long-range information through tagmentation mediated by microbead-immobilized transposomes. Using transposomes with DNA barcodes to identically label adjacent sequences during tagmentation, we can restore inter-read connection of each fragment from original DNA molecule by fragment-barcode linkage after sequencing. In our proof-of-principle experiment, more than 4.5% of the reads are linked with their adjacent reads, and the longest linkage is over 1112 bp. We demonstrate TOM with eight barcodes, but the number of barcodes can be scaled up by an ultrahigh complexity construction. We also show this method has low amplification bias and effectively fits the applications to identify copy number variations.
Identifying the Critical Time Period for Information Extraction when Recognizing Sequences of Play

ERIC Educational Resources Information Center

North, Jamie S.; Williams, A. Mark

2008-01-01

The authors attempted to determine the critical time period for information extraction when recognizing play sequences in soccer. Although efforts have been made to identify the perceptual information underpinning such decisions, no researchers have attempted to determine "when" this information may be extracted from the display. The authors…
Using Internet Search Engines to Obtain Medical Information: A Comparative Study

PubMed Central

Wang, Liupu; Wang, Juexin; Wang, Michael; Li, Yong; Liang, Yanchun

2012-01-01

Background The Internet has become one of the most important means to obtain health and medical information. It is often the first step in checking for basic information about a disease and its treatment. The search results are often useful to general users. Various search engines such as Google, Yahoo!, Bing, and Ask.com can play an important role in obtaining medical information for both medical professionals and lay people. However, the usability and effectiveness of various search engines for medical information have not been comprehensively compared and evaluated. Objective To compare major Internet search engines in their usability of obtaining medical and health information. Methods We applied usability testing as a software engineering technique and a standard industry practice to compare the four major search engines (Google, Yahoo!, Bing, and Ask.com) in obtaining health and medical information. For this purpose, we searched the keyword breast cancer in Google, Yahoo!, Bing, and Ask.com and saved the results of the top 200 links from each search engine. We combined nonredundant links from the four search engines and gave them to volunteer users in an alphabetical order. The volunteer users evaluated the websites and scored each website from 0 to 10 (lowest to highest) based on the usefulness of the content relevant to breast cancer. A medical expert identified six well-known websites related to breast cancer in advance as standards. We also used five keywords associated with breast cancer defined in the latest release of Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and analyzed their occurrence in the websites. Results Each search engine provided rich information related to breast cancer in the search results. All six standard websites were among the top 30 in search results of all four search engines. Google had the best search validity (in terms of whether a website could be opened), followed by Bing, Ask.com, and Yahoo!. The search
Using Internet search engines to obtain medical information: a comparative study.

PubMed

Wang, Liupu; Wang, Juexin; Wang, Michael; Li, Yong; Liang, Yanchun; Xu, Dong

2012-05-16

The Internet has become one of the most important means to obtain health and medical information. It is often the first step in checking for basic information about a disease and its treatment. The search results are often useful to general users. Various search engines such as Google, Yahoo!, Bing, and Ask.com can play an important role in obtaining medical information for both medical professionals and lay people. However, the usability and effectiveness of various search engines for medical information have not been comprehensively compared and evaluated. To compare major Internet search engines in their usability of obtaining medical and health information. We applied usability testing as a software engineering technique and a standard industry practice to compare the four major search engines (Google, Yahoo!, Bing, and Ask.com) in obtaining health and medical information. For this purpose, we searched the keyword breast cancer in Google, Yahoo!, Bing, and Ask.com and saved the results of the top 200 links from each search engine. We combined nonredundant links from the four search engines and gave them to volunteer users in an alphabetical order. The volunteer users evaluated the websites and scored each website from 0 to 10 (lowest to highest) based on the usefulness of the content relevant to breast cancer. A medical expert identified six well-known websites related to breast cancer in advance as standards. We also used five keywords associated with breast cancer defined in the latest release of Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and analyzed their occurrence in the websites. Each search engine provided rich information related to breast cancer in the search results. All six standard websites were among the top 30 in search results of all four search engines. Google had the best search validity (in terms of whether a website could be opened), followed by Bing, Ask.com, and Yahoo!. The search results highly overlapped between the
40 CFR 2.310 - Special rules governing certain information obtained under the Comprehensive Environmental...

Code of Federal Regulations, 2010 CFR

2010-07-01

... information obtained under the Comprehensive Environmental Response, Compensation, and Liability Act of 1980... information obtained under the Comprehensive Environmental Response, Compensation, and Liability Act of 1980, as amended. (a) Definitions. For purposes of this section: (1) Act means the Comprehensive...
Arrays of probes for positional sequencing by hybridization

DOEpatents

Cantor, Charles R [Boston, MA; Prezetakiewiczr, Marek [East Boston, MA; Smith, Cassandra L [Boston, MA; Sano, Takeshi [Waltham, MA

2008-01-15

This invention is directed to methods and reagents useful for sequencing nucleic acid targets utilizing sequencing by hybridization technology comprising probes, arrays of probes and methods whereby sequence information is obtained rapidly and efficiently in discrete packages. That information can be used for the detection, identification, purification and complete or partial sequencing of a particular target nucleic acid. When coupled with a ligation step, these methods can be performed under a single set of hybridization conditions. The invention also relates to the replication of probe arrays and methods for making and replicating arrays of probes which are useful for the large scale manufacture of diagnostic aids used to screen biological samples for specific target sequences. Arrays created using PCR technology may comprise probes with 5'- and/or 3'-overhangs.
MCTP system model based on linear programming optimization of apertures obtained from sequencing patient image data maps.

PubMed

Ureba, A; Salguero, F J; Barbeiro, A R; Jimenez-Ortega, E; Baeza, J A; Miras, H; Linares, R; Perucha, M; Leal, A

2014-08-01

The authors present a hybrid direct multileaf collimator (MLC) aperture optimization model exclusively based on sequencing of patient imaging data to be implemented on a Monte Carlo treatment planning system (MC-TPS) to allow the explicit radiation transport simulation of advanced radiotherapy treatments with optimal results in efficient times for clinical practice. The planning system (called CARMEN) is a full MC-TPS, controlled through aMATLAB interface, which is based on the sequencing of a novel map, called "biophysical" map, which is generated from enhanced image data of patients to achieve a set of segments actually deliverable. In order to reduce the required computation time, the conventional fluence map has been replaced by the biophysical map which is sequenced to provide direct apertures that will later be weighted by means of an optimization algorithm based on linear programming. A ray-casting algorithm throughout the patient CT assembles information about the found structures, the mass thickness crossed, as well as PET values. Data are recorded to generate a biophysical map for each gantry angle. These maps are the input files for a home-made sequencer developed to take into account the interactions of photons and electrons with the MLC. For each linac (Axesse of Elekta and Primus of Siemens) and energy beam studied (6, 9, 12, 15 MeV and 6 MV), phase space files were simulated with the EGSnrc/BEAMnrc code. The dose calculation in patient was carried out with the BEAMDOSE code. This code is a modified version of EGSnrc/DOSXYZnrc able to calculate the beamlet dose in order to combine them with different weights during the optimization process. Three complex radiotherapy treatments were selected to check the reliability of CARMEN in situations where the MC calculation can offer an added value: A head-and-neck case (Case I) with three targets delineated on PET/CT images and a demanding dose-escalation; a partial breast irradiation case (Case II) solved
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

PubMed

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

2015-05-01

To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns.

PubMed

Ortuño, Francisco M; Valenzuela, Olga; Rojas, Fernando; Pomares, Hector; Florido, Javier P; Urquiza, Jose M; Rojas, Ignacio

2013-09-01

Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P < 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P < 0.05), whereas it shows results not significantly different to 3D-COFFEE (P > 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.
MIPS: a database for protein sequences, homology data and yeast genome information.

PubMed Central

Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

1997-01-01

The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498
40 CFR 712.7 - Report of readily obtainable information for subparts B and C.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 40 Protection of Environment 30 2010-07-01 2010-07-01 false Report of readily obtainable... Report of readily obtainable information for subparts B and C. TSCA section 8(a) authorizes EPA to require persons to report information that is known to or reasonably ascertainable by them. For purposes...
Draft Genome Sequences of Endophytic Isolates of Klebsiella variicola and Klebsiella pneumoniae Obtained from the Same Sugarcane Plant

PubMed Central

2018-01-01

ABSTRACT Endophytic Klebsiella variicola KvMx2 and Klebsiella pneumoniae KpMx1 isolates obtained from the same sugarcane stem were used for whole-genome sequencing. The genomes revealed clear differences in essential genes for plant growth, development, and detoxification, as well as nitrogen fixation, catalases, cellulases, and shared virulence factors described in the K. pneumoniae pathogen. PMID:29567733
28 CFR 51.37 - Obtaining information from the submitting authority.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 28 Judicial Administration 2 2010-07-01 2010-07-01 false Obtaining information from the submitting authority. 51.37 Section 51.37 Judicial Administration DEPARTMENT OF JUSTICE (CONTINUED) PROCEDURES FOR THE ADMINISTRATION OF SECTION 5 OF THE VOTING RIGHTS ACT OF 1965, AS AMENDED Processing of Submissions § 51.37...

On the Search for Retrotransposons: Alternative Protocols to Obtain Sequences to Learn Profile Hidden Markov Models.

PubMed

Fischer, Carlos N; Campos, Victor De A; Barella, Victor H

2018-05-01

Profile hidden Markov models (pHMMs) have been used to search for transposable elements (TEs) in genomes. For the learning of pHMMs aimed to search for TEs of the retrotransposon class, the conventional protocol is to use the whole internal nucleotide portions of these elements as representative sequences. To further explore the potential of pHMMs in such a search, we propose five alternative ways to obtain the sets of representative sequences of TEs other than the conventional protocol. In this study, we are interested in Bel-PAO, Copia, Gypsy, and DIRS superfamilies from the retrotransposon class. We compared the pHMMs of all six protocols. The test results show that, for each TE superfamily, the pHMMs of at least two of the proposed protocols performed better than the conventional one and that the number of correct predictions provided by the latter can be improved by considering together the results of one or more of the alternative protocols.
Legal and psychological considerations for obtaining informed consent for reverse total shoulder arthroplasty.

PubMed

Blackwood, Craig; Dixon, Jen; Reilly, Peter; Emery, Roger J

2017-01-01

This paper seeks to outline recent legal developments and requirements pertinent to obtaining informed consent. We argue that this is of particular relevance to patients considering a reverse total shoulder arthroplasty, due to the high complication rate associated with this procedure. By examining the cognitive processes involved in decision-making, and other clinician-related factors such as delivery of information, gender bias and conflict of interest, we explore some of the barriers that can undermine the processes of shared decision-making and obtaining genuine informed consent. We argue that these issues highlight the importance for surgeons in understanding the cognitive processes and other influential factors involved in patients' comprehension and decision-making. We recommend, based on strong evidence, that decision aids could prove useful in overcoming such challenges and could provide one way of mitigating the ethical, professional and legal consequences of failing to obtain proper informed consent. They are not widely used in orthopaedics at present, although it would be in the interests of both the surgeon and patient for such measures to be explored.
Legal and psychological considerations for obtaining informed consent for reverse total shoulder arthroplasty

PubMed Central

Blackwood, Craig; Reilly, Peter; Emery, Roger J

2016-01-01

This paper seeks to outline recent legal developments and requirements pertinent to obtaining informed consent. We argue that this is of particular relevance to patients considering a reverse total shoulder arthroplasty, due to the high complication rate associated with this procedure. By examining the cognitive processes involved in decision-making, and other clinician-related factors such as delivery of information, gender bias and conflict of interest, we explore some of the barriers that can undermine the processes of shared decision-making and obtaining genuine informed consent. We argue that these issues highlight the importance for surgeons in understanding the cognitive processes and other influential factors involved in patients’ comprehension and decision-making. We recommend, based on strong evidence, that decision aids could prove useful in overcoming such challenges and could provide one way of mitigating the ethical, professional and legal consequences of failing to obtain proper informed consent. They are not widely used in orthopaedics at present, although it would be in the interests of both the surgeon and patient for such measures to be explored. PMID:28572846
25 CFR 162.520 - Who owns the energy resource information obtained under the WEEL?

Code of Federal Regulations, 2014 CFR

2014-04-01

... AND WATER LEASES AND PERMITS Wind and Solar Resource Leases Weels § 162.520 Who owns the energy resource information obtained under the WEEL? (a) The WEEL must specify the ownership of any energy... 25 Indians 1 2014-04-01 2014-04-01 false Who owns the energy resource information obtained under...
25 CFR 162.520 - Who owns the energy resource information obtained under the WEEL?

Code of Federal Regulations, 2013 CFR

2013-04-01

... AND WATER LEASES AND PERMITS Wind and Solar Resource Leases Weels § 162.520 Who owns the energy resource information obtained under the WEEL? (a) The WEEL must specify the ownership of any energy... 25 Indians 1 2013-04-01 2013-04-01 false Who owns the energy resource information obtained under...
Draft Genome Sequences of Endophytic Isolates of Klebsiella variicola and Klebsiella pneumoniae Obtained from the Same Sugarcane Plant.

PubMed

Reyna-Flores, Fernando; Barrios-Camacho, Humberto; Dantán-González, Edgar; Ramírez-Trujillo, José Augusto; Lozano Aguirre Beltrán, Luis Fernando; Rodríguez-Medina, Nadia; Garza-Ramos, Ulises; Suárez-Rodríguez, Ramón

2018-03-22

Endophytic Klebsiella variicola KvMx2 and Klebsiella pneumoniae KpMx1 isolates obtained from the same sugarcane stem were used for whole-genome sequencing. The genomes revealed clear differences in essential genes for plant growth, development, and detoxification, as well as nitrogen fixation, catalases, cellulases, and shared virulence factors described in the K. pneumoniae pathogen. Copyright © 2018 Reyna-Flores et al.
[The meaning of autonomy in Chinese culture: obtaining informed consent for operation].

PubMed

Lin, Mei-Ling; Wu, Jo Yung-Wei; Huang, Mei-Chih

2008-10-01

The purpose of gaining the patient's informed consent is ethical, lying in respect for his or her autonomy, and such consent forms the foundation for the performance of clinical medical treatment. In order to respect the patient's autonomy, for example, during decisions about operations, doctors have the obligation to clearly explain that patient's medical condition to him/her. A thorough briefing should be given prior to the obtaining of the patients' consent. In fulfillment of their duties as medical professionals, both doctors and nurses should be involved in clinically informing patients as well as in obtaining their signature for operation and anesthesia. Although informing patients about their physical state is not the responsibility of nurses, it remains absolutely necessary for nurses to understand how people in Asian cultures understand autonomy. This paper begins with a discussion of autonomy in ethics, and then outlines the differences between the Eastern and Western concepts of autonomy, before discussing the obtaining of the signature of consent, a process performed by the nursing staff during clinical treatment, and resulting in the provision of such signatures by patients with the legal capacity to provide them.
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

PubMed Central

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

2015-01-01

Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745
Information theory-based algorithm for in silico prediction of PCR products with whole genomic sequences as templates.

PubMed

Cao, Youfang; Wang, Lianjie; Xu, Kexue; Kou, Chunhai; Zhang, Yulei; Wei, Guifang; He, Junjian; Wang, Yunfang; Zhao, Liping

2005-07-26

A new algorithm for assessing similarity between primer and template has been developed based on the hypothesis that annealing of primer to template is an information transfer process. Primer sequence is converted to a vector of the full potential hydrogen numbers (3 for G or C, 2 for A or T), while template sequence is converted to a vector of the actual hydrogen bond numbers formed after primer annealing. The former is considered as source information and the latter destination information. An information coefficient is calculated as a measure for fidelity of this information transfer process and thus a measure of similarity between primer and potential annealing site on template. Successful prediction of PCR products from whole genomic sequences with a computer program based on the algorithm demonstrated the potential of this new algorithm in areas like in silico PCR and gene finding.
Application of Modified Spin-Echo–based Sequences for Hepatic MR Elastography: Evaluation, Comparison with the Conventional Gradient-Echo Sequence, and Preliminary Clinical Experience

PubMed Central

Mariappan, Yogesh K.; Dzyubak, Bogdan; Glaser, Kevin J.; Venkatesh, Sudhakar K.; Sirlin, Claude B.; Hooker, Jonathan; McGee, Kiaran P.

2017-01-01

Purpose To (a) evaluate modified spin-echo (SE) magnetic resonance (MR) elastographic sequences for acquiring MR images with improved signal-to-noise ratio (SNR) in patients in whom the standard gradient-echo (GRE) MR elastographic sequence yields low hepatic signal intensity and (b) compare the stiffness values obtained with these sequences with those obtained with the conventional GRE sequence. Materials and Methods This HIPAA-compliant retrospective study was approved by the institutional review board; the requirement to obtain informed consent was waived. Data obtained with modified SE and SE echo-planar imaging (EPI) MR elastographic pulse sequences with short echo times were compared with those obtained with the conventional GRE MR elastographic sequence in two patient cohorts, one that exhibited adequate liver signal intensity and one that exhibited low liver signal intensity. Shear stiffness values obtained with the three sequences in 130 patients with successful GRE-based examinations were retrospectively tested for statistical equivalence by using a 5% margin. In 47 patients in whom GRE examinations were considered to have failed because of low SNR, the SNR and confidence level with the SE-based sequences were compared with those with the GRE sequence. Results The results of this study helped confirm the equivalence of SE MR elastography and SE-EPI MR elastography to GRE MR elastography (P = .0212 and P = .0001, respectively). The SE and SE-EPI MR elastographic sequences provided substantially improved SNR and stiffness inversion confidence level in 47 patients in whom GRE MR elastography had failed. Conclusion Modified SE-based MR elastographic sequences provide higher SNR MR elastographic data and reliable stiffness measurements; thus, they enable quantification of stiffness in patients in whom the conventional GRE MR elastographic sequence failed owing to low signal intensity. The equivalence of the three sequences indicates that the current diagnostic
Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

PubMed Central

Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

2010-01-01

Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087
Evaluating information content of SNPs for sample-tagging in re-sequencing projects.

PubMed

Hu, Hao; Liu, Xiang; Jin, Wenfei; Hilger Ropers, H; Wienker, Thomas F

2015-05-15

Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. The analysis shows that as low as 60 optimized SNPs can differentiate the individuals in a population as large as the present world, and only 30 optimized SNPs are in practice sufficient in labeling up to 100 thousand individuals. In the simulated populations of 100 thousand individuals, the average Hamming distances, generated by the optimized set of 30 SNPs are larger than 18, and the duality frequency, is lower than 1 in 10 thousand. This strategy of sample discrimination is proved robust in large sample size and different datasets. The optimized sets of SNPs are designed for Whole Exome Sequencing, and a program is provided for SNP selection, allowing for customized SNP numbers and interested genes. The sample-tagging plan based on this framework will improve re-sequencing projects in terms of reliability and cost-effectiveness.
The minimum information about a genome sequence (MIGS) specification

PubMed Central

Field, Dawn; Garrity, George; Gray, Tanya; Morrison, Norman; Selengut, Jeremy; Sterk, Peter; Tatusova, Tatiana; Thomson, Nicholas; Allen, Michael J; Angiuoli, Samuel V; Ashburner, Michael; Axelrod, Nelson; Baldauf, Sandra; Ballard, Stuart; Boore, Jeffrey; Cochrane, Guy; Cole, James; Dawyndt, Peter; De Vos, Paul; dePamphilis, Claude; Edwards, Robert; Faruque, Nadeem; Feldman, Robert; Gilbert, Jack; Gilna, Paul; Glöckner, Frank Oliver; Goldstein, Philip; Guralnick, Robert; Haft, Dan; Hancock, David; Hermjakob, Henning; Hertz-Fowler, Christiane; Hugenholtz, Phil; Joint, Ian; Kagan, Leonid; Kane, Matthew; Kennedy, Jessie; Kowalchuk, George; Kottmann, Renzo; Kolker, Eugene; Kravitz, Saul; Kyrpides, Nikos; Leebens-Mack, Jim; Lewis, Suzanna E; Li, Kelvin; Lister, Allyson L; Lord, Phillip; Maltsev, Natalia; Markowitz, Victor; Martiny, Jennifer; Methe, Barbara; Mizrachi, Ilene; Moxon, Richard; Nelson, Karen; Parkhill, Julian; Proctor, Lita; White, Owen; Sansone, Susanna-Assunta; Spiers, Andrew; Stevens, Robert; Swift, Paul; Taylor, Chris; Tateno, Yoshio; Tett, Adrian; Turner, Sarah; Ussery, David; Vaughan, Bob; Ward, Naomi; Whetzel, Trish; Gil, Ingio San; Wilson, Gareth; Wipat, Anil

2008-01-01

With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the ‘transparency’ of the information contained in existing genomic databases. PMID:18464787
Information recovery through image sequence fusion under wavelet transformation

NASA Astrophysics Data System (ADS)

He, Qiang

2010-04-01

Remote sensing is widely applied to provide information of areas with limited ground access with applications such as to assess the destruction from natural disasters and to plan relief and recovery operations. However, the data collection of aerial digital images is constrained by bad weather, atmospheric conditions, and unstable camera or camcorder. Therefore, how to recover the information from the low-quality remote sensing images and how to enhance the image quality becomes very important for many visual understanding tasks, such like feature detection, object segmentation, and object recognition. The quality of remote sensing imagery can be improved through meaningful combination of the employed images captured from different sensors or from different conditions through information fusion. Here we particularly address information fusion to remote sensing images under multi-resolution analysis in the employed image sequences. The image fusion is to recover complete information by integrating multiple images captured from the same scene. Through image fusion, a new image with high-resolution or more perceptive for human and machine is created from a time series of low-quality images based on image registration between different video frames.
Towards rationally redesigning bacterial signaling systems using information encoded in abundant sequence data

NASA Astrophysics Data System (ADS)

Cheng, Ryan; Morcos, Faruck; Levine, Herbert; Onuchic, Jose

2014-03-01

An important challenge in biology is to distinguish the subset of residues that allow bacterial two-component signaling (TCS) proteins to preferentially interact with their correct TCS partner such that they can bind and transfer signal. Detailed knowledge of this information would allow one to search sequence-space for mutations that can systematically tune the signal transmission between TCS partners as well as re-encode a TCS protein to preferentially transfer signals to a non-partner. Motivated by the notion that this detailed information is found in sequence data, we explore the mutual sequence co-evolution between signaling partners to infer how mutations can positively or negatively alter their interaction. Using Direct Coupling Analysis (DCA) for determining evolutionarily conserved interprotein interactions, we apply a DCA-based metric to quantify mutational changes in the interaction between TCS proteins and demonstrate that it accurately correlates with experimental mutagenesis studies probing the mutational change in the in vitro phosphotransfer. Our methodology serves as a potential framework for the rational design of TCS systems as well as a framework for the system-level study of protein-protein interactions in sequence-rich systems. This research has been supported by the NSF INSPIRE award MCB-1241332 and by the CTBP sponsored by the NSF (Grant PHY-1308264).
Sequence-specific "gene signatures" can be obtained by PCR with single specific primers at low stringency.

PubMed Central

Pena, S D; Barreto, G; Vago, A R; De Marco, L; Reinach, F C; Dias Neto, E; Simpson, A J

1994-01-01

Low-stringency single specific primer PCR (LSSP-PCR) is an extremely simple PCR-based technique that detects single or multiple mutations in gene-sized DNA fragments. A purified DNA fragment is subjected to PCR using high concentrations of a single specific oligonucleotide primer, large amounts of Taq polymerase, and a very low annealing temperature. Under these conditions the primer hybridizes specifically to its complementary region and nonspecifically to multiple sites within the fragment, in a sequence-dependent manner, producing a heterogeneous set of reaction products resolvable by electrophoresis. The complex banding pattern obtained is significantly altered by even a single-base change and thus constitutes a unique "gene signature." Therefore LSSP-PCR will have almost unlimited application in all fields of genetics and molecular medicine where rapid and sensitive detection of mutations and sequence variations is important. The usefulness of LSSP-PCR is illustrated by applications in the study of mutants of smooth muscle myosin light chain, analysis of a family with X-linked nephrogenic diabetes insipidus, and identity testing using human mitochondrial DNA. Images PMID:8127912
BEAUTY: an enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results.

PubMed

Worley, K C; Wiese, B A; Smith, R F

1995-09-01

BEAUTY (BLAST enhanced alignment utility) is an enhanced version of the NCBI's BLAST data base search tool that facilitates identification of the functions of matched sequences. We have created new data bases of conserved regions and functional domains for protein sequences in NCBI's Entrez data base, and BEAUTY allows this information to be incorporated directly into BLAST search results. A Conserved Regions Data Base, containing the locations of conserved regions within Entrez protein sequences, was constructed by (1) clustering the entire data base into families, (2) aligning each family using our PIMA multiple sequence alignment program, and (3) scanning the multiple alignments to locate the conserved regions within each aligned sequence. A separate Annotated Domains Data Base was constructed by extracting the locations of all annotated domains and sites from sequences represented in the Entrez, PROSITE, BLOCKS, and PRINTS data bases. BEAUTY performs a BLAST search of those Entrez sequences with conserved regions and/or annotated domains. BEAUTY then uses the information from the Conserved Regions and Annotated Domains data bases to generate, for each matched sequence, a schematic display that allows one to directly compare the relative locations of (1) the conserved regions, (2) annotated domains and sites, and (3) the locally aligned regions matched in the BLAST search. In addition, BEAUTY search results include World-Wide Web hypertext links to a number of external data bases that provide a variety of additional types of information on the function of matched sequences. This convenient integration of protein families, conserved regions, annotated domains, alignment displays, and World-Wide Web resources greatly enhances the biological informativeness of sequence similarity searches. BEAUTY searches can be performed remotely on our system using the "BCM Search Launcher" World-Wide Web pages (URL is < http:/ /gc.bcm.tmc.edu:8088/ search-launcher/launcher.html > ).
On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

NASA Astrophysics Data System (ADS)

Tarpine, Ryan; Istrail, Sorin

The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.
Automatic prediction of protein domains from sequence information using a hybrid learning system.

PubMed

Nagarajan, Niranjan; Yona, Golan

2004-06-12

We describe a novel method for detecting the domain structure of a protein from sequence information alone. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence and are combined into a single predictor using a neural network. The output is further smoothed and post-processed using a probabilistic model to predict the most likely transition positions between domains. The method was assessed using the domain definitions in SCOP and CATH for proteins of known structure and was compared with several other existing methods. Our method performs well both in terms of accuracy and sensitivity. It improves significantly over the best methods available, even some of the semi-manual ones, while being fully automatic. Our method can also be used to suggest and verify domain partitions based on structural data. A few examples of predicted domain definitions and alternative partitions, as suggested by our method, are also discussed. An online domain-prediction server is available at http://biozon.org/tools/domains/
40 CFR 2.302 - Special rules governing certain information obtained under the Clean Water Act.

Code of Federal Regulations, 2013 CFR

2013-07-01

... information obtained under the Clean Water Act. 2.302 Section 2.302 Protection of Environment ENVIRONMENTAL... governing certain information obtained under the Clean Water Act. (a) Definitions. For the purposes of this section: (1) Act means the Clean Water Act, as amended, 33 U.S.C. 1251 et seq. (2)(i) Effluent data means...

Accurately Decoding Visual Information from fMRI Data Obtained in a Realistic Virtual Environment

DTIC Science & Technology

2015-06-09

Center for Learning and Memory , The University of Texas at Austin, 100 E 24th Street, Stop C7000, Austin, TX 78712, USA afloren@utexas.edu Received: 18...information from fMRI data obtained in a realistic virtual environment. Front. Hum. Neurosci. 9:327. doi: 10.3389/fnhum.2015.00327 Accurately decoding...visual information from fMRI data obtained in a realistic virtual environment Andrew Floren 1*, Bruce Naylor 2, Risto Miikkulainen 3 and David Ress 4
Structure of the oligomers obtained by enzymatic hydrolysis of the glucomannan produced by the plant Amorphophallus konjac.

PubMed

Cescutti, Paola; Campa, Cristiana; Delben, Franco; Rizzo, Roberto

2002-11-29

Dimers and trimers obtained by enzymatic hydrolysis of the glucomannan produced by the plant Amorphophallus konjac were analysed in order to obtain information on the saccharidic sequences present in the polymer. The polysaccharide was digested with cellulase and beta-mannanase and the oligomers produced were isolated by means of size-exclusion chromatography. They were structurally characterised using electrospray mass spectrometry, capillary electrophoresis, and NMR. The investigation revealed that many possible sequences were present in the polymer backbone suggesting a Bernoulli-type chain.
Minimum Information for Reporting Next Generation Sequence Genotyping (MIRING): Guidelines for Reporting HLA and KIR Genotyping via Next Generation Sequencing

PubMed Central

Mack, Steven J.; Milius, Robert P.; Gifford, Benjamin D.; Sauter, Jürgen; Hofmann, Jan; Osoegawa, Kazutoyo; Robinson, James; Groeneweg, Mathijs; Turenchalk, Gregory S.; Adai, Alex; Holcomb, Cherie; Rozemuller, Erik H.; Penning, Maarten T.; Heuer, Michael L.; Wang, Chunlin; Salit, Marc L.; Schmidt, Alexander H.; Parham, Peter R.; Müller, Carlheinz; Hague, Tim; Fischer, Gottfried; Fernandez-Viňa, Marcelo; Hollenbach, Jill A; Norman, Paul J.; Maiers, Martin

2015-01-01

The development of next-generation sequencing (NGS) technologies for HLA and KIR genotyping is rapidly advancing knowledge of genetic variation of these highly polymorphic loci. NGS genotyping is poised to replace older methods for clinical use, but standard methods for reporting and exchanging these new, high quality genotype data are needed. The Immunogenomic NGS Consortium, a broad collaboration of histocompatibility and immunogenetics clinicians, researchers, instrument manufacturers and software developers, has developed the Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines. MIRING is a checklist that specifies the content of NGS genotyping results as well as a set of messaging guidelines for reporting the results. A MIRING message includes five categories of structured information – message annotation, reference context, full genotype, consensus sequence and novel polymorphism – and references to three categories of accessory information – NGS platform documentation, read processing documentation and primary data. These eight categories of information ensure the long-term portability and broad application of this NGS data for all current histocompatibility and immunogenetics use cases. In addition, MIRING can be extended to allow the reporting of genotype data generated using pre-NGS technologies. Because genotyping results reported using MIRING are easily updated in accordance with reference and nomenclature databases, MIRING represents a bold departure from previous methods of reporting HLA and KIR genotyping results, which have provided static and less-portable data. More information about MIRING can be found online at miring.immunogenomics.org. PMID:26407912
32 CFR Appendix G to Part 275 - Releasing Information Obtained From Financial Institutions

Code of Federal Regulations, 2010 CFR

2010-07-01

... FINANCIAL PRIVACY ACT OF 1978 Pt. 275, App. G Appendix G to Part 275—Releasing Information Obtained From... record was obtained pursuant to the Right to Financial Privacy Act of 1978, 12 U.S.C. 3401 et seq., and... transferring law enforcement office, personnel security element, or intelligence organization, or designee...
The Complete Sequence of a Human Parainfluenzavirus 4 Genome

PubMed Central

Yea, Carmen; Cheung, Rose; Collins, Carol; Adachi, Dena; Nishikawa, John; Tellier, Raymond

2009-01-01

Although the human parainfluenza virus 4 (HPIV4) has been known for a long time, its genome, alone among the human paramyxoviruses, has not been completely sequenced to date. In this study we obtained the first complete genomic sequence of HPIV4 from a clinical isolate named SKPIV4 obtained at the Hospital for Sick Children in Toronto (Ontario, Canada). The coding regions for the N, P/V, M, F and HN proteins show very high identities (95% to 97%) with previously available partial sequences for HPIV4B. The sequence for the L protein and the non-coding regions represent new information. A surprising feature of the genome is its length, more than 17 kb, making it the longest genome within the genus Rubulavirus, although the length is well within the known range of 15 kb to 19 kb for the subfamily Paramyxovirinae. The availability of a complete genomic sequence will facilitate investigations on a respiratory virus that is still not completely characterized. PMID:21994536
ACMES: fast multiple-genome searches for short repeat sequences with concurrent cross-species information retrieval

PubMed Central

Reneker, Jeff; Shyu, Chi-Ren; Zeng, Peiyu; Polacco, Joseph C.; Gassmann, Walter

2004-01-01

We have developed a web server for the life sciences community to use to search for short repeats of DNA sequence of length between 3 and 10 000 bases within multiple species. This search employs a unique and fast hash function approach. Our system also applies information retrieval algorithms to discover knowledge of cross-species conservation of repeat sequences. Furthermore, we have incorporated a part of the Gene Ontology database into our information retrieval algorithms to broaden the coverage of the search. Our web server and tutorial can be found at http://acmes.rnet.missouri.edu. PMID:15215469
A safe an easy method for building consensus HIV sequences from 454 massively parallel sequencing data.

PubMed

Fernández-Caballero Rico, Jose Ángel; Chueca Porcuna, Natalia; Álvarez Estévez, Marta; Mosquera Gutiérrez, María Del Mar; Marcos Maeso, María Ángeles; García, Federico

2018-02-01

To show how to generate a consensus sequence from the information of massive parallel sequences data obtained from routine HIV anti-retroviral resistance studies, and that may be suitable for molecular epidemiology studies. Paired Sanger (Trugene-Siemens) and next-generation sequencing (NGS) (454 GSJunior-Roche) HIV RT and protease sequences from 62 patients were studied. NGS consensus sequences were generated using Mesquite, using 10%, 15%, and 20% thresholds. Molecular evolutionary genetics analysis (MEGA) was used for phylogenetic studies. At a 10% threshold, NGS-Sanger sequences from 17/62 patients were phylogenetically related, with a median bootstrap-value of 88% (IQR83.5-95.5). Association increased to 36/62 sequences, median bootstrap 94% (IQR85.5-98)], using a 15% threshold. Maximum association was at the 20% threshold, with 61/62 sequences associated, and a median bootstrap value of 99% (IQR98-100). A safe method is presented to generate consensus sequences from HIV-NGS data at 20% threshold, which will prove useful for molecular epidemiological studies. Copyright © 2016 Elsevier España, S.L.U. and Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.
77 FR 37886 - Notice of Intent To Obtain Information Regarding Organizations Who Are Assisting African...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-06-25

... information regarding organizations active in this area, for the purpose of information sharing. SUMMARY: This notice announces that the U.S. Africa Command (AFRICOM) is seeking information about organizations, both... DEPARTMENT OF DEFENSE Office of the Secretary Notice of Intent To Obtain Information Regarding...
Coordination sequences and information spreading in small-world networks

NASA Astrophysics Data System (ADS)

Herrero, Carlos P.

2002-10-01

We study the spread of information in small-world networks generated from different d-dimensional regular lattices, with d=1, 2, and 3. With this purpose, we analyze by numerical simulations the behavior of the coordination sequence, e.g., the average number of sites C(n) that can be reached from a given node of the network in n steps along its bonds. For sufficiently large networks, we find an asymptotic behavior C(n)~ρn, with a constant ρ that depends on the network dimension d and on the rewiring probability p (which measures the disorder strength of a given network). A simple model of information spreading in these networks is studied, assuming that only a fraction q of the network sites are active. The number of active nodes reached in n steps has an asymptotic form λn, λ being a constant that depends on p and q, as well as on the dimension d of the underlying lattice. The information spreading presents two different regimes depending on the value of λ: For λ>1 the information propagates along the whole system, and for λ<1 the spreading is damped and the information remains confined in a limited region of the network. We discuss the connection of these results with site percolation in small-world networks.
Optical Processing Techniques For Pseudorandom Sequence Prediction

NASA Astrophysics Data System (ADS)

Gustafson, Steven C.

1983-11-01

Pseudorandom sequences are series of apparently random numbers generated, for example, by linear or nonlinear feedback shift registers. An important application of these sequences is in spread spectrum communication systems, in which, for example, the transmitted carrier phase is digitally modulated rapidly and pseudorandomly and in which the information to be transmitted is incorporated as a slow modulation in the pseudorandom sequence. In this case the transmitted information can be extracted only by a receiver that uses for demodulation the same pseudorandom sequence used by the transmitter, and thus this type of communication system has a very high immunity to third-party interference. However, if a third party can predict in real time the probable future course of the transmitted pseudorandom sequence given past samples of this sequence, then interference immunity can be significantly reduced.. In this application effective pseudorandom sequence prediction techniques should be (1) applicable in real time to rapid (e.g., megahertz) sequence generation rates, (2) applicable to both linear and nonlinear pseudorandom sequence generation processes, and (3) applicable to error-prone past sequence samples of limited number and continuity. Certain optical processing techniques that may meet these requirements are discussed in this paper. In particular, techniques based on incoherent optical processors that perform general linear transforms or (more specifically) matrix-vector multiplications are considered. Computer simulation examples are presented which indicate that significant prediction accuracy can be obtained using these transforms for simple pseudorandom sequences. However, the useful prediction of more complex pseudorandom sequences will probably require the application of more sophisticated optical processing techniques.
New Stopping Criteria for Segmenting DNA Sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Wentian

2001-06-18

We propose a solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns. This new stopping criterion is based on Bayesian information criterion in the model selection framework. When this criterion is applied to telomere of S.cerevisiae and the complete sequence of E.coli, borders of biologically meaningful units were identified, and a more reasonable number of domains was obtained. We also introduce a measure called segmentation strength which can be used to control the delineation of large domains. The relationship between the average domain size and the threshold of segmentation strength is determined for several genomemore » sequences.« less
76 FR 28434 - Notice of Disclosure of Confidential Business Information Obtained Under the Comprehensive...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-05-17

... Information Obtained Under the Comprehensive Environmental Response, Compensation and Liability Act to EPA Contractor Toeroek Associates Inc., and Their Subcontractor, Science Applications International Corp. AGENCY... disclose confidential business information (``CBI'') submitted to EPA Region 9 pursuant to CERCLA to EPA...
75 FR 76479 - Notice of Proposed Information Collection for Public Comment; Procedure for Obtaining...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-12-08

... appropriate automated collection techniques or other forms of information technology; e.g., permitting... Information Collection for Public Comment; Procedure for Obtaining Certificates of Insurance for Capital... toll-free number) or e-mail Ms. [email protected] . Persons with hearing or speech impairments...
Sequence Capture versus Restriction Site Associated DNA Sequencing for Shallow Systematics.

PubMed

Harvey, Michael G; Smith, Brian Tilston; Glenn, Travis C; Faircloth, Brant C; Brumfield, Robb T

2016-09-01

Sequence capture and restriction site associated DNA sequencing (RAD-Seq) are two genomic enrichment strategies for applying next-generation sequencing technologies to systematics studies. At shallow timescales, such as within species, RAD-Seq has been widely adopted among researchers, although there has been little discussion of the potential limitations and benefits of RAD-Seq and sequence capture. We discuss a series of issues that may impact the utility of sequence capture and RAD-Seq data for shallow systematics in non-model species. We review prior studies that used both methods, and investigate differences between the methods by re-analyzing existing RAD-Seq and sequence capture data sets from a Neotropical bird (Xenops minutus). We suggest that the strengths of RAD-Seq data sets for shallow systematics are the wide dispersion of markers across the genome, the relative ease and cost of laboratory work, the deep coverage and read overlap at recovered loci, and the high overall information that results. Sequence capture's benefits include flexibility and repeatability in the genomic regions targeted, success using low-quality samples, more straightforward read orthology assessment, and higher per-locus information content. The utility of a method in systematics, however, rests not only on its performance within a study, but on the comparability of data sets and inferences with those of prior work. In RAD-Seq data sets, comparability is compromised by low overlap of orthologous markers across species and the sensitivity of genetic diversity in a data set to an interaction between the level of natural heterozygosity in the samples examined and the parameters used for orthology assessment. In contrast, sequence capture of conserved genomic regions permits interrogation of the same loci across divergent species, which is preferable for maintaining comparability among data sets and studies for the purpose of drawing general conclusions about the impact of
Detection of hepatitis C virus sequences in brain tissue obtained in recurrent hepatitis C after liver transplantation.

PubMed

Vargas, Hugo E; Laskus, Tomasz; Radkowski, Marek; Wilkinson, Jeff; Balan, Vijay; Douglas, David D; Harrison, M Edwyn; Mulligan, David C; Olden, Kevin; Adair, Debra; Rakela, Jorge

2002-11-01

Patients with chronic hepatitis C frequently report tiredness, easy fatigability, and depression. The aim of this study is to determine whether hepatitis C virus (HCV) replication could be found in brain tissue in patients with hepatitis C and depression. We report two patients with recurrent hepatitis C after liver transplantation who also developed severe depression. One patient died of multiorgan failure and the other, septicemia caused by Staphylococcus aureussis. Both patients had evidence of severe hepatitis C recurrence with features of cholestatic fibrosing hepatitis. We were able to study samples of their central nervous system obtained at autopsy for evidence of HCV replication. The presence of HCV RNA-negative strand, which is the viral replicative form, was determined by strand-specific Tth-based reverse-transcriptase polymerase chain reaction. Viral sequences were compared by means of single-strand conformation polymorphism and direct sequencing. HCV RNA-negative strands were found in subcortical white matter from one patient and cerebral cortex from the other patient. HCV RNA-negative strands amplified from brain tissue differed by several nucleotide substitutions from serum consensus sequences in the 5' untranslated region. These findings support the concept of HCV neuroinvasion, and we speculate that it may provide a biological substrate to neuropsychiatric disorders observed in patients with chronic hepatitis C. The exact lineage of cells permissive for HCV replication and the possible interaction between viral replication and cerebral function that may lead to depression remain to be elucidated.
Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes.

PubMed

Srinivasulu, Yerukala Sathipati; Wang, Jyun-Rong; Hsu, Kai-Ti; Tsai, Ming-Ju; Charoenkwan, Phasit; Huang, Wen-Lin; Huang, Hui-Ling; Ho, Shinn-Ying

2015-01-01

Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization
40 CFR 2.304 - Special rules governing certain information obtained under the Safe Drinking Water Act.

Code of Federal Regulations, 2010 CFR

2010-07-01

... information obtained under the Safe Drinking Water Act. 2.304 Section 2.304 Protection of Environment... Special rules governing certain information obtained under the Safe Drinking Water Act. (a) Definitions. For the purposes of this section: (1) Act means the Safe Drinking Water Act, 42 U.S.C. 300f et seq. (2...
40 CFR 2.304 - Special rules governing certain information obtained under the Safe Drinking Water Act.

Code of Federal Regulations, 2011 CFR

2011-07-01

... information obtained under the Safe Drinking Water Act. 2.304 Section 2.304 Protection of Environment... Special rules governing certain information obtained under the Safe Drinking Water Act. (a) Definitions. For the purposes of this section: (1) Act means the Safe Drinking Water Act, 42 U.S.C. 300f et seq. (2...
Community and gene composition of a human dental plaque microbiota obtained by metagenomic sequencing

PubMed Central

Xie, G.; Chain, P.S.G.; Lo, C.; Liu, K-L.; Gans, J.; Merritt, J.; Qi, F.

2010-01-01

SUMMARY Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~ 2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. PMID:21040513
Community and gene composition of a human dental plaque microbiota obtained by metagenomic sequencing.

PubMed

Xie, G; Chain, P S G; Lo, C-C; Liu, K-L; Gans, J; Merritt, J; Qi, F

2010-12-01

Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. © 2010 John Wiley & Sons A/S.

How could disclosing incidental information from whole-genome sequencing affect patient behavior?

PubMed Central

Christensen, Kurt D; Green, Robert C

2013-01-01

In this article, we argue that disclosure of incidental findings from whole-genome sequencing has the potential to motivate individuals to change health behaviors through psychological mechanisms that differ from typical risk assessment interventions. Their ability to do so, however, is likely to be highly contingent upon the nature of the incidental findings and how they are disclosed, the context of the disclosure and the characteristics of the patient. Moreover, clinicians need to be aware that behavioral responses may occur in unanticipated ways. This article argues for commentators and policy makers to take a cautious but optimistic perspective while empirical evidence is collected through ongoing research involving whole-genome sequencing and the disclosure of incidental information. PMID:24319470
How could disclosing incidental information from whole-genome sequencing affect patient behavior?

PubMed

Christensen, Kurt D; Green, Robert C

2013-06-01

In this article, we argue that disclosure of incidental findings from whole-genome sequencing has the potential to motivate individuals to change health behaviors through psychological mechanisms that differ from typical risk assessment interventions. Their ability to do so, however, is likely to be highly contingent upon the nature of the incidental findings and how they are disclosed, the context of the disclosure and the characteristics of the patient. Moreover, clinicians need to be aware that behavioral responses may occur in unanticipated ways. This article argues for commentators and policy makers to take a cautious but optimistic perspective while empirical evidence is collected through ongoing research involving whole-genome sequencing and the disclosure of incidental information.
49 CFR 835.11 - Obtaining Board accident reports, factual accident reports, and supporting information.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 49 Transportation 7 2012-10-01 2012-10-01 false Obtaining Board accident reports, factual accident reports, and supporting information. 835.11 Section 835.11 Transportation Other Regulations Relating to Transportation (Continued) NATIONAL TRANSPORTATION SAFETY BOARD TESTIMONY OF BOARD EMPLOYEES § 835.11 Obtaining Board accident reports, factual...
49 CFR 835.11 - Obtaining Board accident reports, factual accident reports, and supporting information.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 49 Transportation 7 2011-10-01 2011-10-01 false Obtaining Board accident reports, factual accident reports, and supporting information. 835.11 Section 835.11 Transportation Other Regulations Relating to Transportation (Continued) NATIONAL TRANSPORTATION SAFETY BOARD TESTIMONY OF BOARD EMPLOYEES § 835.11 Obtaining Board accident reports, factual...
49 CFR 835.11 - Obtaining Board accident reports, factual accident reports, and supporting information.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 49 Transportation 7 2014-10-01 2014-10-01 false Obtaining Board accident reports, factual accident reports, and supporting information. 835.11 Section 835.11 Transportation Other Regulations Relating to Transportation (Continued) NATIONAL TRANSPORTATION SAFETY BOARD TESTIMONY OF BOARD EMPLOYEES § 835.11 Obtaining Board accident reports, factual...
49 CFR 835.11 - Obtaining Board accident reports, factual accident reports, and supporting information.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 49 Transportation 7 2013-10-01 2013-10-01 false Obtaining Board accident reports, factual accident reports, and supporting information. 835.11 Section 835.11 Transportation Other Regulations Relating to Transportation (Continued) NATIONAL TRANSPORTATION SAFETY BOARD TESTIMONY OF BOARD EMPLOYEES § 835.11 Obtaining Board accident reports, factual...
Ultraaccurate genome sequencing and haplotyping of single human cells.

PubMed

Chu, Wai Keung; Edge, Peter; Lee, Ho Suk; Bansal, Vikas; Bafna, Vineet; Huang, Xiaohua; Zhang, Kun

2017-11-21

Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10 -8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs. Copyright © 2017 the Author(s). Published by PNAS.
41 CFR 102-192.40 - Where can we obtain more information about the classes of mail?

Code of Federal Regulations, 2010 CFR

2010-07-01

... PROGRAMS 192-MAIL MANAGEMENT Introduction to this Part § 102-192.40 Where can we obtain more information... 41 Public Contracts and Property Management 3 2010-07-01 2010-07-01 false Where can we obtain more information about the classes of mail? 102-192.40 Section 102-192.40 Public Contracts and Property Management...
An exploration of how Mexican American WIC mothers obtain information about behaviors associated with childhood obesity risk

PubMed Central

Cole, Suzanne M.; McKenney-Shubert, Shannon J.; Jones, Sonya J.; Peterson, Karen E.

2016-01-01

Objective To explore how a sample of Mexican American mothers with preschool-aged children recruited from a Midwestern WIC clinic obtain information about 4 behaviors associated with childhood obesity risk: eating, physical activity, screen time and sleep. Methods One-on-1, structured interviews, in which participants were asked how they communicated with family, learned to take care of their first infant and obtained information about the 4 targeted behaviors for their preschool-aged child. Setting An urban WIC clinic in the Midwest. Participants Forty Mexican-descent WIC mothers with children ages 3–4. Phenomenon of Interest Exposure to information about the 4 targeted behaviors among Mexican-descent mothers participating in WIC. Analysis Quantitative and qualitative data were used to characterize and compare across participants. Results Participants primarily obtained information from their child’s maternal grandmother during their first child’s infancy and from health professionals for their preschool-aged child. Participants typically obtained information through interpersonal communication, television and magazines. Participants were most interested in healthy eating information and least interested in screen time information. Some participants did not seek information. Conclusions Participants engaged in different patterns of information seeking across their child’s development and the 4 behaviors, suggesting that future research should be behaviorally specific. Findings from this study suggest several hypotheses to test in future research. PMID:27876321
Repetitive sequences based on genotyping of Candida albicans isolates obtained from Iranian patients with human immunodeficiency virus

PubMed Central

Tamai, Iradj Ashrafi; Salehi, Taghi Zahraei; Sharifzadeh, Aghil; Shokri, Hojjatollah; Khosravi, Ali Reza

2014-01-01

Objective(s): Candidiasis infection caused by Candida albicans has been known as a major problem in patients with immune disorders. The objective of this study was to genotype the C. albicans isolates obtained from oral cavity of patients with positive human immunodeficiency virus (HIV+) with or/and without oropharyngeal candidiasis (OPC). Materials and Methods: A total of 100 C. albicans isolates from Iranian HIV+patients were genotyped using specific PCR primers of the 25S rDNA and RPS genes. Results: The frequencies of genotypes A, B and C which were achieved using 25S rDNA , were 66, 24 and 10 percent, respectively. In addition, genotypes D and E were not found in this study. Each C. albicans genotype was further classified into four subtypes (types 2, 3, 2/3 and 3/4) by PCR amplification targeting RPS sequence. Conclusion: In general, genotype A3 constituted the majority of understudy clinical isolates obtained from oral cavity of Iranian HIV+ patients. PMID:25691923
Simultaneous and complete genome sequencing of influenza A and B with high coverage by Illumina MiSeq Platform.

PubMed

Rutvisuttinunt, Wiriya; Chinnawirotpisan, Piyawan; Simasathien, Sriluck; Shrestha, Sanjaya K; Yoon, In-Kyu; Klungthong, Chonticha; Fernandez, Stefan

2013-11-01

Active global surveillance and characterization of influenza viruses are essential for better preparation against possible pandemic events. Obtaining comprehensive information about the influenza genome can improve our understanding of the evolution of influenza viruses and emergence of new strains, and improve the accuracy when designing preventive vaccines. This study investigated the use of deep sequencing by the next-generation sequencing (NGS) Illumina MiSeq Platform to obtain complete genome sequence information from influenza virus isolates. The influenza virus isolates were cultured from 6 respiratory acute clinical specimens collected in Thailand and Nepal. DNA libraries obtained from each viral isolate were mixed and all were sequenced simultaneously. Total information of 2.6 Gbases was obtained from a 455±14 K/mm2 density with 95.76% (8,571,655/8,950,724 clusters) of the clusters passing quality control (QC) filters. Approximately 93.7% of all sequences from Read1 and 83.5% from Read2 contained high quality sequences that were ≥Q30, a base calling QC score standard. Alignments analysis identified three seasonal influenza A H3N2 strains, one 2009 pandemic influenza A H1N1 strain and two influenza B strains. The nearly entire genomes of all six virus isolates yielded equal or greater than 600-fold sequence coverage depth. MiSeq Platform identified seasonal influenza A H3N2, 2009 pandemic influenza A H1N1and influenza B in the DNA library mixtures efficiently. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
First Complete Genomic Sequence of a Rabies Virus from the Republic of Tajikistan Obtained Directly from a Flinders Technology Associates Card

PubMed Central

Goharriz, H.; Marston, D. A.; Sharifzoda, F.; Ellis, R. J.; Horton, D. L.; Khakimov, T.; Whatmore, A.; Khamroev, K.; Makhmadshoev, A. N.; Bazarov, M.; Fooks, A. R.

2017-01-01

ABSTRACT A brain homogenate derived from a rabid dog in the district of Tojikobod, Republic of Tajikistan, was applied to a Flinders Technology Associates (FTA) card. A full-genome sequence of rabies virus (RABV) was generated from the FTA card directly without extraction, demonstrating the utility of these cards for readily obtaining genetic data. PMID:28684566
First Complete Genomic Sequence of a Rabies Virus from the Republic of Tajikistan Obtained Directly from a Flinders Technology Associates Card.

PubMed

Goharriz, H; Marston, D A; Sharifzoda, F; Ellis, R J; Horton, D L; Khakimov, T; Whatmore, A; Khamroev, K; Makhmadshoev, A N; Bazarov, M; Fooks, A R; Banyard, A C

2017-07-06

A brain homogenate derived from a rabid dog in the district of Tojikobod, Republic of Tajikistan, was applied to a Flinders Technology Associates (FTA) card. A full-genome sequence of rabies virus (RABV) was generated from the FTA card directly without extraction, demonstrating the utility of these cards for readily obtaining genetic data. © Crown copyright 2017.
Content Analysis of Informed Consent for Whole Genome Sequencing Offered by Direct-to-Consumer Genetic Testing Companies.

PubMed

Niemiec, Emilia; Borry, Pascal; Pinxten, Wim; Howard, Heidi Carmen

2016-12-01

Whole exome sequencing (WES) and whole genome sequencing (WGS) have become increasingly available in the research and clinical settings and are now also being offered by direct-to-consumer (DTC) genetic testing (GT) companies. This offer can be perceived as amplifying the already identified concerns regarding adequacy of informed consent (IC) for both WES/WGS and the DTC GT context. We performed a qualitative content analysis of Websites of four companies offering WES/WGS DTC regarding the following elements of IC: pre-test counseling, benefits and risks, and incidental findings (IFs). The analysis revealed concerns, including the potential lack of pre-test counseling in three of the companies studied, missing relevant information in the risks and benefits sections, and potentially misleading information for consumers. Regarding IFs, only one company, which provides opportunistic screening, provides basic information about their management. In conclusion, some of the information (and related practices) present on the companies' Web pages salient to the consent process are not adequate in reference to recommendations for IC for WGS or WES in the clinical context. Requisite resources should be allocated to ensure that commercial companies are offering high-throughput sequencing under responsible conditions, including an adequate consent process. © 2016 WILEY PERIODICALS, INC.
Methods for making nucleotide probes for sequencing and synthesis

DOEpatents

Church, George M; Zhang, Kun; Chou, Joseph

2014-07-08

Compositions and methods for making a plurality of probes for analyzing a plurality of nucleic acid samples are provided. Compositions and methods for analyzing a plurality of nucleic acid samples to obtain sequence information in each nucleic acid sample are also provided.
Transcriptome analysis by strand-specific sequencing of complementary DNA

PubMed Central

Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey

2009-01-01

High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online. PMID:19620212
Transcriptome analysis by strand-specific sequencing of complementary DNA.

PubMed

Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey

2009-10-01

High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online.
An Exploration of How Mexican American WIC Mothers Obtain Information About Behaviors Associated With Childhood Obesity Risk.

PubMed

Davis, Rachel E; Cole, Suzanne M; McKenney-Shubert, Shannon J; Jones, Sonya J; Peterson, Karen E

2017-03-01

To explore how a sample of Mexican American mothers with preschool-aged children recruited from a Midwestern Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) clinic obtained information about 4 behaviors associated with childhood obesity risk: eating, physical activity, screen time, and sleep. One-on-one structured interviews in which participants were asked how they communicated with family, learned to take care of their first infant, and obtained information about the 4 targeted behaviors for their preschool-aged child. An urban WIC clinic in the Midwest. Forty Mexican-descent mothers enrolled in WIC with children aged 3-4 years. Exposure to information about the 4 targeted behaviors among Mexican-descent mothers participating in WIC. Quantitative and qualitative data were used to characterize and compare across participants. Participants primarily obtained information from their child's maternal grandmother during their first child's infancy and from health professionals for their preschool-aged child. Participants typically obtained information through interpersonal communication, television, and magazines. Participants were most interested in healthy eating information and least interested in screen time information. Some participants did not seek information. Participants engaged in different patterns of information seeking across their child's development and the 4 behaviors, which suggests that future research should be behaviorally specific. Findings from this study suggest several hypotheses to test in future research. Copyright © 2016 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.
Fundamental Bounds for Sequence Reconstruction from Nanopore Sequencers.

PubMed

Magner, Abram; Duda, Jarosław; Szpankowski, Wojciech; Grama, Ananth

2016-06-01

Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to accurately reconstruct the true sequence with high probability? Our results provide a number of important insights: (i) the probability of accurate reconstruction of a sequence from a single sample in the presence of indel errors tends quickly (i.e., exponentially) to zero as the length of the sequence increases; and (ii) replicated extrusion is an effective technique for accurate reconstruction. We show that for typical distributions of indel errors, the required number of replicas is a slow function (polylogarithmic) of sequence length - implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Moreover, we show that in certain cases, the required number of replicas can be related to information-theoretic parameters of the indel error distributions.
Whole genome sequencing of Mycobacterium bovis to obtain molecular fingerprints in human and cattle isolates from Baja California, Mexico.

PubMed

Sandoval-Azuara, Sarai Estrella; Muñiz-Salazar, Raquel; Perea-Jacobo, Ricardo; Robbe-Austerman, Suelee; Perera-Ortiz, Alejandro; López-Valencia, Gilberto; Bravo, Doris M; Sanchez-Flores, Alejandro; Miranda-Guzmán, Daniela; Flores-López, Carlos Alberto; Zenteno-Cuevas, Roberto; Laniado-Laborín, Rafael; de la Cruz, Fabiola Lafarga; Stuber, Tod P

2017-10-01

To determine genetic diversity by comparing the whole genome sequences of cattle and human Mycobacterium bovis isolates from Baja California. A whole genome sequencing strategy was used to obtain the molecular fingerprints of 172 isolates of M. bovis obtained from Baja California, Mexico; 155 isolates were from cattle and 17 isolates were from humans. Spoligotypes were characterized in silico and single nucleotide polymorphism (SNP) differences between the isolates were evaluated. A total of 12 M. bovis spoligotype patterns were identified in cattle and humans. Two predominant spoligotypes patterns were seen in both cattle and humans: SB0145 and SB1040. The SB0145 spoligotype represented 59% of cattle isolates (n=91) and 65% of human isolates (n=11), while the SB1040 spoligotype represented 30% of cattle isolates (n=47) and 30% of human isolates (n=5). When evaluating SNP differences, the human isolates were intimately intertwined with the cattle isolates. All isolates from humans had spoligotype patterns that matched those observed in the cattle isolates, and all human isolates shared common ancestors with cattle in Baja California based on SNP analysis. This suggests that most human tuberculosis caused by M. bovis in Baja California is derived from M. bovis circulating in Baja California cattle. These results reinforce the importance of bovine tuberculosis surveillance and control in this region. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

Draft Genome Sequence of Ideonella sp. Strain A 288, Isolated from an Iron-Precipitating Biofilm

PubMed Central

Künzel, Sven; Szewzyk, Ulrich

2017-01-01

ABSTRACT Here, we report the draft genome sequence of the betaproteobacterium Ideonella sp. strain A_228. This isolate, obtained from a bog iron ore-containing floodplain area in Germany, provides valuable information about the genetic diversity of neutrophilic iron-depositing bacteria. The Illumina NextSeq technique was used to sequence the draft genome sequence of the strain. PMID:28818902
NetTurnP--neural network prediction of beta-turns by use of evolutionary information and predicted protein sequence features.

PubMed

Petersen, Bent; Lundegaard, Claus; Petersen, Thomas Nordahl

2010-11-30

β-turns are the most common type of non-repetitive structures, and constitute on average 25% of the amino acids in proteins. The formation of β-turns plays an important role in protein folding, protein stability and molecular recognition processes. In this work we present the neural network method NetTurnP, for prediction of two-class β-turns and prediction of the individual β-turn types, by use of evolutionary information and predicted protein sequence features. It has been evaluated against a commonly used dataset BT426, and achieves a Matthews correlation coefficient of 0.50, which is the highest reported performance on a two-class prediction of β-turn and not-β-turn. Furthermore NetTurnP shows improved performance on some of the specific β-turn types. In the present work, neural network methods have been trained to predict β-turn or not and individual β-turn types from the primary amino acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non-homologous sequences known as BT426. Our two-class prediction method achieves a performance of: MCC=0.50, Qtotal=82.1%, sensitivity=75.6%, PPV=68.8% and AUC=0.864. We have compared our performance to eleven other prediction methods that obtain Matthews correlation coefficients in the range of 0.17-0.47. For the type specific β-turn predictions, only type I and II can be predicted with reasonable Matthews correlation coefficients, where we obtain performance values of 0.36 and 0.31, respectively. The NetTurnP method has been implemented as a webserver, which is freely available at http://www.cbs.dtu.dk/services/NetTurnP/. NetTurnP is the only available webserver that allows submission of multiple sequences.
24 CFR 5.905 - What special authority is there to obtain access to sex offender registration information?

Code of Federal Regulations, 2012 CFR

2012-04-01

... obtain access to sex offender registration information? 5.905 Section 5.905 Housing and Urban Development... access to sex offender registration information? (a) PHA obligation to obtain sex offender registration... applying for admission to any federally assisted housing program is subject to a lifetime sex offender...
24 CFR 5.905 - What special authority is there to obtain access to sex offender registration information?

Code of Federal Regulations, 2013 CFR

2013-04-01

... obtain access to sex offender registration information? 5.905 Section 5.905 Housing and Urban Development... access to sex offender registration information? (a) PHA obligation to obtain sex offender registration... applying for admission to any federally assisted housing program is subject to a lifetime sex offender...
24 CFR 5.905 - What special authority is there to obtain access to sex offender registration information?

Code of Federal Regulations, 2010 CFR

2010-04-01

... obtain access to sex offender registration information? 5.905 Section 5.905 Housing and Urban Development... access to sex offender registration information? (a) PHA obligation to obtain sex offender registration... applying for admission to any federally assisted housing program is subject to a lifetime sex offender...
24 CFR 5.905 - What special authority is there to obtain access to sex offender registration information?

Code of Federal Regulations, 2014 CFR

2014-04-01

... obtain access to sex offender registration information? 5.905 Section 5.905 Housing and Urban Development... access to sex offender registration information? (a) PHA obligation to obtain sex offender registration... applying for admission to any federally assisted housing program is subject to a lifetime sex offender...
24 CFR 5.905 - What special authority is there to obtain access to sex offender registration information?

Code of Federal Regulations, 2011 CFR

2011-04-01

... obtain access to sex offender registration information? 5.905 Section 5.905 Housing and Urban Development... access to sex offender registration information? (a) PHA obligation to obtain sex offender registration... applying for admission to any federally assisted housing program is subject to a lifetime sex offender...
Generation of expressed sequence tags for discovery of genes responsible for floral traits of Chrysanthemum morifolium by next-generation sequencing technology.

PubMed

Sasaki, Katsutomo; Mitsuda, Nobutaka; Nashima, Kenji; Kishimoto, Kyutaro; Katayose, Yuichi; Kanamori, Hiroyuki; Ohmiya, Akemi

2017-09-04

Chrysanthemum morifolium is one of the most economically valuable ornamental plants worldwide. Chrysanthemum is an allohexaploid plant with a large genome that is commercially propagated by vegetative reproduction. New cultivars with different floral traits, such as color, morphology, and scent, have been generated mainly by classical cross-breeding and mutation breeding. However, only limited genetic resources and their genome information are available for the generation of new floral traits. To obtain useful information about molecular bases for floral traits of chrysanthemums, we read expressed sequence tags (ESTs) of chrysanthemums by high-throughput sequencing using the 454 pyrosequencing technology. We constructed normalized cDNA libraries, consisting of full-length, 3'-UTR, and 5'-UTR cDNAs derived from various tissues of chrysanthemums. These libraries produced a total number of 3,772,677 high-quality reads, which were assembled into 213,204 contigs. By comparing the data obtained with those of full genome-sequenced species, we confirmed that our chrysanthemum contig set contained the majority of all expressed genes, which was sufficient for further molecular analysis in chrysanthemums. We confirmed that our chrysanthemum EST set (contigs) contained a number of contigs that encoded transcription factors and enzymes involved in pigment and aroma compound metabolism that was comparable to that of other species. This information can serve as an informative resource for identifying genes involved in various biological processes in chrysanthemums. Moreover, the findings of our study will contribute to a better understanding of the floral characteristics of chrysanthemums including the myriad cultivars at the molecular level.
Genome sequence determination and metagenomic characterization of a Dehalococcoides mixed culture grown on cis-1,2-dichloroethene.

PubMed

Yohda, Masafumi; Yagi, Osami; Takechi, Ayane; Kitajima, Mizuki; Matsuda, Hisashi; Miyamura, Naoaki; Aizawa, Tomoko; Nakajima, Mutsuyasu; Sunairi, Michio; Daiba, Akito; Miyajima, Takashi; Teruya, Morimi; Teruya, Kuniko; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Juan, Ayaka; Nakano, Kazuma; Aoyama, Misako; Terabayashi, Yasunobu; Satou, Kazuhito; Hirano, Takashi

2015-07-01

A Dehalococcoides-containing bacterial consortium that performed dechlorination of 0.20 mM cis-1,2-dichloroethene to ethene in 14 days was obtained from the sediment mud of the lotus field. To obtain detailed information of the consortium, the metagenome was analyzed using the short-read next-generation sequencer SOLiD 3. Matching the obtained sequence tags with the reference genome sequences indicated that the Dehalococcoides sp. in the consortium was highly homologous to Dehalococcoides mccartyi CBDB1 and BAV1. Sequence comparison with the reference sequence constructed from 16S rRNA gene sequences in a public database showed the presence of Sedimentibacter, Sulfurospirillum, Clostridium, Desulfovibrio, Parabacteroides, Alistipes, Eubacterium, Peptostreptococcus and Proteocatella in addition to Dehalococcoides sp. After further enrichment, the members of the consortium were narrowed down to almost three species. Finally, the full-length circular genome sequence of the Dehalococcoides sp. in the consortium, D. mccartyi IBARAKI, was determined by analyzing the metagenome with the single-molecule DNA sequencer PacBio RS. The accuracy of the sequence was confirmed by matching it to the tag sequences obtained by SOLiD 3. The genome is 1,451,062 nt and the number of CDS is 1566, which includes 3 rRNA genes and 47 tRNA genes. There exist twenty-eight RDase genes that are accompanied by the genes for anchor proteins. The genome exhibits significant sequence identity with other Dehalococcoides spp. throughout the genome, but there exists significant difference in the distribution RDase genes. The combination of a short-read next-generation DNA sequencer and a long-read single-molecule DNA sequencer gives detailed information of a bacterial consortium. Copyright © 2014 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
Differential evolution-simulated annealing for multiple sequence alignment

NASA Astrophysics Data System (ADS)

Addawe, R. C.; Addawe, J. M.; Sueño, M. R. K.; Magadia, J. C.

2017-10-01

Multiple sequence alignments (MSA) are used in the analysis of molecular evolution and sequence structure relationships. In this paper, a hybrid algorithm, Differential Evolution - Simulated Annealing (DESA) is applied in optimizing multiple sequence alignments (MSAs) based on structural information, non-gaps percentage and totally conserved columns. DESA is a robust algorithm characterized by self-organization, mutation, crossover, and SA-like selection scheme of the strategy parameters. Here, the MSA problem is treated as a multi-objective optimization problem of the hybrid evolutionary algorithm, DESA. Thus, we name the algorithm as DESA-MSA. Simulated sequences and alignments were generated to evaluate the accuracy and efficiency of DESA-MSA using different indel sizes, sequence lengths, deletion rates and insertion rates. The proposed hybrid algorithm obtained acceptable solutions particularly for the MSA problem evaluated based on the three objectives.
Toward allotetraploid cotton genome assembly: integration of a high-density molecular genetic linkage map with DNA sequence information

PubMed Central

2012-01-01

Background Cotton is the world’s most important natural textile fiber and a significant oilseed crop. Decoding cotton genomes will provide the ultimate reference and resource for research and utilization of the species. Integration of high-density genetic maps with genomic sequence information will largely accelerate the process of whole-genome assembly in cotton. Results In this paper, we update a high-density interspecific genetic linkage map of allotetraploid cultivated cotton. An additional 1,167 marker loci have been added to our previously published map of 2,247 loci. Three new marker types, InDel (insertion-deletion) and SNP (single nucleotide polymorphism) developed from gene information, and REMAP (retrotransposon-microsatellite amplified polymorphism), were used to increase map density. The updated map consists of 3,414 loci in 26 linkage groups covering 3,667.62 cM with an average inter-locus distance of 1.08 cM. Furthermore, genome-wide sequence analysis was finished using 3,324 informative sequence-based markers and publicly-available Gossypium DNA sequence information. A total of 413,113 EST and 195 BAC sequences were physically anchored and clustered by 3,324 sequence-based markers. Of these, 14,243 ESTs and 188 BACs from different species of Gossypium were clustered and specifically anchored to the high-density genetic map. A total of 2,748 candidate unigenes from 2,111 ESTs clusters and 63 BACs were mined for functional annotation and classification. The 337 ESTs/genes related to fiber quality traits were integrated with 132 previously reported cotton fiber quality quantitative trait loci, which demonstrated the important roles in fiber quality of these genes. Higher-level sequence conservation between different cotton species and between the A- and D-subgenomes in tetraploid cotton was found, indicating a common evolutionary origin for orthologous and paralogous loci in Gossypium. Conclusion This study will serve as a valuable genomic resource
Rapid in silico cloning of genes using expressed sequence tags (ESTs).

PubMed

Gill, R W; Sanseau, P

2000-01-01

Expressed sequence tags (ESTs) are short single-pass DNA sequences obtained from either end of cDNA clones. These ESTs are derived from a vast number of cDNA libraries obtained from different species. Human ESTs are the bulk of the data and have been widely used to identify new members of gene families, as markers on the human chromosomes, to discover polymorphism sites and to compare expression patterns in different tissues or pathologies states. Information strategies have been devised to query EST databases. Since most of the analysis is performed with a computer, the term "in silico" strategy has been coined. In this chapter we will review the current status of EST databases, the pros and cons of EST-type data and describe possible strategies to retrieve meaningful information.
Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models

PubMed Central

2014-01-01

Background Logos are commonly used in molecular biology to provide a compact graphical representation of the conservation pattern of a set of sequences. They render the information contained in sequence alignments or profile hidden Markov models by drawing a stack of letters for each position, where the height of the stack corresponds to the conservation at that position, and the height of each letter within a stack depends on the frequency of that letter at that position. Results We present a new tool and web server, called Skylign, which provides a unified framework for creating logos for both sequence alignments and profile hidden Markov models. In addition to static image files, Skylign creates a novel interactive logo plot for inclusion in web pages. These interactive logos enable scrolling, zooming, and inspection of underlying values. Skylign can avoid sampling bias in sequence alignments by down-weighting redundant sequences and by combining observed counts with informed priors. It also simplifies the representation of gap parameters, and can optionally scale letter heights based on alternate calculations of the conservation of a position. Conclusion Skylign is available as a website, a scriptable web service with a RESTful interface, and as a software package for download. Skylign’s interactive logos are easily incorporated into a web page with just a few lines of HTML markup. Skylign may be found at http://skylign.org. PMID:24410852
3D knee segmentation based on three MRI sequences from different planes.

PubMed

Zhou, L; Chav, R; Cresson, T; Chartrand, G; de Guise, J

2016-08-01

In clinical practice, knee MRI sequences with 3.5~5 mm slice distance in sagittal, coronal, and axial planes are often requested for the knee examination since its acquisition is faster than high-resolution MRI sequence in a single plane, thereby reducing the probability of motion artifact. In order to take advantage of the three sequences from different planes, a 3D segmentation method based on the combination of three knee models obtained from the three sequences is proposed in this paper. In the method, the sub-segmentation is respectively performed with sagittal, coronal, and axial MRI sequence in the image coordinate system. With each sequence, an initial knee model is hierarchically deformed, and then the three deformed models are mapped to reference coordinate system defined by the DICOM standard and combined to obtain a patient-specific model. The experimental results verified that the three sub-segmentation results can complement each other, and their integration can compensate for the insufficiency of boundary information caused by 3.5~5 mm gap between consecutive slices. Therefore, the obtained patient-specific model is substantially more accurate than each sub-segmentation results.
Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes

PubMed Central

2015-01-01

Background Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. Results This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. Conclusions The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein
A Workshop Report on Wheat Genome Sequencing

PubMed Central

Gill, Bikram S.; Appels, Rudi; Botha-Oberholster, Anna-Maria; Buell, C. Robin; Bennetzen, Jeffrey L.; Chalhoub, Boulos; Chumley, Forrest; Dvořák, Jan; Iwanaga, Masaru; Keller, Beat; Li, Wanlong; McCombie, W. Richard; Ogihara, Yasunari; Quetier, Francis; Sasaki, Takuji

2004-01-01

Sponsored by the National Science Foundation and the U.S. Department of Agriculture, a wheat genome sequencing workshop was held November 10–11, 2003, in Washington, DC. It brought together 63 scientists of diverse research interests and institutions, including 45 from the United States and 18 from a dozen foreign countries (see list of participants at http://www.ksu.edu/igrow). The objectives of the workshop were to discuss the status of wheat genomics, obtain feedback from ongoing genome sequencing projects, and develop strategies for sequencing the wheat genome. The purpose of this report is to convey the information discussed at the workshop and provide the basis for an ongoing dialogue, bringing forth comments and suggestions from the genetics community. PMID:15514080
Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides

NASA Astrophysics Data System (ADS)

McMillen, Chelsea L.; Wright, Patience M.; Cassady, Carolyn J.

2016-05-01

Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.
Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides.

PubMed

McMillen, Chelsea L; Wright, Patience M; Cassady, Carolyn J

2016-05-01

Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.
Automatic phylogenetic classification of bacterial beta-lactamase sequences including structural and antibiotic substrate preference information.

PubMed

Ma, Jianmin; Eisenhaber, Frank; Maurer-Stroh, Sebastian

2013-12-01

Beta lactams comprise the largest and still most effective group of antibiotics, but bacteria can gain resistance through different beta lactamases that can degrade these antibiotics. We developed a user friendly tree building web server that allows users to assign beta lactamase sequences to their respective molecular classes and subclasses. Further clinically relevant information includes if the gene is typically chromosomal or transferable through plasmids as well as listing the antibiotics which the most closely related reference sequences are known to target and cause resistance against. This web server can automatically build three phylogenetic trees: the first tree with closely related sequences from a Tachyon search against the NCBI nr database, the second tree with curated reference beta lactamase sequences, and the third tree built specifically from substrate binding pocket residues of the curated reference beta lactamase sequences. We show that the latter is better suited to recover antibiotic substrate assignments through nearest neighbor annotation transfer. The users can also choose to build a structural model for the query sequence and view the binding pocket residues of their query relative to other beta lactamases in the sequence alignment as well as in the 3D structure relative to bound antibiotics. This web server is freely available at http://blac.bii.a-star.edu.sg/.
A Teaching-Learning Sequence of Colour Informed by History and Philosophy of Science

ERIC Educational Resources Information Center

Maurício, Paulo; Valente, Bianor; Chagas, Isabel

2017-01-01

In this work, we present a teaching-learning sequence on colour intended to a pre-service elementary teacher programme informed by History and Philosophy of Science. Working in a socio-constructivist framework, we made an excursion on the history of colour. Our excursion through history of colour, as well as the reported misconception on colour…

40 CFR 2.303 - Special rules governing certain information obtained under the Noise Control Act of 1972.

Code of Federal Regulations, 2011 CFR

2011-07-01

... information obtained under the Noise Control Act of 1972. 2.303 Section 2.303 Protection of Environment... Special rules governing certain information obtained under the Noise Control Act of 1972. (a) Definitions. For the purposes of this section: (1) Act means the Noise Control Act of 1972, 42 U.S.C. 4901 et seq...
40 CFR 2.303 - Special rules governing certain information obtained under the Noise Control Act of 1972.

Code of Federal Regulations, 2010 CFR

2010-07-01

... information obtained under the Noise Control Act of 1972. 2.303 Section 2.303 Protection of Environment... Special rules governing certain information obtained under the Noise Control Act of 1972. (a) Definitions. For the purposes of this section: (1) Act means the Noise Control Act of 1972, 42 U.S.C. 4901 et seq...
An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads

PubMed Central

2013-01-01

Background Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly. Results Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies. Conclusions Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies. PMID:24564333
WebPrInSeS: automated full-length clone sequence identification and verification using high-throughput sequencing data.

PubMed

Massouras, Andreas; Decouttere, Frederik; Hens, Korneel; Deplancke, Bart

2010-07-01

High-throughput sequencing (HTS) is revolutionizing our ability to obtain cheap, fast and reliable sequence information. Many experimental approaches are expected to benefit from the incorporation of such sequencing features in their pipeline. Consequently, software tools that facilitate such an incorporation should be of great interest. In this context, we developed WebPrInSeS, a web server tool allowing automated full-length clone sequence identification and verification using HTS data. WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays. Both tools perform de novo assembly using HTS data from any of the three major sequencing platforms. Thus, WebPrInSeS provides a highly integrated, cost-effective and efficient way to sequence-verify or identify clones of interest. WebPrInSeS is available at http://webprinses.epfl.ch/ and is open to all users.
WebPrInSeS: automated full-length clone sequence identification and verification using high-throughput sequencing data

PubMed Central

Massouras, Andreas; Decouttere, Frederik; Hens, Korneel; Deplancke, Bart

2010-01-01

High-throughput sequencing (HTS) is revolutionizing our ability to obtain cheap, fast and reliable sequence information. Many experimental approaches are expected to benefit from the incorporation of such sequencing features in their pipeline. Consequently, software tools that facilitate such an incorporation should be of great interest. In this context, we developed WebPrInSeS, a web server tool allowing automated full-length clone sequence identification and verification using HTS data. WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays. Both tools perform de novo assembly using HTS data from any of the three major sequencing platforms. Thus, WebPrInSeS provides a highly integrated, cost-effective and efficient way to sequence-verify or identify clones of interest. WebPrInSeS is available at http://webprinses.epfl.ch/ and is open to all users. PMID:20501601
Information Topics of Greatest Interest for Return of Genome Sequencing Results among Women Diagnosed with Breast Cancer at a Young Age.

PubMed

Seo, Joann; Ivanovich, Jennifer; Goodman, Melody S; Biesecker, Barbara B; Kaphingst, Kimberly A

2017-06-01

We investigated what information women diagnosed with breast cancer at a young age would want to learn when genome sequencing results are returned. We conducted 60 semi-structured interviews with women diagnosed with breast cancer at age 40 or younger. We examined what specific information participants would want to learn across result types and for each type of result, as well as how much information they would want. Genome sequencing was not offered to participants as part of the study. Two coders independently coded interview transcripts; analysis was conducted using NVivo10. Across result types, participants wanted to learn about health implications, risk and prevalence in quantitative terms, causes of variants, and causes of diseases. Participants wanted to learn actionable information for variants affecting risk of preventable or treatable disease, medication response, and carrier status. The amount of desired information differed for variants affecting risk of unpreventable or untreatable disease, with uncertain significance, and not health-related. Women diagnosed with breast cancer at a young age recognize the value of genome sequencing results in identifying potential causes and effective treatments and expressed interest in using the information to help relatives and to further understand their other health risks. Our findings can inform the development of effective feedback strategies for genome sequencing that meet patients' information needs and preferences.
42 CFR 478.24 - Opportunity for a party to obtain and submit information.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 42 Public Health 4 2011-10-01 2011-10-01 false Opportunity for a party to obtain and submit information. 478.24 Section 478.24 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES (CONTINUED) QUALITY IMPROVEMENT ORGANIZATIONS RECONSIDERATIONS AND APPEALS...
42 CFR 478.24 - Opportunity for a party to obtain and submit information.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 42 Public Health 4 2014-10-01 2014-10-01 false Opportunity for a party to obtain and submit information. 478.24 Section 478.24 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES (CONTINUED) QUALITY IMPROVEMENT ORGANIZATIONS RECONSIDERATIONS AND APPEALS...
42 CFR 478.24 - Opportunity for a party to obtain and submit information.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 42 Public Health 4 2013-10-01 2013-10-01 false Opportunity for a party to obtain and submit information. 478.24 Section 478.24 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES (CONTINUED) QUALITY IMPROVEMENT ORGANIZATIONS RECONSIDERATIONS AND APPEALS...
42 CFR 478.24 - Opportunity for a party to obtain and submit information.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 42 Public Health 4 2012-10-01 2012-10-01 false Opportunity for a party to obtain and submit information. 478.24 Section 478.24 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF HEALTH AND HUMAN SERVICES (CONTINUED) QUALITY IMPROVEMENT ORGANIZATIONS RECONSIDERATIONS AND APPEALS...
Haemagglutinin and neuraminidase sequencing delineate nosocomial influenza outbreaks with accuracy equivalent to whole genome sequencing.

PubMed

Houghton, Rebecca; Ellis, Joanna; Galiano, Monica; Clark, Tristan W; Wyllie, Sarah

2017-04-01

We describe haemagglutinin (HA) and neuraminidase (NA) sequencing in an apparent cross-site influenza A(H1N1) outbreak in renal transplant and haemodialysis patients, confirmed with whole genome sequencing (WGS). Isolates were sequenced from influenza positive individuals. Phylogenetic trees were constructed using HA and NA sequencing and subsequently WGS. Sequence data was analysed to determine genetic relatedness of viruses obtained from inpatient and outpatient cohorts and compared with epidemiological outbreak information. There were 6 patient cases of influenza in the inpatient renal ward cohort (associated with 3 deaths) and 9 patient cases in the outpatient haemodialysis unit cohort (no deaths). WGS confirmed clustered transmission of two genetically different influenza A(H1N1)pdm09 strains initially identified by analysis of HA and NA genes. WGS took longer, and in this case was not required to determine whether or not the two seemingly linked outbreaks were related. Rapid sequencing of HA and NA genes may be sufficient to aid early influenza outbreak investigation making it appealing for future outbreak investigation. However, as next generation sequencing becomes cheaper and more widely available and bioinformatics software is now freely accessible next generation whole genome analysis may increasingly become a valuable tool for real-time Influenza outbreak investigation. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
40 CFR 2.302 - Special rules governing certain information obtained under the Clean Water Act.

Code of Federal Regulations, 2014 CFR

2014-07-01

... provide the information was issued under section 309(a)(3) of the Act, 33 U.S.C. 1319(a)(3), whether a civil action was brought under section 309(b) of the Act, 33 U.S.C. 1319(b), and whether the information... specifically does not apply to information obtained under section 310(d) or 312(g)(3) of the Act, 33 U.S.C...
40 CFR 2.302 - Special rules governing certain information obtained under the Clean Water Act.

Code of Federal Regulations, 2012 CFR

2012-07-01

... provide the information was issued under section 309(a)(3) of the Act, 33 U.S.C. 1319(a)(3), whether a civil action was brought under section 309(b) of the Act, 33 U.S.C. 1319(b), and whether the information... specifically does not apply to information obtained under section 310(d) or 312(g)(3) of the Act, 33 U.S.C...
40 CFR 2.302 - Special rules governing certain information obtained under the Clean Water Act.

Code of Federal Regulations, 2011 CFR

2011-07-01

... provide the information was issued under section 309(a)(3) of the Act, 33 U.S.C. 1319(a)(3), whether a civil action was brought under section 309(b) of the Act, 33 U.S.C. 1319(b), and whether the information... specifically does not apply to information obtained under section 310(d) or 312(g)(3) of the Act, 33 U.S.C...
Implicit Sequence Learning in Dyslexia: A Within-Sequence Comparison of First- and Higher-Order Information

ERIC Educational Resources Information Center

Du, Wenchong; Kelly, Steve W.

2013-01-01

The present study examines implicit sequence learning in adult dyslexics with a focus on comparing sequence transitions with different statistical complexities. Learning of a 12-item deterministic sequence was assessed in 12 dyslexic and 12 non-dyslexic university students. Both groups showed equivalent standard reaction time increments when the…
Genome sequencing of a single tardigrade Hypsibius dujardini individual

PubMed Central

Arakawa, Kazuharu; Yoshida, Yuki; Tomita, Masaru

2016-01-01

Tardigrades are ubiquitous microscopic animals that play an important role in the study of metazoan phylogeny. Most terrestrial tardigrades can withstand extreme environments by entering an ametabolic desiccated state termed anhydrobiosis. Due to their small size and the non-axenic nature of laboratory cultures, molecular studies of tardigrades are prone to contamination. To minimize the possibility of microbial contaminations and to obtain high-quality genomic information, we have developed an ultra-low input library sequencing protocol to enable the genome sequencing of a single tardigrade Hypsibius dujardini individual. Here, we describe the details of our sequencing data and the ultra-low input library preparation methodologies. PMID:27529330
Genome sequencing of a single tardigrade Hypsibius dujardini individual.

PubMed

Arakawa, Kazuharu; Yoshida, Yuki; Tomita, Masaru

2016-08-16

Tardigrades are ubiquitous microscopic animals that play an important role in the study of metazoan phylogeny. Most terrestrial tardigrades can withstand extreme environments by entering an ametabolic desiccated state termed anhydrobiosis. Due to their small size and the non-axenic nature of laboratory cultures, molecular studies of tardigrades are prone to contamination. To minimize the possibility of microbial contaminations and to obtain high-quality genomic information, we have developed an ultra-low input library sequencing protocol to enable the genome sequencing of a single tardigrade Hypsibius dujardini individual. Here, we describe the details of our sequencing data and the ultra-low input library preparation methodologies.
Intelligent Access to Sequence and Structure Databases (IASSD) - an interface for accessing information from major web databases.

PubMed

Ganguli, Sayak; Gupta, Manoj Kumar; Basu, Protip; Banik, Rahul; Singh, Pankaj Kumar; Vishal, Vineet; Bera, Abhisek Ranjan; Chakraborty, Hirak Jyoti; Das, Sasti Gopal

2014-01-01

With the advent of age of big data and advances in high throughput technology accessing data has become one of the most important step in the entire knowledge discovery process. Most users are not able to decipher the query result that is obtained when non specific keywords or a combination of keywords are used. Intelligent access to sequence and structure databases (IASSD) is a desktop application for windows operating system. It is written in Java and utilizes the web service description language (wsdl) files and Jar files of E-utilities of various databases such as National Centre for Biotechnology Information (NCBI) and Protein Data Bank (PDB). Apart from that IASSD allows the user to view protein structure using a JMOL application which supports conditional editing. The Jar file is freely available through e-mail from the corresponding author.
SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

PubMed

Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

2008-05-01

Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are
SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

PubMed Central

Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

2008-01-01

Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of

"Negotiorum Gestio" in Family Medicine, Informed Consent Obtainment, and Disciplinary Responsibility.

PubMed

Birkeland, Søren

2016-01-01

Introduction. Negotiorum gestio (NG) denotes an action where a person well intendedly acts on behalf of another without obtaining the latter's prior consent. In broad terms, NG-like actions have played a considerable role in health care provision. In some settings, health care delivery with only little or presumed patients' consent has been the rule rather than the exception. However, bioethical principles regarding patient autonomy and obtainment of the patient's informed consent (IC) before intervention are now increasingly materialized in the law of many countries. Aim. To study legal consequences of NG in family medicine and IC handling options. Methods. Case law examination. Results. A disciplinary board case is described concerning a family doctor conducting unlawful NG by not coming up to legal IC requirements. Discussion and Conclusion. The practical and legal implications of IC and possible role of novel Shared Decision-Making approaches in coming up to regulation and bioethical demands are discussed. It is concluded that a doctor may run an unnecessary legal risk when conducting NG in decision-competent patients and furthermore it is suggested that novel Shared Decision-Making approaches could help in obtaining a rightful and practicable IC.
Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

PubMed Central

Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

2015-01-01

Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930
Next-generation sequencing for molecular diagnosis of lung adenocarcinoma specimens obtained by fine needle aspiration cytology

NASA Astrophysics Data System (ADS)

Qiu, Tian; Guo, Huiqin; Zhao, Huan; Wang, Luhua; Zhang, Zhihui

2015-06-01

Identification of multi-gene variations has led to the development of new targeted therapies in lung adenocarcinoma patients, and identification of an appropriate patient population with a reliable screening method is the key to the overall success of tumor targeted therapies. In this study, we used the Ion Torrent next-generation sequencing (NGS) technique to screen for mutations in 89 cases of lung adenocarcinoma metastatic lymph node specimens obtained by fine-needle aspiration cytology (FNAC). Of the 89 specimens, 30 (34%) were found to harbor epidermal growth factor receptor (EGFR) kinase domain mutations. Seven (8%) samples harbored KRAS mutations, and three (3%) samples had BRAF mutations involving exon 11 (G469A) and exon 15 (V600E). Eight (9%) samples harbored PIK3CA mutations. One (1%) sample had a HRAS G12C mutation. Thirty-two (36%) samples (36%) harbored TP53 mutations. Other genes including APC, ATM, MET, PTPN11, GNAS, HRAS, RB1, SMAD4 and STK11 were found each in one case. Our study has demonstrated that NGS using the Ion Torrent technology is a useful tool for gene mutation screening in lung adenocarcinoma metastatic lymph node specimens obtained by FNAC, and may promote the development of new targeted therapies in lung adenocarcinoma patients.
Improving phylogenetic analyses by incorporating additional information from genetic sequence databases.

PubMed

Liang, Li-Jung; Weiss, Robert E; Redelings, Benjamin; Suchard, Marc A

2009-10-01

Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model parameters by default. We build objective and informative priors using hierarchical random effect models that combine additional datasets whose parameters are not of direct interest but are similar to the analysis of interest. We propose principled statistical methods that permit more precise parameter estimates in phylogenetic analyses by creating informative priors for parameters of interest. Using additional sequence datasets from our lab or public databases, we construct a fully Bayesian semiparametric hierarchical model to combine datasets. A dynamic iteratively reweighted Markov chain Monte Carlo algorithm conveniently recycles posterior samples from the individual analyses. We demonstrate the value of our approach by examining the insertion-deletion (indel) process in the enolase gene across the Tree of Life using the phylogenetic software BALI-PHY; we incorporate prior information about indels from 82 curated alignments downloaded from the BAliBASE database.
Correlation between protein sequence similarity and x-ray diffraction quality in the protein data bank.

PubMed

Lu, Hui-Meng; Yin, Da-Chuan; Ye, Ya-Jing; Luo, Hui-Min; Geng, Li-Qiang; Li, Hai-Sheng; Guo, Wei-Hong; Shang, Peng

2009-01-01

As the most widely utilized technique to determine the 3-dimensional structure of protein molecules, X-ray crystallography can provide structure of the highest resolution among the developed techniques. The resolution obtained via X-ray crystallography is known to be influenced by many factors, such as the crystal quality, diffraction techniques, and X-ray sources, etc. In this paper, the authors found that the protein sequence could also be one of the factors. We extracted information of the resolution and the sequence of proteins from the Protein Data Bank (PDB), classified the proteins into different clusters according to the sequence similarity, and statistically analyzed the relationship between the sequence similarity and the best resolution obtained. The results showed that there was a pronounced correlation between the sequence similarity and the obtained resolution. These results indicate that protein structure itself is one variable that may affect resolution when X-ray crystallography is used.
Heuristics for multiobjective multiple sequence alignment.

PubMed

Abbasi, Maryam; Paquete, Luís; Pereira, Francisco B

2016-07-15

Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment. We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments. The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show
Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

PubMed Central

Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

2010-01-01

Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665
Sequence investigation of 34 forensic autosomal STRs with massively parallel sequencing.

PubMed

Zhang, Suhua; Niu, Yong; Bian, Yingnan; Dong, Rixia; Liu, Xiling; Bao, Yun; Jin, Chao; Zheng, Hancheng; Li, Chengtao

2018-05-01

STRs vary not only in the length of the repeat units and the number of repeats but also in the region with which they conform to an incremental repeat pattern. Massively parallel sequencing (MPS) offers new possibilities in the analysis of STRs since they can simultaneously sequence multiple targets in a single reaction and capture potential internal sequence variations. Here, we sequenced 34 STRs applied in the forensic community of China with a custom-designed panel. MPS performance were evaluated from sequencing reads analysis, concordance study and sensitivity testing. High coverage sequencing data were obtained to determine the constitute ratios and heterozygous balance. No actual inconsistent genotypes were observed between capillary electrophoresis (CE) and MPS, demonstrating the reliability of the panel and the MPS technology. With the sequencing data from the 200 investigated individuals, 346 and 418 alleles were obtained via CE and MPS technologies at the 34 STRs, indicating MPS technology provides higher discrimination than CE detection. The whole study demonstrated that STR genotyping with the custom panel and MPS technology has the potential not only to reveal length and sequence variations but also to satisfy the demands of high throughput and high multiplexing with acceptable sensitivity.
Novel Approach to Analyzing MFE of Noncoding RNA Sequences

PubMed Central

George, Tina P.; Thomas, Tessamma

2016-01-01

Genomic studies have become noncoding RNA (ncRNA) centric after the study of different genomes provided enormous information on ncRNA over the past decades. The function of ncRNA is decided by its secondary structure, and across organisms, the secondary structure is more conserved than the sequence itself. In this study, the optimal secondary structure or the minimum free energy (MFE) structure of ncRNA was found based on the thermodynamic nearest neighbor model. MFE of over 2600 ncRNA sequences was analyzed in view of its signal properties. Mathematical models linking MFE to the signal properties were found for each of the four classes of ncRNA analyzed. MFE values computed with the proposed models were in concordance with those obtained with the standard web servers. A total of 95% of the sequences analyzed had deviation of MFE values within ±15% relative to those obtained from standard web servers. PMID:27695341
Novel Approach to Analyzing MFE of Noncoding RNA Sequences.

PubMed

George, Tina P; Thomas, Tessamma

2016-01-01

Genomic studies have become noncoding RNA (ncRNA) centric after the study of different genomes provided enormous information on ncRNA over the past decades. The function of ncRNA is decided by its secondary structure, and across organisms, the secondary structure is more conserved than the sequence itself. In this study, the optimal secondary structure or the minimum free energy (MFE) structure of ncRNA was found based on the thermodynamic nearest neighbor model. MFE of over 2600 ncRNA sequences was analyzed in view of its signal properties. Mathematical models linking MFE to the signal properties were found for each of the four classes of ncRNA analyzed. MFE values computed with the proposed models were in concordance with those obtained with the standard web servers. A total of 95% of the sequences analyzed had deviation of MFE values within ±15% relative to those obtained from standard web servers.
Gene finding in metatranscriptomic sequences.

PubMed

Ismail, Wazim Mohammed; Ye, Yuzhen; Tang, Haixu

2014-01-01

Metatranscriptomic sequencing is a highly sensitive bioassay of functional activity in a microbial community, providing complementary information to the metagenomic sequencing of the community. The acquisition of the metatranscriptomic sequences will enable us to refine the annotations of the metagenomes, and to study the gene activities and their regulation in complex microbial communities and their dynamics. In this paper, we present TransGeneScan, a software tool for finding genes in assembled transcripts from metatranscriptomic sequences. By incorporating several features of metatranscriptomic sequencing, including strand-specificity, short intergenic regions, and putative antisense transcripts into a Hidden Markov Model, TranGeneScan can predict a sense transcript containing one or multiple genes (in an operon) or an antisense transcript. We tested TransGeneScan on a mock metatranscriptomic data set containing three known bacterial genomes. The results showed that TranGeneScan performs better than metagenomic gene finders (MetaGeneMark and FragGeneScan) on predicting protein coding genes in assembled transcripts, and achieves comparable or even higher accuracy than gene finders for microbial genomes (Glimmer and GeneMark). These results imply, with the assistance of metatranscriptomic sequencing, we can obtain a broad and precise picture about the genes (and their functions) in a microbial community. TransGeneScan is available as open-source software on SourceForge at https://sourceforge.net/projects/transgenescan/.
What are Whole Exome Sequencing and Whole Genome Sequencing?

MedlinePlus

... the future. For more information about DNA sequencing technologies and their use: Genetics Home Reference discusses whether ... University in St. Louis describes the different sequencing technologies and what the new technologies have meant for ...
The perspectives of researchers on obtaining informed consent in developing countries.

PubMed

Newton, Sam K; Appiah-Poku, John

2007-04-01

The doctrine of informed consent (IC) exists to protect individuals from exploitation or harm. This study into IC was carried out to investigate how different researchers perceived the process whereby researchers obtained consent. It also examined researchers' perspectives on what constituted IC, and how different settings influenced the process. The study recorded in-depth interviews with 12 lecturers and five doctoral students, who had carried out research in developing countries, at a leading school of public health in the United Kingdom. A purposive, snowballing approach was used to identify interviewees. Although the concept and application of the doctrine of IC should have been the same, irrespective of where the research was carried out, the process of obtaining it had to be different. The setting had to be taken into consideration and the autonomy of the subject had to be respected at all times. In areas of high illiteracy, and where understanding of the subject was likely to be a problem, there was an added responsibility placed on the researcher to devise innovative ways of carrying out the study, taking into consideration the peculiarities of the environment. The ethical issues for IC were the same, irrespective of where the research was conducted. However, because the backgrounds, setting, and knowledge of populations differed, there was the need to be similarly sensitive in obtaining consent. The problems of obtaining genuine IC were not limited to developing countries.
Extension of the COG and arCOG databases by amino acid and nucleotide sequences

PubMed Central

Meereis, Florian; Kaufmann, Michael

2008-01-01

Background The current versions of the COG and arCOG databases, both excellent frameworks for studies in comparative and functional genomics, do not contain the nucleotide sequences corresponding to their protein or protein domain entries. Results Using sequence information obtained from GenBank flat files covering the completely sequenced genomes of the COG and arCOG databases, we constructed NUCOCOG (nucleotide sequences containing COG databases) as an extended version including all nucleotide sequences and in addition the amino acid sequences originally utilized to construct the current COG and arCOG databases. We make available three comprehensive single XML files containing the complete databases including all sequence information. In addition, we provide a web interface as a utility suitable to browse the NUCOCOG database for sequence retrieval. The database is accessible at . Conclusion NUCOCOG offers the possibility to analyze any sequence related property in the context of the COG and arCOG framework simply by using script languages such as PERL applied to a large but single XML document. PMID:19014535
40 CFR 2.305 - Special rules governing certain information obtained under the Solid Waste Disposal Act, as amended.

Code of Federal Regulations, 2010 CFR

2010-07-01

... information obtained under the Solid Waste Disposal Act, as amended. 2.305 Section 2.305 Protection of... § 2.305 Special rules governing certain information obtained under the Solid Waste Disposal Act, as amended. (a) Definitions. For purposes of this section: (1) Act means the Solid Waste Disposal Act, as...
40 CFR 2.305 - Special rules governing certain information obtained under the Solid Waste Disposal Act, as amended.

Code of Federal Regulations, 2011 CFR

2011-07-01

... information obtained under the Solid Waste Disposal Act, as amended. 2.305 Section 2.305 Protection of... § 2.305 Special rules governing certain information obtained under the Solid Waste Disposal Act, as amended. (a) Definitions. For purposes of this section: (1) Act means the Solid Waste Disposal Act, as...
Sequence Diversity Diagram for comparative analysis of multiple sequence alignments.

PubMed

Sakai, Ryo; Aerts, Jan

2014-01-01

The sequence logo is a graphical representation of a set of aligned sequences, commonly used to depict conservation of amino acid or nucleotide sequences. Although it effectively communicates the amount of information present at every position, this visual representation falls short when the domain task is to compare between two or more sets of aligned sequences. We present a new visual presentation called a Sequence Diversity Diagram and validate our design choices with a case study. Our software was developed using the open-source program called Processing. It loads multiple sequence alignment FASTA files and a configuration file, which can be modified as needed to change the visualization. The redesigned figure improves on the visual comparison of two or more sets, and it additionally encodes information on sequential position conservation. In our case study of the adenylate kinase lid domain, the Sequence Diversity Diagram reveals unexpected patterns and new insights, for example the identification of subgroups within the protein subfamily. Our future work will integrate this visual encoding into interactive visualization tools to support higher level data exploration tasks.
IMM estimator with out-of-sequence measurements

NASA Astrophysics Data System (ADS)

Bar-Shalom, Yaakov; Chen, Huimin

2004-08-01

In multisensor tracking systems that operate in a centralized information processing architecture, measurements from the same target obtained by different sensors can arrive at the processing center out of sequence. In order to avoid either a delay in the output or the need for reordering and reprocessing an entire sequence of measurements, such measurements have to be processed as out-of-sequence measurements (OOSM). Recent work developed procedures for incorporating OOSMs into a Kalman filter (KF). Since the state of the art tracker for real (maneuvering) targets is the Interacting Multiple Model (IMM) estimator, this paper presents the algorithm for incorporating OOSMs into an IMM estimator. Both data association and estimation are considered. Simulation results are presented for two realistic problems using measurements from two airborne GMTI sensors. It is shown that the proposed algorithm for incorporating OOSMs into an IMM estimator yields practically the same performance as the reordering and in-sequence reprocessing of the measurements.
TRX-LOGOS - a graphical tool to demonstrate DNA information content dependent upon backbone dynamics in addition to base sequence.

PubMed

Fortin, Connor H; Schulze, Katharina V; Babbitt, Gregory A

2015-01-01

It is now widely-accepted that DNA sequences defining DNA-protein interactions functionally depend upon local biophysical features of DNA backbone that are important in defining sites of binding interaction in the genome (e.g. DNA shape, charge and intrinsic dynamics). However, these physical features of DNA polymer are not directly apparent when analyzing and viewing Shannon information content calculated at single nucleobases in a traditional sequence logo plot. Thus, sequence logos plots are severely limited in that they convey no explicit information regarding the structural dynamics of DNA backbone, a feature often critical to binding specificity. We present TRX-LOGOS, an R software package and Perl wrapper code that interfaces the JASPAR database for computational regulatory genomics. TRX-LOGOS extends the traditional sequence logo plot to include Shannon information content calculated with regard to the dinucleotide-based BI-BII conformation shifts in phosphate linkages on the DNA backbone, thereby adding a visual measure of intrinsic DNA flexibility that can be critical for many DNA-protein interactions. TRX-LOGOS is available as an R graphics module offered at both SourceForge and as a download supplement at this journal. To demonstrate the general utility of TRX logo plots, we first calculated the information content for 416 Saccharomyces cerevisiae transcription factor binding sites functionally confirmed in the Yeastract database and matched to previously published yeast genomic alignments. We discovered that flanking regions contain significantly elevated information content at phosphate linkages than can be observed at nucleobases. We also examined broader transcription factor classifications defined by the JASPAR database, and discovered that many general signatures of transcription factor binding are locally more information rich at the level of DNA backbone dynamics than nucleobase sequence. We used TRX-logos in combination with MEGA 6.0 software
Multilocus sequence typing of total-genome-sequenced bacteria.

PubMed

Larsen, Mette V; Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Pontén, Thomas; Ussery, David W; Aarestrup, Frank M; Lund, Ole

2012-04-01

Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST.

Hidden Markov models of biological primary sequence information.

PubMed Central

Baldi, P; Chauvin, Y; Hunkapiller, T; McClure, M A

1994-01-01

Hidden Markov model (HMM) techniques are used to model families of biological sequences. A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family. The HMM approach is applied to three protein families: globins, immunoglobulins, and kinases. In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences. PMID:8302831
A public HTLV-1 molecular epidemiology database for sequence management and data mining.

PubMed

Araujo, Thessika Hialla Almeida; Souza-Brito, Leandro Inacio; Libin, Pieter; Deforche, Koen; Edwards, Dustin; de Albuquerque-Junior, Antonio Eduardo; Vandamme, Anne-Mieke; Galvao-Castro, Bernardo; Alcantara, Luiz Carlos Junior

2012-01-01

It is estimated that 15 to 20 million people are infected with the human T-cell lymphotropic virus type 1 (HTLV-1). At present, there are more than 2,000 unique HTLV-1 isolate sequences published. A central database to aggregate sequence information from a range of epidemiological aspects including HTLV-1 infections, pathogenesis, origins, and evolutionary dynamics would be useful to scientists and physicians worldwide. Described here, we have developed a database that collects and annotates sequence data and can be accessed through a user-friendly search interface. The HTLV-1 Molecular Epidemiology Database website is available at http://htlv1db.bahia.fiocruz.br/. All data was obtained from publications available at GenBank or through contact with the authors. The database was developed using Apache Webserver 2.1.6 and SGBD MySQL. The webpage interfaces were developed in HTML and sever-side scripting written in PHP. The HTLV-1 Molecular Epidemiology Database is hosted on the Gonçalo Moniz/FIOCRUZ Research Center server. There are currently 2,457 registered sequences with 2,024 (82.37%) of those sequences representing unique isolates. Of these sequences, 803 (39.67%) contain information about clinical status (TSP/HAM, 17.19%; ATL, 7.41%; asymptomatic, 12.89%; other diseases, 2.17%; and no information, 60.32%). Further, 7.26% of sequences contain information on patient gender while 5.23% of sequences provide the age of the patient. The HTLV-1 Molecular Epidemiology Database retrieves and stores annotated HTLV-1 proviral sequences from clinical, epidemiological, and geographical studies. The collected sequences and related information are now accessible on a publically available and user-friendly website. This open-access database will support clinical research and vaccine development related to viral genotype.
40 CFR 2.308 - Special rules governing certain information obtained under the Federal Food, Drug and Cosmetic Act.

Code of Federal Regulations, 2012 CFR

2012-07-01

... information obtained under the Federal Food, Drug and Cosmetic Act. 2.308 Section 2.308 Protection of... § 2.308 Special rules governing certain information obtained under the Federal Food, Drug and Cosmetic... Cosmetic Act, as amended, 21 U.S.C. 301 et seq. (2) Petition means a petition for the issuance of a...
40 CFR 2.308 - Special rules governing certain information obtained under the Federal Food, Drug and Cosmetic Act.

Code of Federal Regulations, 2013 CFR

2013-07-01

... information obtained under the Federal Food, Drug and Cosmetic Act. 2.308 Section 2.308 Protection of... § 2.308 Special rules governing certain information obtained under the Federal Food, Drug and Cosmetic... Cosmetic Act, as amended, 21 U.S.C. 301 et seq. (2) Petition means a petition for the issuance of a...
40 CFR 2.308 - Special rules governing certain information obtained under the Federal Food, Drug and Cosmetic Act.

Code of Federal Regulations, 2014 CFR

2014-07-01

... information obtained under the Federal Food, Drug and Cosmetic Act. 2.308 Section 2.308 Protection of... § 2.308 Special rules governing certain information obtained under the Federal Food, Drug and Cosmetic... Cosmetic Act, as amended, 21 U.S.C. 301 et seq. (2) Petition means a petition for the issuance of a...
40 CFR 2.308 - Special rules governing certain information obtained under the Federal Food, Drug and Cosmetic Act.

Code of Federal Regulations, 2011 CFR

2011-07-01

... information obtained under the Federal Food, Drug and Cosmetic Act. 2.308 Section 2.308 Protection of... § 2.308 Special rules governing certain information obtained under the Federal Food, Drug and Cosmetic... Cosmetic Act, as amended, 21 U.S.C. 301 et seq. (2) Petition means a petition for the issuance of a...
40 CFR 2.308 - Special rules governing certain information obtained under the Federal Food, Drug and Cosmetic Act.

Code of Federal Regulations, 2010 CFR

2010-07-01

... information obtained under the Federal Food, Drug and Cosmetic Act. 2.308 Section 2.308 Protection of... § 2.308 Special rules governing certain information obtained under the Federal Food, Drug and Cosmetic... Cosmetic Act, as amended, 21 U.S.C. 301 et seq. (2) Petition means a petition for the issuance of a...
14 CFR Appendix A to Part 440 - Information Requirements for Obtaining a Maximum Probable Loss Determination for Licensed or...

Code of Federal Regulations, 2014 CFR

2014-01-01

... Requirements for Licensed Launch, Including Suborbital Launch I. General Information A. Mission description. 1.... Orbit altitudes (apogee and perigee). 2. Flight sequence. 3. Staging events and the time for each event... shall cover the range of launch trajectories, inclinations and orbits for which authorization is sought...
14 CFR Appendix A to Part 440 - Information Requirements for Obtaining a Maximum Probable Loss Determination for Licensed or...

Code of Federal Regulations, 2013 CFR

2013-01-01

... Requirements for Licensed Launch, Including Suborbital Launch I. General Information A. Mission description. 1.... Orbit altitudes (apogee and perigee). 2. Flight sequence. 3. Staging events and the time for each event... shall cover the range of launch trajectories, inclinations and orbits for which authorization is sought...
14 CFR Appendix A to Part 440 - Information Requirements for Obtaining a Maximum Probable Loss Determination for Licensed or...

Code of Federal Regulations, 2012 CFR

2012-01-01

... Requirements for Licensed Launch, Including Suborbital Launch I. General Information A. Mission description. 1.... Orbit altitudes (apogee and perigee). 2. Flight sequence. 3. Staging events and the time for each event... shall cover the range of launch trajectories, inclinations and orbits for which authorization is sought...
14 CFR Appendix A to Part 440 - Information Requirements for Obtaining a Maximum Probable Loss Determination for Licensed or...

Code of Federal Regulations, 2011 CFR

2011-01-01

... Requirements for Licensed Launch, Including Suborbital Launch I. General Information A. Mission description. 1.... Orbit altitudes (apogee and perigee). 2. Flight sequence. 3. Staging events and the time for each event... shall cover the range of launch trajectories, inclinations and orbits for which authorization is sought...
Allegations of Failure to Obtain Informed Consent in Spinal Surgery Medical Malpractice Claims

PubMed Central

Grauberger, Jennifer; Kerezoudis, Panagiotis; Choudhry, Asad J.; Alvi, Mohammed Ali; Nassr, Ahmad; Currier, Bradford

2017-01-01

Importance Predictive factors associated with increased risk of medical malpractice litigation have been identified, including severity of injury, physician sex, and error in diagnosis. However, there is a paucity of literature investigating informed consent in spinal surgery malpractice. Objective To investigate the failure to obtain informed consent as an allegation in medical malpractice claims for patients undergoing a spinal procedure. Design, Setting, and Participants In this retrospective cohort study, a national medicolegal database was searched for malpractice claim cases related to spinal surgery for all years available (ie, January 1, 1980, through December 31, 2015). Main Outcomes and Measures Failure to obtain informed consent and associated medical malpractice case verdict. Results A total of 233 patients (117 [50.4%] male and 116 [49.8%] female; 80 with no informed consent allegation and 153 who cited lack of informed consent) who underwent spinal surgery and filed a malpractice claim were studied (mean [SD] age, 47.1 [13.1] years in the total group, 45.8 [12.9] years in the control group, and 47.9 [13.3] years in the informed consent group). Median interval between year of surgery and year of verdict was 5.4 years (interquartile range, 4-7 years). The most common informed consent allegations were failure to explain risks and adverse effects of surgery (52 [30.4%]) and failure to explain alternative treatment options (17 [9.9%]). In bivariate analysis, patients in the control group were more likely to require additional surgery (45 [56.3%] vs 53 [34.6%], P = .002) and have more permanent injuries compared with the informed consent group (46 [57.5%] vs 63 [42.0%], P = .03). On multivariable regression analysis, permanent injuries were more often associated with indemnity payment after a plaintiff verdict (odds ratio [OR], 3.12; 95% CI, 1.46-6.65; P = .003) or a settlement (OR, 6.26; 95% CI, 1.06-36.70; P = .04). Informed consent
Allegations of Failure to Obtain Informed Consent in Spinal Surgery Medical Malpractice Claims.

PubMed

Grauberger, Jennifer; Kerezoudis, Panagiotis; Choudhry, Asad J; Alvi, Mohammed Ali; Nassr, Ahmad; Currier, Bradford; Bydon, Mohamad

2017-06-21

Predictive factors associated with increased risk of medical malpractice litigation have been identified, including severity of injury, physician sex, and error in diagnosis. However, there is a paucity of literature investigating informed consent in spinal surgery malpractice. To investigate the failure to obtain informed consent as an allegation in medical malpractice claims for patients undergoing a spinal procedure. In this retrospective cohort study, a national medicolegal database was searched for malpractice claim cases related to spinal surgery for all years available (ie, January 1, 1980, through December 31, 2015). Failure to obtain informed consent and associated medical malpractice case verdict. A total of 233 patients (117 [50.4%] male and 116 [49.8%] female; 80 with no informed consent allegation and 153 who cited lack of informed consent) who underwent spinal surgery and filed a malpractice claim were studied (mean [SD] age, 47.1 [13.1] years in the total group, 45.8 [12.9] years in the control group, and 47.9 [13.3] years in the informed consent group). Median interval between year of surgery and year of verdict was 5.4 years (interquartile range, 4-7 years). The most common informed consent allegations were failure to explain risks and adverse effects of surgery (52 [30.4%]) and failure to explain alternative treatment options (17 [9.9%]). In bivariate analysis, patients in the control group were more likely to require additional surgery (45 [56.3%] vs 53 [34.6%], P = .002) and have more permanent injuries compared with the informed consent group (46 [57.5%] vs 63 [42.0%], P = .03). On multivariable regression analysis, permanent injuries were more often associated with indemnity payment after a plaintiff verdict (odds ratio [OR], 3.12; 95% CI, 1.46-6.65; P = .003) or a settlement (OR, 6.26; 95% CI, 1.06-36.70; P = .04). Informed consent allegations were significantly associated with less severe (temporary or emotional) injury (OR
Analysis of the Quality of Information Obtained About Uterine Artery Embolization From the Internet

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tavare, Aniket N.; Alsafi, Ali, E-mail: ali.alsafi03@imperial.ac.uk; Hamady, Mohamad S.

Purpose: The Internet is widely used by patients to source health care-related information. We sought to analyse the quality of information available on the Internet about uterine artery embolization (UAE). Materials and Methods: We searched three major search engines for the phrase 'uterine artery embolization' and compiled the top 50 results from each engine. After excluding repeated sites, scientific articles, and links to documents, the remaining 50 sites were assessed using the LIDA instrument, which scores sites across the domains of accessibility, usability, and reliability. The Fleisch reading ease score (FRES) was calculated for each of the sites. Finally, wemore » checked the country of origin and the presence of certification by the Health On the Net Foundation (HONcode) as well as their effect on LIDA and FRES scores.ResultsThe following mean scores were obtained: accessibility 48/60 (80%), usability 42/54 (77%), reliability 20/51 (39%), total LIDA 110/165 (67%), and FRES 42/100 (42%). Nine sites had HONcode certification, and this was associated with significantly greater (p < 0.05) reliability and total LIDA and FRES scores. When comparing sites between United Kingdom and United States, there was marked variation in the quality of results obtained when searching for information on UAE (p < 0.05). Conclusion: In general, sites were well designed and easy to use. However, many scored poorly on the reliability of their information either because they were produced in a non-evidence-based way or because they lacking currency. It is important that patients are guided to reputable, location-specific sources of information online, especially because prominent search engine rank does not guarantee reliability of information.« less
Understanding of how older adults with low vision obtain, process, and understand health information and services.

PubMed

Kim, Hyung Nam

2017-10-16

Twenty-five years after the Americans with Disabilities Act, there has still been a lack of advancement of accessibility in healthcare for people with visual impairments, particularly older adults with low vision. This study aims to advance understanding of how older adults with low vision obtain, process, and use health information and services, and to seek opportunities of information technology to support them. A convenience sample of 10 older adults with low vision participated in semi-structured phone interviews, which were audio-recorded and transcribed verbatim for analysis. Participants shared various concerns in accessing, understanding, and using health information, care services, and multimedia technologies. Two main themes and nine subthemes emerged from the analysis. Due to the concerns, older adults with low vision tended to fail to obtain the full range of all health information and services to meet their specific needs. Those with low vision still rely on residual vision such that multimedia-based information which can be useful, but it should still be designed to ensure its accessibility, usability, and understandability.
Critical assessment of pediatric neurosurgery patient/parent educational information obtained via the Internet.

PubMed

Garcia, Michael; Daugherty, Christopher; Ben Khallouq, Bertha; Maugans, Todd

2018-05-01

OBJECTIVE The Internet is used frequently by patients and family members to acquire information about pediatric neurosurgical conditions. The sources, nature, accuracy, and usefulness of this information have not been examined recently. The authors analyzed the results from searches of 10 common pediatric neurosurgical terms using a novel scoring test to assess the value of the educational information obtained. METHODS Google and Bing searches were performed for 10 common pediatric neurosurgical topics (concussion, craniosynostosis, hydrocephalus, pediatric brain tumor, pediatric Chiari malformation, pediatric epilepsy surgery, pediatric neurosurgery, plagiocephaly, spina bifida, and tethered spinal cord). The first 10 "hits" obtained with each search engine were analyzed using the Currency, Relevance, Authority, Accuracy, and Purpose (CRAAP) test, which assigns a numerical score in each of 5 domains. Agreement between results was assessed for 1) concurrent searches with Google and Bing; 2) Google searches over time (6 months apart); 3) Google searches using mobile and PC platforms concurrently; and 4) searches using privacy settings. Readability was assessed with an online analytical tool. RESULTS Google and Bing searches yielded information with similar CRAAP scores (mean 72% and 75%, respectively), but with frequently differing results (58% concordance/matching results). There was a high level of agreement (72% concordance) over time for Google searches and also between searches using general and privacy settings (92% concordance). Government sources scored the best in both CRAAP score and readability. Hospitals and universities were the most prevalent sources, but these sources had the lowest CRAAP scores, due in part to an abundance of self-marketing. The CRAAP scores for mobile and desktop platforms did not differ significantly (p = 0.49). CONCLUSIONS Google and Bing searches yielded useful educational information, using either mobile or PC platforms. Most
SequenceCEROSENE: a computational method and web server to visualize spatial residue neighborhoods at the sequence level.

PubMed

Heinke, Florian; Bittrich, Sebastian; Kaiser, Florian; Labudde, Dirk

2016-01-01

To understand the molecular function of biopolymers, studying their structural characteristics is of central importance. Graphics programs are often utilized to conceive these properties, but with the increasing number of available structures in databases or structure models produced by automated modeling frameworks this process requires assistance from tools that allow automated structure visualization. In this paper a web server and its underlying method for generating graphical sequence representations of molecular structures is presented. The method, called SequenceCEROSENE (color encoding of residues obtained by spatial neighborhood embedding), retrieves the sequence of each amino acid or nucleotide chain in a given structure and produces a color coding for each residue based on three-dimensional structure information. From this, color-highlighted sequences are obtained, where residue coloring represent three-dimensional residue locations in the structure. This color encoding thus provides a one-dimensional representation, from which spatial interactions, proximity and relations between residues or entire chains can be deduced quickly and solely from color similarity. Furthermore, additional heteroatoms and chemical compounds bound to the structure, like ligands or coenzymes, are processed and reported as well. To provide free access to SequenceCEROSENE, a web server has been implemented that allows generating color codings for structures deposited in the Protein Data Bank or structure models uploaded by the user. Besides retrieving visualizations in popular graphic formats, underlying raw data can be downloaded as well. In addition, the server provides user interactivity with generated visualizations and the three-dimensional structure in question. Color encoded sequences generated by SequenceCEROSENE can aid to quickly perceive the general characteristics of a structure of interest (or entire sets of complexes), thus supporting the researcher in the initial
Application of next generation sequencing in clinical microbiology and infection prevention.

PubMed

Deurenberg, Ruud H; Bathoorn, Erik; Chlebowicz, Monika A; Couto, Natacha; Ferdous, Mithila; García-Cobos, Silvia; Kooistra-Smid, Anna M D; Raangs, Erwin C; Rosema, Sigrid; Veloo, Alida C M; Zhou, Kai; Friedrich, Alexander W; Rossen, John W A

2017-02-10

Current molecular diagnostics of human pathogens provide limited information that is often not sufficient for outbreak and transmission investigation. Next generation sequencing (NGS) determines the DNA sequence of a complete bacterial genome in a single sequence run, and from these data, information on resistance and virulence, as well as information for typing is obtained, useful for outbreak investigation. The obtained genome data can be further used for the development of an outbreak-specific screening test. In this review, a general introduction to NGS is presented, including the library preparation and the major characteristics of the most common NGS platforms, such as the MiSeq (Illumina) and the Ion PGM™ (ThermoFisher). An overview of the software used for NGS data analyses used at the medical microbiology diagnostic laboratory in the University Medical Center Groningen in The Netherlands is given. Furthermore, applications of NGS in the clinical setting are described, such as outbreak management, molecular case finding, characterization and surveillance of pathogens, rapid identification of bacteria using the 16S-23S rRNA region, taxonomy, metagenomics approaches on clinical samples, and the determination of the transmission of zoonotic micro-organisms from animals to humans. Finally, we share our vision on the use of NGS in personalised microbiology in the near future, pointing out specific requirements. Copyright © 2016 The Author(s). Published by Elsevier B.V. All rights reserved.
The recurrence sequences via Sylvester matrices

NASA Astrophysics Data System (ADS)

Karaduman, Erdal; Deveci, Ömür

2017-07-01

In this work, we define the Pell-Jacobsthal-Slyvester sequence and the Jacobsthal-Pell-Slyvester sequence by using the Slyvester matrices which are obtained from the characteristic polynomials of the Pell and Jacobsthal sequences and then, we study the sequences defined modulo m. Also, we obtain the cyclic groups and the semigroups from the generating matrices of these sequences when read modulo m and then, we derive the relationships among the orders of the cyclic groups and the periods of the sequences. Furthermore, we redefine Pell-Jacobsthal-Slyvester sequence and the Jacobsthal-Pell-Slyvester sequence by means of the elements of the groups and then, we examine them in the finite groups.
Through Increasing "Information Literacy" Capital and Habitus (Agency): The Complementary Impact on Composition Skills When Appropriately Sequenced

ERIC Educational Resources Information Center

Karas, Timothy

2017-01-01

Through a case study approach of a cohort of community college students at a single community college, the impact on success rates in composition courses was analyzed based on the sequence of completing an information literacy course. Two student cohorts were sampled based on completing an information literacy course prior to, or concurrently with…

Necessary Sequencing Depth and Clustering Method to Obtain Relatively Stable Diversity Patterns in Studying Fish Gut Microbiota.

PubMed

Xiao, Fanshu; Yu, Yuhe; Li, Jinjin; Juneau, Philippe; Yan, Qingyun

2018-05-25

The 16S rRNA gene is one of the most commonly used molecular markers for estimating bacterial diversity during the past decades. However, there is no consistency about the sequencing depth (from thousand to millions of sequences per sample), and the clustering methods used to generate OTUs may also be different among studies. These inconsistent premises make effective comparisons among studies difficult or unreliable. This study aims to examine the necessary sequencing depth and clustering method that would be needed to ensure a stable diversity patterns for studying fish gut microbiota. A total number of 42 samples dataset of Siniperca chuatsi (carnivorous fish) gut microbiota were used to test how the sequencing depth and clustering may affect the alpha and beta diversity patterns of fish intestinal microbiota. Interestingly, we found that the sequencing depth (resampling 1000-11,000 per sample) and the clustering methods (UPARSE and UCLUST) did not bias the estimates of the diversity patterns during the fish development from larva to adult. Although we should acknowledge that a suitable sequencing depth may differ case by case, our finding indicates that a shallow sequencing such as 1000 sequences per sample may be also enough to reflect the general diversity patterns of fish gut microbiota. However, we have shown in the present study that strict pre-processing of the original sequences is required to ensure reliable results. This study provides evidences to help making a strong scientific choice of the sequencing depth and clustering method for future studies on fish gut microbiota patterns, but at the same time reducing as much as possible the costs related to the analysis.
Haplotype estimation using sequencing reads.

PubMed

Delaneau, Olivier; Howie, Bryan; Cox, Anthony J; Zagury, Jean-François; Marchini, Jonathan

2013-10-03

High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Transforming Legacy Systems to Obtain Information Superiority

DTIC Science & Technology

2001-01-01

is imperative that innovative technologies be developed to enable legacy weapon systems to exploit the information revolution, achieve information ... dominance , and meet the required operational tempo. This paper presents an embedded-system architecture, open system middleware services, and a software
Sequence analysis of laci mutations obtained from lung cells of radon-exposed big blue{trademark} transgenic mice

DOE Office of Scientific and Technical Information (OSTI.GOV)

Layton, A.D.; Cross, F.T.; Steigler, G.L.

1994-12-31

We have exposed Big Blue{trademark} transgenic mice by inhalation to 320, 640 and 960 Working Level Months (WLM) of radon progeny. Mice were sacrificed after 3, 6 and 9 days; the time periods required to obtain the exposures. Control mice were also sacrificed at each time interval. In each case all tissues were excised, flash frozen in liquid nitrogen, and stored at -80{degrees}C for further analysis. Twelve lacI mutations have been isolated from the lung tissue of a mouse from the 960-WLM exposure group; the lacI genes from these mutants have been sequenced. Sequence data indicate that three of themore » mutants have a C;G deletion at BP 978 and are possibly clonal in origin. Two mutants have multiple events within the gene: one has a an A:T to C:G transversion and a C:G insertion separated by 291 BPs; the second has a G:C to A:T transition as well as an A:T deletion followed by 6 base pairs downstream by a T:A insertion. Other mutations include a single G:C to A:T transition, a two base pair deletion, and a C:G to T:A transition. Mutant plaques are being evaluated from individual mice at other dose levels. Time course experiments are also planned. These studies will help define the molecular fine structure of mutations induced by high-LET radiation exposure.« less
Development of a Method to Obtain More Accurate General and Oral Health Related Information Retrospectively

PubMed Central

A, Golkari; A, Sabokseir; D, Blane; A, Sheiham; RG, Watt

2017-01-01

Statement of Problem: Early childhood is a crucial period of life as it affects one’s future health. However, precise data on adverse events during this period is usually hard to access or collect, especially in developing countries. Objectives: This paper first reviews the existing methods for retrospective data collection in health and social sciences, and then introduces a new method/tool for obtaining more accurate general and oral health related information from early childhood retrospectively. Materials and Methods: The Early Childhood Events Life-Grid (ECEL) was developed to collect information on the type and time of health-related adverse events during the early years of life, by questioning the parents. The validity of ECEL and the accuracy of information obtained by this method were assessed in a pilot study and in a main study of 30 parents of 8 to 11 year old children from Shiraz (Iran). Responses obtained from parents using the final ECEL were compared with the recorded health insurance documents. Results: There was an almost perfect agreement between the health insurance and ECEL data sets (Kappa value=0.95 and p < 0.001). Interviewees remembered the important events more accurately (100% exact timing match in case of hospitalization). Conclusions: The Early Childhood Events Life-Grid method proved to be highly accurate when compared with recorded medical documents. PMID:28959773
NetTurnP – Neural Network Prediction of Beta-turns by Use of Evolutionary Information and Predicted Protein Sequence Features

PubMed Central

Petersen, Bent; Lundegaard, Claus; Petersen, Thomas Nordahl

2010-01-01

β-turns are the most common type of non-repetitive structures, and constitute on average 25% of the amino acids in proteins. The formation of β-turns plays an important role in protein folding, protein stability and molecular recognition processes. In this work we present the neural network method NetTurnP, for prediction of two-class β-turns and prediction of the individual β-turn types, by use of evolutionary information and predicted protein sequence features. It has been evaluated against a commonly used dataset BT426, and achieves a Matthews correlation coefficient of 0.50, which is the highest reported performance on a two-class prediction of β-turn and not-β-turn. Furthermore NetTurnP shows improved performance on some of the specific β-turn types. In the present work, neural network methods have been trained to predict β-turn or not and individual β-turn types from the primary amino acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non-homologous sequences known as BT426. Our two-class prediction method achieves a performance of: MCC = 0.50, Qtotal = 82.1%, sensitivity = 75.6%, PPV = 68.8% and AUC = 0.864. We have compared our performance to eleven other prediction methods that obtain Matthews correlation coefficients in the range of 0.17 – 0.47. For the type specific β-turn predictions, only type I and II can be predicted with reasonable Matthews correlation coefficients, where we obtain performance values of 0.36 and 0.31, respectively. Conclusion The NetTurnP method has been implemented as a webserver, which is freely available at http://www.cbs.dtu.dk/services/NetTurnP/. NetTurnP is the only available webserver that allows submission of multiple sequences. PMID:21152409
[The use of automated processing of information obtained during space flights for the monitoring and evaluation of airborne pollution].

PubMed

Bagmanov, B Kh; Mikhaĭlova, A Iu; Pavlov, S V

1997-01-01

The article describes experience on use of automated processing of information obtained during spaceflights for analysis of urban air pollution. The authors present a method for processing of information obtained during spaceflights and show how to identify foci of industrial release and area of their spread within and beyond the cities.
24 CFR 960.205 - Drug use by applicants: Obtaining information from drug treatment facility.

Code of Federal Regulations, 2012 CFR

2012-04-01

.... This section addresses a PHA's authority to request and obtain information from drug abuse treatment... household member. (2) Drug abuse treatment facility. An entity: (i) That holds itself out as providing, and... consent forms signed by such household member that: (i) Requests any drug abuse treatment facility to...
24 CFR 960.205 - Drug use by applicants: Obtaining information from drug treatment facility.

Code of Federal Regulations, 2011 CFR

2011-04-01

.... This section addresses a PHA's authority to request and obtain information from drug abuse treatment... household member. (2) Drug abuse treatment facility. An entity: (i) That holds itself out as providing, and... consent forms signed by such household member that: (i) Requests any drug abuse treatment facility to...
24 CFR 960.205 - Drug use by applicants: Obtaining information from drug treatment facility.

Code of Federal Regulations, 2010 CFR

2010-04-01

.... This section addresses a PHA's authority to request and obtain information from drug abuse treatment... household member. (2) Drug abuse treatment facility. An entity: (i) That holds itself out as providing, and... consent forms signed by such household member that: (i) Requests any drug abuse treatment facility to...
24 CFR 960.205 - Drug use by applicants: Obtaining information from drug treatment facility.

Code of Federal Regulations, 2013 CFR

2013-04-01

.... This section addresses a PHA's authority to request and obtain information from drug abuse treatment... household member. (2) Drug abuse treatment facility. An entity: (i) That holds itself out as providing, and... consent forms signed by such household member that: (i) Requests any drug abuse treatment facility to...
24 CFR 960.205 - Drug use by applicants: Obtaining information from drug treatment facility.

Code of Federal Regulations, 2014 CFR

2014-04-01

.... This section addresses a PHA's authority to request and obtain information from drug abuse treatment... household member. (2) Drug abuse treatment facility. An entity: (i) That holds itself out as providing, and... consent forms signed by such household member that: (i) Requests any drug abuse treatment facility to...
Use of social media and internet to obtain health information by rural adolescent mothers.

PubMed

Logsdon, M Cynthia; Mittelberg, Meghan; Myers, John

2015-02-01

Adolescent mothers residing in rural areas need accurate health information to care for themselves and their babies. The purpose of this study was to determine the use of social media and Internet by adolescent mothers residing in rural areas, particularly in regard to obtaining health information. Using a cross-sectional design, a convenience sample of adolescent mothers living in a rural county in a state located in the southern U.S. (n = 15), completed the Pew Internet Survey during home visits with nurses from a community health agency. All adolescent mothers accessed Internet using cell phones (93%) or computers (100%). Many adolescent mothers sent or received over 50 text messages per day. Thirty-three percent of adolescent mothers searched for health information on the Internet every few weeks; 27% received health information from Facebook. Communication of health information using the Internet and social media may be effective with adolescent mothers residing in rural areas. Copyright © 2014 Elsevier Inc. All rights reserved.
Informational structure of genetic sequences and nature of gene splicing

NASA Astrophysics Data System (ADS)

Trifonov, E. N.

1991-10-01

Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.
40 CFR 1515.10 - What information is available, and how can it be obtained?

Code of Federal Regulations, 2010 CFR

2010-07-01

... can it be obtained? 1515.10 Section 1515.10 Protection of Environment COUNCIL ON ENVIRONMENTAL QUALITY... Council on Environmental Quality will permit copying of any available material but will reserve the right... the Freedom of Information Act as amended (5 U.S.C. 552(b)). (c) The legislative history of the...
A proposed clinical decision support architecture capable of supporting whole genome sequence information.

PubMed

Welch, Brandon M; Loya, Salvador Rodriguez; Eilbeck, Karen; Kawamoto, Kensaku

2014-04-04

Whole genome sequence (WGS) information may soon be widely available to help clinicians personalize the care and treatment of patients. However, considerable barriers exist, which may hinder the effective utilization of WGS information in a routine clinical care setting. Clinical decision support (CDS) offers a potential solution to overcome such barriers and to facilitate the effective use of WGS information in the clinic. However, genomic information is complex and will require significant considerations when developing CDS capabilities. As such, this manuscript lays out a conceptual framework for a CDS architecture designed to deliver WGS-guided CDS within the clinical workflow. To handle the complexity and breadth of WGS information, the proposed CDS framework leverages service-oriented capabilities and orchestrates the interaction of several independently-managed components. These independently-managed components include the genome variant knowledge base, the genome database, the CDS knowledge base, a CDS controller and the electronic health record (EHR). A key design feature is that genome data can be stored separately from the EHR. This paper describes in detail: (1) each component of the architecture; (2) the interaction of the components; and (3) how the architecture attempts to overcome the challenges associated with WGS information. We believe that service-oriented CDS capabilities will be essential to using WGS information for personalized medicine.
Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

PubMed

Bastien, Olivier; Maréchal, Eric

2008-08-07

Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the
Archean metamorphic sequence and surfaces, Kangerdlugssuaq Fjord, East Greenland

NASA Technical Reports Server (NTRS)

Kays, M. A.

1986-01-01

The characteristics of Archean metamorphic surfaces and fabrics of a mapped sequence of rocks older than about 3000 Ma provide information basic to an understanding of the structural evolution and metamorphic history in Kangerdlugssuaq Fjord, east Greenland. This information and the additional results of petrologic and geochemical studies have culminated in an extended chronology of Archean plutonic, metamorphic, and tectonic events. The basis for the chronology is considered, especially the nature of the metamorphic fabrics and surfaces in the Archean sequence. The surfaces, which are planar mineral parageneses, may prove to be mappable outside Kangerdlugssuaq Fjord, and if so, will be helpful in extending the events that they represent to other Archean sequences in east Greenland. The surfaces will become especially important reference planes if the absolute ages of their metamorphic assemblages can be determined in at least one location where strain was low subsequent to their recrystallization. Once an isochron is obtained, the dynamothermal age of the regionally identifiable metamorphic surface is determined everywhere it can be mapped.
Clinical genomics information management software linking cancer genome sequence and clinical decisions.

PubMed

Watt, Stuart; Jiao, Wei; Brown, Andrew M K; Petrocelli, Teresa; Tran, Ben; Zhang, Tong; McPherson, John D; Kamel-Reid, Suzanne; Bedard, Philippe L; Onetto, Nicole; Hudson, Thomas J; Dancey, Janet; Siu, Lillian L; Stein, Lincoln; Ferretti, Vincent

2013-09-01

Using sequencing information to guide clinical decision-making requires coordination of a diverse set of people and activities. In clinical genomics, the process typically includes sample acquisition, template preparation, genome data generation, analysis to identify and confirm variant alleles, interpretation of clinical significance, and reporting to clinicians. We describe a software application developed within a clinical genomics study, to support this entire process. The software application tracks patients, samples, genomic results, decisions and reports across the cohort, monitors progress and sends reminders, and works alongside an electronic data capture system for the trial's clinical and genomic data. It incorporates systems to read, store, analyze and consolidate sequencing results from multiple technologies, and provides a curated knowledge base of tumor mutation frequency (from the COSMIC database) annotated with clinical significance and drug sensitivity to generate reports for clinicians. By supporting the entire process, the application provides deep support for clinical decision making, enabling the generation of relevant guidance in reports for verification by an expert panel prior to forwarding to the treating physician. Copyright © 2013 Elsevier Inc. All rights reserved.
Sequencing of Oligourea Foldamers by Tandem Mass Spectrometry

NASA Astrophysics Data System (ADS)

Bathany, Katell; Owens, Neil W.; Guichard, Gilles; Schmitter, Jean-Marie

2013-03-01

This study is focused on sequence analysis of peptidomimetic helical oligoureas by means of tandem mass spectrometry, to build a basis for de novo sequencing for future high-throughput combinatorial library screening of oligourea foldamers. After the evaluation of MS/MS spectra obtained for model compounds with either MALDI or ESI sources, we found that the MALDI-TOF-TOF instrument gave more satisfactory results. MS/MS spectra of oligoureas generated by decay of singly charged precursor ions show major ion series corresponding to fragmentation across both CO-NH and N'H-CO urea bonds. Oligourea backbones fragment to produce a pattern of a, x, b, and y type fragment ions. De novo decoding of spectral information is facilitated by the occurrence of low mass reporter ions, representative of constitutive monomers, in an analogous manner to the use of immonium ions for peptide sequencing.

Metagenome Sequence Analysis of Filamentous Microbial Communities Obtained from Geochemically Distinct Geothermal Channels Reveals Specialization of Three Aquificales Lineages

PubMed Central

Takacs-Vesbach, Cristina; Inskeep, William P.; Jay, Zackary J.; Herrgard, Markus J.; Rusch, Douglas B.; Tringe, Susannah G.; Kozubal, Mark A.; Hamamura, Natsuko; Macur, Richard E.; Fouke, Bruce W.; Reysenbach, Anna-Louise; McDermott, Timothy R.; Jennings, Ryan deM.; Hengartner, Nicolas W.; Xie, Gary

2013-01-01

The Aquificales are thermophilic microorganisms that inhabit hydrothermal systems worldwide and are considered one of the earliest lineages of the domain Bacteria. We analyzed metagenome sequence obtained from six thermal “filamentous streamer” communities (∼40 Mbp per site), which targeted three different groups of Aquificales found in Yellowstone National Park (YNP). Unassembled metagenome sequence and PCR-amplified 16S rRNA gene libraries revealed that acidic, sulfidic sites were dominated by Hydrogenobaculum (Aquificaceae) populations, whereas the circum-neutral pH (6.5–7.8) sites containing dissolved sulfide were dominated by Sulfurihydrogenibium spp. (Hydrogenothermaceae). Thermocrinis (Aquificaceae) populations were found primarily in the circum-neutral sites with undetectable sulfide, and to a lesser extent in one sulfidic system at pH 8. Phylogenetic analysis of assembled sequence containing 16S rRNA genes as well as conserved protein-encoding genes revealed that the composition and function of these communities varied across geochemical conditions. Each Aquificales lineage contained genes for CO2 fixation by the reverse-TCA cycle, but only the Sulfurihydrogenibium populations perform citrate cleavage using ATP citrate lyase (Acl). The Aquificaceae populations use an alternative pathway catalyzed by two separate enzymes, citryl-CoA synthetase (Ccs), and citryl-CoA lyase (Ccl). All three Aquificales lineages contained evidence of aerobic respiration, albeit due to completely different types of heme Cu oxidases (subunit I) involved in oxygen reduction. The distribution of Aquificales populations and differences among functional genes involved in energy generation and electron transport is consistent with the hypothesis that geochemical parameters (e.g., pH, sulfide, H2, O2) have resulted in niche specialization among members of the Aquificales. PMID:23755042
A Study of Two Instructional Sequences Informed by Alternative Learning Progressions in Genetics

NASA Astrophysics Data System (ADS)

Duncan, Ravit Golan; Choi, Jinnie; Castro-Faix, Moraima; Cavera, Veronica L.

2017-12-01

Learning progressions (LPs) are hypothetical models of how learning in a domain develops over time with appropriate instruction. In the domain of genetics, there are two independently developed alternative LPs. The main difference between the two progressions hinges on their assumptions regarding the accessibility of classical (Mendelian) versus molecular genetics and the order in which they should be taught. In order to determine the relative difficulty of the different genetic ideas included in the two progressions, and to test which one is a better fit with students' actual learning, we developed two modules in classical and molecular genetics and alternated their sequence in an implementation study with 11th grade students studying biology. We developed a set of 56 ordered multiple-choice items that collectively assessed both molecular and classical genetic ideas. We found significant gains in students' learning in both molecular and classical genetics, with the largest gain relating to understanding the informational content of genes and the smallest gain in understanding modes of inheritance. Using multidimensional item response modeling, we found no statistically significant differences between the two instructional sequences. However, there was a trend of slightly higher gains for the molecular-first sequence for all genetic ideas.
Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs

PubMed Central

2014-01-01

Background We introduce Sequence Bundles--a novel data visualisation method for representing multiple sequence alignments (MSAs). We identify and address key limitations of the existing bioinformatics data visualisation methods (i.e. the Sequence Logo) by enabling Sequence Bundles to give salient visual expression to sequence motifs and other data features, which would otherwise remain hidden. Methods For the development of Sequence Bundles we employed research-led information design methodologies. Sequences are encoded as uninterrupted, semi-opaque lines plotted on a 2-dimensional reconfigurable grid. Each line represents a single sequence. The thickness and opacity of the stack at each residue in each position indicates the level of conservation and the lines' curved paths expose patterns in correlation and functionality. Several MSAs can be visualised in a composite image. The Sequence Bundles method is designed to favour a tangible, continuous and intuitive display of information. Results We have developed a software demonstration application for generating a Sequence Bundles visualisation of MSAs provided for the BioVis 2013 redesign contest. A subsequent exploration of the visualised line patterns allowed for the discovery of a number of interesting features in the dataset. Reported features include the extreme conservation of sequences displaying a specific residue and bifurcations of the consensus sequence. Conclusions Sequence Bundles is a novel method for visualisation of MSAs and the discovery of sequence motifs. It can aid in generating new insight and hypothesis making. Sequence Bundles is well disposed for future implementation as an interactive visual analytics software, which can complement existing visualisation tools. PMID:25237395
25 CFR 1000.73 - Once a Tribe/Consortium has been awarded a grant, may the Tribe/Consortium obtain information...

Code of Federal Regulations, 2013 CFR

2013-04-01

... 25 Indians 2 2013-04-01 2013-04-01 false Once a Tribe/Consortium has been awarded a grant, may the Tribe/Consortium obtain information from a non-BIA bureau? 1000.73 Section 1000.73 Indians OFFICE OF THE... § 1000.73 Once a Tribe/Consortium has been awarded a grant, may the Tribe/Consortium obtain information...
RNA-Seq Analysis of Cocos nucifera: Transcriptome Sequencing and De Novo Assembly for Subsequent Functional Genomics Approaches

PubMed Central

Xia, Wei; Mason, Annaliese S.; Xia, Zhihui; Qiao, Fei; Zhao, Songlin; Tang, Haoru

2013-01-01

Background Cocos nucifera (coconut), a member of the Arecaceae family, is an economically important woody palm grown in tropical regions. Despite its agronomic importance, previous germplasm assessment studies have relied solely on morphological and agronomical traits. Molecular biology techniques have been scarcely used in assessment of genetic resources and for improvement of important agronomic and quality traits in Cocos nucifera, mostly due to the absence of available sequence information. Methodology/Principal Findings To provide basic information for molecular breeding and further molecular biological analysis in Cocos nucifera, we applied RNA-seq technology and de novo assembly to gain a global overview of the Cocos nucifera transcriptome from mixed tissue samples. Using Illumina sequencing, we obtained 54.9 million short reads and conducted de novo assembly to obtain 57,304 unigenes with an average length of 752 base pairs. Sequence comparison between assembled unigenes and released cDNA sequences of Cocos nucifera and Elaeis guineensis indicated that the assembled sequences were of high quality. Approximately 99.9% of unigenes were novel compared to the released coconut EST sequences. Using BLASTX, 68.2% of unigenes were successfully annotated based on the Genbank non-redundant (Nr) protein database. The annotated unigenes were then further classified using the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Conclusions/Significance Our study provides a large quantity of novel genetic information for Cocos nucifera. This information will act as a valuable resource for further molecular genetic studies and breeding in coconut, as well as for isolation and characterization of functional genes involved in different biochemical pathways in this important tropical crop species. PMID:23555859
RNA-Seq analysis of Cocos nucifera: transcriptome sequencing and de novo assembly for subsequent functional genomics approaches.

PubMed

Fan, Haikuo; Xiao, Yong; Yang, Yaodong; Xia, Wei; Mason, Annaliese S; Xia, Zhihui; Qiao, Fei; Zhao, Songlin; Tang, Haoru

2013-01-01

Cocos nucifera (coconut), a member of the Arecaceae family, is an economically important woody palm grown in tropical regions. Despite its agronomic importance, previous germplasm assessment studies have relied solely on morphological and agronomical traits. Molecular biology techniques have been scarcely used in assessment of genetic resources and for improvement of important agronomic and quality traits in Cocos nucifera, mostly due to the absence of available sequence information. To provide basic information for molecular breeding and further molecular biological analysis in Cocos nucifera, we applied RNA-seq technology and de novo assembly to gain a global overview of the Cocos nucifera transcriptome from mixed tissue samples. Using Illumina sequencing, we obtained 54.9 million short reads and conducted de novo assembly to obtain 57,304 unigenes with an average length of 752 base pairs. Sequence comparison between assembled unigenes and released cDNA sequences of Cocos nucifera and Elaeis guineensis indicated that the assembled sequences were of high quality. Approximately 99.9% of unigenes were novel compared to the released coconut EST sequences. Using BLASTX, 68.2% of unigenes were successfully annotated based on the Genbank non-redundant (Nr) protein database. The annotated unigenes were then further classified using the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Our study provides a large quantity of novel genetic information for Cocos nucifera. This information will act as a valuable resource for further molecular genetic studies and breeding in coconut, as well as for isolation and characterization of functional genes involved in different biochemical pathways in this important tropical crop species.
Challenges of Obtaining Informed Consent in Emergency Ward: A Qualitative Study in One Iranian Hospital

PubMed Central

Davoudi, Nayyereh; Nayeri, Nahid Dehghan; Zokaei, Mohammad Saeed; Fazeli, Nematallah

2017-01-01

Background and Objective: Regarding the fact that emergency ward has unique characteristics, whose uniqueness affects informed consent processes by creating specific challenges. Hence, it seems necessary to identify the process and challenges of informed consent in the emergency ward through a qualitative study to understand actual patients’ and health care providers’ experiences, beliefs, values, and feelings about the informed consent in the emergency ward. Through such studies, new insight can be gained on the process of informed consent and its challenges with the hope that the resulting knowledge will enable the promotion of ethical, legal as well as effective health services to the patients in the emergency ward. Method: In this qualitative study, research field was one of the emergency wards of educational and public hospitals in Iran. Field work and participant observation were carried out for 515 hours from June 2014 to March 2016. Also, conversations and semi-structured interviews based on the observations were conducted. The participants of the study were nurses and physicians working in the emergency ward, as well as patients and their attendants who were involved in the process of obtaining informed consent. Results: Three main categories were extracted from the data: a sense of frustration; reverse protection; and culture of paternalism in consent process. Conclusion: Findings of this study can be utilized in correcting the structures and processes of obtaining informed consent together with promotion of patients' ethical and legal care in emergency ward. In this way, the approaches in consent process will be changed from paternalistic approach to patient-centered care which concomitantly protects patient’s autonomy. PMID:29399235
The quest for rare variants: pooled multiplexed next generation sequencing in plants.

PubMed

Marroni, Fabio; Pinosio, Sara; Morgante, Michele

2012-01-01

Next generation sequencing (NGS) instruments produce an unprecedented amount of sequence data at contained costs. This gives researchers the possibility of designing studies with adequate power to identify rare variants at a fraction of the economic and labor resources required by individual Sanger sequencing. As of today, few research groups working in plant sciences have exploited this potentiality, showing that pooled NGS provides results in excellent agreement with those obtained by individual Sanger sequencing. The aim of this review is to convey to the reader the general ideas underlying the use of pooled NGS for the identification of rare variants. To facilitate a thorough understanding of the possibilities of the method, we will explain in detail the possible experimental and analytical approaches and discuss their advantages and disadvantages. We will show that information on allele frequency obtained by pooled NGS can be used to accurately compute basic population genetics indexes such as allele frequency, nucleotide diversity, and Tajima's D. Finally, we will discuss applications and future perspectives of the multiplexed NGS approach.
Multiplexed fragaria chloroplast genome sequencing

Treesearch

W. Njuguna; A. Liston; R. Cronn; N.V. Bassil

2010-01-01

A method to sequence multiple chloroplast genomes using ultra high throughput sequencing technologies was recently described. Complete chloroplast genome sequences can resolve phylogenetic relationships at low taxonomic levels and identify informative point mutations and indels. The objective of this research was to sequence multiple Fragaria...
Biological sequence compression algorithms.

PubMed

Matsumoto, T; Sadakane, K; Imai, H

2000-01-01

Today, more and more DNA sequences are becoming available. The information about DNA sequences are stored in molecular biology databases. The size and importance of these databases will be bigger and bigger in the future, therefore this information must be stored or communicated efficiently. Furthermore, sequence compression can be used to define similarities between biological sequences. The standard compression algorithms such as gzip or compress cannot compress DNA sequences, but only expand them in size. On the other hand, CTW (Context Tree Weighting Method) can compress DNA sequences less than two bits per symbol. These algorithms do not use special structures of biological sequences. Two characteristic structures of DNA sequences are known. One is called palindromes or reverse complements and the other structure is approximate repeats. Several specific algorithms for DNA sequences that use these structures can compress them less than two bits per symbol. In this paper, we improve the CTW so that characteristic structures of DNA sequences are available. Before encoding the next symbol, the algorithm searches an approximate repeat and palindrome using hash and dynamic programming. If there is a palindrome or an approximate repeat with enough length then our algorithm represents it with length and distance. By using this preprocessing, a new program achieves a little higher compression ratio than that of existing DNA-oriented compression algorithms. We also describe new compression algorithm for protein sequences.
Dual-pathway multi-echo sequence for simultaneous frequency and T2 mapping

NASA Astrophysics Data System (ADS)

Cheng, Cheng-Chieh; Mei, Chang-Sheng; Duryea, Jeffrey; Chung, Hsiao-Wen; Chao, Tzu-Cheng; Panych, Lawrence P.; Madore, Bruno

2016-04-01

Purpose: To present a dual-pathway multi-echo steady state sequence and reconstruction algorithm to capture T2, T2∗ and field map information. Methods: Typically, pulse sequences based on spin echoes are needed for T2 mapping while gradient echoes are needed for field mapping, making it difficult to jointly acquire both types of information. A dual-pathway multi-echo pulse sequence is employed here to generate T2 and field maps from the same acquired data. The approach might be used, for example, to obtain both thermometry and tissue damage information during thermal therapies, or susceptibility and T2 information from a same head scan, or to generate bonus T2 maps during a knee scan. Results: Quantitative T2, T2∗ and field maps were generated in gel phantoms, ex vivo bovine muscle, and twelve volunteers. T2 results were validated against a spin-echo reference standard: A linear regression based on ROI analysis in phantoms provided close agreement (slope/R2 = 0.99/0.998). A pixel-wise in vivo Bland-Altman analysis of R2 = 1/T2 showed a bias of 0.034 Hz (about 0.3%), as averaged over four volunteers. Ex vivo results, with and without motion, suggested that tissue damage detection based on T2 rather than temperature-dose measurements might prove more robust to motion. Conclusion: T2, T2∗ and field maps were obtained simultaneously, from the same datasets, in thermometry, susceptibility-weighted imaging and knee-imaging contexts.
A Proposed Clinical Decision Support Architecture Capable of Supporting Whole Genome Sequence Information

PubMed Central

Welch, Brandon M.; Rodriguez Loya, Salvador; Eilbeck, Karen; Kawamoto, Kensaku

2014-01-01

Whole genome sequence (WGS) information may soon be widely available to help clinicians personalize the care and treatment of patients. However, considerable barriers exist, which may hinder the effective utilization of WGS information in a routine clinical care setting. Clinical decision support (CDS) offers a potential solution to overcome such barriers and to facilitate the effective use of WGS information in the clinic. However, genomic information is complex and will require significant considerations when developing CDS capabilities. As such, this manuscript lays out a conceptual framework for a CDS architecture designed to deliver WGS-guided CDS within the clinical workflow. To handle the complexity and breadth of WGS information, the proposed CDS framework leverages service-oriented capabilities and orchestrates the interaction of several independently-managed components. These independently-managed components include the genome variant knowledge base, the genome database, the CDS knowledge base, a CDS controller and the electronic health record (EHR). A key design feature is that genome data can be stored separately from the EHR. This paper describes in detail: (1) each component of the architecture; (2) the interaction of the components; and (3) how the architecture attempts to overcome the challenges associated with WGS information. We believe that service-oriented CDS capabilities will be essential to using WGS information for personalized medicine. PMID:25411644
Professionally Responsible Disclosure of Genomic Sequencing Results in Pediatric Practice

PubMed Central

Brothers, Kyle B.; Chung, Wendy K.; Joffe, Steven; Koenig, Barbara A.; Wilfond, Benjamin; Yu, Joon-Ho

2015-01-01

Genomic sequencing is being rapidly introduced into pediatric clinical practice. The results of sequencing are distinctive for their complexity and subsequent challenges of interpretation for generalist and specialist pediatricians, parents, and patients. Pediatricians therefore need to prepare for the professionally responsible disclosure of sequencing results to parents and patients and guidance of parents and patients in the interpretation and use of these results, including managing uncertain data. This article provides an ethical framework to guide and evaluate the professionally responsible disclosure of the results of genomic sequencing in pediatric practice. The ethical framework comprises 3 core concepts of pediatric ethics: the best interests of the child standard, parental surrogate decision-making, and pediatric assent. When recommending sequencing, pediatricians should explain the nature of the proposed test, its scope and complexity, the categories of results, and the concept of a secondary or incidental finding. Pediatricians should obtain the informed permission of parents and the assent of mature adolescents about the scope of sequencing to be performed and the return of results. PMID:26371191
An exploration of strategies used by older people to obtain information about health- and social care services in the community.

PubMed

Mc Grath, Margaret; Clancy, Kathleen; Kenny, Anne

2016-10-01

To explore the strategies used by older people living in Ireland to obtain information about community health and social services. A qualitative exploratory design was used. Focus groups (n = 3) were conducted with community dwelling older people (n = 17). A series of vignettes were used to guide discussion regarding hypothetical situations that approximated real-life scenarios for older people. Data were transcribed verbatim and analysed using content analysis. Obtaining information about community health and social services is an ongoing process that requires continuous commitment by older adults. Key strategies which emerged from the data included (i) taking a proactive stance towards accessing health information, (ii) making use of personal networks in your community and (iii) developing 'insider' knowledge. Older people in this study had a proactive approach to obtaining health information and identified the importance of taking responsibility for managing their own needs. Despite this, obtaining basic information about community health and social services was a challenging and time-consuming process. Future research should focus on developing health literacy interventions that build upon and expand the strategies currently used by older people. © 2015 The Authors. Health Expectations published by John Wiley & Sons Ltd.
Reprint of "Application of next generation sequencing in clinical microbiology and infection prevention".

PubMed

Deurenberg, Ruud H; Bathoorn, Erik; Chlebowicz, Monika A; Couto, Natacha; Ferdous, Mithila; García-Cobos, Silvia; Kooistra-Smid, Anna M D; Raangs, Erwin C; Rosema, Sigrid; Veloo, Alida C M; Zhou, Kai; Friedrich, Alexander W; Rossen, John W A

2017-05-20

Current molecular diagnostics of human pathogens provide limited information that is often not sufficient for outbreak and transmission investigation. Next generation sequencing (NGS) determines the DNA sequence of a complete bacterial genome in a single sequence run, and from these data, information on resistance and virulence, as well as information for typing is obtained, useful for outbreak investigation. The obtained genome data can be further used for the development of an outbreak-specific screening test. In this review, a general introduction to NGS is presented, including the library preparation and the major characteristics of the most common NGS platforms, such as the MiSeq (Illumina) and the Ion PGM™ (ThermoFisher). An overview of the software used for NGS data analyses used at the medical microbiology diagnostic laboratory in the University Medical Center Groningen in The Netherlands is given. Furthermore, applications of NGS in the clinical setting are described, such as outbreak management, molecular case finding, characterization and surveillance of pathogens, rapid identification of bacteria using the 16S-23S rRNA region, taxonomy, metagenomics approaches on clinical samples, and the determination of the transmission of zoonotic micro-organisms from animals to humans. Finally, we share our vision on the use of NGS in personalised microbiology in the near future, pointing out specific requirements. Copyright © 2017. Published by Elsevier B.V.
The Survey of Fires in Buildings. Third Report: The Use of Information Obtained From Fire Surveys

NASA Technical Reports Server (NTRS)

Silcock, A.

1973-01-01

The previous two reports in this series gave details of the general. scope of the pilot exercise and methods by which it was carried out. In addition the nature of the information obtained was illustrated by preliminary analyses of the house and industrial fires surveyed. Some brief comments on the use of the information were made. This report indicates a method of assessing the nation wide effects of applying conclusions drawn from the results of limited numbers of surveys and considers the use of the information for specific purposes.
Molecular Diagnosis of Orthopedic-Device-Related Infection Directly from Sonication Fluid by Metagenomic Sequencing

PubMed Central

Sanderson, Nicholas D.; Atkins, Bridget L.; Brent, Andrew J.; Cole, Kevin; Foster, Dona; McNally, Martin A.; Oakley, Sarah; Peto, Leon; Taylor, Adrian; Peto, Tim E. A.; Crook, Derrick W.; Eyre, David W.

2017-01-01

ABSTRACT Culture of multiple periprosthetic tissue samples is the current gold standard for microbiological diagnosis of prosthetic joint infections (PJI). Additional diagnostic information may be obtained through culture of sonication fluid from explants. However, current techniques can have relatively low sensitivity, with prior antimicrobial therapy and infection by fastidious organisms influencing results. We assessed if metagenomic sequencing of total DNA extracts obtained direct from sonication fluid can provide an alternative rapid and sensitive tool for diagnosis of PJI. We compared metagenomic sequencing with standard aerobic and anaerobic culture in 97 sonication fluid samples from prosthetic joint and other orthopedic device infections. Reads from Illumina MiSeq sequencing were taxonomically classified using Kraken. Using 50 derivation samples, we determined optimal thresholds for the number and proportion of bacterial reads required to identify an infection and confirmed our findings in 47 independent validation samples. Compared to results from sonication fluid culture, the species-level sensitivity of metagenomic sequencing was 61/69 (88%; 95% confidence interval [CI], 77 to 94%; for derivation samples 35/38 [92%; 95% CI, 79 to 98%]; for validation samples, 26/31 [84%; 95% CI, 66 to 95%]), and genus-level sensitivity was 64/69 (93%; 95% CI, 84 to 98%). Species-level specificity, adjusting for plausible fastidious causes of infection, species found in concurrently obtained tissue samples, and prior antibiotics, was 85/97 (88%; 95% CI, 79 to 93%; for derivation samples, 43/50 [86%; 95% CI, 73 to 94%]; for validation samples, 42/47 [89%; 95% CI, 77 to 96%]). High levels of human DNA contamination were seen despite the use of laboratory methods to remove it. Rigorous laboratory good practice was required to minimize bacterial DNA contamination. We demonstrate that metagenomic sequencing can provide accurate diagnostic information in PJI. Our findings
Analysis and functional annotation of expressed sequence tags from the fall armyworm Spodoptera frugiperda

PubMed Central

Deng, Youping; Dong, Yinghua; Thodima, Venkata; Clem, Rollie J; Passarelli, A Lorena

2006-01-01

Background Little is known about the genome sequences of lepidopteran insects, although this group of insects has been studied extensively in the fields of endocrinology, development, immunity, and pathogen-host interactions. In addition, cell lines derived from Spodoptera frugiperda and other lepidopteran insects are routinely used for baculovirus foreign gene expression. This study reports the results of an expressed sequence tag (EST) sequencing project in cells from the lepidopteran insect S. frugiperda, the fall armyworm. Results We have constructed an EST database using two cDNA libraries from the S. frugiperda-derived cell line, SF-21. The database consists of 2,367 ESTs which were assembled into 244 contigs and 951 singlets for a total of 1,195 unique sequences. Conclusion S. frugiperda is an agriculturally important pest insect and genomic information will be instrumental for establishing initial transcriptional profiling and gene function studies, and for obtaining information about genes manipulated during infections by insect pathogens such as baculoviruses. PMID:17052344
Protein Sequence Classification with Improved Extreme Learning Machine Algorithms

PubMed Central

2014-01-01

Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms. PMID:24795876
Draft Genome Sequence of Thermoanaerobacter sp. Strain A7A, Reconstructed from a Metagenome Obtained from a High-Temperature Hydrocarbon Reservoir in the Bass Strait, Australia

PubMed Central

Li, Dongmei; Greenfield, Paul; Rosewarne, Carly P.

2013-01-01

The draft genome sequence of Thermoanaerobacter sp. strain A7A was reconstructed from a metagenome of a microbial consortium obtained from the Tuna oil field in the Gippsland Basin, Australia. The organism is a strict anaerobe that is predicted to ferment a range of simple sugars and undertake sulfur reduction. PMID:24029756

Epidemiological information is key when interpreting whole genome sequence data - lessons learned from a large Legionella pneumophila outbreak in Warstein, Germany, 2013.

PubMed

Petzold, Markus; Prior, Karola; Moran-Gilad, Jacob; Harmsen, Dag; Lück, Christian

2017-11-01

IntroductionWhole genome sequencing (WGS) is increasingly used in Legionnaires' disease (LD) outbreak investigations, owing to its higher resolution than sequence-based typing, the gold standard typing method for Legionella pneumophila, in the analysis of endemic strains. Recently, a gene-by-gene typing approach based on 1,521 core genes called core genome multilocus sequence typing (cgMLST) was described that enables a robust and standardised typing of L. pneumophila . Methods : We applied this cgMLST scheme to isolates obtained during the largest outbreak of LD reported so far in Germany. In this outbreak, the epidemic clone ST345 had been isolated from patients and four different environmental sources. In total 42 clinical and environmental isolates were retrospectively typed. Results : Epidemiologically unrelated ST345 isolates were clearly distinguishable from the epidemic clone. Remarkably, epidemic isolates split up into two distinct clusters, ST345-A and ST345-B, each respectively containing a mix of clinical and epidemiologically-related environmental samples. Discussion/conclusion : The outbreak was therefore likely caused by both variants of the single sequence type, which pre-existed in the environmental reservoirs. The two clusters differed by 40 alleles located in two neighbouring genomic regions of ca 42 and 26 kb. Additional analysis supported horizontal gene transfer of the two regions as responsible for the difference between the variants. Both regions comprise virulence genes and have previously been reported to be involved in recombination events. This corroborates the notion that genomic outbreak investigations should always take epidemiological information into consideration when making inferences. Overall, cgMLST proved helpful in disentangling the complex genomic epidemiology of the outbreak.
The Canterbury Tales: Lessons from the Canterbury Earthquake Sequence to Inform Better Public Communication Models

NASA Astrophysics Data System (ADS)

McBride, S.; Tilley, E. N.; Johnston, D. M.; Becker, J.; Orchiston, C.

2015-12-01

This research evaluates the public education earthquake information prior to the Canterbury Earthquake sequence (2010-present), and examines communication learnings to create recommendations for improvement in implementation for these types of campaigns in future. The research comes from a practitioner perspective of someone who worked on these campaigns in Canterbury prior to the Earthquake Sequence and who also was the Public Information Manager Second in Command during the earthquake response in February 2011. Documents, specifically those addressing seismic risk, that were created prior to the earthquake sequence, were analyzed, using a "best practice matrix" created by the researcher, for how closely these aligned to best practice academic research. Readability tests and word counts are also employed to assist with triangulation of the data as was practitioner involvement. This research also outlines the lessons learned by practitioners and explores their experiences in regards to creating these materials and how they perceive these now, given all that has happened since the inception of the booklets. The findings from the research showed these documents lacked many of the attributes of best practice. The overly long, jargon filled text had little positive outcome expectancy messages. This probably would have failed to persuade anyone that earthquakes were a real threat in Canterbury. Paradoxically, it is likely these booklets may have created fatalism in publics who read the booklets. While the overall intention was positive, for scientists to explain earthquakes, tsunami, landslides and other risks to encourage the public to prepare for these events, the implementation could be greatly improved. This final component of the research highlights points of improvement for implementation for more successful campaigns in future. The importance of preparedness and science information campaigns can be not only in preparing the population but also into development of
Genotyping by sequencing resolves shallow population structure to inform conservation of Chinook salmon (Oncorhynchus tshawytscha)

PubMed Central

Larson, Wesley A; Seeb, Lisa W; Everett, Meredith V; Waples, Ryan K; Templin, William D; Seeb, James E

2014-01-01

Recent advances in population genomics have made it possible to detect previously unidentified structure, obtain more accurate estimates of demographic parameters, and explore adaptive divergence, potentially revolutionizing the way genetic data are used to manage wild populations. Here, we identified 10 944 single-nucleotide polymorphisms using restriction-site-associated DNA (RAD) sequencing to explore population structure, demography, and adaptive divergence in five populations of Chinook salmon (Oncorhynchus tshawytscha) from western Alaska. Patterns of population structure were similar to those of past studies, but our ability to assign individuals back to their region of origin was greatly improved (>90% accuracy for all populations). We also calculated effective size with and without removing physically linked loci identified from a linkage map, a novel method for nonmodel organisms. Estimates of effective size were generally above 1000 and were biased downward when physically linked loci were not removed. Outlier tests based on genetic differentiation identified 733 loci and three genomic regions under putative selection. These markers and genomic regions are excellent candidates for future research and can be used to create high-resolution panels for genetic monitoring and population assignment. This work demonstrates the utility of genomic data to inform conservation in highly exploited species with shallow population structure. PMID:24665338
Reflecting on Earlier Experiences with Unsolicited Findings: Points to Consider for Next-Generation Sequencing and Informed Consent in Diagnostics

PubMed Central

Rigter, Tessel; Henneman, Lidewij; Kristoffersson, Ulf; Hall, Alison; Yntema, Helger G; Borry, Pascal; Tönnies, Holger; Waisfisz, Quinten; Elting, Mariet W; Dondorp, Wybo J; Cornel, Martina C

2013-01-01

High-throughput nucleotide sequencing (often referred to as next-generation sequencing; NGS) is increasingly being chosen as a diagnostic tool for cases of expected but unresolved genetic origin. When exploring a higher number of genetic variants, there is a higher chance of detecting unsolicited findings. The consequential increased need for decisions on disclosure of these unsolicited findings poses a challenge for the informed consent procedure. This article discusses the ethical and practical dilemmas encountered when contemplating informed consent for NGS in diagnostics from a multidisciplinary point of view. By exploring recent similar experiences with unsolicited findings in other settings, an attempt is made to describe what can be learned so far for implementing NGS in standard genetic diagnostics. The article concludes with a set of points to consider in order to guide decision-making on the extent of return of results in relation to the mode of informed consent. We hereby aim to provide a sound basis for developing guidelines for optimizing the informed consent procedure. PMID:23784691
Mining SNPs from EST sequences using filters and ensemble classifiers.

PubMed

Wang, J; Zou, Q; Guo, M Z

2010-05-04

Abundant single nucleotide polymorphisms (SNPs) provide the most complete information for genome-wide association studies. However, due to the bottleneck of manual discovery of putative SNPs and the inaccessibility of the original sequencing reads, it is essential to develop a more efficient and accurate computational method for automated SNP detection. We propose a novel computational method to rapidly find true SNPs in public-available EST (expressed sequence tag) databases; this method is implemented as SNPDigger. EST sequences are clustered and aligned. SNP candidates are then obtained according to a measure of redundant frequency. Several new informative biological features, such as the structural neighbor profiles and the physical position of the SNP, were extracted from EST sequences, and the effectiveness of these features was demonstrated. An ensemble classifier, which employs a carefully selected feature set, was included for the imbalanced training data. The sensitivity and specificity of our method both exceeded 80% for human genetic data in the cross validation. Our method enables detection of SNPs from the user's own EST dataset and can be used on species for which there is no genome data. Our tests showed that this method can effectively guide SNP discovery in ESTs and will be useful to avoid and save the cost of biological analyses.
Clinical sequencing in leukemia with the assistance of artificial intelligence.

PubMed

Tojo, Arinobu

2017-01-01

Next generation sequencing (NGS) of cancer genomes is now becoming a prerequisite for accurate diagnosis and proper treatment in clinical oncology. Because the genomic regions for NGS expand from a certain set of genes to the whole exome or whole genome, the resulting sequence data becomes incredibly enormous and makes it quite laborious to translate the genomic data into medicine, so-called annotation and curation. We organized a clinical sequencing team and established a bidirectional (bed-to-bench and bench-to-bed) system to integrate clinical and genomic data for hematological malignancies. We also started a collaborative research project with IBM Japan to adopt the artificial intelligence Watson for Genomics (WfG) to the pipeline of medical informatics. Genomic DNA was prepared from malignant as well as normal tissues in each patient and subjected to NGS. Sequence data was analyzed using an in-house semi-automated pipeline in combination with WfG, which was used to identify candidate driver mutations and relevant pathways from which applicable drug information was deduced. Currently, we have analyzed more than 150 patients with hematological disorders, including AML and ALL, and obtained many informative findings. In this presentation, I will introduce some of the achievements we have made so far.
7 CFR 1.641 - How may parties obtain discovery of information needed for the case?

Code of Federal Regulations, 2010 CFR

2010-01-01

... 7 Agriculture 1 2010-01-01 2010-01-01 false How may parties obtain discovery of information needed for the case? 1.641 Section 1.641 Agriculture Office of the Secretary of Agriculture ADMINISTRATIVE... legal theories of an attorney. (g) Experts. Unless restricted by the ALJ, a party may discover any facts...
Clinical decision support for whole genome sequence information leveraging a service-oriented architecture: a prototype.

PubMed

Welch, Brandon M; Rodriguez-Loya, Salvador; Eilbeck, Karen; Kawamoto, Kensaku

2014-01-01

Whole genome sequence (WGS) information could soon be routinely available to clinicians to support the personalized care of their patients. At such time, clinical decision support (CDS) integrated into the clinical workflow will likely be necessary to support genome-guided clinical care. Nevertheless, developing CDS capabilities for WGS information presents many unique challenges that need to be overcome for such approaches to be effective. In this manuscript, we describe the development of a prototype CDS system that is capable of providing genome-guided CDS at the point of care and within the clinical workflow. To demonstrate the functionality of this prototype, we implemented a clinical scenario of a hypothetical patient at high risk for Lynch Syndrome based on his genomic information. We demonstrate that this system can effectively use service-oriented architecture principles and standards-based components to deliver point of care CDS for WGS information in real-time.
Constructing a sequence of palaeoDEMs to obtain erosion rates in a drainage basin.N

NASA Astrophysics Data System (ADS)

Castelltort, F. Xavier; Carles Balasch, J.; Cirés, Jordi; Colombo, Ferran

2017-04-01

DEMs made in a present-day drainage basin, considering it as a geomorphic unit, represent the end result of a landscape evolution. This process has had to follow a model of erosion. Trying to establish a conceptual erosion model in landscape evolution represents the first difficulty in constructing a sequence of palaeoDEMs. But if one is able to do it, the result will be easier and believable. The next step to do is to make a catalogue of base level types present in the drainage basin. The list has to include elements with determinate position and elevation (x, y, z) from the centre of the basin until hillslopes. A list of base level types may contain fluvial terrace remnants, erosive surfaces, palaeosols, alluvial covers of glacis, alluvial fans, rockfalls, landslides and scree zones. It is very important to know the spatial and temporal relations between the elements of the list, even if they are disconnected by erosion processes. Relative chronologies have to be set for all elements of the catalogue, and as far as possible absolute chronologies. To do it,it is essential to have established first the spatial relations between them, including those elements that are gone. Moreover, it is also essential to have adapted all the elements to the conceptual erosion model proposed. In this step, it has to be kept in mind that erosion rates can be very different in determinate areas within the same geomorphic unit. Erosion processes are focused in specific zones while other areas are maintained in stability. A good technique to construct a palaeoDEM is to start making, by hand, a map of contour lines. At this point, it is valuable to use the elements' catalogue. The use of those elements belonging to the same palaeosurface will result in a map. Several maps can be obtained from a catalogue. Contour maps can be gridded into a 3D surface by means of a specific application and a set of surfaces will be obtained. Algebraic operations can be done with palaeoDEMs obtaining
Next generation sequencing and its applications in forensic genetics.

PubMed

Børsting, Claus; Morling, Niels

2015-09-01

It has been almost a decade since the first next generation sequencing (NGS) technologies emerged and quickly changed the way genetic research is conducted. Today, full genomes are mapped and published almost weekly and with ever increasing speed and decreasing costs. NGS methods and platforms have matured during the last 10 years, and the quality of the sequences has reached a level where NGS is used in clinical diagnostics of humans. Forensic genetic laboratories have also explored NGS technologies and especially in the last year, there has been a small explosion in the number of scientific articles and presentations at conferences with forensic aspects of NGS. These contributions have demonstrated that NGS offers new possibilities for forensic genetic case work. More information may be obtained from unique samples in a single experiment by analyzing combinations of markers (STRs, SNPs, insertion/deletions, mRNA) that cannot be analyzed simultaneously with the standard PCR-CE methods used today. The true variation in core forensic STR loci has been uncovered, and previously unknown STR alleles have been discovered. The detailed sequence information may aid mixture interpretation and will increase the statistical weight of the evidence. In this review, we will give an introduction to NGS and single-molecule sequencing, and we will discuss the possible applications of NGS in forensic genetics. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
76 FR 64010 - Special Rules Governing Certain Information Obtained Under the Clean Air Act: Technical Correction

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-17

... natural gas. 211112 Natural gas liquid extraction facilities. Petrochemical Production 32511 Ethylene.... Suppliers of Natural Gas and NGLs 221210 Natural gas distribution facilities. 211112 Natural gas liquid... Gas Reporting Rule, which are provided in the Special Rules Governing Certain Information Obtained...
Whole Exome Sequencing in Pediatric Neurology Patients: Clinical Implications and Estimated Cost Analysis.

PubMed

Nolan, Danielle; Carlson, Martha

2016-06-01

Genetic heterogeneity in neurologic disorders has been an obstacle to phenotype-based diagnostic testing. The authors hypothesized that information compiled via whole exome sequencing will improve clinical diagnosis and management of pediatric neurology patients. The authors performed a retrospective chart review of patients evaluated in the University of Michigan Pediatric Neurology clinic between 6/2011 and 6/2015. The authors recorded previous diagnostic testing, indications for whole exome sequencing, and whole exome sequencing results. Whole exome sequencing was recommended for 135 patients and obtained in 53 patients. Insurance barriers often precluded whole exome sequencing. The most common indication for whole exome sequencing was neurodevelopmental disorders. Whole exome sequencing improved the presumptive diagnostic rate in the patient cohort from 25% to 48%. Clinical implications included family planning, medication selection, and systemic investigation. Compared to current second tier testing, whole exome sequencing can result in lower long-term charges and more timely diagnosis. Overcoming barriers related to whole exome sequencing insurance authorization could allow for more efficient and fruitful diagnostic neurological evaluations. © The Author(s) 2016.
28 CFR 115.341 - Obtaining information from residents.

Code of Federal Regulations, 2012 CFR

2012-07-01

... ACT NATIONAL STANDARDS Standards for Juvenile Facilities Screening for Risk of Sexual Victimization... use information about each resident's personal history and behavior to reduce the risk of sexual abuse... instrument. (c) At a minimum, the agency shall attempt to ascertain information about: (1) Prior sexual...
28 CFR 115.341 - Obtaining information from residents.

Code of Federal Regulations, 2013 CFR

2013-07-01

... ACT NATIONAL STANDARDS Standards for Juvenile Facilities Screening for Risk of Sexual Victimization... use information about each resident's personal history and behavior to reduce the risk of sexual abuse... instrument. (c) At a minimum, the agency shall attempt to ascertain information about: (1) Prior sexual...
28 CFR 115.341 - Obtaining information from residents.

Code of Federal Regulations, 2014 CFR

2014-07-01

... ACT NATIONAL STANDARDS Standards for Juvenile Facilities Screening for Risk of Sexual Victimization... use information about each resident's personal history and behavior to reduce the risk of sexual abuse... instrument. (c) At a minimum, the agency shall attempt to ascertain information about: (1) Prior sexual...
50 CFR 221.41 - How may parties obtain discovery of information needed for the case?

Code of Federal Regulations, 2010 CFR

2010-10-01

... materials, it must show: (i) That it has substantial need of the materials in preparing its own case; and... legal theories of an attorney. (g) Experts. Unless restricted by the ALJ, a party may discover any facts...: (i) That it has a compelling need for the information; and (ii) That it cannot practicably obtain the...
Contamination of sequence databases with adaptor sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoshikawa, Takeo; Sanders, A.R.; Detera-Wadleigh, S.D.

Because of the exponential increase in the amount of DNA sequences being added to the public databases on a daily basis, it has become imperative to identify sources of contamination rapidly. Previously, contaminations of sequence databases have been reported to alert the scientific community to the problem. These contaminations can be divided into two categories. The first category comprises host sequences that have been difficult for submitters to manage or control. Examples include anomalous sequences derived from Escherichia coli, which are inserted into the chromosomes (and plasmids) of the bacterial hosts. Insertion sequences are highly mobile and are capable ofmore » transposing themselves into plasmids during cloning manipulation. Another example of the first category is the infection with yeast genomic DNA or with bacterial DNA of some commercially available cDNA libraries from Clontech. The second category of database contamination is due to the inadvertent inclusion of nonhost sequences. This category includes incorporation of cloning-vector sequences and multicloning sites in the database submission. M13-derived artifacts have been common, since M13-based vectors have been widely used for subcloning DNA fragments. Recognizing this problem, the National Center for Biotechnology Information (NCBI) started to screen, in April 1994, all sequences directly submitted to GenBank, against a set of vector data retrieved from GenBank by use of key-word searches, such as {open_quotes}vector.{close_quotes} In this report, we present evidence for another sequence artifact that is widespread but that, to our knowledge, has not yet been reported. 11 refs., 1 tab.« less
Downsizing genomic medicine: approaching the ethical complexity of whole-genome sequencing by starting small.

PubMed

Sharp, Richard R

2011-03-01

As we look to a time when whole-genome sequencing is integrated into patient care, it is possible to anticipate a number of ethical challenges that will need to be addressed. The most intractable of these concern informed consent and the responsible management of very large amounts of genetic information. Given the range of possible findings, it remains unclear to what extent it will be possible to obtain meaningful patient consent to genomic testing. Equally unclear is how clinicians will disseminate the enormous volume of genetic information produced by whole-genome sequencing. Toward developing practical strategies for managing these ethical challenges, we propose a research agenda that approaches multiplexed forms of clinical genetic testing as natural laboratories in which to develop best practices for managing the ethical complexities of genomic medicine.
Recognition of Drainage Tunnels during Glacier Lake Outburst Events from Terrestrial Image Sequences

NASA Astrophysics Data System (ADS)

Schwalbe, E.; Koschitzki, R.; Maas, H.-G.

2016-06-01

In recent years, many glaciers all over the world have been distinctly retreating and thinning. One of the consequences of this is the increase of so called glacier lake outburst flood events (GLOFs). The mechanisms ruling such GLOF events are still not yet fully understood by glaciologists. Thus, there is a demand for data and measurements that can help to understand and model the phenomena. Thereby, a main issue is to obtain information about the location and formation of subglacial channels through which some lakes, dammed by a glacier, start to drain. The paper will show how photogrammetric image sequence analysis can be used to collect such data. For the purpose of detecting a subglacial tunnel, a camera has been installed in a pilot study to observe the area of the Colonia Glacier (Northern Patagonian Ice Field) where it dams the Lake Cachet II. To verify the hypothesis, that the course of the subglacial tunnel is indicated by irregular surface motion patterns during its collapse, the camera acquired image sequences of the glacier surface during several GLOF events. Applying tracking techniques to these image sequences, surface feature motion trajectories could be obtained for a dense raster of glacier points. Since only a single camera has been used for image sequence acquisition, depth information is required to scale the trajectories. Thus, for scaling and georeferencing of the measurements a GPS-supported photogrammetric network has been measured. The obtained motion fields of the Colonia Glacier deliver information about the glacier's behaviour before during and after a GLOF event. If the daily vertical glacier motion of the glacier is integrated over a period of several days and projected into a satellite image, the location and shape of the drainage channel underneath the glacier becomes visible. The high temporal resolution of the motion fields may also allows for an analysis of the tunnels dynamic in comparison to the changing water level of the lake.
Structure-Templated Predictions of Novel Protein Interactions from Sequence Information

PubMed Central

Betel, Doron; Breitkreuz, Kevin E; Isserlin, Ruth; Dewar-Darch, Danielle; Tyers, Mike; Hogue, Christopher W. V

2007-01-01

The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain–motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information. PMID:17892321

Inter-laboratory evaluation of the EUROFORGEN Global ancestry-informative SNP panel by massively parallel sequencing using the Ion PGM™.

PubMed

Eduardoff, M; Gross, T E; Santos, C; de la Puente, M; Ballard, D; Strobl, C; Børsting, C; Morling, N; Fusco, L; Hussing, C; Egyed, B; Souto, L; Uacyisrael, J; Syndercombe Court, D; Carracedo, Á; Lareu, M V; Schneider, P M; Parson, W; Phillips, C; Parson, W; Phillips, C

2016-07-01

The EUROFORGEN Global ancestry-informative SNP (AIM-SNPs) panel is a forensic multiplex of 128 markers designed to differentiate an individual's ancestry from amongst the five continental population groups of Africa, Europe, East Asia, Native America, and Oceania. A custom multiplex of AmpliSeq™ PCR primers was designed for the Global AIM-SNPs to perform massively parallel sequencing using the Ion PGM™ system. This study assessed individual SNP genotyping precision using the Ion PGM™, the forensic sensitivity of the multiplex using dilution series, degraded DNA plus simple mixtures, and the ancestry differentiation power of the final panel design, which required substitution of three original ancestry-informative SNPs with alternatives. Fourteen populations that had not been previously analyzed were genotyped using the custom multiplex and these studies allowed assessment of genotyping performance by comparison of data across five laboratories. Results indicate a low level of genotyping error can still occur from sequence misalignment caused by homopolymeric tracts close to the target SNP, despite careful scrutiny of candidate SNPs at the design stage. Such sequence misalignment required the exclusion of component SNP rs2080161 from the Global AIM-SNPs panel. However, the overall genotyping precision and sensitivity of this custom multiplex indicates the Ion PGM™ assay for the Global AIM-SNPs is highly suitable for forensic ancestry analysis with massively parallel sequencing. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Rényi continuous entropy of DNA sequences.

PubMed

Vinga, Susana; Almeida, Jonas S

2004-12-07

Entropy measures of DNA sequences estimate their randomness or, inversely, their repeatability. L-block Shannon discrete entropy accounts for the empirical distribution of all length-L words and has convergence problems for finite sequences. A new entropy measure that extends Shannon's formalism is proposed. Renyi's quadratic entropy calculated with Parzen window density estimation method applied to CGR/USM continuous maps of DNA sequences constitute a novel technique to evaluate sequence global randomness without some of the former method drawbacks. The asymptotic behaviour of this new measure was analytically deduced and the calculation of entropies for several synthetic and experimental biological sequences was performed. The results obtained were compared with the distributions of the null model of randomness obtained by simulation. The biological sequences have shown a different p-value according to the kernel resolution of Parzen's method, which might indicate an unknown level of organization of their patterns. This new technique can be very useful in the study of DNA sequence complexity and provide additional tools for DNA entropy estimation. The main MATLAB applications developed and additional material are available at the webpage . Specialized functions can be obtained from the authors.
20 CFR 603.23 - What information must State UC agencies obtain from other agencies, and crossmatch with wage...

Code of Federal Regulations, 2010 CFR

2010-04-01

... 20 Employees' Benefits 3 2010-04-01 2010-04-01 false What information must State UC agencies obtain from other agencies, and crossmatch with wage information, for purposes of an IEVS? 603.23 Section 603.23 Employees' Benefits EMPLOYMENT AND TRAINING ADMINISTRATION, DEPARTMENT OF LABOR FEDERAL-STATE UNEMPLOYMENT COMPENSATION (UC) PROGRAM;...
Obtaining subjects' consent to publish identifying personal information: current practices and identifying potential issues.

PubMed

Yoshida, Akiko; Dowa, Yuri; Murakami, Hiromi; Kosugi, Shinji

2013-11-25

In studies publishing identifying personal information, obtaining consent is regarded as necessary, as it is impossible to ensure complete anonymity. However, current journal practices around specific points to consider when obtaining consent, the contents of consent forms and how consent forms are managed have not yet been fully examined. This study was conducted to identify potential issues surrounding consent to publish identifying personal information. Content analysis was carried out on instructions for authors and consent forms developed by academic journals in four fields (as classified by Journal Citation Reports): medicine general and internal, genetics and heredity, pediatrics, and psychiatry. An online questionnaire survey of editors working for journals that require the submission of consent forms was also conducted. Instructions for authors were reviewed for 491 academic journals (132 for medicine general and internal, 147 for genetics and heredity, 100 for pediatrics, and 112 for psychiatry). Approximately 40% (203: 74 for medicine general and internal, 31 for genetics and heredity, 58 for pediatrics, and 40 for psychiatry) stated that subject consent was necessary. The submission of consent forms was required by 30% (154) of the journals studied, and 10% (50) provided their own consent forms for authors to use. Two journals mentioned that the possible effects of publication on subjects should be considered. Many journal consent forms mentioned the difficulties in ensuring complete anonymity of subjects, but few addressed the study objective, the subjects' right to refuse consent and the withdrawal of consent. The main reason for requiring the submission of consent forms was to confirm that consent had been obtained. Approximately 40% of journals required subject consent to be obtained. However, differences were observed depending on the fields. Specific considerations were not always documented. There is a need to address issues around the study
Obtaining subjects’ consent to publish identifying personal information: current practices and identifying potential issues

PubMed Central

2013-01-01

Background In studies publishing identifying personal information, obtaining consent is regarded as necessary, as it is impossible to ensure complete anonymity. However, current journal practices around specific points to consider when obtaining consent, the contents of consent forms and how consent forms are managed have not yet been fully examined. This study was conducted to identify potential issues surrounding consent to publish identifying personal information. Methods Content analysis was carried out on instructions for authors and consent forms developed by academic journals in four fields (as classified by Journal Citation Reports): medicine general and internal, genetics and heredity, pediatrics, and psychiatry. An online questionnaire survey of editors working for journals that require the submission of consent forms was also conducted. Results Instructions for authors were reviewed for 491 academic journals (132 for medicine general and internal, 147 for genetics and heredity, 100 for pediatrics, and 112 for psychiatry). Approximately 40% (203: 74 for medicine general and internal, 31 for genetics and heredity, 58 for pediatrics, and 40 for psychiatry) stated that subject consent was necessary. The submission of consent forms was required by 30% (154) of the journals studied, and 10% (50) provided their own consent forms for authors to use. Two journals mentioned that the possible effects of publication on subjects should be considered. Many journal consent forms mentioned the difficulties in ensuring complete anonymity of subjects, but few addressed the study objective, the subjects’ right to refuse consent and the withdrawal of consent. The main reason for requiring the submission of consent forms was to confirm that consent had been obtained. Conclusion Approximately 40% of journals required subject consent to be obtained. However, differences were observed depending on the fields. Specific considerations were not always documented. There is a need
shRNA target prediction informed by comprehensive enquiry (SPICE): a supporting system for high-throughput screening of shRNA library.

PubMed

Kamatuka, Kenta; Hattori, Masahiro; Sugiyama, Tomoyasu

2016-12-01

RNA interference (RNAi) screening is extensively used in the field of reverse genetics. RNAi libraries constructed using random oligonucleotides have made this technology affordable. However, the new methodology requires exploration of the RNAi target gene information after screening because the RNAi library includes non-natural sequences that are not found in genes. Here, we developed a web-based tool to support RNAi screening. The system performs short hairpin RNA (shRNA) target prediction that is informed by comprehensive enquiry (SPICE). SPICE automates several tasks that are laborious but indispensable to evaluate the shRNAs obtained by RNAi screening. SPICE has four main functions: (i) sequence identification of shRNA in the input sequence (the sequence might be obtained by sequencing clones in the RNAi library), (ii) searching the target genes in the database, (iii) demonstrating biological information obtained from the database, and (iv) preparation of search result files that can be utilized in a local personal computer (PC). Using this system, we demonstrated that genes targeted by random oligonucleotide-derived shRNAs were not different from those targeted by organism-specific shRNA. The system facilitates RNAi screening, which requires sequence analysis after screening. The SPICE web application is available at http://www.spice.sugysun.org/.
Personalized Oncology Through Integrative High-Throughput Sequencing: A Pilot Study

PubMed Central

Roychowdhury, Sameek; Iyer, Matthew K.; Robinson, Dan R.; Lonigro, Robert J.; Wu, Yi-Mi; Cao, Xuhong; Kalyana-Sundaram, Shanker; Sam, Lee; Balbin, O. Alejandro; Quist, Michael J.; Barrette, Terrence; Everett, Jessica; Siddiqui, Javed; Kunju, Lakshmi P.; Navone, Nora; Araujo, John C.; Troncoso, Patricia; Logothetis, Christopher J.; Innis, Jeffrey W.; Smith, David C.; Lao, Christopher D.; Kim, Scott Y.; Roberts, J. Scott; Gruber, Stephen B.; Pienta, Kenneth J.; Talpaz, Moshe; Chinnaiyan, Arul M.

2012-01-01

Individual cancers harbor a set of genetic aberrations that can be informative for identifying rational therapies currently available or in clinical trials. We implemented a pilot study to explore the practical challenges of applying high-throughput sequencing in clinical oncology. We enrolled patients with advanced or refractory cancer who were eligible for clinical trials. For each patient, we performed whole-genome sequencing of the tumor, targeted whole-exome sequencing of tumor and normal DNA, and transcriptome sequencing (RNA-Seq) of the tumor to identify potentially informative mutations in a clinically relevant time frame of 3 to 4 weeks. With this approach, we detected several classes of cancer mutations including structural rearrangements, copy number alterations, point mutations, and gene expression alterations. A multidisciplinary Sequencing Tumor Board (STB) deliberated on the clinical interpretation of the sequencing results obtained. We tested our sequencing strategy on human prostate cancer xenografts. Next, we enrolled two patients into the clinical protocol and were able to review the results at our STB within 24 days of biopsy. The first patient had metastatic colorectal cancer in which we identified somatic point mutations in NRAS, TP53, AURKA, FAS, and MYH11, plus amplification and overexpression of cyclin-dependent kinase 8 (CDK8). The second patient had malignant melanoma, in which we identified a somatic point mutation in HRAS and a structural rearrangement affecting CDKN2C. The STB identified the CDK8 amplification and Ras mutation as providing a rationale for clinical trials with CDK inhibitors or MEK (mitogenactivated or extracellular signal–regulated protein kinase kinase) and PI3K (phosphatidylinositol 3-kinase) inhibitors, respectively. Integrative high-throughput sequencing of patients with advanced cancer generates a comprehensive, individual mutational landscape to facilitate biomarker-driven clinical trials in oncology. PMID
45 CFR 73.735-803 - Prohibition against involvement in financial transactions based on information obtained through...

Code of Federal Regulations, 2010 CFR

2010-10-01

... 45 Public Welfare 1 2010-10-01 2010-10-01 false Prohibition against involvement in financial transactions based on information obtained through Federal employment. 73.735-803 Section 73.735-803 Public Welfare DEPARTMENT OF HEALTH AND HUMAN SERVICES GENERAL ADMINISTRATION STANDARDS OF CONDUCT Financial...
The European Classical Swine Fever Virus Database: Blueprint for a Pathogen-Specific Sequence Database with Integrated Sequence Analysis Tools

PubMed Central

Postel, Alexander; Schmeiser, Stefanie; Zimmermann, Bernd; Becher, Paul

2016-01-01

Molecular epidemiology has become an indispensable tool in the diagnosis of diseases and in tracing the infection routes of pathogens. Due to advances in conventional sequencing and the development of high throughput technologies, the field of sequence determination is in the process of being revolutionized. Platforms for sharing sequence information and providing standardized tools for phylogenetic analyses are becoming increasingly important. The database (DB) of the European Union (EU) and World Organisation for Animal Health (OIE) Reference Laboratory for classical swine fever offers one of the world’s largest semi-public virus-specific sequence collections combined with a module for phylogenetic analysis. The classical swine fever (CSF) DB (CSF-DB) became a valuable tool for supporting diagnosis and epidemiological investigations of this highly contagious disease in pigs with high socio-economic impacts worldwide. The DB has been re-designed and now allows for the storage and analysis of traditionally used, well established genomic regions and of larger genomic regions including complete viral genomes. We present an application example for the analysis of highly similar viral sequences obtained in an endemic disease situation and introduce the new geographic “CSF Maps” tool. The concept of this standardized and easy-to-use DB with an integrated genetic typing module is suited to serve as a blueprint for similar platforms for other human or animal viruses. PMID:27827988
WEB-server for search of a periodicity in amino acid and nucleotide sequences

NASA Astrophysics Data System (ADS)

E Frenkel, F.; Skryabin, K. G.; Korotkov, E. V.

2017-12-01

A new web server (http://victoria.biengi.ac.ru/splinter/login.php) was designed and developed to search for periodicity in nucleotide and amino acid sequences. The web server operation is based upon a new mathematical method of searching for multiple alignments, which is founded on the position weight matrices optimization, as well as on implementation of the two-dimensional dynamic programming. This approach allows the construction of multiple alignments of the indistinctly similar amino acid and nucleotide sequences that accumulated more than 1.5 substitutions per a single amino acid or a nucleotide without performing the sequences paired comparisons. The article examines the principles of the web server operation and two examples of studying amino acid and nucleotide sequences, as well as information that could be obtained using the web server.
Epidemiological information is key when interpreting whole genome sequence data – lessons learned from a large Legionella pneumophila outbreak in Warstein, Germany, 2013

PubMed Central

Petzold, Markus; Prior, Karola; Moran-Gilad, Jacob; Harmsen, Dag; Lück, Christian

2017-01-01

Introduction Whole genome sequencing (WGS) is increasingly used in Legionnaires’ disease (LD) outbreak investigations, owing to its higher resolution than sequence-based typing, the gold standard typing method for Legionella pneumophila, in the analysis of endemic strains. Recently, a gene-by-gene typing approach based on 1,521 core genes called core genome multilocus sequence typing (cgMLST) was described that enables a robust and standardised typing of L. pneumophila. Methods: We applied this cgMLST scheme to isolates obtained during the largest outbreak of LD reported so far in Germany. In this outbreak, the epidemic clone ST345 had been isolated from patients and four different environmental sources. In total 42 clinical and environmental isolates were retrospectively typed. Results: Epidemiologically unrelated ST345 isolates were clearly distinguishable from the epidemic clone. Remarkably, epidemic isolates split up into two distinct clusters, ST345-A and ST345-B, each respectively containing a mix of clinical and epidemiologically-related environmental samples. Discussion/conclusion: The outbreak was therefore likely caused by both variants of the single sequence type, which pre-existed in the environmental reservoirs. The two clusters differed by 40 alleles located in two neighbouring genomic regions of ca 42 and 26 kb. Additional analysis supported horizontal gene transfer of the two regions as responsible for the difference between the variants. Both regions comprise virulence genes and have previously been reported to be involved in recombination events. This corroborates the notion that genomic outbreak investigations should always take epidemiological information into consideration when making inferences. Overall, cgMLST proved helpful in disentangling the complex genomic epidemiology of the outbreak. PMID:29162202
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

PubMed Central

Pruitt, Kim D.; Tatusova, Tatiana; Maglott, Donna R.

2005-01-01

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248
LLNL Genomic Assessment: Viral and Bacterial Sequencing Needs for TMTI, Tier 1 Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Slezak, T; Borucki, M; Lenhoff, R

2009-09-29

The Lawrence Livermore National Lab Bioinformatics group has recently taken on a role in DTRA's Transformation Medical Technologies Initiative (TMTI). The high-level goal of TMTI is to accelerate the development of broad-spectrum countermeasures. To achieve those goals, TMTI has a near term need to obtain more sequence information across a large range of pathogens, near neighbors, and across a broad geographical and host range. Our role in this project is to research available sequence data for the organisms of interest and identify critical microbial sequence and knowledge gaps that need to be filled to meet TMTI objectives. This effort includes:more » (1) assessing current genomic sequence for each agent including phylogenetic and geographical diversity, host range, date of isolation range, virulence, sequence availability of key near neighbors, and other characteristics; (2) identifying Subject Matter Experts (SME's) and potential holders of isolate collections, contacting appropriate SME's with known expertise and isolate collections to obtain information on isolate availability and specific recommendations; (3) identifying sequence as well as knowledge gaps (eg virulence, host range, and antibiotic resistance determinants); (4) providing specific recommendations as to the most valuable strains to be placed on the DTRA sequencing queue. We acknowledge that criteria for prioritization of isolates for sequencing falls into two categories aligning with priority queues 1 and 2 as described in the summary. (Priority queue 0 relates to DTRA operational isolates whose availability is not predictable in advance.) 1. Selection of isolates that appear to have likelihood to provide information on virulence and antibiotic resistance. This will include sequence of known virulent strains. Particularly valuable would be virulent strains that have genetically similar yet avirulent, or non human transmissible, counterparts that can be used for comparison to help identify key
Single-virion sequencing of lamivudine-treated HBV populations reveal population evolution dynamics and demographic history.

PubMed

Zhu, Yuan O; Aw, Pauline P K; de Sessions, Paola Florez; Hong, Shuzhen; See, Lee Xian; Hong, Lewis Z; Wilm, Andreas; Li, Chen Hao; Hue, Stephane; Lim, Seng Gee; Nagarajan, Niranjan; Burkholder, William F; Hibberd, Martin

2017-10-27

Viral populations are complex, dynamic, and fast evolving. The evolution of groups of closely related viruses in a competitive environment is termed quasispecies. To fully understand the role that quasispecies play in viral evolution, characterizing the trajectories of viral genotypes in an evolving population is the key. In particular, long-range haplotype information for thousands of individual viruses is critical; yet generating this information is non-trivial. Popular deep sequencing methods generate relatively short reads that do not preserve linkage information, while third generation sequencing methods have higher error rates that make detection of low frequency mutations a bioinformatics challenge. Here we applied BAsE-Seq, an Illumina-based single-virion sequencing technology, to eight samples from four chronic hepatitis B (CHB) patients - once before antiviral treatment and once after viral rebound due to resistance. With single-virion sequencing, we obtained 248-8796 single-virion sequences per sample, which allowed us to find evidence for both hard and soft selective sweeps. We were able to reconstruct population demographic history that was independently verified by clinically collected data. We further verified four of the samples independently through PacBio SMRT and Illumina Pooled deep sequencing. Overall, we showed that single-virion sequencing yields insight into viral evolution and population dynamics in an efficient and high throughput manner. We believe that single-virion sequencing is widely applicable to the study of viral evolution in the context of drug resistance and host adaptation, allows differentiation between soft or hard selective sweeps, and may be useful in the reconstruction of intra-host viral population demographic history.
Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms.

PubMed

Ferraro Petrillo, Umberto; Roscigno, Gianluca; Cattaneo, Giuseppe; Giancarlo, Raffaele

2018-06-01

Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in the realm of genome assembly. However, they are so specialized to this domain that they do not extend easily to the computation of informational and linguistic indices, concurrently on sets of genomes. Following the well-established approach in many disciplines, and with a growing success also in bioinformatics, to resort to MapReduce and Hadoop to deal with 'Big Data' problems, we present KCH, the first set of MapReduce algorithms able to perform concurrently informational and linguistic analysis of large collections of genomic sequences on a Hadoop cluster. The benchmarking of KCH that we provide indicates that it is quite effective and versatile. It is also competitive with respect to the parallel and distributed algorithms highly specialized to k-mer statistics collection for genome assembly problems. In conclusion, KCH is a much needed addition to the growing number of algorithms and tools that use MapReduce for bioinformatics core applications. The software, including instructions for running it over Amazon AWS, as well as the datasets are available at http://www.di-srv.unisa.it/KCH. umberto.ferraro@uniroma1.it. Supplementary data are available at Bioinformatics online.
Predicting Protein-Protein Interactions by Combing Various Sequence-Derived.

PubMed

Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao

2011-09-20

Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.
25 CFR 162.539 - Must I obtain a WEEL before obtaining a WSR lease?

Code of Federal Regulations, 2014 CFR

2014-04-01

... AND PERMITS Wind and Solar Resource Leases Wsr Leases § 162.539 Must I obtain a WEEL before obtaining... direct result of energy resource information gathered from a WEEL activity, obtaining a WEEL is not a...
25 CFR 162.539 - Must I obtain a WEEL before obtaining a WSR lease?

Code of Federal Regulations, 2013 CFR

2013-04-01

... AND PERMITS Wind and Solar Resource Leases Wsr Leases § 162.539 Must I obtain a WEEL before obtaining... direct result of energy resource information gathered from a WEEL activity, obtaining a WEEL is not a...
Counting Patterns in Degenerated Sequences

NASA Astrophysics Data System (ADS)

Nuel, Grégory

Biological sequences like DNA or proteins, are always obtained through a sequencing process which might produce some uncertainty. As a result, such sequences are usually written in a degenerated alphabet where some symbols may correspond to several possible letters (ex: IUPAC DNA alphabet). When counting patterns in such degenerated sequences, the question that naturally arises is: how to deal with degenerated positions ? Since most (usually 99%) of the positions are not degenerated, it is considered harmless to discard the degenerated positions in order to get an observation, but the exact consequences of such a practice are unclear. In this paper, we introduce a rigorous method to take into account the uncertainty of sequencing for biological sequences (DNA, Proteins). We first introduce a Forward-Backward approach to compute the marginal distribution of the constrained sequence and use it both to perform a Expectation-Maximization estimation of parameters, as well as deriving a heterogeneous Markov distribution for the constrained sequence. This distribution is hence used along with known DFA-based pattern approaches to obtain the exact distribution of the pattern count under the constraints. As an illustration, we consider a EST dataset from the EMBL database. Despite the fact that only 1% of the positions in this dataset are degenerated, we show that not taking into account these positions might lead to erroneous observations, further proving the interest of our approach.
Unlocking hidden genomic sequence

PubMed Central

Keith, Jonathan M.; Cochran, Duncan A. E.; Lala, Gita H.; Adams, Peter; Bryant, Darryn; Mitchelson, Keith R.

2004-01-01

Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs. PMID:14973330

Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia

PubMed Central

Shortt, Jonathan A.; Card, Daren C.; Schield, Drew R.; Liu, Yang; Zhong, Bo; Castoe, Todd A.

2017-01-01

Background In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies. Methodology/Principal Findings We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq) to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample. Conclusions/Significance This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance
Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia.

PubMed

Shortt, Jonathan A; Card, Daren C; Schield, Drew R; Liu, Yang; Zhong, Bo; Castoe, Todd A; Carlton, Elizabeth J; Pollock, David D

2017-01-01

In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies. We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq) to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample. This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance, and could be applied to other schistosome species and other
Comparison of DNA Microarray, Loop-Mediated Isothermal Amplification (LAMP) and Real-Time PCR with DNA Sequencing for Identification of Fusarium spp. Obtained from Patients with Hematologic Malignancies.

PubMed

de Souza, Marcela; Matsuzawa, Tetsuhiro; Sakai, Kanae; Muraosa, Yasunori; Lyra, Luzia; Busso-Lopes, Ariane Fidelis; Levin, Anna Sara Shafferman; Schreiber, Angélica Zaninelli; Mikami, Yuzuru; Gonoi, Tohoru; Kamei, Katsuhiko; Moretti, Maria Luiza; Trabasso, Plínio

2017-08-01

The performance of three molecular biology techniques, i.e., DNA microarray, loop-mediated isothermal amplification (LAMP), and real-time PCR were compared with DNA sequencing for properly identification of 20 isolates of Fusarium spp. obtained from blood stream as etiologic agent of invasive infections in patients with hematologic malignancies. DNA microarray, LAMP and real-time PCR identified 16 (80%) out of 20 samples as Fusarium solani species complex (FSSC) and four (20%) as Fusarium spp. The agreement among the techniques was 100%. LAMP exhibited 100% specificity, while DNA microarray, LAMP and real-time PCR showed 100% sensitivity. The three techniques had 100% agreement with DNA sequencing. Sixteen isolates were identified as FSSC by sequencing, being five Fusarium keratoplasticum, nine Fusarium petroliphilum and two Fusarium solani. On the other hand, sequencing identified four isolates as Fusarium non-solani species complex (FNSSC), being three isolates as Fusarium napiforme and one isolate as Fusarium oxysporum. Finally, LAMP proved to be faster and more accessible than DNA microarray and real-time PCR, since it does not require a thermocycler. Therefore, LAMP signalizes as emerging and promising methodology to be used in routine identification of Fusarium spp. among cases of invasive fungal infections.
Complete genome sequence of Southern tomato virus naturally infecting tomatoes in Bangladesh using small RNA deep sequencing

USDA-ARS?s Scientific Manuscript database

The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...
Validation of Splicing Events in Transcriptome Sequencing Data

PubMed Central

Kaisers, Wolfgang; Ptok, Johannes; Schwender, Holger; Schaal, Heiner

2017-01-01

Genomic alignments of sequenced cellular messenger RNA contain gapped alignments which are interpreted as consequence of intron removal. The resulting gap-sites, genomic locations of alignment gaps, are landmarks representing potential splice-sites. As alignment algorithms report gap-sites with a considerable false discovery rate, validations are required. We describe two quality scores, gap quality score (gqs) and weighted gap information score (wgis), developed for validation of putative splicing events: While gqs solely relies on alignment data wgis additionally considers information from the genomic sequence. FASTQ files obtained from 54 human dermal fibroblast samples were aligned against the human genome (GRCh38) using TopHat and STAR aligner. Statistical properties of gap-sites validated by gqs and wgis were evaluated by their sequence similarity to known exon-intron borders. Within the 54 samples, TopHat identifies 1,000,380 and STAR reports 6,487,577 gap-sites. Due to the lack of strand information, however, the percentage of identified GT-AG gap-sites is rather low. While gap-sites from TopHat contain ≈89% GT-AG, gap-sites from STAR only contain ≈42% GT-AG dinucleotide pairs in merged data from 54 fibroblast samples. Validation with gqs yields 156,251 gap-sites from TopHat alignments and 166,294 from STAR alignments. Validation with wgis yields 770,327 gap-sites from TopHat alignments and 1,065,596 from STAR alignments. Both alignment algorithms, TopHat and STAR, report gap-sites with considerable false discovery rate, which can drastically be reduced by validation with gqs and wgis. PMID:28545234
Concordance of the ForenSeq™ system and characterisation of sequence-specific autosomal STR alleles across two major population groups.

PubMed

Devesse, Laurence; Ballard, David; Davenport, Lucinda; Riethorst, Immy; Mason-Buck, Gabriella; Syndercombe Court, Denise

2018-05-01

By using sequencing technology to genotype loci of forensic interest it is possible to simultaneously target autosomal, X and Y STRs as well as identity, ancestry and phenotypic informative SNPs, resulting in a breadth of data obtained from a single run that is considerable when compared to that generated with standard technologies. It is important however that this information aligns with the genotype data currently obtained using commercially available kits for CE-based investigations such that results are compatible with existing databases and hence can be of use to the forensic community. In this work, 400 samples were typed using commercially available STR kits and CE, as well as using the Ilumina ForenSeq™ DNA Signature Prep Kit and MiSeq ® FGx to assess concordance of autosomal STRs and population variability. Results show a concordance rate between the two technologies exceeding 99.98% while numerous novel sequence based alleles are described. In order to make use of the sequence variation observed, sequence specific allele frequencies were generated for White British and British Chinese populations. Copyright © 2017 Elsevier B.V. All rights reserved.
Extracting Both Peptide Sequence and Glycan Structural Information by 157 nm Photodissociation of N-Linked Glycopeptides

PubMed Central

Zhang, Liangyi; Reilly, James P.

2009-01-01

157 nm photodissociation of N-linked glycopeptides was investigated in MALDI tandem time-of-flight (TOF) and linear ion trap mass spectrometers. Singly-charged glycopeptides yielded abundant peptide and glycan fragments. The peptide fragments included a series of x-, y-, v- and w- ions with the glycan remaining intact. These provide information about the peptide sequence and the glycosylation site. In addition to glycosidic fragments, abundant cross-ring glycan fragments that are not observed in low-energy CID were detected. These fragments provide insight into the glycan sequence and linkages. Doubly-charged glycopeptides generated by nanospray in the linear ion trap mass spectrometer also yielded peptide and glycan fragments. However, the former were dominated by low-energy fragments such as b- and y- type ions while glycan was primarily cleaved at glycosidic bonds. PMID:19113943
Reporting Differences Between Spacecraft Sequence Files

NASA Technical Reports Server (NTRS)

Khanampompan, Teerapat; Gladden, Roy E.; Fisher, Forest W.

2010-01-01

A suite of computer programs, called seq diff suite, reports differences between the products of other computer programs involved in the generation of sequences of commands for spacecraft. These products consist of files of several types: replacement sequence of events (RSOE), DSN keyword file [DKF (wherein DSN signifies Deep Space Network)], spacecraft activities sequence file (SASF), spacecraft sequence file (SSF), and station allocation file (SAF). These products can include line numbers, request identifications, and other pieces of information that are not relevant when generating command sequence products, though these fields can result in the appearance of many changes to the files, particularly when using the UNIX diff command to inspect file differences. The outputs of prior software tools for reporting differences between such products include differences in these non-relevant pieces of information. In contrast, seq diff suite removes the fields containing the irrelevant pieces of information before processing to extract differences, so that only relevant differences are reported. Thus, seq diff suite is especially useful for reporting changes between successive versions of the various products and in particular flagging difference in fields relevant to the sequence command generation and review process.
Taxonomic and functional assignment of cloned sequences from high Andean forest soil metagenome.

PubMed

Montaña, José Salvador; Jiménez, Diego Javier; Hernández, Mónica; Angel, Tatiana; Baena, Sandra

2012-02-01

Total metagenomic DNA was isolated from high Andean forest soil and subjected to taxonomical and functional composition analyses by means of clone library generation and sequencing. The obtained yield of 1.7 μg of DNA/g of soil was used to construct a metagenomic library of approximately 20,000 clones (in the plasmid p-Bluescript II SK+) with an average insert size of 4 Kb, covering 80 Mb of the total metagenomic DNA. Metagenomic sequences near the plasmid cloning site were sequenced and them trimmed and assembled, obtaining 299 reads and 31 contigs (0.3 Mb). Taxonomic assignment of total sequences was performed by BLASTX, resulting in 68.8, 44.8 and 24.5% classification into taxonomic groups using the metagenomic RAST server v2.0, WebCARMA v1.0 online system and MetaGenome Analyzer v3.8 software, respectively. Most clone sequences were classified as Bacteria belonging to phlya Actinobacteria, Proteobacteria and Acidobacteria. Among the most represented orders were Actinomycetales (34% average), Rhizobiales, Burkholderiales and Myxococcales and with a greater number of sequences in the genus Mycobacterium (7% average), Frankia, Streptomyces and Bradyrhizobium. The vast majority of sequences were associated with the metabolism of carbohydrates, proteins, lipids and catalytic functions, such as phosphatases, glycosyltransferases, dehydrogenases, methyltransferases, dehydratases and epoxide hydrolases. In this study we compared different methods of taxonomic and functional assignment of metagenomic clone sequences to evaluate microbial diversity in an unexplored soil ecosystem, searching for putative enzymes of biotechnological interest and generating important information for further functional screening of clone libraries.
Non-invasive method to obtain DNA from freshwater mussels (Bivalvia: Unionidae)

USGS Publications Warehouse

Henley, W.F.; Grobler, P.J.; Neves, R.J.

2006-01-01

To determine whether DNA could be isolated from tissues obtained by brush-swabbing the mantle, viscera and foot, mantle-clips and swabbed cells were obtained from eight Quadrula pustulosa (Lea, 1831). DNA yields from clips and swabbings were 447.0 and 975.3 ??g/??L, respectively. Furthermore, comparisons of sequences from the ND-1 mitochondrial gene region showed a 100% sequence agreement of DNA from cells obtained by clips and swabs. To determine the number of swabs needed to obtain adequate yields of DNA for analyses, the visceras and feet of 5 Q. pustulosa each were successively swabbed 2, 4 and 6 times. DNA yields from the 2, 4 and 6 swabbed mussel groups were 399.4, 833.8 and 852.6 ng/??L, respectively. ND-1 sequences from the lowest yield still provided 846-901 bp for the ND-1 region. Nevertheless, to ensure adequate DNA yield from cell samples obtained by swabbing, we recommend that 4 swab-strokes of the viscera and foot be obtained. The use of integumental swabbing for collection of cells for determination of genetic relationships among freshwater mussels is noninvasive, when compared with tissue collection by mantle-clipping. Therefore, its use is recommended for freshwater mussels, especially state-protected or federally listed mussel species.
Entropic fluctuations in DNA sequences

NASA Astrophysics Data System (ADS)

Thanos, Dimitrios; Li, Wentian; Provata, Astero

2018-03-01

The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.
Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms

PubMed Central

Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

2015-01-01

Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450
Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

PubMed

Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

2015-01-01

Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.
Tidying Up International Nucleotide Sequence Databases: Ecological, Geographical and Sequence Quality Annotation of ITS Sequences of Mycorrhizal Fungi

PubMed Central

Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R. Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M.; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

2011-01-01

Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi. PMID:21949797
Feasibility of Obtaining Quantitative 3-Dimensional Information Using Conventional Endoscope: A Pilot Study

PubMed Central

Hyun, Jong Jin; Keum, Bora; Seo, Yeon Seok; Kim, Yong Sik; Jeen, Yoon Tae; Lee, Hong Sik; Um, Soon Ho; Kim, Chang Duck; Ryu, Ho Sang; Lim, Jong-Wook; Woo, Dong-Gi; Kim, Young-Joong; Lim, Myo-Taeg

2012-01-01

Background/Aims Three-dimensional (3D) imaging is gaining popularity and has been partly adopted in laparoscopic surgery or robotic surgery but has not been applied to gastrointestinal endoscopy. As a first step, we conducted an experiment to evaluate whether images obtained by conventional gastrointestinal endoscopy could be used to acquire quantitative 3D information. Methods Two endoscopes (GIF-H260) were used in a Borrmann type I tumor model made of clay. The endoscopes were calibrated by correcting the barrel distortion and perspective distortion. Obtained images were converted to gray-level image, and the characteristics of the images were obtained by edge detection. Finally, data on 3D parameters were measured by using epipolar geometry, two view geometry, and pinhole camera model. Results The focal length (f) of endoscope at 30 mm was 258.49 pixels. Two endoscopes were fixed at predetermined distance, 12 mm (d12). After matching and calculating disparity (v2-v1), which was 106 pixels, the calculated length between the camera and object (L) was 29.26 mm. The height of the object projected onto the image (h) was then applied to the pinhole camera model, and the result of H (height and width) was 38.21 mm and 41.72 mm, respectively. Measurements were conducted from 2 different locations. The measurement errors ranged from 2.98% to 7.00% with the current Borrmann type I tumor model. Conclusions It was feasible to obtain parameters necessary for 3D analysis and to apply the data to epipolar geometry with conventional gastrointestinal endoscope to calculate the size of an object. PMID:22977798
40 CFR 2.309 - Special rules governing certain information obtained under the Marine Protection, Research and...

Code of Federal Regulations, 2010 CFR

2010-07-01

... information obtained under the Marine Protection, Research and Sanctuaries Act of 1972. 2.309 Section 2.309... Protection, Research and Sanctuaries Act of 1972. (a) Definitions. For the purposes of this section: (1) Act means the Marine Protection, Research and Sanctuaries Act of 1972, 33 U.S.C. 1401 et seq. (2) Permit...
Using hidden Markov models to align multiple sequences.

PubMed

Mount, David W

2009-07-01

A hidden Markov model (HMM) is a probabilistic model of a multiple sequence alignment (msa) of proteins. In the model, each column of symbols in the alignment is represented by a frequency distribution of the symbols (called a "state"), and insertions and deletions are represented by other states. One moves through the model along a particular path from state to state in a Markov chain (i.e., random choice of next move), trying to match a given sequence. The next matching symbol is chosen from each state, recording its probability (frequency) and also the probability of going to that state from a previous one (the transition probability). State and transition probabilities are multiplied to obtain a probability of the given sequence. The hidden nature of the HMM is due to the lack of information about the value of a specific state, which is instead represented by a probability distribution over all possible values. This article discusses the advantages and disadvantages of HMMs in msa and presents algorithms for calculating an HMM and the conditions for producing the best HMM.
Analysis of endoscopic third ventriculostomy patency by MRI: value of different pulse sequences, the sequence parameters, and the imaging planes for investigation of flow void.

PubMed

Dinçer, Alp; Yildiz, Erdem; Kohan, Saeed; Memet Özek, M

2011-01-01

The aim of the study is to evaluate the efficiency of turbo spin-echo (TSE), three-dimensional constructive interference in the steady state (3D CISS) and cine phase contrast (Cine PC) sequences in determining flow through the endoscopic third ventriculostomy (ETV) fenestration, and to determine the effect of various TSE sequence parameters. The study was approved by our institutional review board and informed consent from all patients was obtained. Two groups of patients were included: group I (24 patients with good clinical outcome after ETV) and group II (22 patients with hydrocephalus evaluated preoperatively). The imaging protocol for both groups was identical. TSE T2 with various sequence parameters and imaging planes, and 3D CISS, followed by cine PC were obtained. Flow void was graded as four-point scales. The sensitivity, specificity, accuracy, positive and negative predictive values of sequences were calculated. Bidirectional flow through the fenestration was detected in all group I patients by cine PC. Stroke volumes through the fenestration in group I ranged 10-160.8 ml/min. There was no correlation between the presence of reversed flow and flow void grading. Also, there was no correlation between the stroke volumes and flow void grading. The sensitivity of 3D CISS was low, and 2 mm sagittal TSE T2, nearly equal to cine PC, provided best result. Cine PC and TSE T2 both have high confidence in the assessment of the flow through the fenestration. But, sequence parameters significantly affect the efficiency of TSE T2.
Procedures of recruiting, obtaining informed consent, and compensating research participants in Qatar: findings from a qualitative investigation.

PubMed

Killawi, Amal; Khidir, Amal; Elnashar, Maha; Abdelrahim, Huda; Hammoud, Maya; Elliott, Heather; Thurston, Michelle; Asad, Humna; Al-Khal, Abdul Latif; Fetters, Michael D

2014-02-04

Very few researchers have reported on procedures of recruiting, obtaining informed consent, and compensating participants in health research in the Arabian Gulf Region. Empirical research can inform the debate about whether to adjust these procedures for culturally diverse settings. Our objective was to delineate procedures related to recruiting, obtaining informed consent, and compensating health research participants in the extremely high-density multicultural setting of Qatar. During a multistage mixed methods project, field observations and qualitative interviews were conducted in a general medicine clinic of a major medical center in Qatar. Participants were chosen based on gender, age, literacy, and preferred language, i.e., Arabic, English, Hindi and Urdu. Qualitative analysis identified themes about recruitment, informed consent, compensation, and other research procedures. A total of 153 individuals were approached and 84 enrolled; the latter showed a diverse age range (18 to 75 years); varied language representation: Arabic (n = 24), English (n = 20), Hindi (n = 20), and Urdu (n = 20); and balanced gender distribution: women (n = 43) and men (n = 41). Primary reasons for 30 declinations included concern about interview length and recording. The study achieved a 74% participation rate. Qualitative analytics revealed key themes about hesitation to participate, decisions about participation with family members as well as discussions with them as "incidental research participants", the informed consent process, privacy and gender rules of the interview environment, reactions to member checking and compensation, and motivation for participating. Vulnerability emerged as a recurring issue throughout the process among a minority of participants. This study from Qatar is the first to provide empirical data on recruitment, informed consent, compensation and other research procedures in a general adult population in the Middle East and Arabian Gulf. This
Next-Generation Sequencing Platforms

NASA Astrophysics Data System (ADS)

Mardis, Elaine R.

2013-06-01

Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.

Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

PubMed Central

Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

2015-01-01

Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486
Effect of multimedia information sequencing on educational outcome in orthodontic training.

PubMed

Aly, Medhat; Willems, Guy; Van Den Noortgate, Wim; Elen, Jan

2012-08-01

The aim of this research was to compare the effectiveness of hierarchical sequencing (HS) versus elaboration sequencing (ES) models in improving educational outcome of clinical knowledge when using instructional multimedia programs in postgraduate orthodontic training. Twenty-four postgraduate and 24 undergraduate dental students participated in this study. The postgraduates were following an orthodontic speciality training programme. The undergraduates were fourth- and fifth-year dental students. Twelve instructional multimedia modules were developed, six logically sequenced (LS) discussing six different orthodontic topics. Another six modules on identical topics were sequenced according to one macro-sequencing (MS) model. The implemented MS model was either HS or ES. The only difference between LS and MS modules was the adopted sequencing model. All participants were assigned into consistent pairs of students and were randomly divided into a test and a control group. In each pair, one student studied the LS module (control group) while the other studied the MS version (test group). Pre- and post-evaluation tests of each pair of participants were performed to measure knowledge, understanding and application of each participant with regard to the discussed topic. A multilevel analysis was conducted to assess the estimated effect of the different sequencing models. The level of significance was set at 0.05. At baseline, no significant differences (P > 0.05) were found in pre-test scores between groups. The HS model showed a significant effect on the scores achieved (P = 0.05). The test group showed a significantly higher estimated probability of correct answers to the questions (P = 0.003) when applying the HS model. The HS model may improve educational outcome when using instructional multimedia programs in postgraduate orthodontic training.
Metagenome assembly through clustering of next-generation sequencing data using protein sequences.

PubMed

Sim, Mikang; Kim, Jaebum

2015-02-01

The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.
Multi-locus and long amplicon sequencing approach to study microbial diversity at species level using the MinION™ portable nanopore sequencer

PubMed Central

Sanz, Yolanda

2017-01-01

Abstract The miniaturized and portable DNA sequencer MinION™ has demonstrated great potential in different analyses such as genome-wide sequencing, pathogen outbreak detection and surveillance, human genome variability, and microbial diversity. In this study, we tested the ability of the MinION™ platform to perform long amplicon sequencing in order to design new approaches to study microbial diversity using a multi-locus approach. After compiling a robust database by parsing and extracting the rrn bacterial region from more than 67000 complete or draft bacterial genomes, we demonstrated that the data obtained during sequencing of the long amplicon in the MinION™ device using R9 and R9.4 chemistries were sufficient to study 2 mock microbial communities in a multiplex manner and to almost completely reconstruct the microbial diversity contained in the HM782D and D6305 mock communities. Although nanopore-based sequencing produces reads with lower per-base accuracy compared with other platforms, we presented a novel approach consisting of multi-locus and long amplicon sequencing using the MinION™ MkIb DNA sequencer and R9 and R9.4 chemistries that help to overcome the main disadvantage of this portable sequencing platform. Furthermore, the nanopore sequencing library, constructed with the last releases of pore chemistry (R9.4) and sequencing kit (SQK-LSK108), permitted the retrieval of the higher level of 1D read accuracy sufficient to characterize the microbial species present in each mock community analysed. Improvements in nanopore chemistry, such as minimizing base-calling errors and new library protocols able to produce rapid 1D libraries, will provide more reliable information in the near future. Such data will be useful for more comprehensive and faster specific detection of microbial species and strains in complex ecosystems. PMID:28605506
Distress vocalization sequences broadcasted by bats carry redundant information.

PubMed

Hechavarría, Julio C; Beetz, M Jerome; Macias, Silvio; Kössl, Manfred

2016-07-01

Distress vocalizations (also known as alarm or screams) are an important component of the vocal repertoire of a number of animal species, including bats, humans, monkeys and birds, among others. Although the behavioral relevance of distress vocalizations is undeniable, at present, little is known about the rules that govern vocalization production when in alarmful situations. In this article, we show that when distressed, bats of the species Carollia perspicillata produce repetitive vocalization sequences in which consecutive syllables are likely to be similar to one another regarding their physical attributes. The uttered distress syllables are broadband (12-73 kHz) with most of their energy focussing at 23 kHz. Distress syllables are short (~4 ms), their average sound pressure level is close to 70 dB SPL, and they are produced at high repetition rates (every 14 ms). We discuss that, because of their physical attributes, bat distress vocalizations could serve a dual purpose: (1) advertising threatful situations to conspecifics, and (2) informing the threatener that the bats are ready to defend themselves. We also discuss possible advantages of advertising danger/discomfort using repetitive utterances, a calling strategy that appears to be ubiquitous across the animal kingdom.
The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences.

PubMed

Fourment, Mathieu; Gibbs, Mark J

2008-02-05

Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically.
GOLabeler: Improving Sequence-based Large-scale Protein Function Prediction by Learning to Rank.

PubMed

You, Ronghui; Zhang, Zihan; Xiong, Yi; Sun, Fengzhu; Mamitsuka, Hiroshi; Zhu, Shanfeng

2018-03-07

Gene Ontology (GO) has been widely used to annotate functions of proteins and understand their biological roles. Currently only <1% of more than 70 million proteins in UniProtKB have experimental GO annotations, implying the strong necessity of automated function prediction (AFP) of proteins, where AFP is a hard multilabel classification problem due to one protein with a diverse number of GO terms. Most of these proteins have only sequences as input information, indicating the importance of sequence-based AFP (SAFP: sequences are the only input). Furthermore homology-based SAFP tools are competitive in AFP competitions, while they do not necessarily work well for so-called difficult proteins, which have <60% sequence identity to proteins with annotations already. Thus the vital and challenging problem now is how to develop a method for SAFP, particularly for difficult proteins. The key of this method is to extract not only homology information but also diverse, deep- rooted information/evidence from sequence inputs and integrate them into a predictor in a both effective and efficient manner. We propose GOLabeler, which integrates five component classifiers, trained from different features, including GO term frequency, sequence alignment, amino acid trigram, domains and motifs, and biophysical properties, etc., in the framework of learning to rank (LTR), a paradigm of machine learning, especially powerful for multilabel classification. The empirical results obtained by examining GOLabeler extensively and thoroughly by using large-scale datasets revealed numerous favorable aspects of GOLabeler, including significant performance advantage over state-of-the-art AFP methods. http://datamining-iip.fudan.edu.cn/golabeler. zhusf@fudan.edu.cn. Supplementary data are available at Bioinformatics online.
Always look on both sides: Phylogenetic information conveyed by simple sequence repeat allele sequences

USDA-ARS?s Scientific Manuscript database

Simple sequence repeat (SSR) markers are widely used tools for inferences about genetic diversity, phylogeography and spatial genetic structure. Their applications assume that variation among alleles is essentially caused by an expansion or contraction of the number of repeats and that, accessorily,...
Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization.

PubMed

Bauer, Markus; Klau, Gunnar W; Reinert, Knut

2007-07-27

The discovery of functional non-coding RNA sequences has led to an increasing interest in algorithms related to RNA analysis. Traditional sequence alignment algorithms, however, fail at computing reliable alignments of low-homology RNA sequences. The spatial conformation of RNA sequences largely determines their function, and therefore RNA alignment algorithms have to take structural information into account. We present a graph-based representation for sequence-structure alignments, which we model as an integer linear program (ILP). We sketch how we compute an optimal or near-optimal solution to the ILP using methods from combinatorial optimization, and present results on a recently published benchmark set for RNA alignments. The implementation of our algorithm yields better alignments in terms of two published scores than the other programs that we tested: This is especially the case with an increasing number of input sequences. Our program LARA is freely available for academic purposes from http://www.planet-lisa.net.
Plastid, nuclear and reverse transcriptase sequences in the mitochondrial genome of Oenothera: is genetic information transferred between organelles via RNA?

PubMed Central

Schuster, W; Brennicke, A

1987-01-01

We describe an open reading frame (ORF) with high homology to reverse transcriptase in the mitochondrial genome of Oenothera. This ORF displays all the characteristics of an active plant mitochondrial gene with a possible ribosome binding site and 39% T in the third codon position. It is located between a sequence fragment from the plastid genome and one of nuclear origin downstream from the gene encoding subunit 5 of the NADH dehydrogenase. The nuclear derived sequence consists of 528 nucleotides from the small ribosomal RNA and contains an expansion segment unique to nuclear rRNAs. The plastid sequence contains part of the ribosomal protein S4 and the complete tRNA(Ser). The observation that only transcribed sequences have been found i more than one subcellular compartment in higher plants suggests that interorganellar transfer of genetic information may occur via RNA and subsequent local reverse transcription and genomic integration. PMID:14650433
Who decides and what are people willing-to-pay for whole genome sequencing information?

PubMed Central

Marshall, DA; Gonzalez, JM; Johnson, FR; MacDonald, KV; Pugh, A; Douglas, MP; Phillips, KA

2016-01-01

PURPOSE Whole genome sequencing (WGS) can be used as a powerful diagnostic tool which could also be used for screening but may generate anxiety, unnecessary testing and overtreatment. Current guidelines suggest reporting clinically actionable secondary findings when diagnostic testing is performed. We estimated preferences for receiving WGS results. METHODS A US nationally representative survey (n=410 adults) was used to rank preferences for who decides (expert panel, your doctor, you) which WGS results are reported. We estimated the value of information about variants with varying levels of clinical usefulness using willingness-to-pay contingent valuation questions. RESULTS 43% preferred to decide themselves what information is included in the WGS report. 38% (95% CI:33–43%) would not pay for actionable variants, and 3% (95% CI:1–5%) would pay more than $1000. 55% (95% CI:50–60%) would not pay for variants in which medical treatment is currently unclear, and 7% (95% CI:5–9%) would pay more than $400. CONCLUSION Most people prefer to decide what WGS results are reported. Despite valuing actionable information more, some respondents perceive that genetic information could negatively impact them. Preference heterogeneity for WGS information should be considered in the development of policies, particularly to integrate patient preferences with personalized medicine and shared decision making. PMID:27253734
Information Entropy Analysis of the H1N1 Genetic Code

NASA Astrophysics Data System (ADS)

Martwick, Andy

2010-03-01

During the current H1N1 pandemic, viral samples are being obtained from large numbers of infected people world-wide and are being sequenced on the NCBI Influenza Virus Resource Database. The information entropy of the sequences was computed from the probability of occurrence of each nucleotide base at every position of each set of sequences using Shannon's definition of information entropy, [ H=∑bpb,2( 1pb ) ] where H is the observed information entropy at each nucleotide position and pb is the probability of the base pair of the nucleotides A, C, G, U. Information entropy of the current H1N1 pandemic is compared to reference human and swine H1N1 entropy. As expected, the current H1N1 entropy is in a low entropy state and has a very large mutation potential. Using the entropy method in mature genes we can identify low entropy regions of nucleotides that generally correlate to critical protein function.
Phylogenetic Placement of Exact Amplicon Sequences Improves Associations with Clinical Information

PubMed Central

McDonald, Daniel; Gonzalez, Antonio; Navas-Molina, Jose A.; Jiang, Lingjing; Xu, Zhenjiang Zech; Winker, Kevin; Kado, Deborah M.; Orwoll, Eric; Manary, Mark; Mirarab, Siavash

2018-01-01

ABSTRACT Recent algorithmic advances in amplicon-based microbiome studies enable the inference of exact amplicon sequence fragments. These new methods enable the investigation of sub-operational taxonomic units (sOTU) by removing erroneous sequences. However, short (e.g., 150-nucleotide [nt]) DNA sequence fragments do not contain sufficient phylogenetic signal to reproduce a reasonable tree, introducing a barrier in the utilization of critical phylogenetically aware metrics such as Faith’s PD or UniFrac. Although fragment insertion methods do exist, those methods have not been tested for sOTUs from high-throughput amplicon studies in insertions against a broad reference phylogeny. We benchmarked the SATé-enabled phylogenetic placement (SEPP) technique explicitly against 16S V4 sequence fragments and showed that it outperforms the conceptually problematic but often-used practice of reconstructing de novo phylogenies. In addition, we provide a BSD-licensed QIIME2 plugin (https://github.com/biocore/q2-fragment-insertion) for SEPP and integration into the microbial study management platform QIITA. IMPORTANCE The move from OTU-based to sOTU-based analysis, while providing additional resolution, also introduces computational challenges. We demonstrate that one popular method of dealing with sOTUs (building a de novo tree from the short sequences) can provide incorrect results in human gut metagenomic studies and show that phylogenetic placement of the new sequences with SEPP resolves this problem while also yielding other benefits over existing methods. PMID:29719869
SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.

PubMed

Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael

2018-05-25

Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.
Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers

PubMed Central

Pabinger, Stephan; Ernst, Karina; Pulverer, Walter; Kallmeyer, Rainer; Valdes, Ana M.; Metrustry, Sarah; Katic, Denis; Nuzzo, Angelo; Kriegner, Albert; Vierlinger, Klemens; Weinhaeusel, Andreas

2016-01-01

Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely
Initial sequence characterization of the rhabdoviruses of squamate reptiles, including a novel rhabdovirus from a caiman lizard (Dracaena guianensis).

PubMed

Wellehan, James F X; Pessier, Allan P; Archer, Linda L; Childress, April L; Jacobson, Elliott R; Tesh, Robert B

2012-08-17

Rhabdoviruses infect a variety of hosts, including non-avian reptiles. Consensus PCR techniques were used to obtain partial RNA-dependent RNA polymerase gene sequence from five rhabdoviruses of South American lizards; Marco, Chaco, Timbo, Sena Madureira, and a rhabdovirus from a caiman lizard (Dracaena guianensis). The caiman lizard rhabdovirus formed inclusions in erythrocytes, which may be a route for infecting hematophagous insects. This is the first information on behavior of a rhabdovirus in squamates. We also obtained sequence from two rhabdoviruses of Australian lizards, confirming previous Charleville virus sequence and finding that, unlike a previous sequence report but in agreement with serologic reports, Almpiwar virus is clearly distinct from Charleville virus. Bayesian and maximum likelihood phylogenetic analysis revealed that most known rhabdoviruses of squamates cluster in the Almpiwar subgroup. The exception is Marco virus, which is found in the Hart Park group. Copyright © 2012 Elsevier B.V. All rights reserved.
Initial sequence characterization of the rhabdoviruses of squamate reptiles, including a novel rhabdovirus from a caiman lizard (Dracaena guianensis)

PubMed Central

Wellehan, James F.X.; Pessier, Allan P.; Archer, Linda L.; Childress, April L.; Jacobson, Elliott R.; Tesh, Robert B.

2012-01-01

Rhabdoviruses infect a variety of hosts, including non-avian reptiles. Consensus PCR techniques were used to obtain partial RNA-dependent RNA polymerase gene sequence from five rhabdoviruses of South American lizards; Marco, Chaco, Timbo, Sena Madureira, and a rhabdovirus from a caiman lizard (Dracaena guianensis). The caiman lizard rhabdovirus formed inclusions in erythrocytes, which may be a route for infecting hematophagous insects. This is the first information on behavior of a rhabdovirus in squamates. We also obtained sequence from two rhabdoviruses of Australian lizards, confirming previous Charleville virus sequence and finding that, unlike a previous sequence report but in agreement with serologic reports, Almpiwar virus is clearly distinct from Charleville virus. Bayesian and maximum likelihood phylogenetic analysis revealed that most known rhabdoviruses of squamates cluster in the Almpiwar subgroup. The exception is Marco virus, which is found in the Hart Park group. PMID:22397930
Draft genome sequences of 50 MRSA ST5 isolates obtained from a U.S. hospital

USDA-ARS?s Scientific Manuscript database

Methicillin resistant Staphylococcus aureus (MRSA) can be a commensal or pathogen in humans. Pathogenicity and disease are related to the acquisition of mobile genetic elements encoding virulence and antimicrobial resistance genes. Here, we report draft genome sequences for 50 clinical MRSA isolates...
[Isolation and identification of specific sequences correlated to cytoplasmic male sterility and fertile maintenance in cauliflower (Brassica oleracea var. botrytis)].

PubMed

Wang, Chun Guo; Chen, Xiao Qiang; Li, Hui; Zhao, Qian Cheng; Sun, De Ling; Song, Wen Qin

2008-02-01

Analysis of ISSR (Inter-Simple Sequence Repeat) and DDRT-PCR (Differential Display Reverse Transcriptase Polymerase Chain Reaction) was performed between cytoplasmic male sterility cauliflower ogura-A and its corresponding maintainer line ogura-B. Totally, 306 detectable bands were obtained by ISSR using thirty oligonucleotide primers. Commonly, six to twelve bands were produced per primer. Among all these primers only the amplification of primer ISSR3 was polymorphic, an 1100 bp specific band was only detected in maintainer line, named ISSR3(1100). Analysis of this sequence indicated that ISSR3(1100) was high homologous with the corresponding sequences of mitochondrial genome in Brassica napus and Arabidopsis thaliana,which suggested that ISSR3(1100) may derive from mitochondrial genome in cauliflower. To carry out DDRT-PCR analysis, three anchor primers and fifteen random primers were selected to combine. Totally, 1122 bands from 1 000 bp to 50 bp were detected. However, only four bands, named ogura-A 205, ogura-A383, ogura-B307 and ogura-B352, were confirmed to be different display in both lines. This result was further identified by reverse Northern dot blotting analysis. Among these four bands, ogura-A205 and ogura-A383 only express in cytoplasmic male sterility line, while ogura-B307 and ogura-B352 were only detected in maintainer line. Analysis of these sequences indicated that it was the first time that these four sequences were reported in cauliflower. Interestingly, ogura-A205 and ogura-B307 did not exhibit any similarities to other reported sequences in other species, more investigations were required to obtain further information. ogura-A383 and ogura-B352 were also two new sequences, they showed high similarities to corresponding chloroplast sequences of Arabidopsis thaliana and Brassica rapa subsp. pekinensis. So we speculated that these two sequences may derive from chloroplast genome. All these results obtained in this study offer new and
MS/MS-Assisted Design of Sequence-Controlled Synthetic Polymers for Improved Reading of Encoded Information

NASA Astrophysics Data System (ADS)

Charles, Laurence; Cavallo, Gianni; Monnier, Valérie; Oswald, Laurence; Szweda, Roza; Lutz, Jean-François

2017-06-01

In order to improve their MS/MS sequencing, structure of sequence-controlled synthetic polymers can be optimized based on considerations regarding their fragmentation behavior in collision-induced dissociation conditions, as demonstrated here for two digitally encoded polymer families. In poly(triazole amide)s, the main dissociation route proceeded via cleavage of the amide bond in each monomer, hence allowing the chains to be safely sequenced. However, a competitive cleavage of an ether bond in a tri(ethylene glycol) spacer placed between each coding moiety complicated MS/MS spectra while not bringing new structural information. Changing the tri(ethylene glycol) spacer to an alkyl group of the same size allowed this unwanted fragmentation pathway to be avoided, hence greatly simplifying the MS/MS reading step for such undecyl-based poly(triazole amide)s. In poly(alkoxyamine phosphodiester)s, a single dissociation pathway was achieved with repeating units containing an alkoxyamine linkage, which, by very low dissociation energy, made any other chemical bonds MS/MS-silent. Structure of these polymers was further tailored to enhance the stability of those precursor ions with a negatively charged phosphate group per monomer in order to improve their MS/MS readability. Increasing the size of both the alkyl coding moiety and the nitroxide spacer allowed sufficient distance between phosphate groups for all of them to be deprotonated simultaneously. Because the charge state of product ions increased with their polymerization degree, MS/MS spectra typically exhibited groups of fragments at one or the other side of the precursor ion depending on the original α or ω end-group they contain, allowing sequence reconstruction in a straightforward manner. [Figure not available: see fulltext.

Sequence Analysis and Domain Motifs in the Porcine Skin Decorin Glycosaminoglycan Chain*

PubMed Central

Zhao, Xue; Yang, Bo; Solakylidirim, Kemal; Joo, Eun Ji; Toida, Toshihiko; Higashi, Kyohei; Linhardt, Robert J.; Li, Lingyun

2013-01-01

Decorin proteoglycan is comprised of a core protein containing a single O-linked dermatan sulfate/chondroitin sulfate glycosaminoglycan (GAG) chain. Although the sequence of the decorin core protein is determined by the gene encoding its structure, the structure of its GAG chain is determined in the Golgi. The recent application of modern MS to bikunin, a far simpler chondroitin sulfate proteoglycans, suggests that it has a single or small number of defined sequences. On this basis, a similar approach to sequence the decorin of porcine skin much larger and more structurally complex dermatan sulfate/chondroitin sulfate GAG chain was undertaken. This approach resulted in information on the consistency/variability of its linkage region at the reducing end of the GAG chain, its iduronic acid-rich domain, glucuronic acid-rich domain, and non-reducing end. A general motif for the porcine skin decorin GAG chain was established. A single small decorin GAG chain was sequenced using MS/MS analysis. The data obtained in the study suggest that the decorin GAG chain has a small or a limited number of sequences. PMID:23423381
Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.

PubMed

Warris, Sven; Yalcin, Feyruz; Jackson, Katherine J L; Nap, Jan Peter

2015-01-01

To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis. With the Parallel SW Alignment Software (PaSWAS) it is possible (a) to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs) to perform high-speed sequence alignments, and (b) retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1) tag recovery in next generation sequence data and (2) isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation.
Googling DNA sequences on the World Wide Web.

PubMed

Hajibabaei, Mehrdad; Singer, Gregory A C

2009-11-10

New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bioinformatics applications. We have developed a novel algorithm and implemented it for searching species-specific genomic sequences, DNA barcodes, by using popular web-based methods such as Google. We developed an alignment independent character based algorithm based on dividing a sequence library (DNA barcodes) and query sequence to words. The actual search is conducted by conventional search tools such as freely available Google Desktop Search. We implemented our algorithm in two exemplar packages. We developed pre and post-processing software to provide customized input and output services, respectively. Our analysis of all publicly available DNA barcode sequences shows a high accuracy as well as rapid results. Our method makes use of conventional web-based technologies for specialized genetic data. It provides a robust and efficient solution for sequence search on the web. The integration of our search method for large-scale sequence libraries such as DNA barcodes provides an excellent web-based tool for accessing this information and linking it to other available categories of information on the web.
Obtaining and providing health information in the community pharmacy setting.

PubMed

Iwanowicz, Susan L; Marciniak, Macary Weck; Zeolla, Mario M

2006-06-15

Community pharmacists are a valuable information resource for patients and other healthcare providers. The advent of new information technology, most notably the Internet, coupled with the rapid availability of new healthcare information, has fueled this demand. Pharmacy students must receive training that enables them to meet this need. Community advanced pharmacy practice experiences (APPEs) provide an excellent opportunity for students to develop and master drug information skills in a real-world setting. Preceptors must ensure that students are familiar with drug information resources and can efficiently identify the most useful resource for a given topic. Students must also be trained to assess the quality of resources and use this information to effectively respond to drug or health information inquiries. This article will discuss key aspects of providing drug information in the community pharmacy setting and can serve as a guide and resource for APPE preceptors.
Obtaining and Providing Health Information in the Community Pharmacy Setting

PubMed Central

Iwanowicz, Susan L.; Marciniak, Macary Weck; Zeolla, Mario M.

2006-01-01

Community pharmacists are a valuable information resource for patients and other healthcare providers. The advent of new information technology, most notably the Internet, coupled with the rapid availability of new healthcare information, has fueled this demand. Pharmacy students must receive training that enables them to meet this need. Community advanced pharmacy practice experiences (APPEs) provide an excellent opportunity for students to develop and master drug information skills in a real-world setting. Preceptors must ensure that students are familiar with drug information resources and can efficiently identify the most useful resource for a given topic. Students must also be trained to assess the quality of resources and use this information to effectively respond to drug or health information inquiries. This article will discuss key aspects of providing drug information in the community pharmacy setting and can serve as a guide and resource for APPE preceptors. PMID:17136178
Environmental DNA sequencing primers for eutardigrades and bdelloid rotifers

PubMed Central

2009-01-01

Background The time it takes to isolate individuals from environmental samples and then extract DNA from each individual is one of the problems with generating molecular data from meiofauna such as eutardigrades and bdelloid rotifers. The lack of consistent morphological information and the extreme abundance of these classes makes morphological identification of rare, or even common cryptic taxa a large and unwieldy task. This limits the ability to perform large-scale surveys of the diversity of these organisms. Here we demonstrate a culture-independent molecular survey approach that enables the generation of large amounts of eutardigrade and bdelloid rotifer sequence data directly from soil. Our PCR primers, specific to the 18s small-subunit rRNA gene, were developed for both eutardigrades and bdelloid rotifers. Results The developed primers successfully amplified DNA of their target organism from various soil DNA extracts. This was confirmed by both the BLAST similarity searches and phylogenetic analyses. Tardigrades showed much better phylogenetic resolution than bdelloids. Both groups of organisms exhibited varying levels of endemism. Conclusion The development of clade-specific primers for characterizing eutardigrades and bdelloid rotifers from environmental samples should greatly increase our ability to characterize the composition of these taxa in environmental samples. Environmental sequencing as shown here differs from other molecular survey methods in that there is no need to pre-isolate the organisms of interest from soil in order to amplify their DNA. The DNA sequences obtained from methods that do not require culturing can be identified post-hoc and placed phylogenetically as additional closely related sequences are obtained from morphologically identified conspecifics. Our non-cultured environmental sequence based approach will be able to provide a rapid and large-scale screening of the presence, absence and diversity of Bdelloidea and Eutardigrada in
Multilevel analysis of sports video sequences

NASA Astrophysics Data System (ADS)

Han, Jungong; Farin, Dirk; de With, Peter H. N.

2006-01-01

We propose a fully automatic and flexible framework for analysis and summarization of tennis broadcast video sequences, using visual features and specific game-context knowledge. Our framework can analyze a tennis video sequence at three levels, which provides a broad range of different analysis results. The proposed framework includes novel pixel-level and object-level tennis video processing algorithms, such as a moving-player detection taking both the color and the court (playing-field) information into account, and a player-position tracking algorithm based on a 3-D camera model. Additionally, we employ scene-level models for detecting events, like service, base-line rally and net-approach, based on a number real-world visual features. The system can summarize three forms of information: (1) all court-view playing frames in a game, (2) the moving trajectory and real-speed of each player, as well as relative position between the player and the court, (3) the semantic event segments in a game. The proposed framework is flexible in choosing the level of analysis that is desired. It is effective because the framework makes use of several visual cues obtained from the real-world domain to model important events like service, thereby increasing the accuracy of the scene-level analysis. The paper presents attractive experimental results highlighting the system efficiency and analysis capabilities.
Genome Sequences of Eight Aspergillus flavus spp. and One A. parasiticus sp., Isolated From Peanut Seeds in Georgia

USDA-ARS?s Scientific Manuscript database

Aspergillus flavus and A. parasiticus fungi, carcinogen-mycotoxins producers, infect peanut seeds, causing considerable impact on both human health and the economy. Here we report 9 genome sequences of Aspergillus spp. isolated from peanut seeds. The information obtained will allow conducting biodiv...
Complementary DNA sequencing and identification of mRNAs from the venomous gland of Agkistrodon piscivorus leucostoma.

PubMed

Jia, Ying; Cantu, Bruno A; Sánchez, Elda E; Pérez, John C

2008-06-15

To advance our knowledge on the snake venom composition and transcripts expressed in venom gland at the molecular level, we constructed a cDNA library from the venom gland of Agkistrodon piscivorus leucostoma for the generation of expressed sequence tags (ESTs) database. From the randomly sequenced 2112 independent clones, we have obtained ESTs for 1309 (62%) cDNAs, which showed significant deduced amino acid sequence similarity (scores >80) to previously characterized proteins in National Center for Biotechnology Information (NCBI) database. Ribosomal proteins make up 47 clones (2%) and the remaining 756 (36%) cDNAs represent either unknown identity or show BLASTX sequence identity scores of <80 with known GenBank accessions. The most highly expressed gene encoding phospholipase A(2) (PLA(2)) accounting for 35% of A. p. leucostoma venom gland cDNAs was identified and further confirmed by crude venom applied to sodium dodecyl sulfate/polyacrylamide gel electrophoresis (SDS-PAGE) electrophoresis and protein sequencing. A total of 180 representative genes were obtained from the sequence assemblies and deposited to EST database. Clones showing sequence identity to disintegrins, thrombin-like enzymes, hemorrhagic toxins, fibrinogen clotting inhibitors and plasminogen activators were also identified in our EST database. These data can be used to develop a research program that will help us identify genes encoding proteins that are of medical importance or proteins involved in the mechanisms of the toxin venom.
Groupwise registration of cardiac perfusion MRI sequences using normalized mutual information in high dimension

NASA Astrophysics Data System (ADS)

Hamrouni, Sameh; Rougon, Nicolas; Pr"teux, Françoise

2011-03-01

In perfusion MRI (p-MRI) exams, short-axis (SA) image sequences are captured at multiple slice levels along the long-axis of the heart during the transit of a vascular contrast agent (Gd-DTPA) through the cardiac chambers and muscle. Compensating cardio-thoracic motions is a requirement for enabling computer-aided quantitative assessment of myocardial ischaemia from contrast-enhanced p-MRI sequences. The classical paradigm consists of registering each sequence frame on a reference image using some intensity-based matching criterion. In this paper, we introduce a novel unsupervised method for the spatio-temporal groupwise registration of cardiac p-MRI exams based on normalized mutual information (NMI) between high-dimensional feature distributions. Here, local contrast enhancement curves are used as a dense set of spatio-temporal features, and statistically matched through variational optimization to a target feature distribution derived from a registered reference template. The hard issue of probability density estimation in high-dimensional state spaces is bypassed by using consistent geometric entropy estimators, allowing NMI to be computed directly from feature samples. Specifically, a computationally efficient kth-nearest neighbor (kNN) estimation framework is retained, leading to closed-form expressions for the gradient flow of NMI over finite- and infinite-dimensional motion spaces. This approach is applied to the groupwise alignment of cardiac p-MRI exams using a free-form Deformation (FFD) model for cardio-thoracic motions. Experiments on simulated and natural datasets suggest its accuracy and robustness for registering p-MRI exams comprising more than 30 frames.
Comparative Sequence Analysis of Multidrug-Resistant IncA/C Plasmids from Salmonella enterica.

PubMed

Hoffmann, Maria; Pettengill, James B; Gonzalez-Escalona, Narjol; Miller, John; Ayers, Sherry L; Zhao, Shaohua; Allard, Marc W; McDermott, Patrick F; Brown, Eric W; Monday, Steven R

2017-01-01

Determinants of multidrug resistance (MDR) are often encoded on mobile elements, such as plasmids, transposons, and integrons, which have the potential to transfer among foodborne pathogens, as well as to other virulent pathogens, increasing the threats these traits pose to human and veterinary health. Our understanding of MDR among Salmonella has been limited by the lack of closed plasmid genomes for comparisons across resistance phenotypes, due to difficulties in effectively separating the DNA of these high-molecular weight, low-copy-number plasmids from chromosomal DNA. To resolve this problem, we demonstrate an efficient protocol for isolating, sequencing and closing IncA/C plasmids from Salmonella sp. using single molecule real-time sequencing on a Pacific Biosciences (Pacbio) RS II Sequencer. We obtained six Salmonella enterica isolates from poultry, representing six different serovars, each exhibiting the MDR-Ampc resistance profile. Salmonella plasmids were obtained using a modified mini preparation and transformed with Escherichia coli DH10Br. A Qiagen Large-Construct kit™ was used to recover highly concentrated and purified plasmid DNA that was sequenced using PacBio technology. These six closed IncA/C plasmids ranged in size from 104 to 191 kb and shared a stable, conserved backbone containing 98 core genes, with only six differences among those core genes. The plasmids encoded a number of antimicrobial resistance genes, including those for quaternary ammonium compounds and mercury. We then compared our six IncA/C plasmid sequences: first with 14 IncA/C plasmids derived from S. enterica available at the National Center for Biotechnology Information (NCBI), and then with an additional 38 IncA/C plasmids derived from different taxa. These comparisons allowed us to build an evolutionary picture of how antimicrobial resistance may be mediated by this common plasmid backbone. Our project provides detailed genetic information about resistance genes in
Draft genome sequences of 1 MSSA and 7 MRSA ST5 isolates obtained from California

USDA-ARS?s Scientific Manuscript database

Staphylococcus aureus is a commensal of humans that can cause a spectrum of diseases. An isolate’s capacity to cause disease is partially attributed to the acquisition of novel mobile genetic elements. This report provides the draft genome sequence of one methicillin susceptible and seven methicilli...
Use of Intragenic Sequence Ribotyping (ISR) for serotyping Salmonella obtained from poultry and their environment

USDA-ARS?s Scientific Manuscript database

BACKGROUND: The dkgB-linked ribosomal region of Salmonella enterica flanking a 5S gene shows genetic heterogeneity that distinguishes closely related serovars such as Enteritidis, Dublin, Gallinarum and Pullorum (Morales et al, 2006). We wanted to know how sequence-based ISR compared to the traditio...
Whole-Genome Sequencing in Outbreak Analysis

PubMed Central

Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.

2015-01-01

SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885
Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida

PubMed Central

Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping

2007-01-01

Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730
Does an integrated Emergency Department Information System change the sequence of clinical work? A mixed-method cross-site study.

PubMed

Callen, Joanne; Li, Ling; Georgiou, Andrew; Paoloni, Richard; Gibson, Kathryn; Li, Julie; Stewart, Michael; Braithwaite, Jeffrey; Westbrook, Johanna I

2014-12-01

(1) to describe Emergency Department (ED) physicians' and nurses' perceptions about the sequence of work related to patient management with use of an integrated Emergency Department Information System (EDIS), and (2) to measure changes in the sequence of clinician access to patient information. A mixed method study was conducted in four metropolitan EDs. Each used the same EDIS which is a module of the hospitals' enterprise-wide clinical information system composed of many components of an electronic medical record. This enabled access to clinical and management information relating to patients attending all hospitals in the region. Phase one - data were collected from ED physicians and nurses (n=97) by 69 in-depth interviews, five focus groups (28 participants), and 26 h of observations. Phase two - physicians (n=34) in one ED were observed over 2 weeks. Data included whether and what type of information was accessed from the EDIS prior to first examination of the patient. Clinicians reported, and phase 2 observations confirmed, that the integrated EDIS led to changes to the order of information access, which held implications for when tests were ordered and results accessed. Most physicians accessed patient information using EDIS prior to taking the patients' first medical history (77/116; 66.4%, 95% CI: 57.8-75.0%). Previous discharge summaries (74%) and past test results (61%) were most frequently accessed and junior doctors were more likely to access electronic past history information than their senior colleagues (χ(2)=20.717, d.f.=1, p<0.001). The integrated EDIS created new ways of working for ED clinicians. Such changes could hold positive implications for: time taken to reach a diagnosis and deliver treatments; length of stay; patient outcomes and experiences. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
NGS Catalog: A Database of Next Generation Sequencing Studies in Humans

PubMed Central

Xia, Junfeng; Wang, Qingguo; Jia, Peilin; Wang, Bing; Pao, William; Zhao, Zhongming

2015-01-01

Next generation sequencing (NGS) technologies have been rapidly applied in biomedical and biological research since its advent only a few years ago, and they are expected to advance at an unprecedented pace in the following years. To provide the research community with a comprehensive NGS resource, we have developed the database Next Generation Sequencing Catalog (NGS Catalog, http://bioinfo.mc.vanderbilt.edu/NGS/index.html), a continually updated database that collects, curates and manages available human NGS data obtained from published literature. NGS Catalog deposits publication information of NGS studies and their mutation characteristics (SNVs, small insertions/deletions, copy number variations, and structural variants), as well as mutated genes and gene fusions detected by NGS. Other functions include user data upload, NGS general analysis pipelines, and NGS software. NGS Catalog is particularly useful for investigators who are new to NGS but would like to take advantage of these powerful technologies for their own research. Finally, based on the data deposited in NGS Catalog, we summarized features and findings from whole exome sequencing, whole genome sequencing, and transcriptome sequencing studies for human diseases or traits. PMID:22517761
Protein sequences from mastodon and Tyrannosaurus rex revealed by mass spectrometry.

PubMed

Asara, John M; Schweitzer, Mary H; Freimark, Lisa M; Phillips, Matthew; Cantley, Lewis C

2007-04-13

Fossilized bones from extinct taxa harbor the potential for obtaining protein or DNA sequences that could reveal evolutionary links to extant species. We used mass spectrometry to obtain protein sequences from bones of a 160,000- to 600,000-year-old extinct mastodon (Mammut americanum) and a 68-million-year-old dinosaur (Tyrannosaurus rex). The presence of T. rex sequences indicates that their peptide bonds were remarkably stable. Mass spectrometry can thus be used to determine unique sequences from ancient organisms from peptide fragmentation patterns, a valuable tool to study the evolution and adaptation of ancient taxa from which genomic sequences are unlikely to be obtained.
MuffinInfo: HTML5-Based Statistics Extractor from Next-Generation Sequencing Data.

PubMed

Alic, Andy S; Blanquer, Ignacio

2016-09-01

Usually, the information known a priori about a newly sequenced organism is limited. Even resequencing the same organism can generate unpredictable output. We introduce MuffinInfo, a FastQ/Fasta/SAM information extractor implemented in HTML5 capable of offering insights into next-generation sequencing (NGS) data. Our new tool can run on any software or hardware environment, in command line or graphically, and in browser or standalone. It presents information such as average length, base distribution, quality scores distribution, k-mer histogram, and homopolymers analysis. MuffinInfo improves upon the existing extractors by adding the ability to save and then reload the results obtained after a run as a navigable file (also supporting saving pictures of the charts), by supporting custom statistics implemented by the user, and by offering user-adjustable parameters involved in the processing, all in one software. At the moment, the extractor works with all base space technologies such as Illumina, Roche, Ion Torrent, Pacific Biosciences, and Oxford Nanopore. Owing to HTML5, our software demonstrates the readiness of web technologies for mild intensive tasks encountered in bioinformatics.
Characterization of a new apple luteovirus identified by high-throughput sequencing.

PubMed

Liu, Huawei; Wu, Liping; Nikolaeva, Ekaterina; Peter, Kari; Liu, Zongrang; Mollov, Dimitre; Cao, Mengji; Li, Ruhui

2018-05-15

'Rapid Apple Decline' (RAD) is a newly emerging problem of young, dwarf apple trees in the Northeastern USA. The affected trees show trunk necrosis, cracking and canker before collapse in summer. In this study, we discovered and characterized a new luteovirus from apple trees in RAD-affected orchards using high-throughput sequencing (HTS) technology and subsequent Sanger sequencing. Illumina NextSeq sequencing was applied to total RNAs prepared from three diseased apple trees. Sequence reads were de novo assembled, and contigs were annotated by BLASTx. RT-PCR and 5'/3' RACE sequencing were used to obtain the complete genome of a new virus. RT-PCR was used to detect the virus. Three common apple viruses and a new luteovirus were identified from the diseased trees by HTS and RT-PCR. Sequence analyses of the complete genome of the new virus show that it is a new species of the genus Luteovirus in the family Luteoviridae. The virus is graft transmissible and detected by RT-PCR in apple trees in a couple of orchards. A new luteovirus and/or three known viruses were found to be associated with RAD. Molecular characterization of the new luteovirus provides important information for further investigation of its distribution and etiological role.

Hairpin Bisulfite Sequencing: Synchronous Methylation Analysis on Complementary DNA Strands of Individual Chromosomes.

PubMed

Giehr, Pascal; Walter, Jörn

2018-01-01

The accurate and quantitative detection of 5-methylcytosine is of great importance in the field of epigenetics. The method of choice is usually bisulfite sequencing because of the high resolution and the possibility to combine it with next generation sequencing. Nevertheless, also this method has its limitations. Following the bisulfite treatment DNA strands are no longer complementary such that in a subsequent PCR amplification the DNA methylation patterns information of only one of the two DNA strand is preserved. Several years ago Hairpin Bisulfite sequencing was developed as a method to obtain the pattern information on complementary DNA strands. The method requires fragmentation (usually by enzymatic cleavage) of genomic DNA followed by a covalent linking of both DNA strands through ligation of a short DNA hairpin oligonucleotide to both strands. The ligated covalently linked dsDNA products are then subjected to a conventional bisulfite treatment during which all unmodified cytosines are converted to uracils. During the treatment the DNA is denatured forming noncomplementary ssDNA circles. These circles serve as a template for a locus specific PCR to amplify chromosomal patterns of the region of interest. As a result one ends up with a linearized product, which contains the methylation information of both complementary DNA strands.
BAC sequencing using pooled methods.

PubMed

Saski, Christopher A; Feltus, F Alex; Parida, Laxmi; Haiminen, Niina

2015-01-01

Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.
Prefrontal neural correlates of memory for sequences.

PubMed

Averbeck, Bruno B; Lee, Daeyeol

2007-02-28

The sequence of actions appropriate to solve a problem often needs to be discovered by trial and error and recalled in the future when faced with the same problem. Here, we show that when monkeys had to discover and then remember a sequence of decisions across trials, ensembles of prefrontal cortex neurons reflected the sequence of decisions the animal would make throughout the interval between trials. This signal could reflect either an explicit memory process or a sequence-planning process that begins far in advance of the actual sequence execution. This finding extended to error trials such that, when the neural activity during the intertrial interval specified the wrong sequence, the animal also attempted to execute an incorrect sequence. More specifically, we used a decoding analysis to predict the sequence the monkey was planning to execute at the end of the fore-period, just before sequence execution. When this analysis was applied to error trials, we were able to predict where in the sequence the error would occur, up to three movements into the future. This suggests that prefrontal neural activity can retain information about sequences between trials, and that regardless of whether information is remembered correctly or incorrectly, the prefrontal activity veridically reflects the animal's action plan.
Draft genome sequences of two closely related aflatoxigenic Aspergillus species obtained from the Ivory Coast

USDA-ARS?s Scientific Manuscript database

The genomes of the A. ochraceoroseus and A. rambellii type strains were sequenced using a personal genome machine, followed by annotation of their genes. The genome size for A. ochraceoroseus was found to be approximately 23 Mb and contained 7,837 genes, while the A. rambellii genome was found to be...
Automatic summarization of changes in biological image sequences using algorithmic information theory.

PubMed

Cohen, Andrew R; Bjornsson, Christopher S; Temple, Sally; Banker, Gary; Roysam, Badrinath

2009-08-01

An algorithmic information-theoretic method is presented for object-level summarization of meaningful changes in image sequences. Object extraction and tracking data are represented as an attributed tracking graph (ATG). Time courses of object states are compared using an adaptive information distance measure, aided by a closed-form multidimensional quantization. The notion of meaningful summarization is captured by using the gap statistic to estimate the randomness deficiency from algorithmic statistics. The summary is the clustering result and feature subset that maximize the gap statistic. This approach was validated on four bioimaging applications: 1) It was applied to a synthetic data set containing two populations of cells differing in the rate of growth, for which it correctly identified the two populations and the single feature out of 23 that separated them; 2) it was applied to 59 movies of three types of neuroprosthetic devices being inserted in the brain tissue at three speeds each, for which it correctly identified insertion speed as the primary factor affecting tissue strain; 3) when applied to movies of cultured neural progenitor cells, it correctly distinguished neurons from progenitors without requiring the use of a fixative stain; and 4) when analyzing intracellular molecular transport in cultured neurons undergoing axon specification, it automatically confirmed the role of kinesins in axon specification.
The Cucurbitaceae of India: Accepted names, synonyms, geographic distribution, and information on images and DNA sequences

PubMed Central

Renner, Susanne S.; Pandey, Arun K.

2013-01-01

Abstract The most recent critical checklists of the Cucurbitaceae of India are 30 years old. Since then, botanical exploration, online availability of specimen images and taxonomic literature, and molecular-phylogenetic studies have led to modified taxon boundaries and geographic ranges. We present a checklist of the Cucurbitaceae of India that treats 400 relevant names and provides information on the collecting locations and herbaria for all types. We accept 94 species (10 of them endemic) in 31 genera. For accepted species, we provide their geographic distribution inside and outside India, links to online images of herbarium or living specimens, and information on publicly available DNA sequences to highlight gaps in the current understanding of Indian cucurbit diversity. Of the 94 species, 79% have DNA sequences in GenBank, albeit rarely from Indian material. The most species-rich genera are Trichosanthes with 22 species, Cucumis with 11 (all but two wild), Momordica with 8, and Zehneria with 5. From an evolutionary point of view, India is of special interest because it harbors a wide range of lineages, many of them relatively old and phylogenetically isolated. Phytogeographically, the north eastern and peninsular regions are richest in species, while the Jammu Kashmir and Himachal regions have few Cucurbitaceae. Our checklist probably underestimates the true diversity of Indian Cucurbitaceae, but should help focus efforts towards the least known species and regions. PMID:23717193
Sequencing and comparative analyses of the genomes of zoysiagrasses

PubMed Central

Tanaka, Hidenori; Hirakawa, Hideki; Kosugi, Shunichi; Nakayama, Shinobu; Ono, Akiko; Watanabe, Akiko; Hashiguchi, Masatsugu; Gondo, Takahiro; Ishigaki, Genki; Muguerza, Melody; Shimizu, Katsuya; Sawamura, Noriko; Inoue, Takayasu; Shigeki, Yuichi; Ohno, Naoki; Tabata, Satoshi; Akashi, Ryo; Sato, Shusei

2016-01-01

Zoysia is a warm-season turfgrass, which comprises 11 allotetraploid species (2n = 4x = 40), each possessing different morphological and physiological traits. To characterize the genetic systems of Zoysia plants and to analyse their structural and functional differences in individual species and accessions, we sequenced the genomes of Zoysia species using HiSeq and MiSeq platforms. As a reference sequence of Zoysia species, we generated a high-quality draft sequence of the genome of Z. japonica accession ‘Nagirizaki’ (334 Mb) in which 59,271 protein-coding genes were predicted. In parallel, draft genome sequences of Z. matrella ‘Wakaba’ and Z. pacifica ‘Zanpa’ were also generated for comparative analyses. To investigate the genetic diversity among the Zoysia species, genome sequence reads of three additional accessions, Z. japonica ‘Kyoto’, Z. japonica ‘Miyagi’ and Z. matrella ‘Chiba Fair Green’, were accumulated, and aligned against the reference genome of ‘Nagirizaki’ along with those from ‘Wakaba’ and ‘Zanpa’. As a result, we detected 7,424,163 single-nucleotide polymorphisms and 852,488 short indels among these species. The information obtained in this study will be valuable for basic studies on zoysiagrass evolution and genetics as well as for the breeding of zoysiagrasses, and is made available in the ‘Zoysia Genome Database’ at http://zoysia.kazusa.or.jp. PMID:26975196
Sequencing and comparative analyses of the genomes of zoysiagrasses.

PubMed

Tanaka, Hidenori; Hirakawa, Hideki; Kosugi, Shunichi; Nakayama, Shinobu; Ono, Akiko; Watanabe, Akiko; Hashiguchi, Masatsugu; Gondo, Takahiro; Ishigaki, Genki; Muguerza, Melody; Shimizu, Katsuya; Sawamura, Noriko; Inoue, Takayasu; Shigeki, Yuichi; Ohno, Naoki; Tabata, Satoshi; Akashi, Ryo; Sato, Shusei

2016-04-01

Zoysiais a warm-season turfgrass, which comprises 11 allotetraploid species (2n= 4x= 40), each possessing different morphological and physiological traits. To characterize the genetic systems of Zoysia plants and to analyse their structural and functional differences in individual species and accessions, we sequenced the genomes of Zoysia species using HiSeq and MiSeq platforms. As a reference sequence of Zoysia species, we generated a high-quality draft sequence of the genome of Z. japonica accession 'Nagirizaki' (334 Mb) in which 59,271 protein-coding genes were predicted. In parallel, draft genome sequences of Z. matrella 'Wakaba' and Z. pacifica 'Zanpa' were also generated for comparative analyses. To investigate the genetic diversity among the Zoysia species, genome sequence reads of three additional accessions, Z. japonica'Kyoto', Z. japonica'Miyagi' and Z. matrella'Chiba Fair Green', were accumulated, and aligned against the reference genome of 'Nagirizaki' along with those from 'Wakaba' and 'Zanpa'. As a result, we detected 7,424,163 single-nucleotide polymorphisms and 852,488 short indels among these species. The information obtained in this study will be valuable for basic studies on zoysiagrass evolution and genetics as well as for the breeding of zoysiagrasses, and is made available in the 'Zoysia Genome Database' at http://zoysia.kazusa.or.jp. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
[Study on ITS sequences of Aconitum vilmorinianum and its medicinal adulterant].

PubMed

Zhang, Xiao-nan; Du, Chun-hua; Fu, De-huan; Gao, Li; Zhou, Pei-jun; Wang, Li

2012-09-01

To analyze and compare the ITS sequences of Aconitum vilmorinianum and its medicinal adulterant Aconitum austroyunnanense. Total genomic DNA were extracted from sample materials by improved CTAB method, ITS sequences of samples were amplified using PCR systems, directly sequenced and analyzed using software DNAStar, ClustalX1.81 and MEGA 4.0. 299 consistent sites, 19 variable sites and 13 informative sites were found in ITS1 sequences, 162 consistent sites, 2 variable sites and 1 informative sites were found in 5.8S sequences, 217 consistent sites, 3 variable sites and 1 informative site were found in ITS2 sequences. Base transition and transversion was not found only in 5.8S sequences, 2 sites transition and 1 site transversion were found in ITS1 sequences, only 1 site transversion was found in ITS2 sequences comparting the ITS sequences data matrix. By analyzing the ITS sequences data matrix from 2 population of Aconitum vilmorinianum and 3 population of Aconitum austroyunnanense, we found a stable informative site at the 596th base in ITS2 sequences, in all the samples of Aconitum vilmorinianum the base was C, and in all the samples of Aconitum austroyunnanense the base was A. Aconitum vilmorinianum and Aconitum austroyunnanense can be identified by their characters of ITS sequences, and the variable sites in ITS1 sequences are more than in ITS2 sequences.
Progressive structure-based alignment of homologous proteins: Adopting sequence comparison strategies.

PubMed

Joseph, Agnel Praveen; Srinivasan, Narayanaswamy; de Brevern, Alexandre G

2012-09-01

Comparison of multiple protein structures has a broad range of applications in the analysis of protein structure, function and evolution. Multiple structure alignment tools (MSTAs) are necessary to obtain a simultaneous comparison of a family of related folds. In this study, we have developed a method for multiple structure comparison largely based on sequence alignment techniques. A widely used Structural Alphabet named Protein Blocks (PBs) was used to transform the information on 3D protein backbone conformation as a 1D sequence string. A progressive alignment strategy similar to CLUSTALW was adopted for multiple PB sequence alignment (mulPBA). Highly similar stretches identified by the pairwise alignments are given higher weights during the alignment. The residue equivalences from PB based alignments are used to obtain a three dimensional fit of the structures followed by an iterative refinement of the structural superposition. Systematic comparisons using benchmark datasets of MSTAs underlines that the alignment quality is better than MULTIPROT, MUSTANG and the alignments in HOMSTRAD, in more than 85% of the cases. Comparison with other rigid-body and flexible MSTAs also indicate that mulPBA alignments are superior to most of the rigid-body MSTAs and highly comparable to the flexible alignment methods. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
Correction of projective distortion in long-image-sequence mosaics without prior information

NASA Astrophysics Data System (ADS)

Yang, Chenhui; Mao, Hongwei; Abousleman, Glen; Si, Jennie

2010-04-01

Image mosaicking is the process of piecing together multiple video frames or still images from a moving camera to form a wide-area or panoramic view of the scene being imaged. Mosaics have widespread applications in many areas such as security surveillance, remote sensing, geographical exploration, agricultural field surveillance, virtual reality, digital video, and medical image analysis, among others. When mosaicking a large number of still images or video frames, the quality of the resulting mosaic is compromised by projective distortion. That is, during the mosaicking process, the image frames that are transformed and pasted to the mosaic become significantly scaled down and appear out of proportion with respect to the mosaic. As more frames continue to be transformed, important target information in the frames can be lost since the transformed frames become too small, which eventually leads to the inability to continue further. Some projective distortion correction techniques make use of prior information such as GPS information embedded within the image, or camera internal and external parameters. Alternatively, this paper proposes a new algorithm to reduce the projective distortion without using any prior information whatsoever. Based on the analysis of the projective distortion, we approximate the projective matrix that describes the transformation between image frames using an affine model. Using singular value decomposition, we can deduce the affine model scaling factor that is usually very close to 1. By resetting the image scale of the affine model to 1, the transformed image size remains unchanged. Even though the proposed correction introduces some error in the image matching, this error is typically acceptable and more importantly, the final mosaic preserves the original image size after transformation. We demonstrate the effectiveness of this new correction algorithm on two real-world unmanned air vehicle (UAV) sequences. The proposed method is
The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences

PubMed Central

Fourment, Mathieu; Gibbs, Mark J

2008-01-01

Background Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. Results The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. Conclusion VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically. PMID:18251994
From cultured to uncultured genome sequences: metagenomics and modeling microbial ecosystems.

PubMed

Garza, Daniel R; Dutilh, Bas E

2015-11-01

Microorganisms and the viruses that infect them are the most numerous biological entities on Earth and enclose its greatest biodiversity and genetic reservoir. With strength in their numbers, these microscopic organisms are major players in the cycles of energy and matter that sustain all life. Scientists have only scratched the surface of this vast microbial world through culture-dependent methods. Recent developments in generating metagenomes, large random samples of nucleic acid sequences isolated directly from the environment, are providing comprehensive portraits of the composition, structure, and functioning of microbial communities. Moreover, advances in metagenomic analysis have created the possibility of obtaining complete or nearly complete genome sequences from uncultured microorganisms, providing important means to study their biology, ecology, and evolution. Here we review some of the recent developments in the field of metagenomics, focusing on the discovery of genetic novelty and on methods for obtaining uncultured genome sequences, including through the recycling of previously published datasets. Moreover we discuss how metagenomics has become a core scientific tool to characterize eco-evolutionary patterns of microbial ecosystems, thus allowing us to simultaneously discover new microbes and study their natural communities. We conclude by discussing general guidelines and challenges for modeling the interactions between uncultured microorganisms and viruses based on the information contained in their genome sequences. These models will significantly advance our understanding of the functioning of microbial ecosystems and the roles of microbes in the environment.
Single nucleotide polymorphism analysis of Korean native chickens using next generation sequencing data.

PubMed

Seo, Dong-Won; Oh, Jae-Don; Jin, Shil; Song, Ki-Duk; Park, Hee-Bok; Heo, Kang-Nyeong; Shin, Younhee; Jung, Myunghee; Park, Junhyung; Jo, Cheorun; Lee, Hak-Kyo; Lee, Jun-Heon

2015-02-01

There are five native chicken lines in Korea, which are mainly classified by plumage colors (black, white, red, yellow, gray). These five lines are very important genetic resources in the Korean poultry industry. Based on a next generation sequencing technology, whole genome sequence and reference assemblies were performed using Gallus_gallus_4.0 (NCBI) with whole genome sequences from these lines to identify common and novel single nucleotide polymorphisms (SNPs). We obtained 36,660,731,136 ± 1,257,159,120 bp of raw sequence and average 26.6-fold of 25-29 billion reference assembly sequences representing 97.288 % coverage. Also, 4,006,068 ± 97,534 SNPs were observed from 29 autosomes and the Z chromosome and, of these, 752,309 SNPs are the common SNPs across lines. Among the identified SNPs, the number of novel- and known-location assigned SNPs was 1,047,951 ± 14,956 and 2,948,648 ± 81,414, respectively. The number of unassigned known SNPs was 1,181 ± 150 and unassigned novel SNPs was 8,238 ± 1,019. Synonymous SNPs, non-synonymous SNPs, and SNPs having character changes were 26,266 ± 1,456, 11,467 ± 604, 8,180 ± 458, respectively. Overall, 443,048 ± 26,389 SNPs in each bird were identified by comparing with dbSNP in NCBI. The presently obtained genome sequence and SNP information in Korean native chickens have wide applications for further genome studies such as genetic diversity studies to detect causative mutations for economic and disease related traits.
SOMKE: kernel density estimation over data streams by sequences of self-organizing maps.

PubMed

Cao, Yuan; He, Haibo; Man, Hong

2012-08-01

In this paper, we propose a novel method SOMKE, for kernel density estimation (KDE) over data streams based on sequences of self-organizing map (SOM). In many stream data mining applications, the traditional KDE methods are infeasible because of the high computational cost, processing time, and memory requirement. To reduce the time and space complexity, we propose a SOM structure in this paper to obtain well-defined data clusters to estimate the underlying probability distributions of incoming data streams. The main idea of this paper is to build a series of SOMs over the data streams via two operations, that is, creating and merging the SOM sequences. The creation phase produces the SOM sequence entries for windows of the data, which obtains clustering information of the incoming data streams. The size of the SOM sequences can be further reduced by combining the consecutive entries in the sequence based on the measure of Kullback-Leibler divergence. Finally, the probability density functions over arbitrary time periods along the data streams can be estimated using such SOM sequences. We compare SOMKE with two other KDE methods for data streams, the M-kernel approach and the cluster kernel approach, in terms of accuracy and processing time for various stationary data streams. Furthermore, we also investigate the use of SOMKE over nonstationary (evolving) data streams, including a synthetic nonstationary data stream, a real-world financial data stream and a group of network traffic data streams. The simulation results illustrate the effectiveness and efficiency of the proposed approach.
Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data.

PubMed Central

Drummond, Alexei J; Nicholls, Geoff K; Rodrigo, Allen G; Solomon, Wiremu

2002-01-01

Molecular sequences obtained at different sampling times from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly available with modern sequencing technology. Here, we present a Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration. The Kingman coalescent model is used to describe the time structure of the ancestral tree. We recover information about the unknown true ancestral coalescent tree, population size, and the overall mutation rate from temporally spaced data, that is, from nucleotide sequences gathered at different times, from different individuals, in an evolving haploid population. We briefly discuss the methodological implications and show what can be inferred, in various practically relevant states of prior knowledge. We develop extensions for exponentially growing population size and joint estimation of substitution model parameters. We illustrate some of the important features of this approach on a genealogy of HIV-1 envelope (env) partial sequences. PMID:12136032
Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data.

PubMed

Drummond, Alexei J; Nicholls, Geoff K; Rodrigo, Allen G; Solomon, Wiremu

2002-07-01

Molecular sequences obtained at different sampling times from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly available with modern sequencing technology. Here, we present a Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration. The Kingman coalescent model is used to describe the time structure of the ancestral tree. We recover information about the unknown true ancestral coalescent tree, population size, and the overall mutation rate from temporally spaced data, that is, from nucleotide sequences gathered at different times, from different individuals, in an evolving haploid population. We briefly discuss the methodological implications and show what can be inferred, in various practically relevant states of prior knowledge. We develop extensions for exponentially growing population size and joint estimation of substitution model parameters. We illustrate some of the important features of this approach on a genealogy of HIV-1 envelope (env) partial sequences.
Some identities of generalized Fibonacci sequence

NASA Astrophysics Data System (ADS)

Chong, Chin-Yoon; Cheah, C. L.; Ho, C. K.

2014-07-01

We introduced the generalized Fibonacci sequence {Un} defined by U0 = 0, U1 = 1, and Un+2 = pUn+1+qUn for all p, q∈Z+ and for all non-negative integers n. In this paper, we obtained some recursive formulas of the sequence.
Collaborative Filtering Recommendation on Users' Interest Sequences.

PubMed

Cheng, Weijie; Yin, Guisheng; Dong, Yuxin; Dong, Hongbin; Zhang, Wansong

2016-01-01

As an important factor for improving recommendations, time information has been introduced to model users' dynamic preferences in many papers. However, the sequence of users' behaviour is rarely studied in recommender systems. Due to the users' unique behavior evolution patterns and personalized interest transitions among items, users' similarity in sequential dimension should be introduced to further distinguish users' preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users' interest sequences (IS) that rank users' ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users' longest common sub-IS (LCSIS) and the count of users' total common sub-IS (ACSIS). Then, these semantics are utilized to obtain users' IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users' preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction.
Collaborative Filtering Recommendation on Users’ Interest Sequences

PubMed Central

Cheng, Weijie; Yin, Guisheng; Dong, Yuxin; Dong, Hongbin; Zhang, Wansong

2016-01-01

As an important factor for improving recommendations, time information has been introduced to model users’ dynamic preferences in many papers. However, the sequence of users’ behaviour is rarely studied in recommender systems. Due to the users’ unique behavior evolution patterns and personalized interest transitions among items, users’ similarity in sequential dimension should be introduced to further distinguish users’ preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users’ interest sequences (IS) that rank users’ ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users’ longest common sub-IS (LCSIS) and the count of users’ total common sub-IS (ACSIS). Then, these semantics are utilized to obtain users’ IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users’ preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction. PMID:27195787

Integrating biogeochemistry with multiomic sequence information in a model oxygen minimum zone

PubMed Central

Hawley, Alyse K.; Katsev, Sergei; Torres-Beltran, Monica; Bhatia, Maya P.; Kheirandish, Sam; Michiels, Céline C.; Capelle, David; Lavik, Gaute; Doebeli, Michael; Crowe, Sean A.; Hallam, Steven J.

2016-01-01

Microorganisms are the most abundant lifeform on Earth, mediating global fluxes of matter and energy. Over the past decade, high-throughput molecular techniques generating multiomic sequence information (DNA, mRNA, and protein) have transformed our perception of this microcosmos, conceptually linking microorganisms at the individual, population, and community levels to a wide range of ecosystem functions and services. Here, we develop a biogeochemical model that describes metabolic coupling along the redox gradient in Saanich Inlet—a seasonally anoxic fjord with biogeochemistry analogous to oxygen minimum zones (OMZs). The model reproduces measured biogeochemical process rates as well as DNA, mRNA, and protein concentration profiles across the redox gradient. Simulations make predictions about the role of ubiquitous OMZ microorganisms in mediating carbon, nitrogen, and sulfur cycling. For example, nitrite “leakage” during incomplete sulfide-driven denitrification by SUP05 Gammaproteobacteria is predicted to support inorganic carbon fixation and intense nitrogen loss via anaerobic ammonium oxidation. This coupling creates a metabolic niche for nitrous oxide reduction that completes denitrification by currently unidentified community members. These results quantitatively improve previous conceptual models describing microbial metabolic networks in OMZs. Beyond OMZ-specific predictions, model results indicate that geochemical fluxes are robust indicators of microbial community structure and reciprocally, that gene abundances and geochemical conditions largely determine gene expression patterns. The integration of real observational data, including geochemical profiles and process rate measurements as well as metagenomic, metatranscriptomic and metaproteomic sequence data, into a biogeochemical model, as shown here, enables holistic insight into the microbial metabolic network driving nutrient and energy flow at ecosystem scales. PMID:27655888
Integrating biogeochemistry with multiomic sequence information in a model oxygen minimum zone.

PubMed

Louca, Stilianos; Hawley, Alyse K; Katsev, Sergei; Torres-Beltran, Monica; Bhatia, Maya P; Kheirandish, Sam; Michiels, Céline C; Capelle, David; Lavik, Gaute; Doebeli, Michael; Crowe, Sean A; Hallam, Steven J

2016-10-04

Microorganisms are the most abundant lifeform on Earth, mediating global fluxes of matter and energy. Over the past decade, high-throughput molecular techniques generating multiomic sequence information (DNA, mRNA, and protein) have transformed our perception of this microcosmos, conceptually linking microorganisms at the individual, population, and community levels to a wide range of ecosystem functions and services. Here, we develop a biogeochemical model that describes metabolic coupling along the redox gradient in Saanich Inlet-a seasonally anoxic fjord with biogeochemistry analogous to oxygen minimum zones (OMZs). The model reproduces measured biogeochemical process rates as well as DNA, mRNA, and protein concentration profiles across the redox gradient. Simulations make predictions about the role of ubiquitous OMZ microorganisms in mediating carbon, nitrogen, and sulfur cycling. For example, nitrite "leakage" during incomplete sulfide-driven denitrification by SUP05 Gammaproteobacteria is predicted to support inorganic carbon fixation and intense nitrogen loss via anaerobic ammonium oxidation. This coupling creates a metabolic niche for nitrous oxide reduction that completes denitrification by currently unidentified community members. These results quantitatively improve previous conceptual models describing microbial metabolic networks in OMZs. Beyond OMZ-specific predictions, model results indicate that geochemical fluxes are robust indicators of microbial community structure and reciprocally, that gene abundances and geochemical conditions largely determine gene expression patterns. The integration of real observational data, including geochemical profiles and process rate measurements as well as metagenomic, metatranscriptomic and metaproteomic sequence data, into a biogeochemical model, as shown here, enables holistic insight into the microbial metabolic network driving nutrient and energy flow at ecosystem scales.
Efficient generation of complete sequences of MDR-encoding plasmids by rapid assembly of MinION barcoding sequencing data.

PubMed

Li, Ruichao; Xie, Miaomiao; Dong, Ning; Lin, Dachuan; Yang, Xuemei; Wong, Marcus Ho Yin; Chan, Edward Wai-Chi; Chen, Sheng

2018-03-01

Multidrug resistance (MDR)-encoding plasmids are considered major molecular vehicles responsible for transmission of antibiotic resistance genes among bacteria of the same or different species. Delineating the complete sequences of such plasmids could provide valuable insight into the evolution and transmission mechanisms underlying bacterial antibiotic resistance development. However, due to the presence of multiple repeats of mobile elements, complete sequencing of MDR plasmids remains technically complicated, expensive, and time-consuming. Here, we demonstrate a rapid and efficient approach to obtaining multiple MDR plasmid sequences through the use of the MinION nanopore sequencing platform, which is incorporated in a portable device. By assembling the long sequencing reads generated by a single MinION run according to a rapid barcoding sequencing protocol, we obtained the complete sequences of 20 plasmids harbored by multiple bacterial strains. Importantly, single long reads covering a plasmid end-to-end were recorded, indicating that de novo assembly may be unnecessary if the single reads exhibit high accuracy. This workflow represents a convenient and cost-effective approach for systematic assessment of MDR plasmids responsible for treatment failure of bacterial infections, offering the opportunity to perform detailed molecular epidemiological studies to probe the evolutionary and transmission mechanisms of MDR-encoding elements.
Identification of Sequence Specificity of 5-Methylcytosine Oxidation by Tet1 Protein with High-Throughput Sequencing.

PubMed

Kizaki, Seiichiro; Chandran, Anandhakumar; Sugiyama, Hiroshi

2016-03-02

Tet (ten-eleven translocation) family proteins have the ability to oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC). However, the oxidation reaction of Tet is not understood completely. Evaluation of genomic-level epigenetic changes by Tet protein requires unbiased identification of the highly selective oxidation sites. In this study, we used high-throughput sequencing to investigate the sequence specificity of mC oxidation by Tet1. A 6.6×10(4) -member mC-containing random DNA-sequence library was constructed. The library was subjected to Tet-reactive pulldown followed by high-throughput sequencing. Analysis of the obtained sequence data identified the Tet1-reactive sequences. We identified mCpG as a highly reactive sequence of Tet1 protein. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Empirical Bayes Estimation of Coalescence Times from Nucleotide Sequence Data.

PubMed

King, Leandra; Wakeley, John

2016-09-01

We demonstrate the advantages of using information at many unlinked loci to better calibrate estimates of the time to the most recent common ancestor (TMRCA) at a given locus. To this end, we apply a simple empirical Bayes method to estimate the TMRCA. This method is both asymptotically optimal, in the sense that the estimator converges to the true value when the number of unlinked loci for which we have information is large, and has the advantage of not making any assumptions about demographic history. The algorithm works as follows: we first split the sample at each locus into inferred left and right clades to obtain many estimates of the TMRCA, which we can average to obtain an initial estimate of the TMRCA. We then use nucleotide sequence data from other unlinked loci to form an empirical distribution that we can use to improve this initial estimate. Copyright © 2016 by the Genetics Society of America.
Medical Comics as Tools to Aid in Obtaining Informed Consent for Stroke Care.

PubMed

Furuno, Yuichi; Sasajima, Hiroyasu

2015-07-01

Informed consent has now become common in medical practice. However, a gap still exists between doctors and patients in the understanding of clinical conditions. We designed medical comics about "subarachnoid hemorrhage" and "intracerebral hemorrhage" to help doctors obtain informed consent intuitively, quickly, and comprehensively.Between September 2010 and September 2012, we carried out a questionnaire survey about medical comics with the families of patients who had suffered an intracerebral or subarachnoid hemorrhage. The questionnaire consisted of 6 questions inquiring about their mental condition, reading time, usefulness of the comics in understanding brain function and anatomy, pathogenesis, doctor's explanation, and applicability of these comics.The results showed that 93.8% responders would prefer or strongly prefer the use of comics in other medical situations. When considering the level of understanding of brain function and anatomy, pathology of disease, and doctor's explanation, 81.3%, 75.0%, and 68.8% of responders, respectively, rated these comics as very useful or useful.We think that the visual and narrative illustrations in medical comics would be more helpful for patients than a lengthy explanation by a doctor. Most of the responders hoped that medical comics would be applied to other medical cases. Thus, medical comics could work as a new communication tool between doctors and patients.
Flavitrack: an annotated database of flavivirus sequences

PubMed Central

Misra, Milind

2009-01-01

Motivation Properly annotated sequence data for flaviviruses, which cause diseases, such as tick-borne encephalitis (TBE), dengue fever (DF), West Nile (WN) and yellow fever (YF), can aid in the design of antiviral drugs and vaccines to prevent their spread. Flavitrack was designed to help identify conserved sequence motifs, interpret mutational and structural data and track evolution of phenotypic properties. Summary Flavitrack contains over 590 complete flavivirus genome/protein sequences and information on known mutations and literature references. Each sequence has been manually annotated according to its date and place of isolation, phenotype and lethality. Internal tools are provided to rapidly determine relationships between viruses in Flavitrack and sequences provided by the user. Availability http://carnot.utmb.edu/flavitrack Contact chschein@utmb.edu Supplementary information http://carnot.utmb.edu/flavitrack/B1S1.html PMID:17660525
De novo transcriptome sequencing of axolotl blastema for identification of differentially expressed genes during limb regeneration

PubMed Central

2013-01-01

Background Salamanders are unique among vertebrates in their ability to completely regenerate amputated limbs through the mediation of blastema cells located at the stump ends. This regeneration is nerve-dependent because blastema formation and regeneration does not occur after limb denervation. To obtain the genomic information of blastema tissues, de novo transcriptomes from both blastema tissues and denervated stump ends of Ambystoma mexicanum (axolotls) 14 days post-amputation were sequenced and compared using Solexa DNA sequencing. Results The sequencing done for this study produced 40,688,892 reads that were assembled into 307,345 transcribed sequences. The N50 of transcribed sequence length was 562 bases. A similarity search with known proteins identified 39,200 different genes to be expressed during limb regeneration with a cut-off E-value exceeding 10-5. We annotated assembled sequences by using gene descriptions, gene ontology, and clusters of orthologous group terms. Targeted searches using these annotations showed that the majority of the genes were in the categories of essential metabolic pathways, transcription factors and conserved signaling pathways, and novel candidate genes for regenerative processes. We discovered and confirmed numerous sequences of the candidate genes by using quantitative polymerase chain reaction and in situ hybridization. Conclusion The results of this study demonstrate that de novo transcriptome sequencing allows gene expression analysis in a species lacking genome information and provides the most comprehensive mRNA sequence resources for axolotls. The characterization of the axolotl transcriptome can help elucidate the molecular mechanisms underlying blastema formation during limb regeneration. PMID:23815514
Rapid and Accurate Sequencing of Enterovirus Genomes Using MinION Nanopore Sequencer.

PubMed

Wang, Ji; Ke, Yue Hua; Zhang, Yong; Huang, Ke Qiang; Wang, Lei; Shen, Xin Xin; Dong, Xiao Ping; Xu, Wen Bo; Ma, Xue Jun

2017-10-01

Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly used to sequence various viral pathogens in many clinical situations because of its long reads, portability, real-time accessibility of sequenced data, and very low initial costs. However, information is lacking on MinION sequencing of enterovirus genomes. In this proof-of-concept study using Enterovirus 71 (EV71) and Coxsackievirus A16 (CA16) strains as examples, we established an amplicon-based whole genome sequencing method using MinION. We explored the accuracy, minimum sequencing time, discrimination and high-throughput sequencing ability of MinION, and compared its performance with Sanger sequencing. Within the first minute (min) of sequencing, the accuracy of MinION was 98.5% for the single EV71 strain and 94.12%-97.33% for 10 genetically-related CA16 strains. In as little as 14 min, 99% identity was reached for the single EV71 strain, and in 17 min (on average), 99% identity was achieved for 10 CA16 strains in a single run. MinION is suitable for whole genome sequencing of enteroviruses with sufficient accuracy and fine discrimination and has the potential as a fast, reliable and convenient method for routine use. Copyright © 2017 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.
Constructing and Modifying Sequence Statistics for relevent Using informR in 𝖱

PubMed Central

Marcum, Christopher Steven; Butts, Carter T.

2015-01-01

The informR package greatly simplifies the analysis of complex event histories in 𝖱 by providing user friendly tools to build sufficient statistics for the relevent package. Historically, building sufficient statistics to model event sequences (of the form a→b) using the egocentric generalization of Butts’ (2008) relational event framework for modeling social action has been cumbersome. The informR package simplifies the construction of the complex list of arrays needed by the rem() model fitting for a variety of cases involving egocentric event data, multiple event types, and/or support constraints. This paper introduces these tools using examples from real data extracted from the American Time Use Survey. PMID:26185488
Phylogenomics of Phrynosomatid Lizards: Conflicting Signals from Sequence Capture versus Restriction Site Associated DNA Sequencing

PubMed Central

Leaché, Adam D.; Chavez, Andreas S.; Jones, Leonard N.; Grummer, Jared A.; Gottscho, Andrew D.; Linkem, Charles W.

2015-01-01

Sequence capture and restriction site associated DNA sequencing (RADseq) are popular methods for obtaining large numbers of loci for phylogenetic analysis. These methods are typically used to collect data at different evolutionary timescales; sequence capture is primarily used for obtaining conserved loci, whereas RADseq is designed for discovering single nucleotide polymorphisms (SNPs) suitable for population genetic or phylogeographic analyses. Phylogenetic questions that span both “recent” and “deep” timescales could benefit from either type of data, but studies that directly compare the two approaches are lacking. We compared phylogenies estimated from sequence capture and double digest RADseq (ddRADseq) data for North American phrynosomatid lizards, a species-rich and diverse group containing nine genera that began diversifying approximately 55 Ma. Sequence capture resulted in 584 loci that provided a consistent and strong phylogeny using concatenation and species tree inference. However, the phylogeny estimated from the ddRADseq data was sensitive to the bioinformatics steps used for determining homology, detecting paralogs, and filtering missing data. The topological conflicts among the SNP trees were not restricted to any particular timescale, but instead were associated with short internal branches. Species tree analysis of the largest SNP assembly, which also included the most missing data, supported a topology that matched the sequence capture tree. This preferred phylogeny provides strong support for the paraphyly of the earless lizard genera Holbrookia and Cophosaurus, suggesting that the earless morphology either evolved twice or evolved once and was subsequently lost in Callisaurus. PMID:25663487
Detection of Obstacles in Monocular Image Sequences

NASA Technical Reports Server (NTRS)

Kasturi, Rangachar; Camps, Octavia

1997-01-01

The ability to detect and locate runways/taxiways and obstacles in images captured using on-board sensors is an essential first step in the automation of low-altitude flight, landing, takeoff, and taxiing phase of aircraft navigation. Automation of these functions under different weather and lighting situations, can be facilitated by using sensors of different modalities. An aircraft-based Synthetic Vision System (SVS), with sensors of different modalities mounted on-board, complements the current ground-based systems in functions such as detection and prevention of potential runway collisions, airport surface navigation, and landing and takeoff in all weather conditions. In this report, we address the problem of detection of objects in monocular image sequences obtained from two types of sensors, a Passive Millimeter Wave (PMMW) sensor and a video camera mounted on-board a landing aircraft. Since the sensors differ in their spatial resolution, and the quality of the images obtained using these sensors is not the same, different approaches are used for detecting obstacles depending on the sensor type. These approaches are described separately in two parts of this report. The goal of the first part of the report is to develop a method for detecting runways/taxiways and objects on the runway in a sequence of images obtained from a moving PMMW sensor. Since the sensor resolution is low and the image quality is very poor, we propose a model-based approach for detecting runways/taxiways. We use the approximate runway model and the position information of the camera provided by the Global Positioning System (GPS) to define regions of interest in the image plane to search for the image features corresponding to the runway markers. Once the runway region is identified, we use histogram-based thresholding to detect obstacles on the runway and regions outside the runway. This algorithm is tested using image sequences simulated from a single real PMMW image.
Health Information Obtained From the Internet and Changes in Medical Decision Making: Questionnaire Development and Cross-Sectional Survey.

PubMed

Chen, Yen-Yuan; Li, Chia-Ming; Liang, Jyh-Chong; Tsai, Chin-Chung

2018-02-12

medical decision making (P=.01), consulting with others (P<.001), and promoting self-efficacy on deliberating the online health information (P<.001) based on the online health information they obtained. Present health care professionals have a responsibility to acknowledge that patients' medical decision making may be changed based on additional online health information. Health care professionals should assist patients' medical decision making by initiating as much dialogue with patients as possible, providing credible and convincing health information to patients, and guiding patients where to look for accurate, comprehensive, and understandable online health information. By doing so, patients will avoid becoming overwhelmed with extraneous and often conflicting health information. Educational interventions to promote health information seekers' ability to identify, locate, obtain, read, understand, evaluate, and effectively use online health information are highly encouraged. ©Yen-Yuan Chen, Chia-Ming Li, Jyh-Chong Liang, Chin-Chung Tsai. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 12.02.2018.
De novo sequencing and analysis of the transcriptome of Panax ginseng in the leaf-expansion period.

PubMed

Liu, Shichao; Wang, Siming; Liu, Meichen; Yang, Fei; Zhang, Hui; Liu, Shiyang; Wang, Qun; Zhao, Yu

2016-08-01

Panax ginseng, a traditional Chinese medicine, is used worldwide for its variety of health benefits and its treatment efficacy. However, it is difficult to cultivate due to its vulnerability to environmental stresses. The present study provided the first report, to the best of our knowledge, of transcriptome analysis of ginseng at the leaf‑expansion stage. Using the Illumina sequencing platform, >40,000,000 high‑quality paired‑end reads were obtained and assembled into 100,533 unique sequences. When the sequences were searched against the publicly available National Center for Biotechnology Information protein database using The Basic Local Alignment Search Tool, 61,599 sequences exhibited similarity to known proteins. Functional annotation and classification, including use of the Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases, revealed that the activated genes in ginseng were predominantly ribonuclease‑like storage genes, environmental stress genes, pathogenesis-related genes and other antioxidant genes. A number of candidate genes in environmental stress‑associated pathways were also identified. These novel data provide useful information on the growth and development stages of ginseng, and serve as an important public information platform for further understanding of the molecular mechanisms and functional genomics of ginseng.
Simulations Using Random-Generated DNA and RNA Sequences

ERIC Educational Resources Information Center

Bryce, C. F. A.

1977-01-01

Using a very simple computer program written in BASIC, a very large number of random-generated DNA or RNA sequences are obtained. Students use these sequences to predict complementary sequences and translational products, evaluate base compositions, determine frequencies of particular triplet codons, and suggest possible secondary structures.…
Sequences characterization of microsatellite DNA sequences in Pacific abalone ( Haliotis discus hannai)

NASA Astrophysics Data System (ADS)

Li, Qi; Akihiro, Kijima

2007-01-01

The microsatellite-enriched library was constructed using magnetic bead hybridization selection method, and the microsatellite DNA sequences were analyzed in Pacific abalone Haliotis discus hannai. Three hundred and fifty white colonies were screened using PCR-based technique, and 84 clones were identified to potentially contain microsatellite repeat motif. The 84 clones were sequenced, and 42 microsatellites and 4 minisatellites with a minimum of five repeats were found (13.1% of white colonies screened). Besides the motif of CA contained in the oligoprobe, we also found other 16 types of microsatellite repeats including a dinucleotide repeat, two tetranucleotide repeats, twelve pentanucleotide repeats and a hexanucleotide repeat. According to Weber (1990), the microsatellite sequences obtained could be categorized structurally into perfect repeats (73.3%), imperfect repeats (13.3%), and compound repeats (13.4%). Among the microsatellite repeats, relatively short arrays (<20 repeats) were most abundant, accounting for 75.0%. The largest length of microsatellites was 48 repeats, and the average number of repeats was 13.4. The data on the composition and length distribution of microsatellites obtained in the present study can be useful for choosing the repeat motifs for microsatellite isolation in other abalone species.
Human Genome Sequencing in Health and Disease

PubMed Central

Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

2013-01-01

Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320
Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

PubMed Central

Laehnemann, David; Borkhardt, Arndt

2016-01-01

Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159
Integrating De Novo Transcriptome Assembly and Cloning to Obtain Chicken Ovocleidin-17 Full-Length cDNA

PubMed Central

Ning, ZhongHua; Hincke, Maxwell T.; Yang, Ning; Hou, ZhuoCheng

2014-01-01

Efficiently obtaining full-length cDNA for a target gene is the key step for functional studies and probing genetic variations. However, almost all sequenced domestic animal genomes are not ‘finished’. Many functionally important genes are located in these gapped regions. It can be difficult to obtain full-length cDNA for which only partial amino acid/EST sequences exist. In this study we report a general pipeline to obtain full-length cDNA, and illustrate this approach for one important gene (Ovocleidin-17, OC-17) that is associated with chicken eggshell biomineralization. Chicken OC-17 is one of the best candidates to control and regulate the deposition of calcium carbonate in the calcified eggshell layer. OC-17 protein has been purified, sequenced, and has had its three-dimensional structure solved. However, researchers still cannot conduct OC-17 mRNA related studies because the mRNA sequence is unknown and the gene is absent from the current chicken genome. We used RNA-Seq to obtain the entire transcriptome of the adult hen uterus, and then conducted de novo transcriptome assembling with bioinformatics analysis to obtain candidate OC-17 transcripts. Based on this sequence, we used RACE and PCR cloning methods to successfully obtain the full-length OC-17 cDNA. Temporal and spatial OC-17 mRNA expression analyses were also performed to demonstrate that OC-17 is predominantly expressed in the adult hen uterus during the laying cycle and barely at immature developmental stages. Differential uterine expression of OC-17 was observed in hens laying eggs with weak versus strong eggshell, confirming its important role in the regulation of eggshell mineralization and providing a new tool for genetic selection for eggshell quality parameters. This study is the first one to report the full-length OC-17 cDNA sequence, and builds a foundation for OC-17 mRNA related studies. We provide a general method for biologists experiencing difficulty in obtaining candidate gene full
Integrating de novo transcriptome assembly and cloning to obtain chicken Ovocleidin-17 full-length cDNA.

PubMed

Zhang, Quan; Liu, Long; Zhu, Feng; Ning, ZhongHua; Hincke, Maxwell T; Yang, Ning; Hou, ZhuoCheng

2014-01-01

Efficiently obtaining full-length cDNA for a target gene is the key step for functional studies and probing genetic variations. However, almost all sequenced domestic animal genomes are not 'finished'. Many functionally important genes are located in these gapped regions. It can be difficult to obtain full-length cDNA for which only partial amino acid/EST sequences exist. In this study we report a general pipeline to obtain full-length cDNA, and illustrate this approach for one important gene (Ovocleidin-17, OC-17) that is associated with chicken eggshell biomineralization. Chicken OC-17 is one of the best candidates to control and regulate the deposition of calcium carbonate in the calcified eggshell layer. OC-17 protein has been purified, sequenced, and has had its three-dimensional structure solved. However, researchers still cannot conduct OC-17 mRNA related studies because the mRNA sequence is unknown and the gene is absent from the current chicken genome. We used RNA-Seq to obtain the entire transcriptome of the adult hen uterus, and then conducted de novo transcriptome assembling with bioinformatics analysis to obtain candidate OC-17 transcripts. Based on this sequence, we used RACE and PCR cloning methods to successfully obtain the full-length OC-17 cDNA. Temporal and spatial OC-17 mRNA expression analyses were also performed to demonstrate that OC-17 is predominantly expressed in the adult hen uterus during the laying cycle and barely at immature developmental stages. Differential uterine expression of OC-17 was observed in hens laying eggs with weak versus strong eggshell, confirming its important role in the regulation of eggshell mineralization and providing a new tool for genetic selection for eggshell quality parameters. This study is the first one to report the full-length OC-17 cDNA sequence, and builds a foundation for OC-17 mRNA related studies. We provide a general method for biologists experiencing difficulty in obtaining candidate gene full

Whole-Genome Sequencing and Variant Analysis of Human Papillomavirus 16 Infections.

PubMed

van der Weele, Pascal; Meijer, Chris J L M; King, Audrey J

2017-10-01

Human papillomavirus (HPV) is a strongly conserved DNA virus, high-risk types of which can cause cervical cancer in persistent infections. The most common type found in HPV-attributable cancer is HPV16, which can be subdivided into four lineages (A to D) with different carcinogenic properties. Studies have shown HPV16 sequence diversity in different geographical areas, but only limited information is available regarding HPV16 diversity within a population, especially at the whole-genome level. We analyzed HPV16 major variant diversity and conservation in persistent infections and performed a single nucleotide polymorphism (SNP) comparison between persistent and clearing infections. Materials were obtained in the Netherlands from a cohort study with longitudinal follow-up for up to 3 years. Our analysis shows a remarkably large variant diversity in the population. Whole-genome sequences were obtained for 57 persistent and 59 clearing HPV16 infections, resulting in 109 unique variants. Interestingly, persistent infections were completely conserved through time. One reinfection event was identified where the initial and follow-up samples clustered differently. Non-A1/A2 variants seemed to clear preferentially ( P = 0.02). Our analysis shows that population-wide HPV16 sequence diversity is very large. In persistent infections, the HPV16 sequence was fully conserved. Sequencing can identify HPV16 reinfections, although occurrence is rare. SNP comparison identified no strongly acting effect of the viral genome affecting HPV16 infection clearance or persistence in up to 3 years of follow-up. These findings suggest the progression of an early HPV16 infection could be host related. IMPORTANCE Human papillomavirus 16 (HPV16) is the predominant type found in cervical cancer. Progression of initial infection to cervical cancer has been linked to sequence properties; however, knowledge of variants circulating in European populations, especially with longitudinal follow-up, is
Sequence History Update Tool

NASA Technical Reports Server (NTRS)

Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

2008-01-01

The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.
Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

PubMed Central

Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

2015-01-01

Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913
Effects of Sequences of Cognitions on Group Performance Over Time

PubMed Central

Molenaar, Inge; Chiu, Ming Ming

2017-01-01

Extending past research showing that sequences of low cognitions (low-level processing of information) and high cognitions (high-level processing of information through questions and elaborations) influence the likelihoods of subsequent high and low cognitions, this study examines whether sequences of cognitions are related to group performance over time; 54 primary school students (18 triads) discussed and wrote an essay about living in another country (32,375 turns of talk). Content analysis and statistical discourse analysis showed that within each lesson, groups with more low cognitions or more sequences of low cognition followed by high cognition added more essay words. Groups with more high cognitions, sequences of low cognition followed by low cognition, or sequences of high cognition followed by an action followed by low cognition, showed different words and sequences, suggestive of new ideas. The links between cognition sequences and group performance over time can inform facilitation and assessment of student discussions. PMID:28490854
Effects of Sequences of Cognitions on Group Performance Over Time.

PubMed

Molenaar, Inge; Chiu, Ming Ming

2017-04-01

Extending past research showing that sequences of low cognitions (low-level processing of information) and high cognitions (high-level processing of information through questions and elaborations) influence the likelihoods of subsequent high and low cognitions, this study examines whether sequences of cognitions are related to group performance over time; 54 primary school students (18 triads) discussed and wrote an essay about living in another country (32,375 turns of talk). Content analysis and statistical discourse analysis showed that within each lesson, groups with more low cognitions or more sequences of low cognition followed by high cognition added more essay words. Groups with more high cognitions, sequences of low cognition followed by low cognition, or sequences of high cognition followed by an action followed by low cognition, showed different words and sequences, suggestive of new ideas. The links between cognition sequences and group performance over time can inform facilitation and assessment of student discussions.
PolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes

PubMed Central

Wang, Ruijia; Nambiar, Ram; Zheng, Dinghai

2018-01-01

Abstract PolyA_DB is a database cataloging cleavage and polyadenylation sites (PASs) in several genomes. Previous versions were based mainly on expressed sequence tags (ESTs), which had a limited amount and could lead to inaccurate PAS identification due to the presence of internal A-rich sequences in transcripts. Here, we present an updated version of the database based solely on deep sequencing data. First, PASs are mapped by the 3′ region extraction and deep sequencing (3′READS) method, ensuring unequivocal PAS identification. Second, a large volume of data based on diverse biological samples increases PAS coverage by 3.5-fold over the EST-based version and provides PAS usage information. Third, strand-specific RNA-seq data are used to extend annotated 3′ ends of genes to obtain more thorough annotations of alternative polyadenylation (APA) sites. Fourth, conservation information of PAS across mammals sheds light on significance of APA sites. The database (URL: http://www.polya-db.org/v3) currently holds PASs in human, mouse, rat and chicken, and has links to the UCSC genome browser for further visualization and for integration with other genomic data. PMID:29069441
Recently published protein sequences. I.

NASA Technical Reports Server (NTRS)

Jukes, T. H.; Holmquist, R.

1972-01-01

Some polypeptide sequences that have been published in the 1972 scientific literature are listed. Only selected sequences are included. The compilation has two objectives. Current information between periods when more comprehensive compilations are published is to be assembled and the use of data that do not include arrangements of unsequenced peptides for 'maximum homology' is to be encouraged.
DNA and RNA sequencing by nanoscale reading through programmable electrophoresis and nanoelectrode-gated tunneling and dielectric detection

DOEpatents

Lee, James W.; Thundat, Thomas G.

2005-06-14

An apparatus and method for performing nucleic acid (DNA and/or RNA) sequencing on a single molecule. The genetic sequence information is obtained by probing through a DNA or RNA molecule base by base at nanometer scale as though looking through a strip of movie film. This DNA sequencing nanotechnology has the theoretical capability of performing DNA sequencing at a maximal rate of about 1,000,000 bases per second. This enhanced performance is made possible by a series of innovations including: novel applications of a fine-tuned nanometer gap for passage of a single DNA or RNA molecule; thin layer microfluidics for sample loading and delivery; and programmable electric fields for precise control of DNA or RNA movement. Detection methods include nanoelectrode-gated tunneling current measurements, dielectric molecular characterization, and atomic force microscopy/electrostatic force microscopy (AFM/EFM) probing for nanoscale reading of the nucleic acid sequences.
How the public uses social media wechat to obtain health information in china: a survey study.

PubMed

Zhang, Xingting; Wen, Dong; Liang, Jun; Lei, Jianbo

2017-07-05

On average, 570 million users, 93% in China's first-tier cities, log on to WeChat every day. WeChat has become the most widely and frequently used social media in China, and has been profoundly integrated into the daily life of many Chinese people. A variety of health-related information may be found on WeChat. The objective of this study is to understand how the general public views the impact of the rapidly emerging social media on health information acquisition. A self-administered questionnaire was designed, distributed, collected, and analyzed utilizing the online survey tool Sojump. WeChat was adopted to randomly release the questionnaires using convenience sampling and collect the results after a certain amount of time. (1) A total of 1636 questionnaires (WeChat customers) were collected from 32 provinces. (2) The primary means by which respondents received health education was via the Internet (71.79%). Baidu and WeChat were the top 2 search tools utilized (90.71% and 28.30%, respectively). Only 12.41% of respondents were satisfied with their online health information search. (3) Almost all had seen (98.35%) or read (97.68%) health information; however, only 14.43% believed that WeChat health information could improve health. Nearly one-third frequently received and read health information through WeChat. WeChat was selected (63.26%) as the most expected means for obtaining health information. (4) The major concerns regarding health information through WeChat included the following: excessively homogeneous information, the lack of a guarantee of professionalism, and the presence of advertisements. (5) Finally, the general public was most interested in individualized and interactive health information by managing clinicians, they will highly benefit from using social media rather than Internet search tools. The current state of health acquisition proves worrisome. The public has a high chance to access health information via WeChat. The growing popularity of
The challenge of obtaining information necessary for multi-criteria decision analysis implementation: the case of physiotherapy services in Canada

PubMed Central

2013-01-01

Background As fiscal constraints dominate health policy discussions across Canada and globally, priority-setting exercises are becoming more common to guide the difficult choices that must be made. In this context, it becomes highly desirable to have accurate estimates of the value of specific health care interventions. Economic evaluation is a well-accepted method to estimate the value of health care interventions. However, economic evaluation has significant limitations, which have lead to an increase in the use of Multi-Criteria Decision Analysis (MCDA). One key concern with MCDA is the availability of the information necessary for implementation. In the Fall 2011, the Canadian Physiotherapy Association embarked on a project aimed at providing a valuation of physiotherapy services that is both evidence-based and relevant to resource allocation decisions. The framework selected for this project was MCDA. We report on how we addressed the challenge of obtaining some of the information necessary for MCDA implementation. Methods MCDA criteria were selected and areas of physiotherapy practices were identified. The building up of the necessary information base was a three step process. First, there was a literature review for each practice area, on each criterion. The next step was to conduct interviews with experts in each of the practice areas to critique the results of the literature review and to fill in gaps where there was no or insufficient literature. Finally, the results of the individual interviews were validated by a national committee to ensure consistency across all practice areas and that a national level perspective is applied. Results Despite a lack of research evidence on many of the considerations relevant to the estimation of the value of physiotherapy services (the criteria), sufficient information was obtained to facilitate MCDA implementation at the local level. Conclusions The results of this research project serve two purposes: 1) a method to
A nationwide database linking information on the hosts with sequence data of their virus strains: A useful tool for the eradication of bovine viral diarrhea (BVD) in Switzerland.

PubMed

Stalder, Hanspeter; Hug, Corinne; Zanoni, Reto; Vogt, Hans-Rudolf; Peterhans, Ernst; Schweizer, Matthias; Bachofen, Claudia

2016-06-15

Pestiviruses infect a wide variety of animals of the order Artiodactyla, with bovine viral diarrhea virus (BVDV) being an economically important pathogen of livestock globally. BVDV is maintained in the cattle population by infecting fetuses early in gestation and, thus, by generating persistently infected (PI) animals that efficiently transmit the virus throughout their lifetime. In 2008, Switzerland started a national control campaign with the aim to eradicate BVDV from all bovines in the country by searching for and eliminating every PI cattle. Different from previous eradication programs, all animals of the entire population were tested for virus within one year, followed by testing each newborn calf in the subsequent four years. Overall, 3,855,814 animals were tested from 2008 through 2011, 20,553 of which returned an initial BVDV-positive result. We were able to obtain samples from at least 36% of all initially positive tested animals. We sequenced the 5' untranslated region (UTR) of more than 7400 pestiviral strains and compiled the sequence data in a database together with an array of information on the PI animals, among others, the location of the farm in which they were born, their dams, and the locations where the animals had lived. To our knowledge, this is the largest database combining viral sequences with animal data of an endemic viral disease. Using unique identification tags, the different datasets within the database were connected to run diverse molecular epidemiological analyses. The large sets of animal and sequence data made it possible to run analyses in both directions, i.e., starting from a likely epidemiological link, or starting from related sequences. We present the results of three epidemiological investigations in detail and a compilation of 122 individual investigations that show the usefulness of such a database in a country-wide BVD eradication program. Copyright © 2015 Elsevier B.V. All rights reserved.
Optimization of sequence alignment for simple sequence repeat regions.

PubMed

Jighly, Abdulqader; Hamwieh, Aladdin; Ogbonnaya, Francis C

2011-07-20

Microsatellites, or simple sequence repeats (SSRs), are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs) mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs).SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type.When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic phylogenic relationship.
Medical Comics as Tools to Aid in Obtaining Informed Consent for Stroke Care

PubMed Central

Furuno, Yuichi; Sasajima, Hiroyasu

2015-01-01

Abstract Informed consent has now become common in medical practice. However, a gap still exists between doctors and patients in the understanding of clinical conditions. We designed medical comics about “subarachnoid hemorrhage” and “intracerebral hemorrhage” to help doctors obtain informed consent intuitively, quickly, and comprehensively. Between September 2010 and September 2012, we carried out a questionnaire survey about medical comics with the families of patients who had suffered an intracerebral or subarachnoid hemorrhage. The questionnaire consisted of 6 questions inquiring about their mental condition, reading time, usefulness of the comics in understanding brain function and anatomy, pathogenesis, doctor's explanation, and applicability of these comics. The results showed that 93.8% responders would prefer or strongly prefer the use of comics in other medical situations. When considering the level of understanding of brain function and anatomy, pathology of disease, and doctor's explanation, 81.3%, 75.0%, and 68.8% of responders, respectively, rated these comics as very useful or useful. We think that the visual and narrative illustrations in medical comics would be more helpful for patients than a lengthy explanation by a doctor. Most of the responders hoped that medical comics would be applied to other medical cases. Thus, medical comics could work as a new communication tool between doctors and patients. PMID:26131830
A discrete choice experiment to obtain a tariff for valuing informal care situations measured with the CarerQol instrument.

PubMed

Hoefman, Renske J; van Exel, Job; Rose, John M; van de Wetering, E J; Brouwer, Werner B F

2014-01-01

Economic evaluations adopting a societal perspective need to include informal care whenever relevant. However, in practice, informal care is often neglected, because there are few validated instruments to measure and value informal care for inclusion in economic evaluations. The CarerQol, which is such an instrument, measures the impact of informal care on 7 important burden dimensions (CarerQol-7D) and values this in terms of general quality of life (CarerQol-VAS). The objective of the study was to calculate utility scores based on relative utility weights for the CarerQol-7D. These tariffs will facilitate inclusion of informal care in economic evaluations. The CarerQol-7D tariff was derived with a discrete choice experiment conducted as an Internet survey among the general adult population in the Netherlands (N = 992). The choice set contained 2 unlabeled alternatives described in terms of the 7 CarerQol-7D dimensions (level range: "no,"some," and "a lot"). An efficient experimental design with priors obtained from a pilot study (N = 104) was used. Data were analyzed with a panel mixed multinomial parameter model including main and interaction effects of the attributes. The utility attached to informal care situations was significantly higher when this situation was more attractive in terms of fewer problems and more fulfillment or support. The interaction term between the CarerQol-7D dimensions physical health and mental health problems also significantly explained this utility. The tariff was constructed by adding up the relative utility weights per category of all CarerQol-7D dimensions and the interaction term. We obtained a tariff providing standard utility scores for caring situations described with the CarerQol-7D. This facilitates the inclusion of informal care in economic evaluations.
Methodologic European external quality assurance for DNA sequencing: the EQUALseq program.

PubMed

Ahmad-Nejad, Parviz; Dorn-Beineke, Alexandra; Pfeiffer, Ulrike; Brade, Joachim; Geilenkeuser, Wolf-Jochen; Ramsden, Simon; Pazzagli, Mario; Neumaier, Michael

2006-04-01

DNA sequencing is a key technique in molecular diagnostics, but to date no comprehensive methodologic external quality assessment (EQA) programs have been instituted. Between 2003 and 2005, the European Union funded, as specific support actions, the EQUAL initiative to develop methodologic EQA schemes for genotyping (EQUALqual), quantitative PCR (EQUALquant), and sequencing (EQUALseq). Here we report on the results of the EQUALseq program. The participating laboratories received a 4-sample set comprising 2 DNA plasmids, a PCR product, and a finished sequencing reaction to be analyzed. Data and information from detailed questionnaires were uploaded online and evaluated by use of a scoring system for technical skills and proficiency of data interpretation. Sixty laboratories from 21 European countries registered, and 43 participants (72%) returned data and samples. Capillary electrophoresis was the predominant platform (n = 39; 91%). The median contiguous correct sequence stretch was 527 nucleotides with considerable variation in quality of both primary data and data evaluation. The association between laboratory performance and the number of sequencing assays/year was statistically significant (P <0.05). Interestingly, more than 30% of participants neither added comments to their data nor made efforts to identify the gene sequences or mutational positions. Considerable variations exist even in a highly standardized methodology such as DNA sequencing. Methodologic EQAs are appropriate tools to uncover strengths and weaknesses in both technique and proficiency, and our results emphasize the need for mandatory EQAs. The results of EQUALseq should help improve the overall quality of molecular genetics findings obtained by DNA sequencing.
Success of University Student Volunteers in Obtaining Consent for Reviewing Private Health Information for Emergency Research.

PubMed

Kramer, Adam I; Stephenson, Elizabeth; Betel, Adam; Crudden, Johanna; Boutis, Kathy

2017-01-01

This study aimed to determine the success of university student volunteers in obtaining consent from parents to allow review of their child's personal health information (PHI) for emergency research screening. This study also aimed to examine the variables associated with successful consent. This was a prospective cross-sectional study conducted at a pediatric emergency department (ED). University students, who functioned as delegates of the health information custodian, approached parents for consent. Of 2,506 parents, 1,852 (73.9%) provided consent to allow review of their child's PHI for research screening. Variables associated with successful consent were high (≥12 months) versus low (<12 months) volunteer experience (OR = 2.0), research related (vs. unrelated) to the child's chief complaint (OR = 2.0), child treated regularly by specialists at the study institution (OR = 1.7), and ED presentation mid-week vs. weekend (OR = 1.7) and morning vs. evening presentation (OR = 1.4). When approached by a university student volunteer, about 25% of parents declined to have their child's PHI reviewed for research screening. This model of obtaining consent may put some emergency research at risk for selection bias. Variables that increase the odds of successful consent can be considered in program design to improve the effectiveness of this model.
Assessing information content and interactive relationships of subgenomic DNA sequences of the MHC using complexity theory approaches based on the non-extensive statistical mechanics

NASA Astrophysics Data System (ADS)

Karakatsanis, L. P.; Pavlos, G. P.; Iliopoulos, A. C.; Pavlos, E. G.; Clark, P. M.; Duke, J. L.; Monos, D. S.

2018-09-01

This study combines two independent domains of science, the high throughput DNA sequencing capabilities of Genomics and complexity theory from Physics, to assess the information encoded by the different genomic segments of exonic, intronic and intergenic regions of the Major Histocompatibility Complex (MHC) and identify possible interactive relationships. The dynamic and non-extensive statistical characteristics of two well characterized MHC sequences from the homozygous cell lines, PGF and COX, in addition to two other genomic regions of comparable size, used as controls, have been studied using the reconstructed phase space theorem and the non-extensive statistical theory of Tsallis. The results reveal similar non-linear dynamical behavior as far as complexity and self-organization features. In particular, the low-dimensional deterministic nonlinear chaotic and non-extensive statistical character of the DNA sequences was verified with strong multifractal characteristics and long-range correlations. The nonlinear indices repeatedly verified that MHC sequences, whether exonic, intronic or intergenic include varying levels of information and reveal an interaction of the genes with intergenic regions, whereby the lower the number of genes in a region, the less the complexity and information content of the intergenic region. Finally we showed the significance of the intergenic region in the production of the DNA dynamics. The findings reveal interesting content information in all three genomic elements and interactive relationships of the genes with the intergenic regions. The results most likely are relevant to the whole genome and not only to the MHC. These findings are consistent with the ENCODE project, which has now established that the non-coding regions of the genome remain to be of relevance, as they are functionally important and play a significant role in the regulation of expression of genes and coordination of the many biological processes of the cell.
Program for Editing Spacecraft Command Sequences

NASA Technical Reports Server (NTRS)

Gladden, Roy; Waggoner, Bruce; Kordon, Mark; Hashemi, Mahnaz; Hanks, David; Salcedo, Jose

2006-01-01

Sequence Translator, Editor, and Expander Resource (STEER) is a computer program that facilitates construction of sequences and blocks of sequences (hereafter denoted generally as sequence products) for commanding a spacecraft. STEER also provides mechanisms for translating among various sequence product types and quickly expanding activities of a given sequence in chronological order for review and analysis of the sequence. To date, construction of sequence products has generally been done by use of such clumsy mechanisms as text-editor programs, translating among sequence product types has been challenging, and expanding sequences to time-ordered lists has involved arduous processes of converting sequence products to "real" sequences and running them through Class-A software (defined, loosely, as flight and ground software critical to a spacecraft mission). Also, heretofore, generating sequence products in standard formats has been troublesome because precise formatting and syntax are required. STEER alleviates these issues by providing a graphical user interface containing intuitive fields in which the user can enter the necessary information. The STEER expansion function provides a "quick and dirty" means of seeing how a sequence and sequence block would expand into a chronological list, without need to use of Class-A software.
Efficient error correction for next-generation sequencing of viral amplicons.

PubMed

Skums, Pavel; Dimitrova, Zoya; Campo, David S; Vaughan, Gilberto; Rossi, Livia; Forbi, Joseph C; Yokosawa, Jonny; Zelikovsky, Alex; Khudyakov, Yury

2012-06-25

Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses.The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm.
Sequence-Dependent Persistence Length of Long DNA

NASA Astrophysics Data System (ADS)

Chuang, Hui-Min; Reifenberger, Jeffrey G.; Cao, Han; Dorfman, Kevin D.

2017-12-01

Using a high-throughput genome-mapping approach, we obtained circa 50 million measurements of the extension of internal human DNA segments in a 41 nm ×41 nm nanochannel. The underlying DNA sequences, obtained by mapping to the reference human genome, are 2.5-393 kilobase pairs long and contain percent GC contents between 32.5% and 60%. Using Odijk's theory for a channel-confined wormlike chain, these data reveal that the DNA persistence length increases by almost 20% as the percent GC content increases. The increased persistence length is rationalized by a model, containing no adjustable parameters, that treats the DNA as a statistical terpolymer with a sequence-dependent intrinsic persistence length and a sequence-independent electrostatic persistence length.

LookSeq: a browser-based viewer for deep sequencing data.

PubMed

Manske, Heinrich Magnus; Kwiatkowski, Dominic P

2009-11-01

Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.
Detection of integrated papillomavirus sequences by ligation-mediated PCR (DIPS-PCR) and molecular characterization in cervical cancer cells.

PubMed

Luft, F; Klaes, R; Nees, M; Dürst, M; Heilmann, V; Melsheimer, P; von Knebel Doeberitz, M

2001-04-01

Human papillomavirus (HPV) genomes usually persist as episomal molecules in HPV associated preneoplastic lesions whereas they are frequently integrated into the host cell genome in HPV-related cancers cells. This suggests that malignant conversion of HPV-infected epithelia is linked to recombination of cellular and viral sequences. Due to technical limitations, precise sequence information on viral-cellular junctions were obtained only for few cell lines and primary lesions. In order to facilitate the molecular analysis of genomic HPV integration, we established a ligation-mediated PCR assay for the detection of integrated papillomavirus sequences (DIPS-PCR). DIPS-PCR was initially used to amplify genomic viral-cellular junctions from HPV-associated cervical cancer cell lines (C4-I, C4-II, SW756, and HeLa) and HPV-immortalized keratinocyte lines (HPKIA, HPKII). In addition to junctions already reported in public data bases, various new fusion fragments were identified. Subsequently, 22 different viral-cellular junctions were amplified from 17 cervical carcinomas and 1 vulval intraepithelial neoplasia (VIN III). Sequence analysis of each junction revealed that the viral E1 open reading frame (ORF) was fused to cellular sequences in 20 of 22 (91%) cases. Chromosomal integration loci mapped to chromosomes 1 (2n), 2 (3n), 7 (2n), 8 (3n), 10 (1n), 14 (5n), 16 (1n), 17 (2n), and mitochondrial DNA (1n), suggesting random distribution of chromosomal integration sites. Precise sequence information obtained by DIPS-PCR was further used to monitor the monoclonal origin of 4 cervical cancers, 1 case of recurrent premalignant lesions and 1 lymph node metastasis. Therefore, DIPS-PCR might allow efficient therapy control and prediction of relapse in patients with HPV-associated anogenital cancers. Copyright 2001 Wiley-Liss, Inc.
RNA Sequencing Analysis of the Gametophyte Transcriptome from the Liverwort, Marchantia polymorpha

PubMed Central

Sharma, Niharika; Jung, Chol-Hee; Bhalla, Prem L.; Singh, Mohan B.

2014-01-01

The liverwort Marchantia polymorpha is a member of the most basal lineage of land plants (embryophytes) and likely retains many ancestral morphological, physiological and molecular characteristics. Despite its phylogenetic importance and the availability of previous EST studies, M. polymorpha’s lack of economic importance limits accessible genomic resources for this species. We employed Illumina RNA-Seq technology to sequence the gametophyte transcriptome of M. polymorpha. cDNA libraries from 6 different male and female developmental tissues were sequenced to delineate a global view of the M. polymorpha transcriptome. Approximately 80 million short reads were obtained and assembled into a non-redundant set of 46,533 transcripts (> = 200 bp) from 46,070 loci. The average length and the N50 length of the transcripts were 757 bp and 471 bp, respectively. Sequence comparison of assembled transcripts with non-redundant proteins from embryophytes resulted in the annotation of 43% of the transcripts. The transcripts were also compared with M. polymorpha expressed sequence tags (ESTs), and approximately 69.5% of the transcripts appeared to be novel. Twenty-one percent of the transcripts were assigned GO terms to improve annotation. In addition, 6,112 simple sequence repeats (SSRs) were identified as potential molecular markers, which may be useful in studies of genetic diversity. A comparative genomics approach revealed that a substantial proportion of the genes (35.5%) expressed in M. polymorpha were conserved across phylogenetically related species, such as Selaginella and Physcomitrella, and identified 580 genes that are potentially unique to liverworts. Our study presents an extensive amount of novel sequence information for M. polymorpha. This information will serve as a valuable genomics resource for further molecular, developmental and comparative evolutionary studies, as well as for the isolation and characterization of functional genes that are involved in
High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.

PubMed

Jones, David T; Kandathil, Shaun M

2018-04-26

In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.
19 CFR 356.11 - Procedures for obtaining access to privileged information.

Code of Federal Regulations, 2013 CFR

2013-04-01

... information. 356.11 Section 356.11 Customs Duties INTERNATIONAL TRADE ADMINISTRATION, DEPARTMENT OF COMMERCE... Government. Where, in the course of a panel review, the panel has reviewed privileged information under a Protective Order for Privileged Information, and the issue to which such information pertains is relevant to...
19 CFR 356.11 - Procedures for obtaining access to privileged information.

Code of Federal Regulations, 2014 CFR

2014-04-01

... information. 356.11 Section 356.11 Customs Duties INTERNATIONAL TRADE ADMINISTRATION, DEPARTMENT OF COMMERCE... Government. Where, in the course of a panel review, the panel has reviewed privileged information under a Protective Order for Privileged Information, and the issue to which such information pertains is relevant to...
19 CFR 356.11 - Procedures for obtaining access to privileged information.

Code of Federal Regulations, 2011 CFR

2011-04-01

... information. 356.11 Section 356.11 Customs Duties INTERNATIONAL TRADE ADMINISTRATION, DEPARTMENT OF COMMERCE... Government. Where, in the course of a panel review, the panel has reviewed privileged information under a Protective Order for Privileged Information, and the issue to which such information pertains is relevant to...
Use of the Internet by patients and their families to obtain genetics-related information.

PubMed

Taylor, M R; Alman, A; Manchester, D K

2001-08-01

To characterize use of the Internet by patients and their families referred to general genetics clinics. We developed a survey to assess Internet use among patients visiting urban and rural clinics in Colorado and Wyoming. One hundred eighty-nine surveys were distributed to patients and their family members visiting outpatient general genetics clinics in spring 2000. The 8-page anonymous survey instrument asked about use of the Internet to obtain genetics-related information (GRI). All participants were asked whether a physician or health professional had referred them to the Internet for GRI. Subjects who had previously used the Internet to search for GRI were asked to rate whether they considered the GRI they encountered to be accurate, inaccurate, easy to understand, confusing, or trustworthy. One hundred fifty-seven surveys (83%) were returned (52% urban; 48% rural). Ninety (60%) of 149 respondents were at the clinic for a new-patient visit, and 59 (40%) were follow-up visits. All respondents were older than 17 years; 141 (91%) of 155 respondents were the patient's parent or guardian. Seventy-three (47%) of 155 respondents had searched the Internet for GRI prior to their clinic visit. The patients and families themselves initiated the majority of such efforts; only 8 (5%) of 148 respondents had been referred to a site on the World Wide Web by a physician. Interestingly, 136 (92%) of 147 respondents indicated that they would be likely to visit a Web site that was recommended by a geneticist. The most compelling reasons for searching the Internet for GRI were to get information in layperson's terms (60/131 [46%]); to get information about treatment (16/131 [12%]); and to get information about genetic research (16/131 [12%]). Among respondents who reported visiting GRI Web sites, 24 (41%) of 58 agreed that information was confusing or difficult to understand, 35 (53%) of 66 agreed that information was accurate and trustworthy, and 44 (77%) of 57 agreed that using
Impact of Genomic Counseling on Informed Decision-Making among ostensibly Healthy Individuals Seeking Personal Genome Sequencing: the HealthSeq Project.

PubMed

Suckiel, Sabrina A; Linderman, Michael D; Sanderson, Saskia C; Diaz, George A; Wasserstein, Melissa; Kasarskis, Andrew; Schadt, Eric E; Zinberg, Randi E

2016-10-01

Personal genome sequencing is increasingly utilized by healthy individuals for predispositional screening and other applications. However, little is known about the impact of 'genomic counseling' on informed decision-making in this context. Our primary aim was to compare measures of participants' informed decision-making before and after genomic counseling in the HealthSeq project, a longitudinal cohort study of individuals receiving personal results from whole genome sequencing (WGS). Our secondary aims were to assess the impact of the counseling on WGS knowledge and concerns, and to explore participants' satisfaction with the counseling. Questionnaires were administered to participants (n = 35) before and after their pre-test genomic counseling appointment. Informed decision-making was measured using the Decisional Conflict Scale (DCS) and the Satisfaction with Decision Scale (SDS). DCS scores decreased after genomic counseling (mean: 11.34 before vs. 5.94 after; z = -4.34, p < 0.001, r = 0.52), and SDS scores increased (mean: 27.91 vs. 29.06 respectively; z = 2.91, p = 0.004, r = 0.35). Satisfaction with counseling was high (mean (SD) = 26.91 (2.68), on a scale where 6 = low and 30 = high satisfaction). HealthSeq participants felt that their decision regarding receiving personal results from WGS was more informed after genomic counseling. Further research comparing the impact of different genomic counseling models is needed.
Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts

PubMed Central

Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming

2015-01-01

Background Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. Results We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington’s, Alzheimer’s and Parkinson’s diseases. This is the first description of degenerative disease-associated genes in jellyfish. Conclusion We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular
Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts.

PubMed

Liu, Guoyan; Zhou, Yonghong; Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming

2015-01-01

Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington's, Alzheimer's and Parkinson's diseases. This is the first description of degenerative disease-associated genes in jellyfish. We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular level and information on the underlying
Sequence Composition and Gene Content of the Short Arm of Rye (Secale cereale) Chromosome 1

PubMed Central

Fluch, Silvia; Kopecky, Dieter; Burg, Kornel; Šimková, Hana; Taudien, Stefan; Petzold, Andreas; Kubaláková, Marie; Platzer, Matthias; Berenyi, Maria; Krainer, Siegfried; Doležel, Jaroslav; Lelley, Tamas

2012-01-01

Background The purpose of the study is to elucidate the sequence composition of the short arm of rye chromosome 1 (Secale cereale) with special focus on its gene content, because this portion of the rye genome is an integrated part of several hundreds of bread wheat varieties worldwide. Methodology/Principal Findings Multiple Displacement Amplification of 1RS DNA, obtained from flow sorted 1RS chromosomes, using 1RS ditelosomic wheat-rye addition line, and subsequent Roche 454FLX sequencing of this DNA yielded 195,313,589 bp sequence information. This quantity of sequence information resulted in 0.43× sequence coverage of the 1RS chromosome arm, permitting the identification of genes with estimated probability of 95%. A detailed analysis revealed that more than 5% of the 1RS sequence consisted of gene space, identifying at least 3,121 gene loci representing 1,882 different gene functions. Repetitive elements comprised about 72% of the 1RS sequence, Gypsy/Sabrina (13.3%) being the most abundant. More than four thousand simple sequence repeat (SSR) sites mostly located in gene related sequence reads were identified for possible marker development. The existence of chloroplast insertions in 1RS has been verified by identifying chimeric chloroplast-genomic sequence reads. Synteny analysis of 1RS to the full genomes of Oryza sativa and Brachypodium distachyon revealed that about half of the genes of 1RS correspond to the distal end of the short arm of rice chromosome 5 and the proximal region of the long arm of Brachypodium distachyon chromosome 2. Comparison of the gene content of 1RS to 1HS barley chromosome arm revealed high conservation of genes related to chromosome 5 of rice. Conclusions The present study revealed the gene content and potential gene functions on this chromosome arm and demonstrated numerous sequence elements like SSRs and gene-related sequences, which can be utilised for future research as well as in breeding of wheat and rye. PMID:22328922
Alternation blindness in the representation of binary sequences.

PubMed

Yu, Ru Qi; Osherson, Daniel; Zhao, Jiaying

2018-03-01

Binary information is prevalent in the environment and contains 2 distinct outcomes. Binary sequences consist of a mixture of alternation and repetition. Understanding how people perceive such sequences would contribute to a general theory of information processing. In this study, we examined how people process alternation and repetition in binary sequences. Across 4 paradigms involving estimation, working memory, change detection, and visual search, we found that the number of alternations is underestimated compared with repetitions (Experiment 1). Moreover, recall for binary sequences deteriorates as the sequence alternates more (Experiment 2). Changes in bits are also harder to detect as the sequence alternates more (Experiment 3). Finally, visual targets superimposed on bits of a binary sequence take longer to process as alternation increases (Experiment 4). Overall, our results indicate that compared with repetition, alternation in a binary sequence is less salient in the sense of requiring more attention for successful encoding. The current study thus reveals the cognitive constraints in the representation of alternation and provides a new explanation for the overalternation bias in randomness perception. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Simultaneous virus identification and characterization of severe unexplained pneumonia cases using a metagenomics sequencing technique.

PubMed

Zou, Xiaohui; Tang, Guangpeng; Zhao, Xiang; Huang, Yan; Chen, Tao; Lei, Mingyu; Chen, Wenbing; Yang, Lei; Zhu, Wenfei; Zhuang, Li; Yang, Jing; Feng, Zhaomin; Wang, Dayan; Wang, Dingming; Shu, Yuelong

2017-03-01

Many viruses can cause respiratory diseases in humans. Although great advances have been achieved in methods of diagnosis, it remains challenging to identify pathogens in unexplained pneumonia (UP) cases. In this study, we applied next-generation sequencing (NGS) technology and a metagenomic approach to detect and characterize respiratory viruses in UP cases from Guizhou Province, China. A total of 33 oropharyngeal swabs were obtained from hospitalized UP patients and subjected to NGS. An unbiased metagenomic analysis pipeline identified 13 virus species in 16 samples. Human rhinovirus C was the virus most frequently detected and was identified in seven samples. Human measles virus, adenovirus B 55 and coxsackievirus A10 were also identified. Metagenomic sequencing also provided virus genomic sequences, which enabled genotype characterization and phylogenetic analysis. For cases of multiple infection, metagenomic sequencing afforded information regarding the quantity of each virus in the sample, which could be used to evaluate each viruses' role in the disease. Our study highlights the potential of metagenomic sequencing for pathogen identification in UP cases.
Enhanced sequencing coverage with digital droplet multiple displacement amplification

PubMed Central

Sidore, Angus M.; Lan, Freeman; Lim, Shaun W.; Abate, Adam R.

2016-01-01

Sequencing small quantities of DNA is important for applications ranging from the assembly of uncultivable microbial genomes to the identification of cancer-associated mutations. To obtain sufficient quantities of DNA for sequencing, the small amount of starting material must be amplified significantly. However, existing methods often yield errors or non-uniform coverage, reducing sequencing data quality. Here, we describe digital droplet multiple displacement amplification, a method that enables massive amplification of low-input material while maintaining sequence accuracy and uniformity. The low-input material is compartmentalized as single molecules in millions of picoliter droplets. Because the molecules are isolated in compartments, they amplify to saturation without competing for resources; this yields uniform representation of all sequences in the final product and, in turn, enhances the quality of the sequence data. We demonstrate the ability to uniformly amplify the genomes of single Escherichia coli cells, comprising just 4.7 fg of starting DNA, and obtain sequencing coverage distributions that rival that of unamplified material. Digital droplet multiple displacement amplification provides a simple and effective method for amplifying minute amounts of DNA for accurate and uniform sequencing. PMID:26704978
Reading biological processes from nucleotide sequences

NASA Astrophysics Data System (ADS)

Murugan, Anand

Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical
Dactyl Alphabet Gesture Recognition in a Video Sequence Using Microsoft Kinect

NASA Astrophysics Data System (ADS)

Artyukhin, S. G.; Mestetskiy, L. M.

2015-05-01

This paper presents an efficient framework for solving the problem of static gesture recognition based on data obtained from the web cameras and depth sensor Kinect (RGB-D - data). Each gesture given by a pair of images: color image and depth map. The database store gestures by it features description, genereated by frame for each gesture of the alphabet. Recognition algorithm takes as input a video sequence (a sequence of frames) for marking, put in correspondence with each frame sequence gesture from the database, or decide that there is no suitable gesture in the database. First, classification of the frame of the video sequence is done separately without interframe information. Then, a sequence of successful marked frames in equal gesture is grouped into a single static gesture. We propose a method combined segmentation of frame by depth map and RGB-image. The primary segmentation is based on the depth map. It gives information about the position and allows to get hands rough border. Then, based on the color image border is specified and performed analysis of the shape of the hand. Method of continuous skeleton is used to generate features. We propose a method of skeleton terminal branches, which gives the opportunity to determine the position of the fingers and wrist. Classification features for gesture is description of the position of the fingers relative to the wrist. The experiments were carried out with the developed algorithm on the example of the American Sign Language. American Sign Language gesture has several components, including the shape of the hand, its orientation in space and the type of movement. The accuracy of the proposed method is evaluated on the base of collected gestures consisting of 2700 frames.
Comparative and Joint Analysis of Two Metagenomic Datasets from a Biogas Fermenter Obtained by 454-Pyrosequencing

PubMed Central

Jaenicke, Sebastian; Ander, Christina; Bekel, Thomas; Bisdorf, Regina; Dröge, Marcus; Gartemann, Karl-Heinz; Jünemann, Sebastian; Kaiser, Olaf; Krause, Lutz; Tille, Felix; Zakrzewski, Martha; Pühler, Alfred

2011-01-01

Biogas production from renewable resources is attracting increased attention as an alternative energy source due to the limited availability of traditional fossil fuels. Many countries are promoting the use of alternative energy sources for sustainable energy production. In this study, a metagenome from a production-scale biogas fermenter was analysed employing Roche's GS FLX Titanium technology and compared to a previous dataset obtained from the same community DNA sample that was sequenced on the GS FLX platform. Taxonomic profiling based on 16S rRNA-specific sequences and an Environmental Gene Tag (EGT) analysis employing CARMA demonstrated that both approaches benefit from the longer read lengths obtained on the Titanium platform. Results confirmed Clostridia as the most prevalent taxonomic class, whereas species of the order Methanomicrobiales are dominant among methanogenic Archaea. However, the analyses also identified additional taxa that were missed by the previous study, including members of the genera Streptococcus, Acetivibrio, Garciella, Tissierella, and Gelria, which might also play a role in the fermentation process leading to the formation of methane. Taking advantage of the CARMA feature to correlate taxonomic information of sequences with their assigned functions, it appeared that Firmicutes, followed by Bacteroidetes and Proteobacteria, dominate within the functional context of polysaccharide degradation whereas Methanomicrobiales represent the most abundant taxonomic group responsible for methane production. Clostridia is the most important class involved in the reductive CoA pathway (Wood-Ljungdahl pathway) that is characteristic for acetogenesis. Based on binning of 16S rRNA-specific sequences allocated to the dominant genus Methanoculleus, it could be shown that this genus is represented by several different species. Phylogenetic analysis of these sequences placed them in close proximity to the hydrogenotrophic methanogen Methanoculleus
Detecting false positive sequence homology: a machine learning approach.

PubMed

Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Bybee, Seth M

2016-02-24

Accurate detection of homologous relationships of biological sequences (DNA or amino acid) amongst organisms is an important and often difficult task that is essential to various evolutionary studies, ranging from building phylogenies to predicting functional gene annotations. There are many existing heuristic tools, most commonly based on bidirectional BLAST searches that are used to identify homologous genes and combine them into two fundamentally distinct classes: orthologs and paralogs. Due to only using heuristic filtering based on significance score cutoffs and having no cluster post-processing tools available, these methods can often produce multiple clusters constituting unrelated (non-homologous) sequences. Therefore sequencing data extracted from incomplete genome/transcriptome assemblies originated from low coverage sequencing or produced by de novo processes without a reference genome are susceptible to high false positive rates of homology detection. In this paper we develop biologically informative features that can be extracted from multiple sequence alignments of putative homologous genes (orthologs and paralogs) and further utilized in context of guided experimentation to verify false positive outcomes. We demonstrate that our machine learning method trained on both known homology clusters obtained from OrthoDB and randomly generated sequence alignments (non-homologs), successfully determines apparent false positives inferred by heuristic algorithms especially among proteomes recovered from low-coverage RNA-seq data. Almost ~42 % and ~25 % of predicted putative homologies by InParanoid and HaMStR respectively were classified as false positives on experimental data set. Our process increases the quality of output from other clustering algorithms by providing a novel post-processing method that is both fast and efficient at removing low quality clusters of putative homologous genes recovered by heuristic-based approaches.
Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition.

PubMed

Tamura, Takeyuki; Akutsu, Tatsuya

2007-11-30

Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on http://sunflower.kuicr.kyoto-u.ac.jp/~tamura/slpfa.html.

The Sequencing of Basic Chemistry Topics by Physical Science Teachers

ERIC Educational Resources Information Center

Sibanda, Doras; Hobden, Paul

2016-01-01

The purpose of this study was to find out teachers' preferred teaching sequence for basic chemistry topics in Physical Science in South Africa, to obtain their reasons underpinning their preferred sequence, and to compare these sequences with the prescribed sequences in the current curriculum. The study was located within a pragmatic paradigm and…
Efficient error correction for next-generation sequencing of viral amplicons

PubMed Central

2012-01-01

Background Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. Results In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Conclusions Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses. The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm PMID:22759430
Elimination sequence optimization for SPAR

NASA Technical Reports Server (NTRS)

Hogan, Harry A.

1986-01-01

SPAR is a large-scale computer program for finite element structural analysis. The program allows user specification of the order in which the joints of a structure are to be eliminated since this order can have significant influence over solution performance, in terms of both storage requirements and computer time. An efficient elimination sequence can improve performance by over 50% for some problems. Obtaining such sequences, however, requires the expertise of an experienced user and can take hours of tedious effort to affect. Thus, an automatic elimination sequence optimizer would enhance productivity by reducing the analysts' problem definition time and by lowering computer costs. Two possible methods for automating the elimination sequence specifications were examined. Several algorithms based on the graph theory representations of sparse matrices were studied with mixed results. Significant improvement in the program performance was achieved, but sequencing by an experienced user still yields substantially better results. The initial results provide encouraging evidence that the potential benefits of such an automatic sequencer would be well worth the effort.
A Proteomic Workflow Using High-Throughput De Novo Sequencing Towards Complementation of Genome Information for Improved Comparative Crop Science.

PubMed

Turetschek, Reinhard; Lyon, David; Desalegn, Getinet; Kaul, Hans-Peter; Wienkoop, Stefanie

2016-01-01

The proteomic study of non-model organisms, such as many crop plants, is challenging due to the lack of comprehensive genome information. Changing environmental conditions require the study and selection of adapted cultivars. Mutations, inherent to cultivars, hamper protein identification and thus considerably complicate the qualitative and quantitative comparison in large-scale systems biology approaches. With this workflow, cultivar-specific mutations are detected from high-throughput comparative MS analyses, by extracting sequence polymorphisms with de novo sequencing. Stringent criteria are suggested to filter for confidential mutations. Subsequently, these polymorphisms complement the initially used database, which is ready to use with any preferred database search algorithm. In our example, we thereby identified 26 specific mutations in two cultivars of Pisum sativum and achieved an increased number (17 %) of peptide spectrum matches.
Analysis of expressed sequence tags for Frankliniella occidentalis, the western flower thrips.

PubMed

Rotenberg, D; Whitfield, A E

2010-08-01

Thrips are members of the insect order Thysanoptera and Frankliniella occidentalis (the western flower thrips) is the most economically important pest within this order. F. occidentalis is both a direct pest of crops and an efficient vector of plant viruses, including Tomato spotted wilt virus (TSWV). Despite the world-wide importance of thrips in agriculture, there is little knowledge of the F. occidentalis genome or gene functions at this time. A normalized cDNA library was constructed from first instar thrips and 13 839 expressed sequence tags (ESTs) were obtained. Our EST data assembled into 894 contigs and 11 806 singletons (12 700 nonredundant sequences). We found that 31% of these sequences had significant similarity (E< or = 10(-10)) to protein sequences in the National Center for Biotechnology Information nonredundant (nr) protein database, and 25% were functionally annotated using Blast 2GO. We identified 74 sequences with putative homology to proteins associated with insect innate immunity. Sixteen sequences had significant similarity to proteins associated with small RNA-mediated gene silencing pathways (RNA interference; RNAi), including the antiviral pathway (short interfering RNA-mediated pathway). Our EST collection provides new sequence resources for characterizing gene functions in F. occidentalis and other thrips species with regards to vital biological processes, studying the mechanism of interactions with the viruses harboured and transmitted by the vector, and identifying new insect gene-centred targets for plant disease and insect control.
dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees.

PubMed

Wise, Michael J

2016-01-01

Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa.
dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees

PubMed Central

2016-01-01

Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa
From metaphor to practices: The introduction of "information engineers" into the first DNA sequence database.

PubMed

García-Sancho, Miguel

2011-01-01

This paper explores the introduction of professional systems engineers and information management practices into the first centralized DNA sequence database, developed at the European Molecular Biology Laboratory (EMBL) during the 1980s. In so doing, it complements the literature on the emergence of an information discourse after World War II and its subsequent influence in biological research. By the careers of the database creators and the computer algorithms they designed, analyzing, from the mid-1960s onwards information in biology gradually shifted from a pervasive metaphor to be embodied in practices and professionals such as those incorporated at the EMBL. I then investigate the reception of these database professionals by the EMBL biological staff, which evolved from initial disregard to necessary collaboration as the relationship between DNA, genes, and proteins turned out to be more complex than expected. The trajectories of the database professionals at the EMBL suggest that the initial subject matter of the historiography of genomics should be the long-standing practices that emerged after World War II and to a large extent originated outside biomedicine and academia. Only after addressing these practices, historians may turn to their further disciplinary assemblage in fields such as bioinformatics or biotechnology.
Draft Genome Sequences of Six Mycobacterium immunogenum, Strains Obtained from a Chloraminated Drinking Water Distribution System Simulator

EPA Science Inventory

We report the draft genome sequences of six Mycobacterium immunogenum isolated from a chloraminated drinking water distribution system simulator subjected to changes in operational parameters. M. immunogenum, a rapidly growing mycobacteria previously reported as the cause of hyp...
High-Throughput Next-Generation Sequencing of Polioviruses

PubMed Central

Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.

2016-01-01

ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929
7 CFR 4290.620 - Requirements to obtain information from Portfolio Concerns.

Code of Federal Regulations, 2010 CFR

2010-01-01

... English. (a) Information for initial Financing decision. Before extending any Financing, you must require... financing proceeds), cash flow analyses, projections, and such economic development information about the Enterprise, as are necessary to support your investment decision. The information submitted must be...
Nucleotide sequences encoding a thermostable alkaline protease

DOEpatents

Wilson, David B.; Lao, Guifang

1998-01-01

Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium.
Genetic characterization of human herpesvirus type 1: Full-length genome sequence of strain obtained from an encephalitis case from India.

PubMed

Bondre, Vijay P; Sankararaman, Vasudha; Andhare, Vijaysinh; Tupekar, Manisha; Sapkal, Gajanan N

2016-11-01

Human herpes simplex virus 1 (HSV-1) is the most common cause of sporadic encephalitis in humans that contributes to >10 per cent of the encephalitis cases occurring worldwide. Availability of limited full genome sequences from a small number of isolates resulted in poor understanding of host and viral factors responsible for variable clinical outcome. In this study genetic relationship, extent and source of recombination using full-length genome sequence derived from a newly isolated HSV-1 isolate was studied in comparison with those sampled from patients with varied clinical outcome. Full genome sequence of HSV-1 isolated from cerebrospinal fluid (CSF) of a patient with acute encephalitis syndrome (AES) by inoculation in baby hamster kidney-21 (BHK-21) cells was determined using next-generation sequencing (NGS) technology. Phylogenetic analysis of the newly generated sequence in comparison with 33 additional full-length genomes defined genetic relationship with worldwide distributed strains. The bootscan and similarity plot analysis defined recombination crossovers and similarities between newly isolated Indian HSV-1 with six Asian and a total of 34 worldwide isolated strains. Mapping of 376,332 reads amplified from HSV-1 DNA by NGS generated full-length genome of 151,024 bp from newly isolated Indian HSV-1. Phylogenetic analysis classified worldwide distributed strains into three major evolutionary lineages correlating to their geographic distribution. Lineage 1 containing strains were isolated from America and Europe; lineage 2 contained all the strains from Asian countries along with the North American KOS and RE strains whereas the South African isolates were distributed into two groups under lineage 3. Recombination analysis confirmed events of recombination in Indian HSV-1 genome resulting from mixing of different strains evolved in Asian countries. Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close
Genetic characterization of human herpesvirus type 1: Full-length genome sequence of strain obtained from an encephalitis case from India

PubMed Central

Bondre, Vijay P.; Sankararaman, Vasudha; Andhare, Vijaysinh; Tupekar, Manisha; Sapkal, Gajanan N.

2016-01-01

Background & objectives: Human herpes simplex virus 1 (HSV-1) is the most common cause of sporadic encephalitis in humans that contributes to >10 per cent of the encephalitis cases occurring worldwide. Availability of limited full genome sequences from a small number of isolates resulted in poor understanding of host and viral factors responsible for variable clinical outcome. In this study genetic relationship, extent and source of recombination using full-length genome sequence derived from a newly isolated HSV-1 isolate was studied in comparison with those sampled from patients with varied clinical outcome. Methods: Full genome sequence of HSV-1 isolated from cerebrospinal fluid (CSF) of a patient with acute encephalitis syndrome (AES) by inoculation in baby hamster kidney-21 (BHK-21) cells was determined using next-generation sequencing (NGS) technology. Phylogenetic analysis of the newly generated sequence in comparison with 33 additional full-length genomes defined genetic relationship with worldwide distributed strains. The bootscan and similarity plot analysis defined recombination crossovers and similarities between newly isolated Indian HSV-1 with six Asian and a total of 34 worldwide isolated strains. Results: Mapping of 376,332 reads amplified from HSV-1 DNA by NGS generated full-length genome of 151,024 bp from newly isolated Indian HSV-1. Phylogenetic analysis classified worldwide distributed strains into three major evolutionary lineages correlating to their geographic distribution. Lineage 1 containing strains were isolated from America and Europe; lineage 2 contained all the strains from Asian countries along with the North American KOS and RE strains whereas the South African isolates were distributed into two groups under lineage 3. Recombination analysis confirmed events of recombination in Indian HSV-1 genome resulting from mixing of different strains evolved in Asian countries. Interpretation & conclusions: Our results showed that the full
Sequencing technologies - the next generation.

PubMed

Metzker, Michael L

2010-01-01

Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalysed the development of next-generation sequencing (NGS) technologies. The inexpensive production of large volumes of sequence data is the primary advantage over conventional methods. Here, I present a technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments. I also outline the broad range of applications for NGS technologies, in addition to providing guidelines for platform selection to address biological questions of interest.
Combining Physicochemical and Evolutionary Information for Protein Contact Prediction

PubMed Central

Schneider, Michael; Brock, Oliver

2014-01-01

We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information—evolutionary and physicochemical—we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/. PMID:25338092
Sequence-specific sup 1 H NMR resonance assignments of Bacillus subtilis HPr: Use of spectra obtained from mutants to resolve spectral overlap

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wittekind, M.; Klevit, R.E.; Reizer, J.

1990-08-07

On the basis of an analysis of two-dimensional {sup 1}H NMR spectra, the complete sequence-specific {sup 1}H NMR assignments are presented for the phosphocarrier protein HPr from the Gram-positive bacterium Bacillus subtilis. During the assignment procedure, extensive use was made of spectra obtained from point mutants of HPr in order to resolve spectral overlap and to provide verification of assignments. Regions of regular secondary structure were identified by characteristic patterns of sequential backbone proton NOEs and slowly exchanging amide protons. B subtilis HPr contains four {beta}-strands that form a single antiparallel {beta}-sheet and two well-defined {alpha}-helices. There are two stretchesmore » of extended backbone structure, one of which contains the active site His{sub 15}. The overall fold of the protein is very similar to that of Escherichia coli HPr determined by NMR studies.« less
Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

PubMed Central

2012-01-01

Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742
Rapid Diagnostics of Onboard Sequences

NASA Technical Reports Server (NTRS)

Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

2012-01-01

Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command
Sequence analysis of Leukemia DNA

NASA Astrophysics Data System (ADS)

Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

2018-03-01

Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.