Sequence-based screening for self-sufficient P450 monooxygenase from a metagenome library.
Kim, B S; Kim, S Y; Park, J; Park, W; Hwang, K Y; Yoon, Y J; Oh, W K; Kim, B Y; Ahn, J S
2007-05-01
Cytochrome P450 monooxygenases (CYPs) are useful catalysts for oxidation reactions. Self-sufficient CYPs harbour a reductive domain covalently connected to a P450 domain and are known for their robust catalytic activity with great potential as biocatalysts. In an effort to expand genetic sources of self-sufficient CYPs, we devised a sequence-based screening system to identify them in a soil metagenome. We constructed a soil metagenome library and performed sequence-based screening for self-sufficient CYP genes. A new CYP gene, syk181, was identified from the metagenome library. Phylogenetic analysis revealed that SYK181 formed a distinct phylogenic line with 46% amino-acid-sequence identity to CYP102A1 which has been extensively studied as a fatty acid hydroxylase. The heterologously expressed SYK181 showed significant hydroxylase activity towards naphthalene and phenanthrene as well as towards fatty acids. Sequence-based screening of metagenome libraries is expected to be a useful approach for searching self-sufficient CYP genes. The translated product of syk181 shows self-sufficient hydroxylase activity towards fatty acids and aromatic compounds. SYK181 is the first self-sufficient CYP obtained directly from a metagenome library. The genetic and biochemical information on SYK181 are expected to be helpful for engineering self-sufficient CYPs with broader catalytic activities towards various substrates, which would be useful for bioconversion of natural products and biodegradation of organic chemicals.
Constructing and Modifying Sequence Statistics for relevent Using informR in 𝖱
Marcum, Christopher Steven; Butts, Carter T.
2015-01-01
The informR package greatly simplifies the analysis of complex event histories in 𝖱 by providing user friendly tools to build sufficient statistics for the relevent package. Historically, building sufficient statistics to model event sequences (of the form a→b) using the egocentric generalization of Butts’ (2008) relational event framework for modeling social action has been cumbersome. The informR package simplifies the construction of the complex list of arrays needed by the rem() model fitting for a variety of cases involving egocentric event data, multiple event types, and/or support constraints. This paper introduces these tools using examples from real data extracted from the American Time Use Survey. PMID:26185488
NASA Astrophysics Data System (ADS)
Jiao, Yong; Wakakuwa, Eyuri; Ogawa, Tomohiro
2018-02-01
We consider asymptotic convertibility of an arbitrary sequence of bipartite pure states into another by local operations and classical communication (LOCC). We adopt an information-spectrum approach to address cases where each element of the sequences is not necessarily a tensor power of a bipartite pure state. We derive necessary and sufficient conditions for the LOCC convertibility of one sequence to another in terms of spectral entropy rates of entanglement of the sequences. Based on these results, we also provide simple proofs for previously known results on the optimal rates of entanglement concentration and dilution of general sequences of bipartite pure states.
The least channel capacity for chaos synchronization.
Wang, Mogei; Wang, Xingyuan; Liu, Zhenzhen; Zhang, Huaguang
2011-03-01
Recently researchers have found that a channel with capacity exceeding the Kolmogorov-Sinai entropy of the drive system (h(KS)) is theoretically necessary and sufficient to sustain the unidirectional synchronization to arbitrarily high precision. In this study, we use symbolic dynamics and the automaton reset sequence to distinguish the information that is required in identifying the current drive word and obtaining the synchronization. Then, we show that the least channel capacity that is sufficient to transmit the distinguished information and attain the synchronization of arbitrarily high precision is h(KS). Numerical simulations provide support for our conclusions.
COI (cytochrome oxidase-I) sequence based studies of Carangid fishes from Kakinada coast, India.
Persis, M; Chandra Sekhar Reddy, A; Rao, L M; Khedkar, G D; Ravinder, K; Nasruddin, K
2009-09-01
Mitochondrial DNA, cytochrome oxidase-1 gene sequences were analyzed for species identification and phylogenetic relationship among the very high food value and commercially important Indian carangid fish species. Sequence analysis of COI gene very clearly indicated that all the 28 fish species fell into five distinct groups, which are genetically distant from each other and exhibited identical phylogenetic reservation. All the COI gene sequences from 28 fishes provide sufficient phylogenetic information and evolutionary relationship to distinguish the carangid species unambiguously. This study proves the utility of mtDNA COI gene sequence based approach in identifying fish species at a faster pace.
Sharma, Aseem; Chatterjee, Arindam; Goyal, Manu; Parsons, Matthew S; Bartel, Seth
2015-04-01
Targeting redundancy within MRI can improve its cost-effective utilization. We sought to quantify potential redundancy in our brain MRI protocols. In this retrospective review, we aggregated 207 consecutive adults who underwent brain MRI and reviewed their medical records to document clinical indication, core diagnostic information provided by MRI, and its clinical impact. Contributory imaging abnormalities constituted positive core diagnostic information whereas absence of imaging abnormalities constituted negative core diagnostic information. The senior author selected core sequences deemed sufficient for extraction of core diagnostic information. For validating core sequences selection, four readers assessed the relative ease of extracting core diagnostic information from the core sequences. Potential redundancy was calculated by comparing the average number of core sequences to the average number of sequences obtained. Scanning had been performed using 9.4±2.8 sequences over 37.3±12.3 minutes. Core diagnostic information was deemed extractable from 2.1±1.1 core sequences, with an assumed scanning time of 8.6±4.8 minutes, reflecting a potential redundancy of 74.5%±19.1%. Potential redundancy was least in scans obtained for treatment planning (14.9%±25.7%) and highest in scans obtained for follow-up of benign diseases (81.4%±12.6%). In 97.4% of cases, all four readers considered core diagnostic information to be either easily extractable from core sequences or the ease to be equivalent to that from the entire study. With only one MRI lacking clinical impact (0.48%), overutilization did not seem to contribute to potential redundancy. High potential redundancy that can be targeted for more efficient scanner utilization exists in brain MRI protocols.
Faulon, Jean-Loup; Misra, Milind; Martin, Shawn; ...
2007-11-23
Motivation: Identifying protein enzymatic or pharmacological activities are important areas of research in biology and chemistry. Biological and chemical databases are increasingly being populated with linkages between protein sequences and chemical structures. Additionally, there is now sufficient information to apply machine-learning techniques to predict interactions between chemicals and proteins at a genome scale. Current machine-learning techniques use as input either protein sequences and structures or chemical information. We propose here a method to infer protein–chemical interactions using heterogeneous input consisting of both protein sequence and chemical information. Results: Our method relies on expressing proteins and chemicals with a common cheminformaticsmore » representation. We demonstrate our approach by predicting whether proteins can catalyze reactions not present in training sets. We also predict whether a given drug can bind a target, in the absence of prior binding information for that drug and target. Lastly, such predictions cannot be made with current machine-learning techniques requiring binding information for individual reactions or individual targets.« less
Turner, Barbara; Paun, Ovidiu; Munzinger, Jérôme; Chase, Mark W.; Samuel, Rosabelle
2016-01-01
Background and Aims Some plant groups, especially on islands, have been shaped by strong ancestral bottlenecks and rapid, recent radiation of phenotypic characters. Single molecular markers are often not informative enough for phylogenetic reconstruction in such plant groups. Whole plastid genomes and nuclear ribosomal DNA (nrDNA) are viewed by many researchers as sources of information for phylogenetic reconstruction of groups in which expected levels of divergence in standard markers are low. Here we evaluate the usefulness of these data types to resolve phylogenetic relationships among closely related Diospyros species. Methods Twenty-two closely related Diospyros species from New Caledonia were investigated using whole plastid genomes and nrDNA data from low-coverage next-generation sequencing (NGS). Phylogenetic trees were inferred using maximum parsimony, maximum likelihood and Bayesian inference on separate plastid and nrDNA and combined matrices. Key Results The plastid and nrDNA sequences were, singly and together, unable to provide well supported phylogenetic relationships among the closely related New Caledonian Diospyros species. In the nrDNA, a 6-fold greater percentage of parsimony-informative characters compared with plastid DNA was found, but the total number of informative sites was greater for the much larger plastid DNA genomes. Combining the plastid and nuclear data improved resolution. Plastid results showed a trend towards geographical clustering of accessions rather than following taxonomic species. Conclusions In plant groups in which multiple plastid markers are not sufficiently informative, an investigation at the level of the entire plastid genome may also not be sufficient for detailed phylogenetic reconstruction. Sequencing of complete plastid genomes and nrDNA repeats seems to clarify some relationships among the New Caledonian Diospyros species, but the higher percentage of parsimony-informative characters in nrDNA compared with plastid DNA did not help to resolve the phylogenetic tree because the total number of variable sites was much lower than in the entire plastid genome. The geographical clustering of the individuals against a background of overall low sequence divergence could indicate transfer of plastid genomes due to hybridization and introgression following secondary contact. PMID:27098088
SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information
2014-01-01
Background The recent introduction of the Pacific Biosciences RS single molecule sequencing technology has opened new doors to scaffolding genome assemblies in a cost-effective manner. The long read sequence information is promised to enhance the quality of incomplete and inaccurate draft assemblies constructed from Next Generation Sequencing (NGS) data. Results Here we propose a novel hybrid assembly methodology that aims to scaffold pre-assembled contigs in an iterative manner using PacBio RS long read information as a backbone. On a test set comprising six bacterial draft genomes, assembled using either a single Illumina MiSeq or Roche 454 library, we show that even a 50× coverage of uncorrected PacBio RS long reads is sufficient to drastically reduce the number of contigs. Comparisons to the AHA scaffolder indicate our strategy is better capable of producing (nearly) complete bacterial genomes. Conclusions The current work describes our SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences. We conclude that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run our program, allow to scaffold genomes in a fast and reliable manner. PMID:24950923
34 CFR 675.19 - Fiscal procedures and records.
Code of Federal Regulations, 2011 CFR
2011-07-01
... EDUCATION, DEPARTMENT OF EDUCATION FEDERAL WORK-STUDY PROGRAMS Federal Work-Study Program § 675.19 Fiscal... paid on an hourly basis, a time record showing the hours each student worked in clock time sequence, or the total hours worked per day; (ii) Include a payroll voucher containing sufficient information to...
34 CFR 675.19 - Fiscal procedures and records.
Code of Federal Regulations, 2010 CFR
2010-07-01
... EDUCATION, DEPARTMENT OF EDUCATION FEDERAL WORK-STUDY PROGRAMS Federal Work-Study Program § 675.19 Fiscal... paid on an hourly basis, a time record showing the hours each student worked in clock time sequence, or the total hours worked per day; (ii) Include a payroll voucher containing sufficient information to...
Cataloging the 1811-1812 New Madrid, central U.S., earthquake sequence
Hough, S.E.
2009-01-01
The three principal New Madrid, central U.S., mainshocks of 1811-1812 were followed by extensive aftershock sequences that included numerous felt events. Although no instrumental data are available for the sequence, historical accounts provide information that can be used to estimate magnitudes and locations for the large aftershocks as well as the mainshocks. Several detailed eyewitness accounts of the sequence provide sufficient information to identify times and rough magnitude estimates for a number of aftershocks that have not been analyzed previously. I also use three extended compilations of felt events to explore the overall sequence productivity. Although one generally cannot estimate magnitudes or locations for individual events, the intensity distributions of recent, instrumentally recorded earthquakes in the region provide a basis for estimation of the magnitude distribution of 1811-1812 aftershocks. The distribution is consistent with a b-value distribution. I estimate Mw 6-6.3 for the three largest identifiable aftershocks, apart from the so-called dawn aftershock on 16 December 1811.
Kit for detecting nucleic acid sequences using competitive hybridization probes
Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.
2001-01-01
A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the target sequence.
Isotopically enhanced triple-quantum-dot qubit
Eng, Kevin; Ladd, Thaddeus D.; Smith, Aaron; Borselli, Matthew G.; Kiselev, Andrey A.; Fong, Bryan H.; Holabird, Kevin S.; Hazard, Thomas M.; Huang, Biqin; Deelman, Peter W.; Milosavljevic, Ivan; Schmitz, Adele E.; Ross, Richard S.; Gyure, Mark F.; Hunter, Andrew T.
2015-01-01
Like modern microprocessors today, future processors of quantum information may be implemented using all-electrical control of silicon-based devices. A semiconductor spin qubit may be controlled without the use of magnetic fields by using three electrons in three tunnel-coupled quantum dots. Triple dots have previously been implemented in GaAs, but this material suffers from intrinsic nuclear magnetic noise. Reduction of this noise is possible by fabricating devices using isotopically purified silicon. We demonstrate universal coherent control of a triple-quantum-dot qubit implemented in an isotopically enhanced Si/SiGe heterostructure. Composite pulses are used to implement spin-echo type sequences, and differential charge sensing enables single-shot state readout. These experiments demonstrate sufficient control with sufficiently low noise to enable the long pulse sequences required for exchange-only two-qubit logic and randomized benchmarking. PMID:26601186
Evaluating information content of SNPs for sample-tagging in re-sequencing projects.
Hu, Hao; Liu, Xiang; Jin, Wenfei; Hilger Ropers, H; Wienker, Thomas F
2015-05-15
Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. The analysis shows that as low as 60 optimized SNPs can differentiate the individuals in a population as large as the present world, and only 30 optimized SNPs are in practice sufficient in labeling up to 100 thousand individuals. In the simulated populations of 100 thousand individuals, the average Hamming distances, generated by the optimized set of 30 SNPs are larger than 18, and the duality frequency, is lower than 1 in 10 thousand. This strategy of sample discrimination is proved robust in large sample size and different datasets. The optimized sets of SNPs are designed for Whole Exome Sequencing, and a program is provided for SNP selection, allowing for customized SNP numbers and interested genes. The sample-tagging plan based on this framework will improve re-sequencing projects in terms of reliability and cost-effectiveness.
Rational Protein Engineering Guided by Deep Mutational Scanning
Shin, HyeonSeok; Cho, Byung-Kwan
2015-01-01
Sequence–function relationship in a protein is commonly determined by the three-dimensional protein structure followed by various biochemical experiments. However, with the explosive increase in the number of genome sequences, facilitated by recent advances in sequencing technology, the gap between protein sequences available and three-dimensional structures is rapidly widening. A recently developed method termed deep mutational scanning explores the functional phenotype of thousands of mutants via massive sequencing. Coupled with a highly efficient screening system, this approach assesses the phenotypic changes made by the substitution of each amino acid sequence that constitutes a protein. Such an informational resource provides the functional role of each amino acid sequence, thereby providing sufficient rationale for selecting target residues for protein engineering. Here, we discuss the current applications of deep mutational scanning and consider experimental design. PMID:26404267
Gate sequence for continuous variable one-way quantum computation
Su, Xiaolong; Hao, Shuhong; Deng, Xiaowei; Ma, Lingyu; Wang, Meihong; Jia, Xiaojun; Xie, Changde; Peng, Kunchi
2013-01-01
Measurement-based one-way quantum computation using cluster states as resources provides an efficient model to perform computation and information processing of quantum codes. Arbitrary Gaussian quantum computation can be implemented sufficiently by long single-mode and two-mode gate sequences. However, continuous variable gate sequences have not been realized so far due to an absence of cluster states larger than four submodes. Here we present the first continuous variable gate sequence consisting of a single-mode squeezing gate and a two-mode controlled-phase gate based on a six-mode cluster state. The quantum property of this gate sequence is confirmed by the fidelities and the quantum entanglement of two output modes, which depend on both the squeezing and controlled-phase gates. The experiment demonstrates the feasibility of implementing Gaussian quantum computation by means of accessible gate sequences.
Roden, Suzanne E; Dutton, Peter H; Morin, Phillip A
2009-01-01
The green sea turtle, Chelonia mydas, was used as a case study for single nucleotide polymorphism (SNP) discovery in a species that has little genetic sequence information available. As green turtles have a complex population structure, additional nuclear markers other than microsatellites could add to our understanding of their complex life history. Amplified fragment length polymorphism technique was used to generate sets of random fragments of genomic DNA, which were then electrophoretically separated with precast gels, stained with SYBR green, excised, and directly sequenced. It was possible to perform this method without the use of polyacrylamide gels, radioactive or fluorescent labeled primers, or hybridization methods, reducing the time, expense, and safety hazards of SNP discovery. Within 13 loci, 2547 base pairs were screened, resulting in the discovery of 35 SNPs. Using this method, it was possible to yield a sufficient number of loci to screen for SNP markers without the availability of prior sequence information.
RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries.
Habegger, Lukas; Sboner, Andrea; Gianoulis, Tara A; Rozowsky, Joel; Agarwal, Ashish; Snyder, Michael; Gerstein, Mark
2011-01-15
The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses. RSEQtools is implemented in C and the source code is available at http://rseqtools.gersteinlab.org/.
Makowsky, Robert; Cox, Christian L; Roelke, Corey; Chippindale, Paul T
2010-11-01
Determining the appropriate gene for phylogeny reconstruction can be a difficult process. Rapidly evolving genes tend to resolve recent relationships, but suffer from alignment issues and increased homoplasy among distantly related species. Conversely, slowly evolving genes generally perform best for deeper relationships, but lack sufficient variation to resolve recent relationships. We determine the relationship between sequence divergence and Bayesian phylogenetic reconstruction ability using both natural and simulated datasets. The natural data are based on 28 well-supported relationships within the subphylum Vertebrata. Sequences of 12 genes were acquired and Bayesian analyses were used to determine phylogenetic support for correct relationships. Simulated datasets were designed to determine whether an optimal range of sequence divergence exists across extreme phylogenetic conditions. Across all genes we found that an optimal range of divergence for resolving the correct relationships does exist, although this level of divergence expectedly depends on the distance metric. Simulated datasets show that an optimal range of sequence divergence exists across diverse topologies and models of evolution. We determine that a simple to measure property of genetic sequences (genetic distance) is related to phylogenic reconstruction ability in Bayesian analyses. This information should be useful for selecting the most informative gene to resolve any relationships, especially those that are difficult to resolve, as well as minimizing both cost and confounding information during project design. Copyright © 2010. Published by Elsevier Inc.
Automated use of mutagenesis data in structure prediction.
Nanda, Vikas; DeGrado, William F
2005-05-15
In the absence of experimental structural determination, numerous methods are available to indirectly predict or probe the structure of a target molecule. Genetic modification of a protein sequence is a powerful tool for identifying key residues involved in binding reactions or protein stability. Mutagenesis data is usually incorporated into the modeling process either through manual inspection of model compatibility with empirical data, or through the generation of geometric constraints linking sensitive residues to a binding interface. We present an approach derived from statistical studies of lattice models for introducing mutation information directly into the fitness score. The approach takes into account the phenotype of mutation (neutral or disruptive) and calculates the energy for a given structure over an ensemble of sequences. The structure prediction procedure searches for the optimal conformation where neutral sequences either have no impact or improve stability and disruptive sequences reduce stability relative to wild type. We examine three types of sequence ensembles: information from saturation mutagenesis, scanning mutagenesis, and homologous proteins. Incorporating multiple sequences into a statistical ensemble serves to energetically separate the native state and misfolded structures. As a result, the prediction of structure with a poor force field is sufficiently enhanced by mutational information to improve accuracy. Furthermore, by separating misfolded conformations from the target score, the ensemble energy serves to speed up conformational search algorithms such as Monte Carlo-based methods. Copyright 2005 Wiley-Liss, Inc.
Nakazato, Takeru; Bono, Hidemasa
2017-01-01
Abstract It is important for public data repositories to promote the reuse of archived data. In the growing field of omics science, however, the increasing number of submissions of high-throughput sequencing (HTSeq) data to public repositories prevents users from choosing a suitable data set from among the large number of search results. Repository users need to be able to set a threshold to reduce the number of results to obtain a suitable subset of high-quality data for reanalysis. We calculated the quality of sequencing data archived in a public data repository, the Sequence Read Archive (SRA), by using the quality control software FastQC. We obtained quality values for 1 171 313 experiments, which can be used to evaluate the suitability of data for reuse. We also visualized the data distribution in SRA by integrating the quality information and metadata of experiments and samples. We provide quality information of all of the archived sequencing data, which enable users to obtain sufficient quality sequencing data for reanalyses. The calculated quality data are available to the public in various formats. Our data also provide an example of enhancing the reuse of public data by adding metadata to published research data by a third party. PMID:28449062
Sanz, Yolanda
2017-01-01
Abstract The miniaturized and portable DNA sequencer MinION™ has demonstrated great potential in different analyses such as genome-wide sequencing, pathogen outbreak detection and surveillance, human genome variability, and microbial diversity. In this study, we tested the ability of the MinION™ platform to perform long amplicon sequencing in order to design new approaches to study microbial diversity using a multi-locus approach. After compiling a robust database by parsing and extracting the rrn bacterial region from more than 67000 complete or draft bacterial genomes, we demonstrated that the data obtained during sequencing of the long amplicon in the MinION™ device using R9 and R9.4 chemistries were sufficient to study 2 mock microbial communities in a multiplex manner and to almost completely reconstruct the microbial diversity contained in the HM782D and D6305 mock communities. Although nanopore-based sequencing produces reads with lower per-base accuracy compared with other platforms, we presented a novel approach consisting of multi-locus and long amplicon sequencing using the MinION™ MkIb DNA sequencer and R9 and R9.4 chemistries that help to overcome the main disadvantage of this portable sequencing platform. Furthermore, the nanopore sequencing library, constructed with the last releases of pore chemistry (R9.4) and sequencing kit (SQK-LSK108), permitted the retrieval of the higher level of 1D read accuracy sufficient to characterize the microbial species present in each mock community analysed. Improvements in nanopore chemistry, such as minimizing base-calling errors and new library protocols able to produce rapid 1D libraries, will provide more reliable information in the near future. Such data will be useful for more comprehensive and faster specific detection of microbial species and strains in complex ecosystems. PMID:28605506
Linear reduction method for predictive and informative tag SNP selection.
He, Jingwu; Westbrooks, Kelly; Zelikovsky, Alexander
2005-01-01
Constructing a complete human haplotype map is helpful when associating complex diseases with their related SNPs. Unfortunately, the number of SNPs is very large and it is costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNPs that should be sequenced to a small number of informative representatives called tag SNPs. In this paper, we propose a new linear algebra-based method for selecting and using tag SNPs. We measure the quality of our tag SNP selection algorithm by comparing actual SNPs with SNPs predicted from selected linearly independent tag SNPs. Our experiments show that for sufficiently long haplotypes, knowing only 0.4% of all SNPs the proposed linear reduction method predicts an unknown haplotype with the error rate below 2% based on 10% of the population.
The fuzzy polynucleotide space: basic properties.
Torres, Angela; Nieto, Juan J
2003-03-22
Any triplet codon may be regarded as a 12-dimensional fuzzy code. Sufficient information about a particular sequence may not be available in certain situations. The investigator will be confronted with imprecise sequences, yet want to make comparisons of sequences. Fuzzy polynucleotides can be compared by using geometrical interpretation of fuzzy sets as points in a hypercube. We introduce the space of fuzzy polynucleotides and a means of measuring dissimilitudes between them. We establish mathematical principles to measure dissimilarities between fuzzy polynucleotides and present several examples in this metric space. We calculate the frequencies of the nucleotides at the three base sites of a codon in the coding sequences of Escherichia coli K-12 and Mycobacterium tuberculosis H37Rv, and consider them as points in that fuzzy space. We compute the distance between the genomes of E.coli and M.tuberculosis.
Bastien, Olivier; Maréchal, Eric
2008-08-07
Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the information hazard rate, and that pairwise sequence alignment scores should follow a Gumbel distribution, which parameters could find some theoretical rationale. In particular, one parameter corresponds to the information hazard rate. Extreme value distribution of alignment scores, assessed from high scoring segments pairs following the Karlin-Altschul model, can also be deduced from the Reliability Theory applied to molecular sequences. It reflects the redundancy of information between homologous sequences, under functional conservative pressure. This model also provides a link between concepts of biological sequence analysis and of systems biology.
Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.
Warris, Sven; Yalcin, Feyruz; Jackson, Katherine J L; Nap, Jan Peter
2015-01-01
To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis. With the Parallel SW Alignment Software (PaSWAS) it is possible (a) to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs) to perform high-speed sequence alignments, and (b) retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1) tag recovery in next generation sequence data and (2) isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation.
Ma, Xin; Guo, Jing; Sun, Xiao
2015-01-01
The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.
Protein 3D Structure Computed from Evolutionary Sequence Variation
Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris
2011-01-01
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes. PMID:22163331
Hellmuth, Marc; Wieseke, Nicolas; Lechner, Marcus; Lenhof, Hans-Peter; Middendorf, Martin; Stadler, Peter F.
2015-01-01
Phylogenomics heavily relies on well-curated sequence data sets that comprise, for each gene, exclusively 1:1 orthologos. Paralogs are treated as a dangerous nuisance that has to be detected and removed. We show here that this severe restriction of the data sets is not necessary. Building upon recent advances in mathematical phylogenetics, we demonstrate that gene duplications convey meaningful phylogenetic information and allow the inference of plausible phylogenetic trees, provided orthologs and paralogs can be distinguished with a degree of certainty. Starting from tree-free estimates of orthology, cograph editing can sufficiently reduce the noise to find correct event-annotated gene trees. The information of gene trees can then directly be translated into constraints on the species trees. Although the resolution is very poor for individual gene families, we show that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees, even in the presence of horizontal gene transfer. PMID:25646426
Ohta, Tazro; Nakazato, Takeru; Bono, Hidemasa
2017-06-01
It is important for public data repositories to promote the reuse of archived data. In the growing field of omics science, however, the increasing number of submissions of high-throughput sequencing (HTSeq) data to public repositories prevents users from choosing a suitable data set from among the large number of search results. Repository users need to be able to set a threshold to reduce the number of results to obtain a suitable subset of high-quality data for reanalysis. We calculated the quality of sequencing data archived in a public data repository, the Sequence Read Archive (SRA), by using the quality control software FastQC. We obtained quality values for 1 171 313 experiments, which can be used to evaluate the suitability of data for reuse. We also visualized the data distribution in SRA by integrating the quality information and metadata of experiments and samples. We provide quality information of all of the archived sequencing data, which enable users to obtain sufficient quality sequencing data for reanalyses. The calculated quality data are available to the public in various formats. Our data also provide an example of enhancing the reuse of public data by adding metadata to published research data by a third party. © The Authors 2017. Published by Oxford University Press.
Top-down analysis of protein samples by de novo sequencing techniques
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vyatkina, Kira; Wu, Si; Dekker, Lennard J. M.
MOTIVATION: Recent technological advances have made high-resolution mass spectrometers affordable to many laboratories, thus boosting rapid development of top-down mass spectrometry, and implying a need in efficient methods for analyzing this kind of data. RESULTS: We describe a method for analysis of protein samples from top-down tandem mass spectrometry data, which capitalizes on de novo sequencing of fragments of the proteins present in the sample. Our algorithm takes as input a set of de novo amino acid strings derived from the given mass spectra using the recently proposed Twister approach, and combines them into aggregated strings endowed with offsets. Themore » former typically constitute accurate sequence fragments of sufficiently well-represented proteins from the sample being analyzed, while the latter indicate their location in the protein sequence, and also bear information on post-translational modifications and fragmentation patterns.« less
Protein Structure Determination using Metagenome sequence data
Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A.; Kim, David E.; Kamisetty, Hetunandan; Kyrpides, Nikos C.; Baker, David
2017-01-01
Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families, and that metagenome sequence data more than triples the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact based structure matching and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the PDB. This approach provides the representative models for large protein families originally envisioned as the goal of the protein structure initiative at a fraction of the cost. PMID:28104891
1982-12-01
Sequence dj Estimate of the Desired Signal DEL Sampling Time Interval DS Direct Sequence c Sufficient Statistic E/T Signal Power Erfc Complimentary Error...Namely, a white Gaussian noise (WGN) generator was added. Also, a statistical subroutine was added in order to assess performance improvement at the...reference code and then passed through a correlation detector whose output is the sufficient 1 statistic , e . Using a threshold device and the sufficient
Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs.
Powell, Bradford C; Hutchison, Clyde A
2006-01-19
Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene prediction. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes.
Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs
Powell, Bradford C; Hutchison, Clyde A
2006-01-01
Background Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. Results "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene predicion. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Conclusion Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes. PMID:16423288
Sasaki, Katsutomo; Mitsuda, Nobutaka; Nashima, Kenji; Kishimoto, Kyutaro; Katayose, Yuichi; Kanamori, Hiroyuki; Ohmiya, Akemi
2017-09-04
Chrysanthemum morifolium is one of the most economically valuable ornamental plants worldwide. Chrysanthemum is an allohexaploid plant with a large genome that is commercially propagated by vegetative reproduction. New cultivars with different floral traits, such as color, morphology, and scent, have been generated mainly by classical cross-breeding and mutation breeding. However, only limited genetic resources and their genome information are available for the generation of new floral traits. To obtain useful information about molecular bases for floral traits of chrysanthemums, we read expressed sequence tags (ESTs) of chrysanthemums by high-throughput sequencing using the 454 pyrosequencing technology. We constructed normalized cDNA libraries, consisting of full-length, 3'-UTR, and 5'-UTR cDNAs derived from various tissues of chrysanthemums. These libraries produced a total number of 3,772,677 high-quality reads, which were assembled into 213,204 contigs. By comparing the data obtained with those of full genome-sequenced species, we confirmed that our chrysanthemum contig set contained the majority of all expressed genes, which was sufficient for further molecular analysis in chrysanthemums. We confirmed that our chrysanthemum EST set (contigs) contained a number of contigs that encoded transcription factors and enzymes involved in pigment and aroma compound metabolism that was comparable to that of other species. This information can serve as an informative resource for identifying genes involved in various biological processes in chrysanthemums. Moreover, the findings of our study will contribute to a better understanding of the floral characteristics of chrysanthemums including the myriad cultivars at the molecular level.
Impact of sequencing depth in ChIP-seq experiments
Jung, Youngsook L.; Luquette, Lovelace J.; Ho, Joshua W.K.; Ferrari, Francesco; Tolstorukov, Michael; Minoda, Aki; Issner, Robbyn; Epstein, Charles B.; Karpen, Gary H.; Kuroda, Mitzi I.; Park, Peter J.
2014-01-01
In a chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experiment, an important consideration in experimental design is the minimum number of sequenced reads required to obtain statistically significant results. We present an extensive evaluation of the impact of sequencing depth on identification of enriched regions for key histone modifications (H3K4me3, H3K36me3, H3K27me3 and H3K9me2/me3) using deep-sequenced datasets in human and fly. We propose to define sufficient sequencing depth as the number of reads at which detected enrichment regions increase <1% for an additional million reads. Although the required depth depends on the nature of the mark and the state of the cell in each experiment, we observe that sufficient depth is often reached at <20 million reads for fly. For human, there are no clear saturation points for the examined datasets, but our analysis suggests 40–50 million reads as a practical minimum for most marks. We also devise a mathematical model to estimate the sufficient depth and total genomic coverage of a mark. Lastly, we find that the five algorithms tested do not agree well for broad enrichment profiles, especially at lower depths. Our findings suggest that sufficient sequencing depth and an appropriate peak-calling algorithm are essential for ensuring robustness of conclusions derived from ChIP-seq data. PMID:24598259
Deblurring sequential ocular images from multi-spectral imaging (MSI) via mutual information.
Lian, Jian; Zheng, Yuanjie; Jiao, Wanzhen; Yan, Fang; Zhao, Bojun
2018-06-01
Multi-spectral imaging (MSI) produces a sequence of spectral images to capture the inner structure of different species, which was recently introduced into ocular disease diagnosis. However, the quality of MSI images can be significantly degraded by motion blur caused by the inevitable saccades and exposure time required for maintaining a sufficiently high signal-to-noise ratio. This degradation may confuse an ophthalmologist, reduce the examination quality, or defeat various image analysis algorithms. We propose an early work specially on deblurring sequential MSI images, which is distinguished from many of the current image deblurring techniques by resolving the blur kernel simultaneously for all the images in an MSI sequence. It is accomplished by incorporating several a priori constraints including the sharpness of the latent clear image, the spatial and temporal smoothness of the blur kernel and the similarity between temporally-neighboring images in MSI sequence. Specifically, we model the similarity between MSI images with mutual information considering the different wavelengths used for capturing different images in MSI sequence. The optimization of the proposed approach is based on a multi-scale framework and stepwise optimization strategy. Experimental results from 22 MSI sequences validate that our approach outperforms several state-of-the-art techniques in natural image deblurring.
All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences.
Hayat, Sikander; Sander, Chris; Marks, Debora S; Elofsson, Arne
2015-04-28
Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand-strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases.
Jones, David T; Kandathil, Shaun M
2018-04-26
In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.
Mumps virus F gene and HN gene sequencing as a molecular tool to study mumps virus transmission.
Gouma, Sigrid; Cremer, Jeroen; Parkkali, Saara; Veldhuijzen, Irene; van Binnendijk, Rob S; Koopmans, Marion P G
2016-11-01
Various mumps outbreaks have occurred in the Netherlands since 2004, particularly among persons who had received 2 doses of measles, mumps, and rubella (MMR) vaccination. Genomic typing of pathogens can be used to track outbreaks, but the established genotyping of mumps virus based on the small hydrophobic (SH) gene sequences did not provide sufficient resolution. Therefore, we expanded the sequencing to include fusion (F) gene and haemagglutinin-neuraminidase (HN) gene sequences in addition to the SH gene sequences from 109 mumps virus genotype G strains obtained between 2004 and mid 2015 in the Netherlands. When the molecular information from these 3 genes was combined, we were able to identify separate mumps virus clusters and track mumps virus transmission. The analyses suggested that multiple mumps virus introductions occurred in the Netherlands between 2004 and 2015 resulting in several mumps outbreaks throughout this period, whereas during some local outbreaks the molecular data pointed towards endemic circulation. Combined analysis of epidemiological data and sequence data collected in 2015 showed good support for the phylogenetic clustering. Copyright © 2016 Elsevier B.V. All rights reserved.
Automatic detection of pelvic lymph nodes using multiple MR sequences
NASA Astrophysics Data System (ADS)
Yan, Michelle; Lu, Yue; Lu, Renzhi; Requardt, Martin; Moeller, Thomas; Takahashi, Satoru; Barentsz, Jelle
2007-03-01
A system for automatic detection of pelvic lymph nodes is developed by incorporating complementary information extracted from multiple MR sequences. A single MR sequence lacks sufficient diagnostic information for lymph node localization and staging. Correct diagnosis often requires input from multiple complementary sequences which makes manual detection of lymph nodes very labor intensive. Small lymph nodes are often missed even by highly-trained radiologists. The proposed system is aimed at assisting radiologists in finding lymph nodes faster and more accurately. To the best of our knowledge, this is the first such system reported in the literature. A 3-dimensional (3D) MR angiography (MRA) image is employed for extracting blood vessels that serve as a guide in searching for pelvic lymph nodes. Segmentation, shape and location analysis of potential lymph nodes are then performed using a high resolution 3D T1-weighted VIBE (T1-vibe) MR sequence acquired by Siemens 3T scanner. An optional contrast-agent enhanced MR image, such as post ferumoxtran-10 T2*-weighted MEDIC sequence, can also be incorporated to further improve detection accuracy of malignant nodes. The system outputs a list of potential lymph node locations that are overlaid onto the corresponding MR sequences and presents them to users with associated confidence levels as well as their sizes and lengths in each axis. Preliminary studies demonstrates the feasibility of automatic lymph node detection and scenarios in which this system may be used to assist radiologists in diagnosis and reporting.
Application of next generation sequencing in clinical microbiology and infection prevention.
Deurenberg, Ruud H; Bathoorn, Erik; Chlebowicz, Monika A; Couto, Natacha; Ferdous, Mithila; García-Cobos, Silvia; Kooistra-Smid, Anna M D; Raangs, Erwin C; Rosema, Sigrid; Veloo, Alida C M; Zhou, Kai; Friedrich, Alexander W; Rossen, John W A
2017-02-10
Current molecular diagnostics of human pathogens provide limited information that is often not sufficient for outbreak and transmission investigation. Next generation sequencing (NGS) determines the DNA sequence of a complete bacterial genome in a single sequence run, and from these data, information on resistance and virulence, as well as information for typing is obtained, useful for outbreak investigation. The obtained genome data can be further used for the development of an outbreak-specific screening test. In this review, a general introduction to NGS is presented, including the library preparation and the major characteristics of the most common NGS platforms, such as the MiSeq (Illumina) and the Ion PGM™ (ThermoFisher). An overview of the software used for NGS data analyses used at the medical microbiology diagnostic laboratory in the University Medical Center Groningen in The Netherlands is given. Furthermore, applications of NGS in the clinical setting are described, such as outbreak management, molecular case finding, characterization and surveillance of pathogens, rapid identification of bacteria using the 16S-23S rRNA region, taxonomy, metagenomics approaches on clinical samples, and the determination of the transmission of zoonotic micro-organisms from animals to humans. Finally, we share our vision on the use of NGS in personalised microbiology in the near future, pointing out specific requirements. Copyright © 2016 The Author(s). Published by Elsevier B.V. All rights reserved.
Deurenberg, Ruud H; Bathoorn, Erik; Chlebowicz, Monika A; Couto, Natacha; Ferdous, Mithila; García-Cobos, Silvia; Kooistra-Smid, Anna M D; Raangs, Erwin C; Rosema, Sigrid; Veloo, Alida C M; Zhou, Kai; Friedrich, Alexander W; Rossen, John W A
2017-05-20
Current molecular diagnostics of human pathogens provide limited information that is often not sufficient for outbreak and transmission investigation. Next generation sequencing (NGS) determines the DNA sequence of a complete bacterial genome in a single sequence run, and from these data, information on resistance and virulence, as well as information for typing is obtained, useful for outbreak investigation. The obtained genome data can be further used for the development of an outbreak-specific screening test. In this review, a general introduction to NGS is presented, including the library preparation and the major characteristics of the most common NGS platforms, such as the MiSeq (Illumina) and the Ion PGM™ (ThermoFisher). An overview of the software used for NGS data analyses used at the medical microbiology diagnostic laboratory in the University Medical Center Groningen in The Netherlands is given. Furthermore, applications of NGS in the clinical setting are described, such as outbreak management, molecular case finding, characterization and surveillance of pathogens, rapid identification of bacteria using the 16S-23S rRNA region, taxonomy, metagenomics approaches on clinical samples, and the determination of the transmission of zoonotic micro-organisms from animals to humans. Finally, we share our vision on the use of NGS in personalised microbiology in the near future, pointing out specific requirements. Copyright © 2017. Published by Elsevier B.V.
2012-01-01
Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods. Conclusions Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems. PMID:22643026
Sequence co-evolution gives 3D contacts and structures of protein complexes
Hopf, Thomas A; Schärfe, Charlotta P I; Rodrigues, João P G L M; Green, Anna G; Kohlbacher, Oliver; Sander, Chris; Bonvin, Alexandre M J J; Marks, Debora S
2014-01-01
Protein–protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein–protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein–protein interaction networks and used for interaction predictions at residue resolution. DOI: http://dx.doi.org/10.7554/eLife.03430.001 PMID:25255213
Chang, D D; Clayton, D A
1986-01-01
Transcription of the heavy strand of mouse mitochondrial DNA starts from two closely spaced, distinct sites located in the displacement loop region of the genome. We report here an analysis of regulatory sequences required for faithful transcription from these two sites. Data obtained from in vitro assays demonstrated that a 51-base-pair region, encompassing nucleotides -40 to +11 of the downstream start site, contains sufficient information for accurate transcription from both start sites. Deletion of the 3' flanking sequences, including one or both start sites to -17, resulted in the initiation of transcription by the mitochondrial RNA polymerase from alternative sites within vector DNA sequences. This feature places the mouse heavy-strand promoter uniquely among other known mitochondrial promoters, all of which absolutely require cognate start sites for transcription. Comparison of the heavy-strand promoter with those of other vertebrate mitochondrial DNAs revealed a remarkably high rate of sequence divergence among species. Images PMID:3785226
A novel, privacy-preserving cryptographic approach for sharing sequencing data
Cassa, Christopher A; Miller, Rachel A; Mandl, Kenneth D
2013-01-01
Objective DNA samples are often processed and sequenced in facilities external to the point of collection. These samples are routinely labeled with patient identifiers or pseudonyms, allowing for potential linkage to identity and private clinical information if intercepted during transmission. We present a cryptographic scheme to securely transmit externally generated sequence data which does not require any patient identifiers, public key infrastructure, or the transmission of passwords. Materials and methods This novel encryption scheme cryptographically protects participant sequence data using a shared secret key that is derived from a unique subset of an individual’s genetic sequence. This scheme requires access to a subset of an individual’s genetic sequence to acquire full access to the transmitted sequence data, which helps to prevent sample mismatch. Results We validate that the proposed encryption scheme is robust to sequencing errors, population uniqueness, and sibling disambiguation, and provides sufficient cryptographic key space. Discussion Access to a set of an individual’s genotypes and a mutually agreed cryptographic seed is needed to unlock the full sequence, which provides additional sample authentication and authorization security. We present modest fixed and marginal costs to implement this transmission architecture. Conclusions It is possible for genomics researchers who sequence participant samples externally to protect the transmission of sequence data using unique features of an individual’s genetic sequence. PMID:23125421
King, Timothy L.; Eackles, Michael S.; Reshetnikov, Andrey N.
2015-01-01
Human-mediated translocations and subsequent large-scale colonization by the invasive fish rotan (Perccottus glenii Dybowski, 1877; Perciformes, Odontobutidae), also known as Amur or Chinese sleeper, has resulted in dramatic transformations of small lentic ecosystems. However, no detailed genetic information exists on population structure, levels of effective movement, or relatedness among geographic populations of P. glenii within the European part of the range. We used massively parallel genomic DNA shotgun sequencing on the semiconductor-based Ion Torrent Personal Genome Machine (PGM) sequencing platform to identify nuclear microsatellite and mitochondrial DNA sequences in P. glenii from European Russia. Here we describe the characterization of nine nuclear microsatellite loci, ascertain levels of allelic diversity, heterozygosity, and demographic status of P. glenii collected from Ilev, Russia, one of several initial introduction points in European Russia. In addition, we mapped sequence reads to the complete P. glenii mitochondrial DNA sequence to identify polymorphic regions. Nuclear microsatellite markers developed for P. glenii yielded sufficient genetic diversity to: (1) produce unique multilocus genotypes; (2) elucidate structure among geographic populations; and (3) provide unique perspectives for analysis of population sizes and historical demographics. Among 4.9 million filtered P. glenii Ion Torrent PGM sequence reads, 11,304 mapped to the mitochondrial genome (NC_020350). This resulted in 100 % coverage of this genome to a mean coverage depth of 102X. A total of 130 variable sites were observed between the publicly available genome from China and the studied composite mitochondrial genome. Among these, 82 were diagnostic and monomorphic between the mitochondrial genomes and distributed among 15 genome regions. The polymorphic sites (N = 48) were distributed among 11 mitochondrial genome regions. Our results also indicate that sequence reads generated from two three-hour runs on the Ion Torrent PGM can generate a sufficient number of nuclear and mitochondrial markers to improve understanding of the evolutionary and ecological dynamics of non-model and in particular, invasive species.
Lincoln, A; Sorock, G; Courtney, T; Wellman, H; Smith, G; Amoroso, P
2004-01-01
Objective: To determine whether narrative text in safety reports contains sufficient information regarding contributing factors and precipitating mechanisms to prioritize occupational back injury prevention strategies. Design, setting, subjects, and main outcome measures: Nine essential data elements were identified in narratives and coded sections of safety reports for each of 94 cases of back injuries to United States Army truck drivers reported to the United States Army Safety Center between 1987 and 1997. The essential elements of each case were used to reconstruct standardized event sequences. A taxonomy of the event sequences was then developed to identify common hazard scenarios and opportunities for primary interventions. Results: Coded data typically only identified five data elements (broad activity, task, event/exposure, nature of injury, and outcomes) while narratives provided additional elements (contributing factor, precipitating mechanism, primary source) essential for developing our taxonomy. Three hazard scenarios were associated with back injuries among Army truck drivers accounting for 83% of cases: struck by/against events during motor vehicle crashes; falls resulting from slips/trips or loss of balance; and overexertion from lifting activities. Conclusions: Coded data from safety investigations lacked sufficient information to thoroughly characterize the injury event. However, the combination of existing narrative text (similar to that collected by many injury surveillance systems) and coded data enabled us to develop a more complete taxonomy of injury event characteristics and identify common hazard scenarios. This study demonstrates that narrative text can provide the additional information on contributing factors and precipitating mechanisms needed to target prevention strategies. PMID:15314055
Chandler, Natalie; Best, Sunayna; Hayward, Jane; Faravelli, Francesca; Mansour, Sahar; Kivuva, Emma; Tapon, Dagmar; Male, Alison; DeVile, Catherine; Chitty, Lyn S
2018-03-29
PurposeUnexpected fetal abnormalities occur in 2-5% of pregnancies. While traditional cytogenetic and microarray approaches achieve diagnosis in around 40% of cases, lack of diagnosis in others impedes parental counseling, informed decision making, and pregnancy management. Postnatally exome sequencing yields high diagnostic rates, but relies on careful phenotyping to interpret genotype results. Here we used a multidisciplinary approach to explore the utility of rapid fetal exome sequencing for prenatal diagnosis using skeletal dysplasias as an exemplar.MethodsParents in pregnancies undergoing invasive testing because of sonographic fetal abnormalities, where multidisciplinary review considered skeletal dysplasia a likely etiology, were consented for exome trio sequencing (both parents and fetus). Variant interpretation focused on a virtual panel of 240 genes known to cause skeletal dysplasias.ResultsDefinitive molecular diagnosis was made in 13/16 (81%) cases. In some cases, fetal ultrasound findings alone were of sufficient severity for parents to opt for termination. In others, molecular diagnosis informed accurate prediction of outcome, improved parental counseling, and enabled parents to terminate or continue the pregnancy with certainty.ConclusionTrio sequencing with expert multidisciplinary review for case selection and data interpretation yields timely, high diagnostic rates in fetuses presenting with unexpected skeletal abnormalities. This improves parental counseling and pregnancy management.Genetics in Medicine advance online publication, 29 March 2018; doi:10.1038/gim.2018.30.
First Pass Annotation of Promoters on Human Chromosome 22
Scherf, Matthias; Klingenhoff, Andreas; Frech, Kornelie; Quandt, Kerstin; Schneider, Ralf; Grote, Korbinian; Frisch, Matthias; Gailus-Durner, Valérie; Seidel, Alexander; Brack-Werner, Ruth; Werner, Thomas
2001-01-01
The publication of the first almost complete sequence of a human chromosome (chromosome 22) is a major milestone in human genomics. Together with the sequence, an excellent annotation of genes was published which certainly will serve as an information resource for numerous future projects. We noted that the annotation did not cover regulatory regions; in particular, no promoter annotation has been provided. Here we present an analysis of the complete published chromosome 22 sequence for promoters. A recent breakthrough in specific in silico prediction of promoter regions enabled us to attempt large-scale prediction of promoter regions on chromosome 22. Scanning of sequence databases revealed only 20 experimentally verified promoters, of which 10 were correctly predicted by our approach. Nearly 40% of our 465 predicted promoter regions are supported by the currently available gene annotation. Promoter finding also provides a biologically meaningful method for “chromosomal scaffolding”, by which long genomic sequences can be divided into segments starting with a gene. As one example, the combination of promoter region prediction with exon/intron structure predictions greatly enhances the specificity of de novo gene finding. The present study demonstrates that it is possible to identify promoters in silico on the chromosomal level with sufficient reliability for experimental planning and indicates that a wealth of information about regulatory regions can be extracted from current large-scale (megabase) sequencing projects. Results are available on-line at http://genomatix.gsf.de/chr22/. PMID:11230158
Pandey, Gunjan; Pandey, Janmejay; Jain, Rakesh K
2006-05-01
Monitoring of micro-organisms released deliberately into the environment is essential to assess their movement during the bio-remediation process. During the last few years, DNA-based genetic methods have emerged as the preferred method for such monitoring; however, their use is restricted in cases where organisms used for bio-remediation are not well characterized or where the public domain databases do not provide sufficient information regarding their sequence. For monitoring of such micro-organisms, alternate approaches have to be undertaken. In this study, we have specifically monitored a p-nitrophenol (PNP)-degrading organism, Arthrobacter protophormiae RKJ100, using molecular methods during PNP degradation in soil microcosm. Cells were tagged with a transposon-based foreign DNA sequence prior to their introduction into PNP-contaminated microcosms. Later, this artificially introduced DNA sequence was PCR-amplified to distinguish the bio-augmented organism from the indigenous microflora during PNP bio-remediation.
Karyotype Analysis of Four Vicia Species using In Situ Hybridization with Repetitive Sequences
NAVRÁTILOVÁ, ALICE; NEUMANN, PAVEL; MACAS, JIŘÍ
2003-01-01
Mitotic chromosomes of four Vicia species (V. sativa, V. grandiflora, V. pannonica and V. narbonensis) were subjected to in situ hybridization with probes derived from conserved plant repetitive DNA sequences (18S–25S and 5S rDNA, telomeres) and genus‐specific satellite repeats (VicTR‐A and VicTR‐B). Numbers and positions of hybridization signals provided cytogenetic landmarks suitable for unambiguous identification of all chromosomes, and establishment of the karyotypes. The VicTR‐A and ‐B sequences, in particular, produced highly informative banding patterns that alone were sufficient for discrimination of all chromosomes. However, these patterns were not conserved among species and thus could not be employed for identification of homologous chromosomes. This fact, together with observed variations in positions and numbers of rDNA loci, suggests considerable divergence between karyotypes of the species studied. PMID:12770847
Ferreira, Diogo C; van der Linden, Marx G; de Oliveira, Leandro C; Onuchic, José N; de Araújo, Antônio F Pereira
2016-04-01
Recent ab initio folding simulations for a limited number of small proteins have corroborated a previous suggestion that atomic burial information obtainable from sequence could be sufficient for tertiary structure determination when combined to sequence-independent geometrical constraints. Here, we use simulations parameterized by native burials to investigate the required amount of information in a diverse set of globular proteins comprising different structural classes and a wide size range. Burial information is provided by a potential term pushing each atom towards one among a small number L of equiprobable concentric layers. An upper bound for the required information is provided by the minimal number of layers L(min) still compatible with correct folding behavior. We obtain L(min) between 3 and 5 for seven small to medium proteins with 50 ≤ Nr ≤ 110 residues while for a larger protein with Nr = 141 we find that L ≥ 6 is required to maintain native stability. We additionally estimate the usable redundancy for a given L ≥ L(min) from the burial entropy associated to the largest folding-compatible fraction of "superfluous" atoms, for which the burial term can be turned off or target layers can be chosen randomly. The estimated redundancy for small proteins with L = 4 is close to 0.8. Our results are consistent with the above-average quality of burial predictions used in previous simulations and indicate that the fraction of approachable proteins could increase significantly with even a mild, plausible, improvement on sequence-dependent burial prediction or on sequence-independent constraints that augment the detectable redundancy during simulations. © 2016 Wiley Periodicals, Inc.
Effects of temperature and mass conservation on the typical chemical sequences of hydrogen oxidation
NASA Astrophysics Data System (ADS)
Nicholson, Schuyler B.; Alaghemandi, Mohammad; Green, Jason R.
2018-01-01
Macroscopic properties of reacting mixtures are necessary to design synthetic strategies, determine yield, and improve the energy and atom efficiency of many chemical processes. The set of time-ordered sequences of chemical species are one representation of the evolution from reactants to products. However, only a fraction of the possible sequences is typical, having the majority of the joint probability and characterizing the succession of chemical nonequilibrium states. Here, we extend a variational measure of typicality and apply it to atomistic simulations of a model for hydrogen oxidation over a range of temperatures. We demonstrate an information-theoretic methodology to identify typical sequences under the constraints of mass conservation. Including these constraints leads to an improved ability to learn the chemical sequence mechanism from experimentally accessible data. From these typical sequences, we show that two quantities defining the variational typical set of sequences—the joint entropy rate and the topological entropy rate—increase linearly with temperature. These results suggest that, away from explosion limits, data over a narrow range of thermodynamic parameters could be sufficient to extrapolate these typical features of combustion chemistry to other conditions.
2012-01-01
Background As Next-Generation Sequencing data becomes available, existing hardware environments do not provide sufficient storage space and computational power to store and process the data due to their enormous size. This is and will be a frequent problem that is encountered everyday by researchers who are working on genetic data. There are some options available for compressing and storing such data, such as general-purpose compression software, PBAT/PLINK binary format, etc. However, these currently available methods either do not offer sufficient compression rates, or require a great amount of CPU time for decompression and loading every time the data is accessed. Results Here, we propose a novel and simple algorithm for storing such sequencing data. We show that, the compression factor of the algorithm ranges from 16 to several hundreds, which potentially allows SNP data of hundreds of Gigabytes to be stored in hundreds of Megabytes. We provide a C++ implementation of the algorithm, which supports direct loading and parallel loading of the compressed format without requiring extra time for decompression. By applying the algorithm to simulated and real datasets, we show that the algorithm gives greater compression rate than the commonly used compression methods, and the data-loading process takes less time. Also, The C++ library provides direct-data-retrieving functions, which allows the compressed information to be easily accessed by other C++ programs. Conclusions The SpeedGene algorithm enables the storage and the analysis of next generation sequencing data in current hardware environment, making system upgrades unnecessary. PMID:22591016
Coordination sequences and information spreading in small-world networks
NASA Astrophysics Data System (ADS)
Herrero, Carlos P.
2002-10-01
We study the spread of information in small-world networks generated from different d-dimensional regular lattices, with d=1, 2, and 3. With this purpose, we analyze by numerical simulations the behavior of the coordination sequence, e.g., the average number of sites C(n) that can be reached from a given node of the network in n steps along its bonds. For sufficiently large networks, we find an asymptotic behavior C(n)~ρn, with a constant ρ that depends on the network dimension d and on the rewiring probability p (which measures the disorder strength of a given network). A simple model of information spreading in these networks is studied, assuming that only a fraction q of the network sites are active. The number of active nodes reached in n steps has an asymptotic form λn, λ being a constant that depends on p and q, as well as on the dimension d of the underlying lattice. The information spreading presents two different regimes depending on the value of λ: For λ>1 the information propagates along the whole system, and for λ<1 the spreading is damped and the information remains confined in a limited region of the network. We discuss the connection of these results with site percolation in small-world networks.
Methods for determining the genetic affinity of microorganisms and viruses
NASA Technical Reports Server (NTRS)
Fox, George E. (Inventor); Willson, III, Richard C. (Inventor); Zhang, Zhengdong (Inventor)
2012-01-01
Selecting which sub-sequences in a database of nucleic acid such as 16S rRNA are highly characteristic of particular groupings of bacteria, microorganisms, fungi, etc. on a substantially phylogenetic tree. Also applicable to viruses comprising viral genomic RNA or DNA. A catalogue of highly characteristic sequences identified by this method is assembled to establish the genetic identity of an unknown organism. The characteristic sequences are used to design nucleic acid hybridization probes that include the characteristic sequence or its complement, or are derived from one or more characteristic sequences. A plurality of these characteristic sequences is used in hybridization to determine the phylogenetic tree position of the organism(s) in a sample. Those target organisms represented in the original sequence database and sufficient characteristic sequences can identify to the species or subspecies level. Oligonucleotide arrays of many probes are especially preferred. A hybridization signal can comprise fluorescence, chemiluminescence, or isotopic labeling, etc.; or sequences in a sample can be detected by direct means, e.g. mass spectrometry. The method's characteristic sequences can also be used to design specific PCR primers. The method uniquely identifies the phylogenetic affinity of an unknown organism without requiring prior knowledge of what is present in the sample. Even if the organism has not been previously encountered, the method still provides useful information about which phylogenetic tree bifurcation nodes encompass the organism.
2014-01-01
Background Deciphering of the information content of eukaryotic promoters has remained confined to universal landmarks and conserved sequence elements such as enhancers and transcription factor binding motifs, which are considered sufficient for gene activation and regulation. Gene-specific sequences, interspersed between the canonical transacting factor binding sites or adjoining them within a promoter, are generally taken to be devoid of any regulatory information and have therefore been largely ignored. An unanswered question therefore is, do gene-specific sequences within a eukaryotic promoter have a role in gene activation? Here, we present an exhaustive experimental analysis of a gene-specific sequence adjoining the heat shock element (HSE) in the proximal promoter of the small heat shock protein gene, αB-crystallin (cryab). These sequences are highly conserved between the rodents and the humans. Results Using human retinal pigment epithelial cells in culture as the host, we have identified a 10-bp gene-specific promoter sequence (GPS), which, unlike an enhancer, controls expression from the promoter of this gene, only when in appropriate position and orientation. Notably, the data suggests that GPS in comparison with the HSE works in a context-independent fashion. Additionally, when moved upstream, about a nucleosome length of DNA (−154 bp) from the transcription start site (TSS), the activity of the promoter is markedly inhibited, suggesting its involvement in local promoter access. Importantly, we demonstrate that deletion of the GPS results in complete loss of cryab promoter activity in transgenic mice. Conclusions These data suggest that gene-specific sequences such as the GPS, identified here, may have critical roles in regulating gene-specific activity from eukaryotic promoters. PMID:24589182
Rapid and Accurate Sequencing of Enterovirus Genomes Using MinION Nanopore Sequencer.
Wang, Ji; Ke, Yue Hua; Zhang, Yong; Huang, Ke Qiang; Wang, Lei; Shen, Xin Xin; Dong, Xiao Ping; Xu, Wen Bo; Ma, Xue Jun
2017-10-01
Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly used to sequence various viral pathogens in many clinical situations because of its long reads, portability, real-time accessibility of sequenced data, and very low initial costs. However, information is lacking on MinION sequencing of enterovirus genomes. In this proof-of-concept study using Enterovirus 71 (EV71) and Coxsackievirus A16 (CA16) strains as examples, we established an amplicon-based whole genome sequencing method using MinION. We explored the accuracy, minimum sequencing time, discrimination and high-throughput sequencing ability of MinION, and compared its performance with Sanger sequencing. Within the first minute (min) of sequencing, the accuracy of MinION was 98.5% for the single EV71 strain and 94.12%-97.33% for 10 genetically-related CA16 strains. In as little as 14 min, 99% identity was reached for the single EV71 strain, and in 17 min (on average), 99% identity was achieved for 10 CA16 strains in a single run. MinION is suitable for whole genome sequencing of enteroviruses with sufficient accuracy and fine discrimination and has the potential as a fast, reliable and convenient method for routine use. Copyright © 2017 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.
Structural genomics reveals EVE as a new ASCH/PUA-related domain
Bertonati, Claudia; Punta, Marco; Fischer, Markus; Yachdav, Guy; Forouhar, Farhad; Zhou, Weihong; Kuzin, Alexander P.; Seetharaman, Jayaraman; Abashidze, Mariam; Ramelot, Theresa A.; Kennedy, Michael A.; Cort, John R.; Belachew, Adam; Hunt, John F.; Tong, Liang; Montelione, Gaetano T.; Rost, Burkhard
2014-01-01
Summary We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE. Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links. PMID:19191354
Structural Genomics Reveals EVE as a New ASCH/PUA-Related Domain
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bertonati, C.; Punta, M; Fischer, M
2008-01-01
We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE.more » Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links.« less
Xiong, H; Campelo, D; Pollack, R J; Raoult, D; Shao, R; Alem, M; Ali, J; Bilcha, K; Barker, S C
2014-08-01
The Illumina Hiseq platform was used to sequence the entire mitochondrial coding-regions of 20 body lice, Pediculus humanus Linnaeus, and head lice, P. capitis De Geer (Phthiraptera: Pediculidae), from eight towns and cities in five countries: Ethiopia, France, China, Australia and the U.S.A. These data (∼310 kb) were used to see how much more informative entire mitochondrial coding-region sequences were than partial mitochondrial coding-region sequences, and thus to guide the design of future studies of the phylogeny, origin, evolution and taxonomy of body lice and head lice. Phylogenies were compared from entire coding-region sequences (∼15.4 kb), entire cox1 (∼1.5 kb), partial cox1 (∼700 bp) and partial cytb (∼600 bp) sequences. On the one hand, phylogenies from entire mitochondrial coding-region sequences (∼15.4 kb) were much more informative than phylogenies from entire cox1 sequences (∼1.5 kb) and partial gene sequences (∼600 to ∼700 bp). For example, 19 branches had > 95% bootstrap support in our maximum likelihood tree from the entire mitochondrial coding-regions (∼15.4 kb) whereas the tree from 700 bp cox1 had only two branches with bootstrap support > 95%. Yet, by contrast, partial cytb (∼600 bp) and partial cox1 (∼486 bp) sequences were sufficient to genotype lice to Clade A, B or C. The sequences of the mitochondrial genomes of the P. humanus, P. capitis and P. schaeffi Fahrenholz studied are in NCBI GenBank under the accession numbers KC660761-800, KC685631-6330, KC241882-97, EU219988-95, HM241895-8 and JX080388-407. © 2014 The Royal Entomological Society.
Collins, Richard A; Stajich, Jason E; Field, Deborah J; Olive, Joan E; DeAbreu, Diane M
2015-05-01
When we expressed a small (0.9 kb) nonprotein-coding transcript derived from the mitochondrial VS plasmid in the nucleus of Neurospora we found that it was efficiently spliced at one or more of eight 5' splice sites and ten 3' splice sites, which are present apparently by chance in the sequence. Further experimental and bioinformatic analyses of other mitochondrial plasmids, random sequences, and natural nuclear genes in Neurospora and other fungi indicate that fungal spliceosomes recognize a wide range of 5' splice site and branchpoint sequences and predict introns to be present at high frequency in random sequence. In contrast, analysis of intronless fungal nuclear genes indicates that branchpoint, 5' splice site and 3' splice site consensus sequences are underrepresented compared with random sequences. This underrepresentation of splicing signals is sufficient to deplete the nuclear genome of splice sites at locations that do not comprise biologically relevant introns. Thus, the splicing machinery can recognize a wide range of splicing signal sequences, but splicing still occurs with great accuracy, not because the splicing machinery distinguishes correct from incorrect introns, but because incorrect introns are substantially depleted from the genome. © 2015 Collins et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Houghton, Rebecca; Ellis, Joanna; Galiano, Monica; Clark, Tristan W; Wyllie, Sarah
2017-04-01
We describe haemagglutinin (HA) and neuraminidase (NA) sequencing in an apparent cross-site influenza A(H1N1) outbreak in renal transplant and haemodialysis patients, confirmed with whole genome sequencing (WGS). Isolates were sequenced from influenza positive individuals. Phylogenetic trees were constructed using HA and NA sequencing and subsequently WGS. Sequence data was analysed to determine genetic relatedness of viruses obtained from inpatient and outpatient cohorts and compared with epidemiological outbreak information. There were 6 patient cases of influenza in the inpatient renal ward cohort (associated with 3 deaths) and 9 patient cases in the outpatient haemodialysis unit cohort (no deaths). WGS confirmed clustered transmission of two genetically different influenza A(H1N1)pdm09 strains initially identified by analysis of HA and NA genes. WGS took longer, and in this case was not required to determine whether or not the two seemingly linked outbreaks were related. Rapid sequencing of HA and NA genes may be sufficient to aid early influenza outbreak investigation making it appealing for future outbreak investigation. However, as next generation sequencing becomes cheaper and more widely available and bioinformatics software is now freely accessible next generation whole genome analysis may increasingly become a valuable tool for real-time Influenza outbreak investigation. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
Tan, Joon Liang; Khang, Tsung Fei; Ngeow, Yun Fong; Choo, Siew Woh
2013-12-13
Mycobacterium abscessus is a rapidly growing mycobacterium that is often associated with human infections. The taxonomy of this species has undergone several revisions and is still being debated. In this study, we sequenced the genomes of 12 M. abscessus strains and used phylogenomic analysis to perform subspecies classification. A data mining approach was used to rank and select informative genes based on the relative entropy metric for the construction of a phylogenetic tree. The resulting tree topology was similar to that generated using the concatenation of five classical housekeeping genes: rpoB, hsp65, secA, recA and sodA. Additional support for the reliability of the subspecies classification came from the analysis of erm41 and ITS gene sequences, single nucleotide polymorphisms (SNPs)-based classification and strain clustering demonstrated by a variable number tandem repeat (VNTR) assay and a multilocus sequence analysis (MLSA). We subsequently found that the concatenation of a minimal set of three median-ranked genes: DNA polymerase III subunit alpha (polC), 4-hydroxy-2-ketovalerate aldolase (Hoa) and cell division protein FtsZ (ftsZ), is sufficient to recover the same tree topology. PCR assays designed specifically for these genes showed that all three genes could be amplified in the reference strain of M. abscessus ATCC 19977T. This study provides proof of concept that whole-genome sequence-based data mining approach can provide confirmatory evidence of the phylogenetic informativeness of existing markers, as well as lead to the discovery of a more economical and informative set of markers that produces similar subspecies classification in M. abscessus. The systematic procedure used in this study to choose the informative minimal set of gene markers can potentially be applied to species or subspecies classification of other bacteria.
Ethical issues in consumer genome sequencing: Use of consumers' samples and data
Niemiec, Emilia; Howard, Heidi Carmen
2016-01-01
High throughput approaches such as whole genome sequencing (WGS) and whole exome sequencing (WES) create an unprecedented amount of data providing powerful resources for clinical care and research. Recently, WGS and WES services have been made available by commercial direct-to-consumer (DTC) companies. The DTC offer of genetic testing (GT) has already brought attention to potentially problematic issues such as the adequacy of consumers' informed consent and transparency of companies' research activities. In this study, we analysed the websites of four DTC GT companies offering WGS and/or WES with regard to their policies governing storage and future use of consumers' data and samples. The results are discussed in relation to recommendations and guiding principles such as the “Statement of the European Society of Human Genetics on DTC GT for health-related purposes” (2010) and the “Framework for responsible sharing of genomic and health-related data” (Global Alliance for Genomics and Health, 2014). The analysis reveals that some companies may store and use consumers' samples or sequencing data for unspecified research and share the data with third parties. Moreover, the companies do not provide sufficient or clear information to consumers about this, which can undermine the validity of the consent process. Furthermore, while all companies state that they provide privacy safeguards for data and mention the limitations of these, information about the possibility of re-identification is lacking. Finally, although the companies that may conduct research do include information regarding proprietary claims and commercialisation of the results, it is not clear whether consumers are aware of the consequences of these policies. These results indicate that DTC GT companies still need to improve the transparency regarding handling of consumers' samples and data, including having an explicit and clear consent process for research activities. PMID:27047756
Sma3s: a three-step modular annotator for large sequence datasets.
Muñoz-Mérida, Antonio; Viguera, Enrique; Claros, M Gonzalo; Trelles, Oswaldo; Pérez-Pulido, Antonio J
2014-08-01
Automatic sequence annotation is an essential component of modern 'omics' studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can be difficult to define a similarity threshold that achieves sufficient coverage without sacrificing annotation quality. Defining the correct configuration is critical and can be challenging for non-specialist users. Thus, the development of robust automatic annotation techniques that generate high-quality annotations without needing expert knowledge would be very valuable for the research community. We present Sma3s, a tool for automatically annotating very large collections of biological sequences from any kind of gene library or genome. Sma3s is composed of three modules that progressively annotate query sequences using either: (i) very similar homologues, (ii) orthologous sequences or (iii) terms enriched in groups of homologous sequences. We trained the system using several random sets of known sequences, demonstrating average sensitivity and specificity values of ~85%. In conclusion, Sma3s is a versatile tool for high-throughput annotation of a wide variety of sequence datasets that outperforms the accuracy of other well-established annotation algorithms, and it can enrich existing database annotations and uncover previously hidden features. Importantly, Sma3s has already been used in the functional annotation of two published transcriptomes. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Rapid resistome mapping using nanopore sequencing
Imamovic, Lejla; Hashim Ellabaan, Mostafa M.; van Schaik, Willem; Koza, Anna
2017-01-01
Abstract The emergence of antibiotic resistance in human pathogens has become a major threat to modern medicine. The outcome of antibiotic treatment can be affected by the composition of the gut. Accordingly, knowledge of the gut resistome composition could enable more effective and individualized treatment of bacterial infections. Yet, rapid workflows for resistome characterization are lacking. To address this challenge we developed the poreFUME workflow that deploys functional metagenomic selections and nanopore sequencing to resistome mapping. We demonstrate the approach by functionally characterizing the gut resistome of an ICU (intensive care unit) patient. The accuracy of the poreFUME pipeline is with >97% sufficient for the annotation of antibiotic resistance genes. The poreFUME pipeline provides a promising approach for efficient resistome profiling that could inform antibiotic treatment decisions in the future. PMID:28062856
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harwood, Caroline S.
Rhodopseudomonas palustris is a common soil and water bacterium that makes its living by converting sunlight to cellular energy and by absorbing atmospheric carbon dioxide and converting it to biomass. This microbe can also degrade and recycle components of the woody tissues of plants, wood being the most abundant polymer on earth. Because of its intimate involvement in carbon management and recycling, R. palustris was selected by the DOE Carbon Management Program to have its genome sequenced by the Joint Genome Institute (JGI). This award provided funds for the preparation of R. palustris genomic DNA which was then supplied tomore » the JGI in sufficient amounts to enable the complete sequencing of the R. palustris genome. The PI also supplied the JGI with technical information about the molecular biology of R. palustris.« less
Haller, Sebastian; Eller, Christoph; Hermes, Julia; Kaase, Martin; Steglich, Matthias; Radonić, Aleksandar; Dabrowski, Piotr Wojtek; Nitsche, Andreas; Pfeifer, Yvonne; Werner, Guido; Wunderle, Werner; Velasco, Edward; Abu Sin, Muna; Eckmanns, Tim; Nübel, Ulrich
2015-01-01
Objective We aimed to retrospectively reconstruct the timing of transmission events and pathways in order to understand why extensive preventive measures and investigations were not sufficient to prevent new cases. Methods We extracted available information from patient charts to describe cases and to compare them to the normal population of the ward. We conducted a cohort study to identify risk factors for pathogen acquisition. We sequenced the available isolates to determine the phylogenetic relatedness of Klebsiella pneumoniae isolates on the basis of their genome sequences. Results The investigation comprises 37 cases and the 10 cases with ESBL (extended-spectrum beta-lactamase)-producing K. pneumoniae bloodstream infection. Descriptive epidemiology indicated that a continuous transmission from person to person was most likely. Results from the cohort study showed that ‘frequent manipulation’ (a proxy for increased exposure to medical procedures) was significantly associated with being a case (RR 1.44, 95% CI 1.02 to 2.19). Genome sequences revealed that all 48 bacterial isolates available for sequencing from 31 cases were closely related (maximum genetic distance, 12 single nucleotide polymorphisms). Based on our calculation of evolutionary rate and sequence diversity, we estimate that the outbreak strain was endemic since 2008. Conclusions Epidemiological and phylogenetic analyses consistently indicated that there were additional, undiscovered cases prior to the onset of microbiological screening and that the spread of the pathogen remained undetected over several years, driven predominantly by person-to-person transmission. Whole-genome sequencing provided valuable information on the onset, course and size of the outbreak, and on possible ways of transmission. PMID:25967999
Top-down analysis of protein samples by de novo sequencing techniques.
Vyatkina, Kira; Wu, Si; Dekker, Lennard J M; VanDuijn, Martijn M; Liu, Xiaowen; Tolić, Nikola; Luider, Theo M; Paša-Tolić, Ljiljana; Pevzner, Pavel A
2016-09-15
Recent technological advances have made high-resolution mass spectrometers affordable to many laboratories, thus boosting rapid development of top-down mass spectrometry, and implying a need in efficient methods for analyzing this kind of data. We describe a method for analysis of protein samples from top-down tandem mass spectrometry data, which capitalizes on de novo sequencing of fragments of the proteins present in the sample. Our algorithm takes as input a set of de novo amino acid strings derived from the given mass spectra using the recently proposed Twister approach, and combines them into aggregated strings endowed with offsets. The former typically constitute accurate sequence fragments of sufficiently well-represented proteins from the sample being analyzed, while the latter indicate their location in the protein sequence, and also bear information on post-translational modifications and fragmentation patterns. Freely available on the web at http://bioinf.spbau.ru/en/twister vyatkina@spbau.ru or ppevzner@ucsd.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity
NASA Technical Reports Server (NTRS)
Fox, G. E.; Wisotzkey, J. D.; Jurtshuk, P. Jr
1992-01-01
16S rRNA (genes coding for rRNA) sequence comparisons were conducted with the following three psychrophilic strains: Bacillus globisporus W25T (T = type strain) and Bacillus psychrophilus W16AT, and W5. These strains exhibited more than 99.5% sequence identity and within experimental uncertainty could be regarded as identical. Their close taxonomic relationship was further documented by phenotypic similarities. In contrast, previously published DNA-DNA hybridization results have convincingly established that these strains do not belong to the same species if current standards are used. These results emphasize the important point that effective identity of 16S rRNA sequences is not necessarily a sufficient criterion to guarantee species identity. Thus, although 16S rRNA sequences can be used routinely to distinguish and establish relationships between genera and well-resolved species, very recently diverged species may not be recognizable.
de la Bastide, Paul Y; Leung, Wai Lam; Hintz, William E
2015-01-01
The ITS region of the rDNA gene was compared for Saprolegnia spp. in order to improve our understanding of nucleotide sequence variability within and between species of this genus, determine species composition in Canadian fin fish aquaculture facilities, and to assess the utility of ITS sequence variability in genetic marker development. From a collection of more than 400 field isolates, ITS region nucleotide sequences were studied and it was determined that there was sufficient consistent inter-specific variation to support the designation of species identity based on ITS sequence data. This non-subjective approach to species identification does not rely upon transient morphological features. Phylogenetic analyses comparing our ITS sequences and species designations with data from previous studies generally supported the clade scheme of Diéguez-Uribeondo et al. (2007) and found agreement with the molecular taxonomic cluster system of Sandoval-Sierra et al. (2014). Our Canadian ITS sequence collection will thus contribute to the public database and assist the clarification of Saprolegnia spp. taxonomy. The analysis of ITS region sequence variability facilitated genus- and species-level identification of unknown samples from aquaculture facilities and provided useful information on species composition. A unique ITS-RFLP for the identification of S. parasitica was also described. Copyright © 2014 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Merkley, Eric D.; Sego, Landon H.; Lin, Andy
Adaptive processes in bacterial species can occur rapidly in laboratory culture, leading to genetic divergence between naturally occurring and laboratory-adapted strains. Differentiating wild and closely-related laboratory strains is clearly important for biodefense and bioforensics; however, DNA sequence data alone has thus far not provided a clear signature, perhaps due to lack of understanding of how diverse genome changes lead to adapted phenotypes. Protein abundance profiles from mass spectrometry-based proteomics analyses are a molecular measure of phenotype. Proteomics data contains sufficient information that powerful statistical methods can uncover signatures that distinguish wild strains of Yersinia pestis from laboratory-adapted strains.
Numeric promoter description - A comparative view on concepts and general application.
Beier, Rico; Labudde, Dirk
2016-01-01
Nucleic acid molecules play a key role in a variety of biological processes. Starting from storage and transfer tasks, this also comprises the triggering of biological processes, regulatory effects and the active influence gained by target binding. Based on the experimental output (in this case promoter sequences), further in silico analyses aid in gaining new insights into these processes and interactions. The numerical description of nucleic acids thereby constitutes a bridge between the concrete biological issues and the analytical methods. Hence, this study compares 26 descriptor sets obtained by applying well-known numerical description concepts to an established dataset of 38 DNA promoter sequences. The suitability of the description sets was evaluated by computing partial least squares regression models and assessing the model accuracy. We conclude that the major importance regarding the descriptive power is attached to positional information rather than to explicitly incorporated physico-chemical information, since a sufficient amount of implicit physico-chemical information is already encoded in the nucleobase classification. The regression models especially benefited from employing the information that is encoded in the sequential and structural neighborhood of the nucleobases. Thus, the analyses of n-grams (short fragments of length n) suggested that they are valuable descriptors for DNA target interactions. A mixed n-gram descriptor set thereby yielded the best description of the promoter sequences. The corresponding regression model was checked and found to be plausible as it was able to reproduce the characteristic binding motifs of promoter sequences in a reasonable degree. As most functional nucleic acids are based on the principle of molecular recognition, the findings are not restricted to promoter sequences, but can rather be transferred to other kinds of functional nucleic acids. Thus, the concepts presented in this study could provide advantages for future nucleic acid-based technologies, like biosensoring, therapeutics and molecular imaging. Copyright © 2015 Elsevier Inc. All rights reserved.
Informative priors based on transcription factor structural class improve de novo motif discovery.
Narlikar, Leelavati; Gordân, Raluca; Ohler, Uwe; Hartemink, Alexander J
2006-07-15
An important problem in molecular biology is to identify the locations at which a transcription factor (TF) binds to DNA, given a set of DNA sequences believed to be bound by that TF. In previous work, we showed that information in the DNA sequence of a binding site is sufficient to predict the structural class of the TF that binds it. In particular, this suggests that we can predict which locations in any DNA sequence are more likely to be bound by certain classes of TFs than others. Here, we argue that traditional methods for de novo motif finding can be significantly improved by adopting an informative prior probability that a TF binding site occurs at each sequence location. To demonstrate the utility of such an approach, we present priority, a powerful new de novo motif finding algorithm. Using data from TRANSFAC, we train three classifiers to recognize binding sites of basic leucine zipper, forkhead, and basic helix loop helix TFs. These classifiers are used to equip priority with three class-specific priors, in addition to a default prior to handle TFs of other classes. We apply priority and a number of popular motif finding programs to sets of yeast intergenic regions that are reported by ChIP-chip to be bound by particular TFs. priority identifies motifs the other methods fail to identify, and correctly predicts the structural class of the TF recognizing the identified binding sites. Supplementary material and code can be found at http://www.cs.duke.edu/~amink/.
Quinn, J S; Guglich, E; Seutin, G; Lau, R; Marsolais, J; Parna, L; Boag, P T; White, B N
1992-02-01
The first tandemly repeated sequence examined in a passerine bird, a 431-bp PstI fragment named pMAT1, has been cloned from the genome of the brown-headed cowbird (Molothrus ater). The sequence represents about 5-10% of the genome (about 4 x 10(5) copies) and yields prominent ethidium bromide stained bands when genomic DNA cut with a variety of restriction enzymes is electrophoresed in agarose gels. A particularly striking ladder of fragments is apparent when the DNA is cut with HinfI, indicative of a tandem arrangement of the monomer. The cloned PstI monomer has been sequenced, revealing no internal repeated structure. There are sequences that hybridize with pMAT1 found in related nine-primaried oscines but not in more distantly related oscines, suboscines, or nonpasserine species. Little sequence similarity to tandemly repeated PstI cut sequences from the merlin (Falco columbarius), saurus crane (Grus antigone), or Puerto Rican parrot (Amazona vittata) or to HinfI digested sequence from the Toulouse goose (Anser anser) was detected. The isolated sequence was used as a probe to examine DNA samples of eight members of the tribe Icterini. This examination revealed phylogenetically informative characters. The repeat contains cutting sites from a number of restriction enzymes, which, if sufficiently polymorphic, would provide new phylogenetic characters. Sequences like these, conserved within a species, but variable between closely related species, may be very useful for phylogenetic studies of closely related taxa.
Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq
Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru
2015-01-01
Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593
Extraction of High Molecular Weight DNA from Fungal Rust Spores for Long Read Sequencing.
Schwessinger, Benjamin; Rathjen, John P
2017-01-01
Wheat rust fungi are complex organisms with a complete life cycle that involves two different host plants and five different spore types. During the asexual infection cycle on wheat, rusts produce massive amounts of dikaryotic urediniospores. These spores are dikaryotic (two nuclei) with each nucleus containing one haploid genome. This dikaryotic state is likely to contribute to their evolutionary success, making them some of the major wheat pathogens globally. Despite this, most published wheat rust genomes are highly fragmented and contain very little haplotype-specific sequence information. Current long-read sequencing technologies hold great promise to provide more contiguous and haplotype-phased genome assemblies. Long reads are able to span repetitive regions and phase structural differences between the haplomes. This increased genome resolution enables the identification of complex loci and the study of genome evolution beyond simple nucleotide polymorphisms. Long-read technologies require pure high molecular weight DNA as an input for sequencing. Here, we describe a DNA extraction protocol for rust spores that yields pure double-stranded DNA molecules with molecular weight of >50 kilo-base pairs (kbp). The isolated DNA is of sufficient purity for PacBio long-read sequencing, but may require additional purification for other sequencing technologies such as Nanopore and 10× Genomics.
Genotyping of ancient Mycobacterium tuberculosis strains reveals historic genetic diversity.
Müller, Romy; Roberts, Charlotte A; Brown, Terence A
2014-04-22
The evolutionary history of the Mycobacterium tuberculosis complex (MTBC) has previously been studied by analysis of sequence diversity in extant strains, but not addressed by direct examination of strain genotypes in archaeological remains. Here, we use ancient DNA sequencing to type 11 single nucleotide polymorphisms and two large sequence polymorphisms in the MTBC strains present in 10 archaeological samples from skeletons from Britain and Europe dating to the second-nineteenth centuries AD. The results enable us to assign the strains to groupings and lineages recognized in the extant MTBC. We show that at least during the eighteenth-nineteenth centuries AD, strains of M. tuberculosis belonging to different genetic groups were present in Britain at the same time, possibly even at a single location, and we present evidence for a mixed infection in at least one individual. Our study shows that ancient DNA typing applied to multiple samples can provide sufficiently detailed information to contribute to both archaeological and evolutionary knowledge of the history of tuberculosis.
Besnard, Fabrice; Koutsovoulos, Georgios; Dieudonné, Sana; Blaxter, Mark; Félix, Marie-Anne
2017-01-01
Mapping-by-sequencing has become a standard method to map and identify phenotype-causing mutations in model species. Here, we show that a fragmented draft assembly is sufficient to perform mapping-by-sequencing in nonmodel species. We generated a draft assembly and annotation of the genome of the free-living nematode Oscheius tipulae, a distant relative of the model Caenorhabditis elegans. We used this draft to identify the likely causative mutations at the O. tipulae cov-3 locus, which affect vulval development. The cov-3 locus encodes the O. tipulae ortholog of C. elegans mig-13, and we further show that Cel-mig-13 mutants also have an unsuspected vulval-development phenotype. In a virtuous circle, we were able to use the linkage information collected during mutant mapping to improve the genome assembly. These results showcase the promise of genome-enabled forward genetics in nonmodel species. PMID:28630114
Besnard, Fabrice; Koutsovoulos, Georgios; Dieudonné, Sana; Blaxter, Mark; Félix, Marie-Anne
2017-08-01
Mapping-by-sequencing has become a standard method to map and identify phenotype-causing mutations in model species. Here, we show that a fragmented draft assembly is sufficient to perform mapping-by-sequencing in nonmodel species. We generated a draft assembly and annotation of the genome of the free-living nematode Oscheius tipulae , a distant relative of the model Caenorhabditis elegans We used this draft to identify the likely causative mutations at the O. tipulae cov -3 locus, which affect vulval development. The cov-3 locus encodes the O. tipulae ortholog of C. elegans mig-13 , and we further show that Cel-mig-13 mutants also have an unsuspected vulval-development phenotype. In a virtuous circle, we were able to use the linkage information collected during mutant mapping to improve the genome assembly. These results showcase the promise of genome-enabled forward genetics in nonmodel species. Copyright © 2017 by the Genetics Society of America.
Botti, Sara; Giuffra, Elisabetta
2010-08-23
DNA barcodes are a global standard for species identification and have countless applications in the medical, forensic and alimentary fields, but few barcoding methods work efficiently in samples in which DNA is degraded, e.g. foods and archival specimens. This limits the choice of target regions harbouring a sufficient number of diagnostic polymorphisms. The method described here uses existing PCR and sequencing methodologies to detect mitochondrial DNA polymorphisms in complex matrices such as foods. The reported application allowed the discrimination among 17 fish species of the Scombridae family with high commercial interest such as mackerels, bonitos and tunas which are often present in processed seafood. The approach can be easily upgraded with the release of new genetic diversity information to increase the range of detected species. Cocktail of primers are designed for PCR using publicly available sequences of the target sequence. They are composed of a fixed 5' region and of variable 3' cocktail portions that allow amplification of any member of a group of species of interest. The population of short amplicons is directly sequenced and indexed using primers containing a longer 5' region and the non polymorphic portion of the cocktail portion. A 226 bp region of CytB was selected as target after collection and screening of 148 online sequences; 85 SNPs were found, of which 75 were present in at least two sequences. Primers were also designed for two shorter sub-fragments that could be amplified from highly degraded samples. The test was used on 103 samples of seafood (canned tuna and scomber, tuna salad, tuna sauce) and could successfully detect the presence of different or additional species that were not identified on the labelling of canned tuna, tuna salad and sauce samples. The described method is largely independent of the degree of degradation of DNA source and can thus be applied to processed seafood. Moreover, the method is highly flexible: publicly available sequence information on mitochondrial genomes are rapidly increasing for most species, facilitating the choice of target sequences and the improvement of resolution of the test. This is particularly important for discrimination of marine and aquaculture species for which genome information is still limited.
Exploring the sequence-structure protein landscape in the glycosyltransferase family
Zhang, Ziding; Kochhar, Sunil; Grigorov, Martin
2003-01-01
To understand the molecular basis of glycosyltransferases’ (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D-PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence-similarity search methods (i.e., BLAST and PSI-BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI-BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family. PMID:14500887
Phylogenetic Placement of Exact Amplicon Sequences Improves Associations with Clinical Information
McDonald, Daniel; Gonzalez, Antonio; Navas-Molina, Jose A.; Jiang, Lingjing; Xu, Zhenjiang Zech; Winker, Kevin; Kado, Deborah M.; Orwoll, Eric; Manary, Mark; Mirarab, Siavash
2018-01-01
ABSTRACT Recent algorithmic advances in amplicon-based microbiome studies enable the inference of exact amplicon sequence fragments. These new methods enable the investigation of sub-operational taxonomic units (sOTU) by removing erroneous sequences. However, short (e.g., 150-nucleotide [nt]) DNA sequence fragments do not contain sufficient phylogenetic signal to reproduce a reasonable tree, introducing a barrier in the utilization of critical phylogenetically aware metrics such as Faith’s PD or UniFrac. Although fragment insertion methods do exist, those methods have not been tested for sOTUs from high-throughput amplicon studies in insertions against a broad reference phylogeny. We benchmarked the SATé-enabled phylogenetic placement (SEPP) technique explicitly against 16S V4 sequence fragments and showed that it outperforms the conceptually problematic but often-used practice of reconstructing de novo phylogenies. In addition, we provide a BSD-licensed QIIME2 plugin (https://github.com/biocore/q2-fragment-insertion) for SEPP and integration into the microbial study management platform QIITA. IMPORTANCE The move from OTU-based to sOTU-based analysis, while providing additional resolution, also introduces computational challenges. We demonstrate that one popular method of dealing with sOTUs (building a de novo tree from the short sequences) can provide incorrect results in human gut metagenomic studies and show that phylogenetic placement of the new sequences with SEPP resolves this problem while also yielding other benefits over existing methods. PMID:29719869
Mass loss from red giants - Infrared spectroscopy
NASA Technical Reports Server (NTRS)
Wannier, P. G.
1985-01-01
A discussion is presented of IR spectroscopy, particularly high-resolution spectroscopy in the approximately 1-20 micron band, as it impacts the study of circumstellar envelopes. The molecular bands within this region contain an enormous amount of information, especially when observed with sufficient resolution to obtain kinematic information. In a single spectrum, it is possible to resolve lines from up to 50 different rotational/vibrational levels of a given molecule and to detect several different isotopic variants. When high resolution techniques are combined with mapping techniques and/or time sequence observations of variable stars, the resulting information can paint a very detailed picture of the mass-loss phenomenon. To date, near-IR observations have been made of 20 molecular species. CO is the most widely observed molecule and useful information has been gleaned from the observed rotational excitation, kinematics, time variability and spatial structure of its lines. Examples of different observing techniques are discussed in the following sections.
Draft versus finished sequence data for DNA and protein diagnostic signature development
Gardner, Shea N.; Lam, Marisa W.; Smith, Jason R.; Torres, Clinton L.; Slezak, Tom R.
2005-01-01
Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10−3–10−5 (∼8× coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of ∼1% (3× to 6× coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures. PMID:16243783
Ferragut, Fátima; Vega, Celina G; Mauroy, Axel; Conceição-Neto, Nádia; Zeller, Mark; Heylen, Elisabeth; Uriarte, Enrique Louge; Bilbao, Gladys; Bok, Marina; Matthijnssens, Jelle; Thiry, Etienne; Badaracco, Alejandra; Parreño, Viviana
2016-06-01
Bovine noroviruses are enteric pathogens detected in fecal samples of both diarrheic and non-diarrheic calves from several countries worldwide. However, epidemiological information regarding bovine noroviruses is still lacking for many important cattle producing countries from South America. In this study, three bovine norovirus genogroup III sequences were determined by conventional RT-PCR and Sanger sequencing in feces from diarrheic dairy calves from Argentina (B4836, B4848, and B4881, all collected in 2012). Phylogenetic studies based on a partial coding region for the RNA-dependent RNA polymerase (RdRp, 503 nucleotides) of these three samples suggested that two of them (B4836 and B4881) belong to genotype 2 (GIII.2) while the third one (B4848) was more closely related to genotype 1 (GIII.1) strains. By deep sequencing, the capsid region from two of these strains could be determined. This confirmed the circulation of genotype 1 (B4848) together with the presence of another sequence (B4881) sharing its highest genetic relatedness with genotype 1, but sufficiently distant to constitute a new genotype. This latter strain was shown in silico to be a recombinant: phylogenetic divergence was detected between its RNA-dependent RNA polymerase coding sequence (genotype GIII.2) and its capsid protein coding sequence (genotype GIII.1 or a potential norovirus genotype). According to this data, this strain could be the second genotype GIII.2_GIII.1 bovine norovirus recombinant described in literature worldwide. Further analysis suggested that this strain could even be a potential norovirus GIII genotype, tentatively named GIII.4. The data provides important epidemiological and evolutionary information on bovine noroviruses circulating in South America. Copyright © 2016. Published by Elsevier B.V.
Thomas, W. Kelley; Vida, J. T.; Frisse, Linda M.; Mundo, Manuel; Baldwin, James G.
1997-01-01
To effectively integrate DNA sequence analysis and classical nematode taxonomy, we must be able to obtain DNA sequences from formalin-fixed specimens. Microdissected sections of nematodes were removed from specimens fixed in formalin, using standard protocols and without destroying morphological features. The fixed sections provided sufficient template for multiple polymerase chain reaction-based DNA sequence analyses. PMID:19274156
2011-01-01
Background Sequence homology considerations widely used to transfer functional annotation to uncharacterized protein sequences require special precautions in the case of non-globular sequence segments including membrane-spanning stretches composed of non-polar residues. Simple, quantitative criteria are desirable for identifying transmembrane helices (TMs) that must be included into or should be excluded from start sequence segments in similarity searches aimed at finding distant homologues. Results We found that there are two types of TMs in membrane-associated proteins. On the one hand, there are so-called simple TMs with elevated hydrophobicity, low sequence complexity and extraordinary enrichment in long aliphatic residues. They merely serve as membrane-anchoring device. In contrast, so-called complex TMs have lower hydrophobicity, higher sequence complexity and some functional residues. These TMs have additional roles besides membrane anchoring such as intra-membrane complex formation, ligand binding or a catalytic role. Simple and complex TMs can occur both in single- and multi-membrane-spanning proteins essentially in any type of topology. Whereas simple TMs have the potential to confuse searches for sequence homologues and to generate unrelated hits with seemingly convincing statistical significance, complex TMs contain essential evolutionary information. Conclusion For extending the homology concept onto membrane proteins, we provide a necessary quantitative criterion to distinguish simple TMs (and a sufficient criterion for complex TMs) in query sequences prior to their usage in homology searches based on assessment of hydrophobicity and sequence complexity of the TM sequence segments. Reviewers This article was reviewed by Shamil Sunyaev, L. Aravind and Arcady Mushegian. PMID:22024092
Vears, D F; Niemiec, E; Howard, H C; Borry, P
2018-06-10
Whole exome and whole genome sequencing are increasingly being offered to patients in the clinical setting. Yet, the question of whether, and to what extent, unsolicited findings (UF) and/or secondary findings (SF) should be returned to patients remains open and little is known about how diagnostic consent forms address this issue. We systematically identified consent forms for diagnostic genomic sequencing online and used inductive content analysis to determine if and how they discuss reporting of UF and SF, and whether patients are given options regarding the return of these results. Fifty-four forms representing 38 laboratories/clinics were analyzed. A quarter of the forms did not mention UF or SF. Forms used a variety of terms to discuss UF and SF, sometimes using these interchangeably or incorrectly. Reporting policies for UF varied: five forms stated that UF will not be returned, 15 indicated UF may be returned, and 28 did not specify their policy. One-third indicated their laboratory returns SF. Addressing inconsistent terminology and providing sufficient information about UF/SF in consent forms will increase patient understanding and help ensure adequate informed consent. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Chen, Mingchen; Lin, Xingcheng; Zheng, Weihua; Onuchic, José N; Wolynes, Peter G
2016-08-25
The associative memory, water mediated, structure and energy model (AWSEM) is a coarse-grained force field with transferable tertiary interactions that incorporates local in sequence energetic biases using bioinformatically derived structural information about peptide fragments with locally similar sequences that we call memories. The memory information from the protein data bank (PDB) database guides proper protein folding. The structural information about available sequences in the database varies in quality and can sometimes lead to frustrated free energy landscapes locally. One way out of this difficulty is to construct the input fragment memory information from all-atom simulations of portions of the complete polypeptide chain. In this paper, we investigate this approach first put forward by Kwac and Wolynes in a more complete way by studying the structure prediction capabilities of this approach for six α-helical proteins. This scheme which we call the atomistic associative memory, water mediated, structure and energy model (AAWSEM) amounts to an ab initio protein structure prediction method that starts from the ground up without using bioinformatic input. The free energy profiles from AAWSEM show that atomistic fragment memories are sufficient to guide the correct folding when tertiary forces are included. AAWSEM combines the efficiency of coarse-grained simulations on the full protein level with the local structural accuracy achievable from all-atom simulations of only parts of a large protein. The results suggest that a hybrid use of atomistic fragment memory and database memory in structural predictions may well be optimal for many practical applications.
Ensemble codes involving hippocampal neurons are at risk during delayed performance tests.
Hampson, R E; Deadwyler, S A
1996-11-26
Multielectrode recording techniques were used to record ensemble activity from 10 to 16 simultaneously active CA1 and CA3 neurons in the rat hippocampus during performance of a spatial delayed-nonmatch-to-sample task. Extracted sources of variance were used to assess the nature of two different types of errors that accounted for 30% of total trials. The two types of errors included ensemble "miscodes" of sample phase information and errors associated with delay-dependent corruption or disappearance of sample information at the time of the nonmatch response. Statistical assessment of trial sequences and associated "strength" of hippocampal ensemble codes revealed that miscoded error trials always followed delay-dependent error trials in which encoding was "weak," indicating that the two types of errors were "linked." It was determined that the occurrence of weakly encoded, delay-dependent error trials initiated an ensemble encoding "strategy" that increased the chances of being correct on the next trial and avoided the occurrence of further delay-dependent errors. Unexpectedly, the strategy involved "strongly" encoding response position information from the prior (delay-dependent) error trial and carrying it forward to the sample phase of the next trial. This produced a miscode type error on trials in which the "carried over" information obliterated encoding of the sample phase response on the next trial. Application of this strategy, irrespective of outcome, was sufficient to reorient the animal to the proper between trial sequence of response contingencies (nonmatch-to-sample) and boost performance to 73% correct on subsequent trials. The capacity for ensemble analyses of strength of information encoding combined with statistical assessment of trial sequences therefore provided unique insight into the "dynamic" nature of the role hippocampus plays in delay type memory tasks.
Language extraction from zinc sulfide
NASA Astrophysics Data System (ADS)
Varn, Dowman Parks
2001-09-01
Recent advances in the analysis of one-dimensional temporal and spacial series allow for detailed characterization of disorder and computation in physical systems. One such system that has defied theoretical understanding since its discovery in 1912 is polytypism. Polytypes are layered compounds, exhibiting crystallinity in two dimensions, yet having complicated stacking sequences in the third direction. They can show both ordered and disordered sequences, sometimes each in the same specimen. We demonstrate a method for extracting two-layer correlation information from ZnS diffraction patterns and employ a novel technique for epsilon-machine reconstruction. We solve a long-standing problem---that of determining structural information for disordered materials from their diffraction patterns---for this special class of disorder. Our solution offers the most complete possible statistical description of the disorder. Furthermore, from our reconstructed epsilon-machines we find the effective range of the interlayer interaction in these materials, as well as the configurational energy of both ordered and disordered specimens. Finally, we can determine the 'language' (in terms of the Chomsky Hierarchy) these small rocks speak, and we find that regular languages are sufficient to describe them.
View-invariant gait recognition method by three-dimensional convolutional neural network
NASA Astrophysics Data System (ADS)
Xing, Weiwei; Li, Ying; Zhang, Shunli
2018-01-01
Gait as an important biometric feature can identify a human at a long distance. View change is one of the most challenging factors for gait recognition. To address the cross view issues in gait recognition, we propose a view-invariant gait recognition method by three-dimensional (3-D) convolutional neural network. First, 3-D convolutional neural network (3DCNN) is introduced to learn view-invariant feature, which can capture the spatial information and temporal information simultaneously on normalized silhouette sequences. Second, a network training method based on cross-domain transfer learning is proposed to solve the problem of the limited gait training samples. We choose the C3D as the basic model, which is pretrained on the Sports-1M and then fine-tune C3D model to adapt gait recognition. In the recognition stage, we use the fine-tuned model to extract gait features and use Euclidean distance to measure the similarity of gait sequences. Sufficient experiments are carried out on the CASIA-B dataset and the experimental results demonstrate that our method outperforms many other methods.
A 3D approximate maximum likelihood localization solver
DOE Office of Scientific and Technical Information (OSTI.GOV)
2016-09-23
A robust three-dimensional solver was needed to accurately and efficiently estimate the time sequence of locations of fish tagged with acoustic transmitters and vocalizing marine mammals to describe in sufficient detail the information needed to assess the function of dam-passage design alternatives and support Marine Renewable Energy. An approximate maximum likelihood solver was developed using measurements of time difference of arrival from all hydrophones in receiving arrays on which a transmission was detected. Field experiments demonstrated that the developed solver performed significantly better in tracking efficiency and accuracy than other solvers described in the literature.
Wei, Yunzhou; Chesne, Megan T.; Terns, Rebecca M.; Terns, Michael P.
2015-01-01
CRISPR-Cas systems are RNA-based immune systems that protect prokaryotes from invaders such as phages and plasmids. In adaptation, the initial phase of the immune response, short foreign DNA fragments are captured and integrated into host CRISPR loci to provide heritable defense against encountered foreign nucleic acids. Each CRISPR contains a ∼100–500 bp leader element that typically includes a transcription promoter, followed by an array of captured ∼35 bp sequences (spacers) sandwiched between copies of an identical ∼35 bp direct repeat sequence. New spacers are added immediately downstream of the leader. Here, we have analyzed adaptation to phage infection in Streptococcus thermophilus at the CRISPR1 locus to identify cis-acting elements essential for the process. We show that the leader and a single repeat of the CRISPR locus are sufficient for adaptation in this system. Moreover, we identified a leader sequence element capable of stimulating adaptation at a dormant repeat. We found that sequences within 10 bp of the site of integration, in both the leader and repeat of the CRISPR, are required for the process. Our results indicate that information at the CRISPR leader-repeat junction is critical for adaptation in this Type II-A system and likely other CRISPR-Cas systems. PMID:25589547
Non-Genomic Origins of Proteins and Metabolism
NASA Technical Reports Server (NTRS)
Pohorille, Andrew
2003-01-01
It is proposed that evolution of inanimate matter to cells endowed with a nucleic acid- based coding of genetic information was preceded by an evolutionary phase, in which peptides not coded by nucleic acids were able to self-organize into networks capable of evolution towards increasing metabolic complexity. Recent findings that truly different, simple peptides (Keefe and Szostak, 2001) can perform the same function (such as ATP binding) provide experimental support for this mechanism of early protobiological evolution. The central concept underlying this mechanism is that the reproduction of cellular functions alone was sufficient for self-maintenance of protocells, and that self- replication of macromolecules was not required at this stage of evolution. The precise transfer of information between successive generations of the earliest protocells was unnecessary and, possibly, undesirable. The key requirement in the initial stage of protocellular evolution was an ability to rapidly explore a large number of protein sequences in order to discover a set of molecules capable of supporting self- maintenance and growth of protocells. Undoubtedly, the essential protocellular functions were carried out by molecules not nearly as efficient or as specific as contemporary proteins. Many, potentially unrelated sequences could have performed each of these functions at an evolutionarily acceptable level. As evolution progressed, however proteins must have performed their functions with increasing efficiency and specificity. This, in turn, put additional constraints on protein sequences and the fraction of proteins capable of performing their functions at the required level decreased. At some point, the likelihood of generating a sufficiently efficient set of proteins through a non-coded synthesis was so small that further evolution was not possible without storing information about the sequences of these proteins. Beyond this point, further evolution required coupling between proteins and informational polymers that is characteristic to all known forms of life. The emergence of such coupling must be postulated in any scenario of the origin of life, no matter whether it starts with RNA or proteins. To examine the evolutionary potential of non-genomic systems, a simple, computationally tractable model, which is still capable of capturing the essential features of the real system, has been studied computationally. Both constructive and destructive processes have been introduced into the model in a stochastic manner. Instead of assuming random reaction sets, only a suite of protobiologically plausible reactions has been considered. Peptides have been explicitly considered as protoenzymes and their catalytic efficiencies have been assigned on the basis of biochemical principles and experimental estimates. Simulations have been carried out using a novel approach (The Next Reaction Method) that is appropriate even for very low concentrations of reactants. Studies have focused on global autocatalytic processes and their diversity.
Rapid resistome mapping using nanopore sequencing.
van der Helm, Eric; Imamovic, Lejla; Hashim Ellabaan, Mostafa M; van Schaik, Willem; Koza, Anna; Sommer, Morten O A
2017-05-05
The emergence of antibiotic resistance in human pathogens has become a major threat to modern medicine. The outcome of antibiotic treatment can be affected by the composition of the gut. Accordingly, knowledge of the gut resistome composition could enable more effective and individualized treatment of bacterial infections. Yet, rapid workflows for resistome characterization are lacking. To address this challenge we developed the poreFUME workflow that deploys functional metagenomic selections and nanopore sequencing to resistome mapping. We demonstrate the approach by functionally characterizing the gut resistome of an ICU (intensive care unit) patient. The accuracy of the poreFUME pipeline is with >97% sufficient for the annotation of antibiotic resistance genes. The poreFUME pipeline provides a promising approach for efficient resistome profiling that could inform antibiotic treatment decisions in the future. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
When is Information Sufficient for Action Search with Unreliable Yet Informative Intelligence
2016-03-30
information: http://pubsonline.informs.org When Is Information Sufficient for Action? Search with Unreliable yet Informative Intelligence Michael Atkinson... Search with Unreliable yet Informative Intelligence. Operations Research Published online in Articles in Advance 30 Mar 2016 . http://dx.doi.org/10.1287...print) � ISSN 1526-5463 (online) http://dx.doi.org/10.1287/opre.2016.1488 © 2016 INFORMS When Is Information Sufficient for Action? Search with
López-Urrutia, Eduardo; Valdés, Jesús; Bonilla-Moreno, Raúl; Martínez-Salazar, Martha; Martínez-Garcia, Martha; Berumen, Jaime; Villegas-Sepúlveda, Nicolás
2012-06-01
The HPV-16 E6/E7 genes, which contain intron 1, are processed by alternative splicing and its transcripts are detected with a heterogeneous profile in tumours cells. Frequently, the HPV-16 positive carcinoma cells bear viral variants that contain single nucleotide polymorphisms into its DNA sequence. We were interested in analysing the contribution of this polymorphism to the heterogeneity in the pattern of the E6/E7 spliced transcripts. Using the E6/E7 sequences from three closely related HPV-16 variants, we have shown that a few nucleotide changes are sufficient to produce heterogeneity in the splicing profile. Furthermore, using mutants that contained a single SNP, we also showed that one nucleotide change was sufficient to reproduce the heterogeneous splicing profile. Additionally, a difference of two or three SNPs among these viral sequences was sufficient to recruit differentially several splicing factors to the polymorphic E6/E7 transcripts. Moreover, only one SNP was sufficient to alter the binding site of at least one splicing factor, changing the ability of splicing factors to bind the transcript. Finally, the factors that were differentially bound to the short form of intron 1 of one of these E6/E7 variants were identified as TIA1 and/or TIAR and U1-70k, while U2AF65, U5-52k and PTB were preferentially bound to the transcript of the other variants. Copyright © 2012 Elsevier B.V. All rights reserved.
Report on the Human Genome Initiative for the Office of Health and Environmental Research
DOE R&D Accomplishments Database
Tinoco, I.; Cahill, G.; Cantor, C.; Caskey, T.; Dulbecco, R.; Engelhardt, D. L.; Hood, L.; Lerman, L. S.; Mendelsohn, M. L.; Sinsheimer, R. L.; Smith, T.; Soll, D.; Stormo, G.; White, R. L.
1987-04-01
The report urges DOE and the Nation to commit to a large, multi-year, multidisciplinary, technological undertaking to order and sequence the human genome. This effort will first require significant innovation in general capability to manipulate DNA, major new analytical methods for ordering and sequencing, theoretical developments in computer science and mathematical biology, and great expansions in our ability to store and manipulate the information and to interface it with other large and diverse genetic databases. The actual ordering and sequencing involves the coordinated processing of some 3 billion bases from a reference human genome. Science is poised on the rudimentary edge of being able to read and understand human genes. A concerted, broadly based, scientific effort to provide new methods of sufficient power and scale should transform this activity from an inefficient one-gene-at-a-time, single laboratory effort into a coordinated, worldwide, comprehensive reading of "the book of man". The effort will be extraordinary in scope and magnitude, but so will be the benefit to biological understanding, new technology and the diagnosis and treatment of human disease.
Thess, Andreas; Grund, Stefanie; Mui, Barbara L; Hope, Michael J; Baumhof, Patrick; Fotin-Mleczek, Mariola; Schlake, Thomas
2015-01-01
Being a transient carrier of genetic information, mRNA could be a versatile, flexible, and safe means for protein therapies. While recent findings highlight the enormous therapeutic potential of mRNA, evidence that mRNA-based protein therapies are feasible beyond small animals such as mice is still lacking. Previous studies imply that mRNA therapeutics require chemical nucleoside modifications to obtain sufficient protein expression and avoid activation of the innate immune system. Here we show that chemically unmodified mRNA can achieve those goals as well by applying sequence-engineered molecules. Using erythropoietin (EPO) driven production of red blood cells as the biological model, engineered Epo mRNA elicited meaningful physiological responses from mice to nonhuman primates. Even in pigs of about 20 kg in weight, a single adequate dose of engineered mRNA encapsulated in lipid nanoparticles (LNPs) induced high systemic Epo levels and strong physiological effects. Our results demonstrate that sequence-engineered mRNA has the potential to revolutionize human protein therapies. PMID:26050989
NASA Astrophysics Data System (ADS)
Charles, Laurence; Cavallo, Gianni; Monnier, Valérie; Oswald, Laurence; Szweda, Roza; Lutz, Jean-François
2017-06-01
In order to improve their MS/MS sequencing, structure of sequence-controlled synthetic polymers can be optimized based on considerations regarding their fragmentation behavior in collision-induced dissociation conditions, as demonstrated here for two digitally encoded polymer families. In poly(triazole amide)s, the main dissociation route proceeded via cleavage of the amide bond in each monomer, hence allowing the chains to be safely sequenced. However, a competitive cleavage of an ether bond in a tri(ethylene glycol) spacer placed between each coding moiety complicated MS/MS spectra while not bringing new structural information. Changing the tri(ethylene glycol) spacer to an alkyl group of the same size allowed this unwanted fragmentation pathway to be avoided, hence greatly simplifying the MS/MS reading step for such undecyl-based poly(triazole amide)s. In poly(alkoxyamine phosphodiester)s, a single dissociation pathway was achieved with repeating units containing an alkoxyamine linkage, which, by very low dissociation energy, made any other chemical bonds MS/MS-silent. Structure of these polymers was further tailored to enhance the stability of those precursor ions with a negatively charged phosphate group per monomer in order to improve their MS/MS readability. Increasing the size of both the alkyl coding moiety and the nitroxide spacer allowed sufficient distance between phosphate groups for all of them to be deprotonated simultaneously. Because the charge state of product ions increased with their polymerization degree, MS/MS spectra typically exhibited groups of fragments at one or the other side of the precursor ion depending on the original α or ω end-group they contain, allowing sequence reconstruction in a straightforward manner. [Figure not available: see fulltext.
Deller, Timothy W; Khalighi, Mohammad Mehdi; Jansen, Floris P; Glover, Gary H
2018-01-01
The recent introduction of simultaneous whole-body PET/MR scanners has enabled new research taking advantage of the complementary information obtainable with PET and MRI. One such application is kinetic modeling, which requires high levels of PET quantitative stability. To accomplish the required PET stability levels, the PET subsystem must be sufficiently isolated from the effects of MR activity. Performance measurements have previously been published, demonstrating sufficient PET stability in the presence of MR pulsing for typical clinical use; however, PET stability during radiofrequency (RF)-intensive and gradient-intensive sequences has not previously been evaluated for a clinical whole-body scanner. In this work, PET stability of the GE SIGNA PET/MR was examined during simultaneous scanning of aggressive MR pulse sequences. Methods: PET performance tests were acquired with MR idle and during simultaneous MR pulsing. Recent system improvements mitigating RF interference and gain variation were used. A fast recovery fast spin echo MR sequence was selected for high RF power, and an echo planar imaging sequence was selected for its high heat-inducing gradients. Measurements were performed to determine PET stability under varying MR conditions using the following metrics: sensitivity, scatter fraction, contrast recovery, uniformity, count rate performance, and image quantitation. A final PET quantitative stability assessment for simultaneous PET scanning during functional MRI studies was performed with a spiral in-and-out gradient echo sequence. Results: Quantitation stability of a 68 Ge flood phantom was demonstrated within 0.34%. Normalized sensitivity was stable during simultaneous scanning within 0.3%. Scatter fraction measured with a 68 Ge line source in the scatter phantom was stable within the range of 40.4%-40.6%. Contrast recovery and uniformity were comparable for PET images acquired simultaneously with multiple MR conditions. Peak noise equivalent count rate was 224 kcps at an effective activity concentration of 18.6 kBq/mL, and the count rate curves and scatter fraction curve were consistent for the alternating MR pulsing states. A final test demonstrated quantitative stability during a spiral functional MRI sequence. Conclusion: PET stability metrics demonstrated that PET quantitation was not affected during simultaneous aggressive MRI. This stability enables demanding applications such as kinetic modeling. © 2018 by the Society of Nuclear Medicine and Molecular Imaging.
Neptune: a bioinformatics tool for rapid discovery of genomic variation in bacterial populations
Marinier, Eric; Zaheer, Rahat; Berry, Chrystal; Weedmark, Kelly A.; Domaratzki, Michael; Mabon, Philip; Knox, Natalie C.; Reimer, Aleisha R.; Graham, Morag R.; Chui, Linda; Patterson-Fortin, Laura; Zhang, Jian; Pagotto, Franco; Farber, Jeff; Mahony, Jim; Seyer, Karine; Bekal, Sadjia; Tremblay, Cécile; Isaac-Renton, Judy; Prystajecky, Natalie; Chen, Jessica; Slade, Peter
2017-01-01
Abstract The ready availability of vast amounts of genomic sequence data has created the need to rethink comparative genomics algorithms using ‘big data’ approaches. Neptune is an efficient system for rapidly locating differentially abundant genomic content in bacterial populations using an exact k-mer matching strategy, while accommodating k-mer mismatches. Neptune’s loci discovery process identifies sequences that are sufficiently common to a group of target sequences and sufficiently absent from non-targets using probabilistic models. Neptune uses parallel computing to efficiently identify and extract these loci from draft genome assemblies without requiring multiple sequence alignments or other computationally expensive comparative sequence analyses. Tests on simulated and real datasets showed that Neptune rapidly identifies regions that are both sensitive and specific. We demonstrate that this system can identify trait-specific loci from different bacterial lineages. Neptune is broadly applicable for comparative bacterial analyses, yet will particularly benefit pathogenomic applications, owing to efficient and sensitive discovery of differentially abundant genomic loci. The software is available for download at: http://github.com/phac-nml/neptune. PMID:29048594
NASA Astrophysics Data System (ADS)
Han, Zhaofang; Xiao, Shijun; Liu, Xiande; Liu, Yang; Li, Jiakai; Xie, Yangjie; Wang, Zhiyong
2017-03-01
The large yellow croaker, Larimichthys crocea is an important marine fish in China with a high economic value. In the last decade, the stock conservation and aquaculture industry of this species have been facing severe challenges because of wild population collapse and degeneration of important economic traits. However, genes contributing to growth and immunity in L. crocea have not been thoroughly analyzed, and available molecular markers are still not sufficient for genetic resource management and molecular selection. In this work, we sequenced the transcriptome in L. crocea liver tissue with a Roche 454 sequencing platform and assembled the transcriptome into 93 801 transcripts. Of them, 38 856 transcripts were successfully annotated in nt, nr, Swiss-Prot, InterPro, COG, GO and KEGG databases. Based on the annotation information, 3 165 unigenes related to growth and immunity were identified. Additionally, a total of 6 391 simple sequence repeats (SSRs) were identified from the transcriptome, among which 4 498 SSRs had enough flanking regions to design primers for polymerase chain reactions (PCR). To access the polymorphism of these markers, 30 primer pairs were randomly selected for PCR amplification and validation in 30 individuals, and 12 primer pairs (40.0%) exhibited obvious length polymorphisms. This work applied RNA-Seq to assemble and analyze a live transcriptome in L. crocea. With gene annotation and sequence information, genes related to growth and immunity were identified and massive SSR markers were developed, providing valuable genetic resources for future gene functional analysis and selective breeding of L. crocea.
Kiraz, Nuri; Oz, Yasemin; Aslan, Huseyin; Erturan, Zayre; Ener, Beyza; Akdagli, Sevtap Arikan; Muslumanoglu, Hamza; Cetinkaya, Zafer
2015-10-01
Although conventional identification of pathogenic fungi is based on the combination of tests evaluating their morphological and biochemical characteristics, they can fail to identify the less common species or the differentiation of closely related species. In addition these tests are time consuming, labour-intensive and require experienced personnel. We evaluated the feasibility and sufficiency of DNA extraction by Whatman FTA filter matrix technology and DNA sequencing of D1-D2 region of the large ribosomal subunit gene for identification of clinical isolates of 21 yeast and 160 moulds in our clinical mycology laboratory. While the yeast isolates were identified at species level with 100% homology, 102 (63.75%) clinically important mould isolates were identified at species level, 56 (35%) isolates at genus level against fungal sequences existing in DNA databases and two (1.25%) isolates could not be identified. Consequently, Whatman FTA filter matrix technology was a useful method for extraction of fungal DNA; extremely rapid, practical and successful. Sequence analysis strategy of D1-D2 region of the large ribosomal subunit gene was found considerably sufficient in identification to genus level for the most clinical fungi. However, the identification to species level and especially discrimination of closely related species may require additional analysis. © 2015 Blackwell Verlag GmbH.
Shuttle operations simulation model programmers'/users' manual
NASA Technical Reports Server (NTRS)
Porter, D. G.
1972-01-01
The prospective user of the shuttle operations simulation (SOS) model is given sufficient information to enable him to perform simulation studies of the space shuttle launch-to-launch operations cycle. The procedures used for modifying the SOS model to meet user requirements are described. The various control card sequences required to execute the SOS model are given. The report is written for users with varying computer simulation experience. A description of the components of the SOS model is included that presents both an explanation of the logic involved in the simulation of the shuttle operations cycle and a description of the routines used to support the actual simulation.
Event-triggered consensus tracking of multi-agent systems with Lur'e nonlinear dynamics
NASA Astrophysics Data System (ADS)
Huang, Na; Duan, Zhisheng; Wen, Guanghui; Zhao, Yu
2016-05-01
In this paper, distributed consensus tracking problem for networked Lur'e systems is investigated based on event-triggered information interactions. An event-triggered control algorithm is designed with the advantages of reducing controller update frequency and sensor energy consumption. By using tools of ?-procedure and Lyapunov functional method, some sufficient conditions are derived to guarantee that consensus tracking is achieved under a directed communication topology. Meanwhile, it is shown that Zeno behaviour of triggering time sequences is excluded for the proposed event-triggered rule. Finally, some numerical simulations on coupled Chua's circuits are performed to illustrate the effectiveness of the theoretical algorithms.
Efficient high-throughput sequencing of a laser microdissected chromosome arm
2013-01-01
Background Genomic sequence assemblies are key tools for a broad range of gene function and evolutionary studies. The diploid amphibian Xenopus tropicalis plays a pivotal role in these fields due to its combination of experimental flexibility, diploid genome, and early-branching tetrapod taxonomic position, having diverged from the amniote lineage ~360 million years ago. A genome assembly and a genetic linkage map have recently been made available. Unfortunately, large gaps in the linkage map attenuate long-range integrity of the genome assembly. Results We laser dissected the short arm of X. tropicalis chromosome 7 for next generation sequencing and computational mapping to the reference genome. This arm is of particular interest as it encodes the sex determination locus, but its genetic map contains large gaps which undermine available genome assemblies. Whole genome amplification of 15 laser-microdissected 7p arms followed by next generation sequencing yielded ~35 million reads, over four million of which uniquely mapped to the X. tropicalis genome. Our analysis placed more than 200 previously unmapped scaffolds on the analyzed chromosome arm, providing valuable low-resolution physical map information for de novo genome assembly. Conclusion We present a new approach for improving and validating genetic maps and sequence assemblies. Whole genome amplification of 15 microdissected chromosome arms provided sufficient high-quality material for localizing previously unmapped scaffolds and genes as well as recognizing mislocalized scaffolds. PMID:23714049
Molecular variation and distribution of Anopheles fluviatilis (Diptera: Culicidae) complex in Iran.
Naddaf, Saied Reza; Razavi, Mohammad Reza; Bahramali, Golnaz
2010-09-01
Anopheles fluviatilis James (Diptera: Culicidae) is one of the known malaria vectors in south and southeastern Iran. Earlier ITS2 sequences analysis of specimens from Iran demonstrated only a single genotype that was identical to species Y in India, which is also the same as species T. We identified 2 haplotypes in the An. fluviatilis populations of Iran based on differences in nucleotide sequences of D3 domain of the 28S locus of ribosomal DNA (rDNA). Comparison of sequence data from 44 Iranian specimens with those publicly available in the Genbank database showed that all of the 28S-D3 sequences from Kazeroun and Khesht regions in Fars Province were identical to the database entry representing species U in India. In other regions, all the individuals showed heterozygosity at the single nucleotide position, which identifies species U and T. It is argued that the 2 species may co-occur in some regions and hybridize; however, the heterozygosity in the 28S-D3 locus was not reflected in ITS2 sequences and this locus for all individuals was identical to species T. This study shows that in a newly diverged species, like members of An. fluviatilis complex, a single molecular marker may not be sufficiently discriminatory to identify all the taxa over a vast geographical area. In addition, other molecular markers may provide more reliable information for species discrimination.
Differential Effects of Paced and Unpaced Responding on delayed Serial Order Recall in Schizophrenia
Hill, S. Kristian; Griffin, Ginny B.; Houk, James C.; Sweeney, John A.
2011-01-01
Working memory for temporal order is a component of working memory that is especially dependent on striatal systems, but has not been extensively studied in schizophrenia. This study was designed to characterize serial order reproduction by adapting a spatial serial order task developed for nonhuman primate studies, while controlling for working memory load and whether responses were initiated freely (unpaced) or in an externally paced format. Clinically stable schizophrenia patients (n=27) and psychiatrically healthy individuals (n=25) were comparable on demographic variables and performance on standardized tests of immediate serial order recall (Digit Span, Spatial Span). No group differences were observed for serial order recall when read sequence reproduction was unpaced. However, schizophrenia patients exhibited significant impairments when responding was paced, regardless of sequence length or retention delay. Intact performance by schizophrenia patients during the unpaced condition indicates that prefrontal storage and striatal output systems are sufficiently intact to learn novel response sequences and hold them in working memory to perform serial order tasks. However, retention for newly learned response sequences was disrupted in schizophrenia patients by paced responding, when read-out of each element in the response sequence was externally controlled. The disruption of memory for serial order in paced read-out condition indicates a deficit in frontostriatal interaction characterized by an inability to update working memory stores and deconstruct ‘chunked’ information. PMID:21705197
Derkach, Andriy; Chiang, Theodore; Gong, Jiafen; Addis, Laura; Dobbins, Sara; Tomlinson, Ian; Houlston, Richard; Pal, Deb K.; Strug, Lisa J.
2014-01-01
Motivation: Sufficiently powered case–control studies with next-generation sequence (NGS) data remain prohibitively expensive for many investigators. If feasible, a more efficient strategy would be to include publicly available sequenced controls. However, these studies can be confounded by differences in sequencing platform; alignment, single nucleotide polymorphism and variant calling algorithms; read depth; and selection thresholds. Assuming one can match cases and controls on the basis of ethnicity and other potential confounding factors, and one has access to the aligned reads in both groups, we investigate the effect of systematic differences in read depth and selection threshold when comparing allele frequencies between cases and controls. We propose a novel likelihood-based method, the robust variance score (RVS), that substitutes genotype calls by their expected values given observed sequence data. Results: We show theoretically that the RVS eliminates read depth bias in the estimation of minor allele frequency. We also demonstrate that, using simulated and real NGS data, the RVS method controls Type I error and has comparable power to the ‘gold standard’ analysis with the true underlying genotypes for both common and rare variants. Availability and implementation: An RVS R script and instructions can be found at strug.research.sickkids.ca, and at https://github.com/strug-lab/RVS. Contact: lisa.strug@utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24733292
Zeng, Qiwei; Chen, Hongyu; Zhang, Chao; Han, Minjing; Li, Tian; Qi, Xiwu; Xiang, Zhonghuai; He, Ningjia
2015-01-01
Mulberry, belonging to the order Rosales, family Moraceae, and genus Morus, has received attention because of both its economic and medicinal value, as well as for its important ecological function. The genus Morus has a worldwide distribution, however, its taxonomy remains complex and disputed. Many studies have attempted to classify Morus species, resulting in varied numbers of designated Morus spp. To address this issue, we used information from internal transcribed spacer (ITS) genetic sequences to study the taxonomy of all the members of generally accepted genus Morus. We found that intraspecific 5.8S rRNA sequences were identical but that interspecific 5.8S sequences were diverse. M. alba and M. notabilis showed the shortest (215 bp) and the longest (233 bp) ITS1 sequence length, respectively. With the completion of the mulberry genome, we could identify single nucleotide polymorphisms within the ITS locus in the M. notabilis genome. From reconstruction of a phylogenetic tree based on the complete ITS data, we propose that the Morus genus should be classified into eight species, including M. alba, M. nigra, M. notabilis, M. serrata, M. celtidifolia, M. insignis, M. rubra, and M. mesozygia. Furthermore, the classification of the ITS sequences of known interspecific hybrid clones into both paternal and maternal clades indicated that ITS variation was sufficient to distinguish interspecific hybrids in the genus Morus. PMID:26266951
Conformational Entropy of Intrinsically Disordered Proteins from Amino Acid Triads
Baruah, Anupaul; Rani, Pooja; Biswas, Parbati
2015-01-01
This work quantitatively characterizes intrinsic disorder in proteins in terms of sequence composition and backbone conformational entropy. Analysis of the normalized relative composition of the amino acid triads highlights a distinct boundary between globular and disordered proteins. The conformational entropy is calculated from the dihedral angles of the middle amino acid in the amino acid triad for the conformational ensemble of the globular, partially and completely disordered proteins relative to the non-redundant database. Both Monte Carlo (MC) and Molecular Dynamics (MD) simulations are used to characterize the conformational ensemble of the representative proteins of each group. The results show that the globular proteins span approximately half of the allowed conformational states in the Ramachandran space, while the amino acid triads in disordered proteins sample the entire range of the allowed dihedral angle space following Flory’s isolated-pair hypothesis. Therefore, only the sequence information in terms of the relative amino acid triad composition may be sufficient to predict protein disorder and the backbone conformational entropy, even in the absence of well-defined structure. The predicted entropies are found to agree with those calculated using mutual information expansion and the histogram method. PMID:26138206
Bouchez, Valérie; Guglielmini, Julien; Dazas, Mélody; Landier, Annie; Toubiana, Julie; Guillot, Sophie; Criscuolo, Alexis; Brisse, Sylvain
2018-06-01
Bordetella pertussis causes whooping cough, a highly contagious respiratory disease that is reemerging in many world regions. The spread of antigen-deficient strains may threaten acellular vaccine efficacy. Dynamics of strain transmission are poorly defined because of shortcomings in current strain genotyping methods. Our objective was to develop a whole-genome genotyping strategy with sufficient resolution for local epidemiologic questions and sufficient reproducibility to enable international comparisons of clinical isolates. We defined a core genome multilocus sequence typing scheme comprising 2,038 loci and demonstrated its congruence with whole-genome single-nucleotide polymorphism variation. Most cases of intrafamilial groups of isolates or of multiple isolates recovered from the same patient were distinguished from temporally and geographically cocirculating isolates. However, epidemiologically unrelated isolates were sometimes nearly undistinguishable. We set up a publicly accessible core genome multilocus sequence typing database to enable global comparisons of B. pertussis isolates, opening the way for internationally coordinated surveillance.
[Multiplexing mapping of human cDNAs]. Final report, September 1, 1991--February 28, 1994
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
Using PCR with automated product analysis, 329 human brain cDNA sequences have been assigned to individual human chromosomes. Primers were designed from single-pass cDNA sequences expressed sequence tags (ESTs). Primers were used in PCR reactions with DNA from somatic cell hybrid mapping panels as templates, often with multiplexing. Many ESTs mapped match sequence database records. To evaluate of these matches, the position of the primers relative to the matching region (In), the BLAST scores and the Poisson probability values of the EST/sequence record match were determined. In cases where the gene product was stringently identified by the sequence match hadmore » already been mapped, the gene locus determined by EST was consistent with the previous position which strongly supports the validity of assigning unknown genes to human chromosomes based on the EST sequence matches. In the present cases mapping the ESTs to a chromosome can also be considered to have mapped the known gene product: rolipram-sensitive cAMP phosphodiesterase, chromosome 1; protein phosphatase 2A{beta}, chromosome 4; alpha-catenin, chromosome 5; the ELE1 oncogene, chromosome 10q11.2 or q2.1-q23; MXII protein, chromosome l0q24-qter; ribosomal protein L18a homologue, chromosome 14; ribosomal protein L3, chromosome 17; and moesin, Xp11-cen. There were also ESTs mapped that were closely related to non-human sequence records. These matches therefore can be considered to identify human counterparts of known gene products, or members of known gene families. Examples of these include membrane proteins, translation-associated proteins, structural proteins, and enzymes. These data then demonstrate that single pass sequence information is sufficient to design PCR primers useful for assigning cDNA sequences to human chromosomes. When the EST sequence matches previous sequence database records, the chromosome assignments of the EST can be used to make preliminary assignments of the human gene to a chromosome.« less
Ramu, P; Kassahun, B; Senthilvel, S; Ashok Kumar, C; Jayashree, B; Folkertsma, R T; Reddy, L Ananda; Kuruvinashetti, M S; Haussmann, B I G; Hash, C T
2009-11-01
The sequencing and detailed comparative functional analysis of genomes of a number of select botanical models open new doors into comparative genomics among the angiosperms, with potential benefits for improvement of many orphan crops that feed large populations. In this study, a set of simple sequence repeat (SSR) markers was developed by mining the expressed sequence tag (EST) database of sorghum. Among the SSR-containing sequences, only those sharing considerable homology with rice genomic sequences across the lengths of the 12 rice chromosomes were selected. Thus, 600 SSR-containing sorghum EST sequences (50 homologous sequences on each of the 12 rice chromosomes) were selected, with the intention of providing coverage for corresponding homologous regions of the sorghum genome. Primer pairs were designed and polymorphism detection ability was assessed using parental pairs of two existing sorghum mapping populations. About 28% of these new markers detected polymorphism in this 4-entry panel. A subset of 55 polymorphic EST-derived SSR markers were mapped onto the existing skeleton map of a recombinant inbred population derived from cross N13 x E 36-1, which is segregating for Striga resistance and the stay-green component of terminal drought tolerance. These new EST-derived SSR markers mapped across all 10 sorghum linkage groups, mostly to regions expected based on prior knowledge of rice-sorghum synteny. The ESTs from which these markers were derived were then mapped in silico onto the aligned sorghum genome sequence, and 88% of the best hits corresponded to linkage-based positions. This study demonstrates the utility of comparative genomic information in targeted development of markers to fill gaps in linkage maps of related crop species for which sufficient genomic tools are not available.
de Bruijn, Gert-Jan; Visscher, Ilse; Mollen, Saar
2015-01-01
To test the effects of descriptive norm and message framing on fruit intake (intentions) in Dutch adults. Randomized pretest-posttest study using a 2 × 2 design. Internet-based. Dutch adults recruited via leaflets and announcements on intranet and Internet and who provided immediate intention (n = 294) and 1-week follow-up intention and fruit intake data (n = 177). Messages combining information on intake of others (low vs high intake) with information about positive or negative outcomes of (in)sufficient fruit intake. Fruit intake intentions and fruit intake. Analyses of covariance. Those already consuming sufficient fruit and receiving negative information about insufficient fruit intake increased their motivation to consume sufficient fruit immediately (P = .03), but not at 1-week follow-up. Those who read positive information about sufficient fruit intake reported higher fruit consumption than those who read negative information about insufficient fruit intake (P = .03). This was stronger in those already consuming sufficient fruit. There were no effects of descriptive norm information (P > .19). Information about outcomes was more persuasive than descriptive majority norm information. Effects were generally stronger in those already consuming sufficient fruit. Copyright © 2015 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.
Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.
2011-02-01
Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.
Using string alignment in a query-by-humming system for real world applications
NASA Astrophysics Data System (ADS)
Sailer, Christian
2005-09-01
Though query by humming (i.e., retrieving music or information about music by singing a characteristic melody) has been a popular research topic during the past decade, few approaches have reached a level of usefulness beyond mere scientific interest. One of the main problems is the inherent contradiction between error tolerance and dicriminative power in conventional melody matching algorithms that rely on a melody contour approach to handle intonation or transcription errors. Adopting the string matching/alignment techniques from bioinformatics to melody sequences allows to directly assess the similarity between two melodies. This method takes an MPEG-7 compliant melody sequence (i.e., a list of note intervals and length ratios) as query and evaluates the steps necessary to transform it into the reference sequence. By introducing a musically founded cost-of-replace function and an adequate post processing, this method yields a measure for melodic similarity. Thus it is possible to construct a query by humming system that can properly discriminate between thousands of melodies and still be sufficiently error tolerant to be used by untrained singers. The robustness has been verified in extensive tests and real world applications.
Mammoth and Mastodon collagen sequences; survival and utility
NASA Astrophysics Data System (ADS)
Buckley, M.; Larkin, N.; Collins, M.
2011-04-01
Near-complete collagen (I) sequences are proposed for elephantid and mammutid taxa, based upon available African elephant genomic data and supported with LC-MALDI-MS/MS and LC-ESI-MS/MS analyses of collagen digests from proboscidean bone. Collagen sequence coverage was investigated from several specimens of two extinct mammoths ( Mammuthus trogontherii and Mammuthus primigenius), the extinct American mastodon ( Mammut americanum), the extinct straight-tusked elephant ( Elephas ( Palaeoloxodon) antiquus) and extant Asian ( Elephas maximus) and African ( Loxodonta africana) elephants and compared between the two ionization techniques used. Two suspected mammoth fossils from the British Middle Pleistocene (Cromerian) deposits of the West Runton Forest Bed were analysed to investigate the potential use of peptide mass spectrometry for fossil identification. Despite the age of the fossils, sufficient peptides were obtained to identify these as elephantid, and sufficient sequence variation to discriminate elephantid and mammutid collagen (I). In-depth LC-MS analyses further failed to identify a peptide that could be used to reliably distinguish between the three genera of elephantids ( Elephas, Loxodonta and Mammuthus), an observation consistent with predicted amino acid substitution rates between these species.
DNA Extraction Protocols for Whole-Genome Sequencing in Marine Organisms.
Panova, Marina; Aronsson, Henrik; Cameron, R Andrew; Dahl, Peter; Godhe, Anna; Lind, Ulrika; Ortega-Martinez, Olga; Pereyra, Ricardo; Tesson, Sylvie V M; Wrange, Anna-Lisa; Blomberg, Anders; Johannesson, Kerstin
2016-01-01
The marine environment harbors a large proportion of the total biodiversity on this planet, including the majority of the earths' different phyla and classes. Studying the genomes of marine organisms can bring interesting insights into genome evolution. Today, almost all marine organismal groups are understudied with respect to their genomes. One potential reason is that extraction of high-quality DNA in sufficient amounts is challenging for many marine species. This is due to high polysaccharide content, polyphenols and other secondary metabolites that will inhibit downstream DNA library preparations. Consequently, protocols developed for vertebrates and plants do not always perform well for invertebrates and algae. In addition, many marine species have large population sizes and, as a consequence, highly variable genomes. Thus, to facilitate the sequence read assembly process during genome sequencing, it is desirable to obtain enough DNA from a single individual, which is a challenge in many species of invertebrates and algae. Here, we present DNA extraction protocols for seven marine species (four invertebrates, two algae, and a marine yeast), optimized to provide sufficient DNA quality and yield for de novo genome sequencing projects.
An Exchange-Only Qubit in Isotopically Enriched 28Si
NASA Astrophysics Data System (ADS)
Gyure, Mark
2015-03-01
We demonstrate coherent manipulation and universal control of a qubit composed of a triple quantum dot implemented in an isotopically enhanced Si/SiGe heterostructure, which requires no local AC or DC magnetic fields for operation. Strong control over tunnel rates is enabled by a dopantless, accumulation-only device design, and an integrated measurement dot enables single-shot measurement. Reduction of magnetic noise is achieved via isotopic purification of the silicon quantum well. We demonstrate universal control using composite pulses and employ these pulses for spin-echo-type sequences to measure both magnetic noise and charge noise. The noise measured is sufficiently low to enable the long pulse sequences required for exchange-only quantum information processing. Sponsored by United States Department of Defense. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressly or implied, of the United States Department of Defense or the U.S. Government. Approved for public release, distribution unlimited.
Classification of HCV and HIV-1 Sequences with the Branching Index
Hraber, Peter; Kuiken, Carla; Waugh, Mark; Geer, Shaun; Bruno, William J.; Leitner, Thomas
2009-01-01
SUMMARY Classification of viral sequences should be fast, objective, accurate, and reproducible. Most methods that classify sequences use either pairwise distances or phylogenetic relations, but cannot discern when a sequence is unclassifiable. The branching index (BI) combines distance and phylogeny methods to compute a ratio that quantifies how closely a query sequence clusters with a subtype clade. In the hypothesis-testing framework of statistical inference, the BI is compared with a threshold to test whether sufficient evidence exists for the query sequence to be classified among known sequences. If above the threshold, the null hypothesis of no support for the subtype relation is rejected and the sequence is taken as belonging to the subtype clade with which it clusters on the tree. This study evaluates statistical properties of the branching index for subtype classification in HCV and HIV-1. Pairs of BI values with known positive and negative test results were computed from 10,000 random fragments of reference alignments. Sampled fragments were of sufficient length to contain phylogenetic signal that groups reference sequences together properly into subtype clades. For HCV, a threshold BI of 0.71 yields 95.1% agreement with reference subtypes, with equal false positive and false negative rates. For HIV-1, a threshold of 0.66 yields 93.5% agreement. Higher thresholds can be used where lower false positive rates are required. In synthetic recombinants, regions without breakpoints are recognized accurately; regions with breakpoints do not uniquely represent any known subtype. Web-based services for viral subtype classification with the branching index are available online. PMID:18753218
Risks Posed by Reston, the Forgotten Ebolavirus
Cantoni, Diego; Hamlet, Arran; Michaelis, Martin; Wass, Mark N.
2016-01-01
ABSTRACT Out of the five members of the Ebolavirus family, four cause life-threatening disease, whereas the fifth, Reston virus (RESTV), is nonpathogenic in humans. The reasons for this discrepancy remain unclear. In this review, we analyze the currently available information to provide a state-of-the-art summary of the factors that determine the human pathogenicity of Ebolaviruses. RESTV causes sporadic infections in cynomolgus monkeys and is found in domestic pigs throughout the Philippines and China. Phylogenetic analyses revealed that RESTV is most closely related to the Sudan virus, which causes a high mortality rate in humans. Amino acid sequence differences between RESTV and the other Ebolaviruses are found in all nine Ebolavirus proteins, though no one residue appears sufficient to confer pathogenicity. Changes in the glycoprotein contribute to differences in Ebolavirus pathogenicity but are not sufficient to confer pathogenicity on their own. Similarly, differences in VP24 and VP35 affect viral immune evasion and are associated with changes in human pathogenicity. A recent in silico analysis systematically determined the functional consequences of sequence variations between RESTV and human-pathogenic Ebolaviruses. Multiple positions in VP24 were differently conserved between RESTV and the other Ebolaviruses and may alter human pathogenicity. In conclusion, the factors that determine the pathogenicity of Ebolaviruses in humans remain insufficiently understood. An improved understanding of these pathogenicity-determining factors is of crucial importance for disease prevention and for the early detection of emergent and potentially human-pathogenic RESTVs. PMID:28066813
NASA Astrophysics Data System (ADS)
James, Ryan G.; Mahoney, John R.; Crutchfield, James P.
2017-06-01
One of the most basic characterizations of the relationship between two random variables, X and Y , is the value of their mutual information. Unfortunately, calculating it analytically and estimating it empirically are often stymied by the extremely large dimension of the variables. One might hope to replace such a high-dimensional variable by a smaller one that preserves its relationship with the other. It is well known that either X (or Y ) can be replaced by its minimal sufficient statistic about Y (or X ) while preserving the mutual information. While intuitively reasonable, it is not obvious or straightforward that both variables can be replaced simultaneously. We demonstrate that this is in fact possible: the information X 's minimal sufficient statistic preserves about Y is exactly the information that Y 's minimal sufficient statistic preserves about X . We call this procedure information trimming. As an important corollary, we consider the case where one variable is a stochastic process' past and the other its future. In this case, the mutual information is the channel transmission rate between the channel's effective states. That is, the past-future mutual information (the excess entropy) is the amount of information about the future that can be predicted using the past. Translating our result about minimal sufficient statistics, this is equivalent to the mutual information between the forward- and reverse-time causal states of computational mechanics. We close by discussing multivariate extensions to this use of minimal sufficient statistics.
Scaglione, Davide; Acquadro, Alberto; Portis, Ezio; Taylor, Christopher A; Lanteri, Sergio; Knapp, Steven J
2009-01-01
Background The globe artichoke (Cynara cardunculus var. scolymus L.) is a significant crop in the Mediterranean basin. Despite its commercial importance and its both dietary and pharmaceutical value, knowledge of its genetics and genomics remains scant. Microsatellite markers have become a key tool in genetic and genomic analysis, and we have exploited recently acquired EST (expressed sequence tag) sequence data (Composite Genome Project - CGP) to develop an extensive set of microsatellite markers. Results A unigene assembly was created from over 36,000 globe artichoke EST sequences, containing 6,621 contigs and 12,434 singletons. Over 12,000 of these unigenes were functionally assigned on the basis of homology with Arabidopsis thaliana reference proteins. A total of 4,219 perfect repeats, located within 3,308 unigenes was identified and the gene ontology (GO) analysis highlighted some GO term's enrichments among different classes of microsatellites with respect to their position. Sufficient flanking sequence was available to enable the design of primers to amplify 2,311 of these microsatellites, and a set of 300 was tested against a DNA panel derived from 28 C. cardunculus genotypes. Consistent amplification and polymorphism was obtained from 236 of these assays. Their polymorphic information content (PIC) ranged from 0.04 to 0.90 (mean 0.66). Between 176 and 198 of the assays were informative in at least one of the three available mapping populations. Conclusion EST-based microsatellites have provided a large set of de novo genetic markers, which show significant amounts of polymorphism both between and within the three taxa of C. cardunculus. They are thus well suited as assays for phylogenetic analysis, the construction of genetic maps, marker-assisted breeding, transcript mapping and other genomic applications in the species. PMID:19785740
Ortí, G; Meyer, A
1996-04-01
The rate and pattern of DNA evolution of ependymin, a single-copy gene coding for a highly expressed glycoprotein in the brain matrix of teleost fishes, is characterized and its phylogenetic utility for fish systematics is assessed. DNA sequences were determined from catfish, electric fish, and characiforms and compared with published ependymin sequences from cyprinids, salmon, pike, and herring. Among these groups, ependymin amino acid sequences were highly divergent (up to 60% sequence difference), but had surprisingly similar hydropathy profiles and invariant glycosylation sites, suggesting that functional properties of the proteins are conserved. Comparison of base composition at third codon positions and introns revealed AT-rich introns and GC-rich third codon positions, suggesting that the biased codon usage observed might not be due to mutational bias. Phylogenetic information content of third codon positions was surprisingly high and sufficient to recover the most basal nodes of the tree, in spite of the observation that pairwise distances (at third codon positions) were well above the presumed saturation level. This finding can be explained by the high proportion of phylogenetically informative nonsynonymous changes at third codon positions among these highly divergent proteins. Ependymin DNA sequences have established the first molecular evidence for the monophyly of a group containing salmonids and esociforms. In addition, ependymin suggests a sister group relationship of electric fish (Gymnotiformes) and Characiformes, constituting a significant departure from currently accepted classifications. However, relationships among characiform lineages were not completely resolved by ependymin sequences in spite of seemingly appropriate levels of variation among taxa and considerably low levels of homoplasy in the data (consistency index = 0.7). If the diversification of Characiformes took place in an "explosive" manner, over a relatively short period of time this pattern should also be observed using other phylogenetic markers. Poor conservation of ependymin's primary structure hinders the design of efficient primers for PCR that could be used in wide-ranging fish systematic studies. However, alternative methods like PCR amplification from cDNA used here should provide promising comparative sequence data for the resolution of phylogenetic relationships among other basal lineages of teleost fishes.
Tarr, Sarah J; Cryar, Adam; Thalassinos, Konstantinos; Haldar, Kasturi; Osborne, Andrew R
2013-01-01
The malaria parasite exports proteins across its plasma membrane and a surrounding parasitophorous vacuole membrane, into its host erythrocyte. Most exported proteins contain a Host Targeting motif (HT motif) that targets them for export. In the parasite secretory pathway, the HT motif is cleaved by the protease plasmepsin V, but the role of the newly generated N-terminal sequence in protein export is unclear. Using a model protein that is cleaved by an exogenous viral protease, we show that the new N-terminal sequence, normally generated by plasmepsin V cleavage, is sufficient to target a protein for export, and that cleavage by plasmepsin V is not coupled directly to the transfer of a protein to the next component in the export pathway. Mutation of the fourth and fifth positions of the HT motif, as well as amino acids further downstream, block or affect the efficiency of protein export indicating that this region is necessary for efficient export. We also show that the fifth position of the HT motif is important for plasmepsin V cleavage. Our results indicate that plasmepsin V cleavage is required to generate a new N-terminal sequence that is necessary and sufficient to mediate protein export by the malaria parasite. PMID:23279267
On the Topical Structure of Medical Charts
Archbold, Armar A.; Evans, David A.
1989-01-01
In a study of 55 H&P sections of hospital charts, we tested the hypothesis that topic-sub-topic sequencing is sufficiently regular to provide ‘missing’ information in the construction of explicit propositions from elliptical text. ‘Propositions’ were taken to be frames with the slots topic, sub-topic, method, site, attribute, value, and qualifier. Topic was identifiable in 96% of all cases; attribute-value pairs were uniquely recoverable from topics in 69% of all cases; site was co-determined by topic, method, and attribute. Our results suggest that uncertainties in the automated processing of H&P statements can be overcome by appealing to knowledge about the topical structure of medical charts.
Simple method for experimentally testing any form of quantum contextuality
NASA Astrophysics Data System (ADS)
Cabello, Adán
2016-03-01
Contextuality provides a unifying paradigm for nonclassical aspects of quantum probabilities and resources of quantum information. Unfortunately, most forms of quantum contextuality remain experimentally unexplored due to the difficulty of performing sequences of projective measurements on individual quantum systems. Here we show that two-point correlations between binary compatible observables are sufficient to reveal any form of contextuality. This allows us to design simple experiments that are more robust against imperfections and easier to analyze, thus opening the door for observing interesting forms of contextuality, including those requiring quantum systems of high dimensions. In addition, it allows us to connect contextuality to communication complexity scenarios and reformulate a recent result relating contextuality and quantum computation.
Merkley, Eric D.; Sego, Landon H.; Lin, Andy; ...
2017-08-30
Adaptive processes in bacterial species can occur rapidly in laboratory culture, leading to genetic divergence between naturally occurring and laboratory-adapted strains. Differentiating wild and closely-related laboratory strains is clearly important for biodefense and bioforensics; however, DNA sequence data alone has thus far not provided a clear signature, perhaps due to lack of understanding of how diverse genome changes lead to adapted phenotypes. Protein abundance profiles from mass spectrometry-based proteomics analyses are a molecular measure of phenotype. Proteomics data contains sufficient information that powerful statistical methods can uncover signatures that distinguish wild strains of Yersinia pestis from laboratory-adapted strains.
NASA Astrophysics Data System (ADS)
Kojima, Yasufumi; Okamoto, Satoki
2018-04-01
A magnetar's magnetosphere gradually evolves by the injection of energy and helicity from the interior. Axisymmetric static solutions for a relativistic force-free magnetosphere with a power-law current model are numerically obtained. They provide information about the configurations in which the stored energy is large. The energy along a sequence of equilibria increases and becomes sufficient to open the magnetic field. A magnetic flux rope, in which a large amount of toroidal field is confined, is formed in the vicinity of the star, for states exceeding the open field energy. These states are energetically metastable, and the excess energy may be ejected as a magnetar outburst.
A vertebrate case study of the quality of assemblies derived from next-generation sequences
2011-01-01
The unparalleled efficiency of next-generation sequencing (NGS) has prompted widespread adoption, but significant problems remain in the use of NGS data for whole genome assembly. We explore the advantages and disadvantages of chicken genome assemblies generated using a variety of sequencing and assembly methodologies. NGS assemblies are equivalent in some ways to a Sanger-based assembly yet deficient in others. Nonetheless, these assemblies are sufficient for the identification of the majority of genes and can reveal novel sequences when compared to existing assembly references. PMID:21453517
SAM: String-based sequence search algorithm for mitochondrial DNA database queries
Röck, Alexander; Irwin, Jodi; Dür, Arne; Parsons, Thomas; Parson, Walther
2011-01-01
The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org). PMID:21056022
Lisi, Simonetta; Chirichella, Michele; Arisi, Ivan; Goracci, Martina; Cremisi, Federico; Cattaneo, Antonino
2017-01-01
Antibody libraries are important resources to derive antibodies to be used for a wide range of applications, from structural and functional studies to intracellular protein interference studies to developing new diagnostics and therapeutics. Whatever the goal, the key parameter for an antibody library is its complexity (also known as diversity), i.e. the number of distinct elements in the collection, which directly reflects the probability of finding in the library an antibody against a given antigen, of sufficiently high affinity. Quantitative evaluation of antibody library complexity and quality has been for a long time inadequately addressed, due to the high similarity and length of the sequences of the library. Complexity was usually inferred by the transformation efficiency and tested either by fingerprinting and/or sequencing of a few hundred random library elements. Inferring complexity from such a small sampling is, however, very rudimental and gives limited information about the real diversity, because complexity does not scale linearly with sample size. Next-generation sequencing (NGS) has opened new ways to tackle the antibody library complexity quality assessment. However, much remains to be done to fully exploit the potential of NGS for the quantitative analysis of antibody repertoires and to overcome current limitations. To obtain a more reliable antibody library complexity estimate here we show a new, PCR-free, NGS approach to sequence antibody libraries on Illumina platform, coupled to a new bioinformatic analysis and software (Diversity Estimator of Antibody Library, DEAL) that allows to reliably estimate the complexity, taking in consideration the sequencing error. PMID:28505201
Fantini, Marco; Pandolfini, Luca; Lisi, Simonetta; Chirichella, Michele; Arisi, Ivan; Terrigno, Marco; Goracci, Martina; Cremisi, Federico; Cattaneo, Antonino
2017-01-01
Antibody libraries are important resources to derive antibodies to be used for a wide range of applications, from structural and functional studies to intracellular protein interference studies to developing new diagnostics and therapeutics. Whatever the goal, the key parameter for an antibody library is its complexity (also known as diversity), i.e. the number of distinct elements in the collection, which directly reflects the probability of finding in the library an antibody against a given antigen, of sufficiently high affinity. Quantitative evaluation of antibody library complexity and quality has been for a long time inadequately addressed, due to the high similarity and length of the sequences of the library. Complexity was usually inferred by the transformation efficiency and tested either by fingerprinting and/or sequencing of a few hundred random library elements. Inferring complexity from such a small sampling is, however, very rudimental and gives limited information about the real diversity, because complexity does not scale linearly with sample size. Next-generation sequencing (NGS) has opened new ways to tackle the antibody library complexity quality assessment. However, much remains to be done to fully exploit the potential of NGS for the quantitative analysis of antibody repertoires and to overcome current limitations. To obtain a more reliable antibody library complexity estimate here we show a new, PCR-free, NGS approach to sequence antibody libraries on Illumina platform, coupled to a new bioinformatic analysis and software (Diversity Estimator of Antibody Library, DEAL) that allows to reliably estimate the complexity, taking in consideration the sequencing error.
Convolutional neural network architectures for predicting DNA–protein binding
Zeng, Haoyang; Edwards, Matthew D.; Liu, Ge; Gifford, David K.
2016-01-01
Motivation: Convolutional neural networks (CNN) have outperformed conventional methods in modeling the sequence specificity of DNA–protein binding. Yet inappropriate CNN architectures can yield poorer performance than simpler models. Thus an in-depth understanding of how to match CNN architecture to a given task is needed to fully harness the power of CNNs for computational biology applications. Results: We present a systematic exploration of CNN architectures for predicting DNA sequence binding using a large compendium of transcription factor datasets. We identify the best-performing architectures by varying CNN width, depth and pooling designs. We find that adding convolutional kernels to a network is important for motif-based tasks. We show the benefits of CNNs in learning rich higher-order sequence features, such as secondary motifs and local sequence context, by comparing network performance on multiple modeling tasks ranging in difficulty. We also demonstrate how careful construction of sequence benchmark datasets, using approaches that control potentially confounding effects like positional or motif strength bias, is critical in making fair comparisons between competing methods. We explore how to establish the sufficiency of training data for these learning tasks, and we have created a flexible cloud-based framework that permits the rapid exploration of alternative neural network architectures for problems in computational biology. Availability and Implementation: All the models analyzed are available at http://cnn.csail.mit.edu. Contact: gifford@mit.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307608
No need to replace an "anomalous" primate (Primates) with an "anomalous" bear (Carnivora, Ursidae).
Gutiérrez, Eliécer E; Pine, Ronald H
2015-01-01
By means of mitochondrial 12S rRNA sequencing of putative "yeti", "bigfoot", and other "anomalous primate" hair samples, a recent study concluded that two samples, presented as from the Himalayas, do not belong to an "anomalous primate", but to an unknown, anomalous type of ursid. That is, that they match 12S rRNA sequences of a fossil Polar Bear (Ursusmaritimus), but neither of modern Polar Bears, nor of Brown Bears (Ursusarctos), the closest relative of Polar Bears, and one that occurs today in the Himalayas. We have undertaken direct comparison of sequences; replication of the original comparative study; inference of phylogenetic relationships of the two samples with respect to those from all extant species of Ursidae (except for the Giant Panda, Ailuropodamelanoleuca) and two extinct Pleistocene species; and application of a non-tree-based population aggregation approach for species diagnosis and identification. Our results demonstrate that the very short fragment of the 12S rRNA gene sequenced by Sykes et al. is not sufficiently informative to support the hypotheses provided by these authors with respect to the taxonomic identity of the individuals from which these sequences were obtained. We have concluded that there is no reason to believe that the two samples came from anything other than Brown Bears. These analyses afforded an opportunity to test the monophyly of morphologically defined species and to comment on both their phylogenetic relationships and future efforts necessary to advance our understanding of ursid systematics.
Ülker, Bekir; Hommelsheim, Carl Maximilian; Berson, Tobias; Thomas, Stefan; Chandrasekar, Balakumaran; Olcay, Ahmet Can; Berendzen, Kenneth Wayne; Frantzeskakis, Lamprinos
2012-01-01
A widely used approach for assessing genome instability in plants makes use of somatic homologous recombination (SHR) reporter lines. Here, we review the published characteristics and uses of SHR lines. We found a lack of detailed information on these lines and a lack of sufficient evidence that they report only homologous recombination. We postulate that instead of SHR, these lines might be reporting a number of alternative stress-induced stochastic events known to occur at transcriptional, posttranscriptional, and posttranslational levels. We conclude that the reliability and usefulness of the somatic homologous recombination reporter lines requires revision. Thus, more detailed information about these reporter lines is needed before they can be used with confidence to measure genome instability, including the complete sequences of SHR constructs, the genomic location of reporter genes and, importantly, molecular evidence that reconstituted gene expression in these lines is indeed a result of somatic recombination. PMID:23144181
Denis, Jean-Baptiste; Vandenbogaert, Mathias; Caro, Valérie
2016-01-01
The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members. PMID:28451381
Correia, Damien; Doppelt-Azeroual, Olivia; Denis, Jean-Baptiste; Vandenbogaert, Mathias; Caro, Valérie
2015-01-01
The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users' input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user's input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy's main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members.
Matrix Transformations between Certain Sequence Spaces over the Non-Newtonian Complex Field
Efe, Hakan
2014-01-01
In some cases, the most general linear operator between two sequence spaces is given by an infinite matrix. So the theory of matrix transformations has always been of great interest in the study of sequence spaces. In the present paper, we introduce the matrix transformations in sequence spaces over the field ℂ* and characterize some classes of infinite matrices with respect to the non-Newtonian calculus. Also we give the necessary and sufficient conditions on an infinite matrix transforming one of the classical sets over ℂ* to another one. Furthermore, the concept for sequence-to-sequence and series-to-series methods of summability is given with some illustrated examples. PMID:25110740
Use of Formulaic Sequences in Monologues of Chinese EFL Learners
ERIC Educational Resources Information Center
Qi, Yan; Ding, Yanren
2011-01-01
The literature on formulaic language lacks sufficient research on how L2 learners make progress in native-like formulaicity of their target language. This study analyzed the use of formulaic sequences (FSs) by 56 Chinese university English majors in their prepared monologues at the beginning and end of a three-year period and compared the student…
Strain/species identification in metagenomes using genome-specific markers
Tu, Qichao; He, Zhili; Zhou, Jizhong
2014-01-01
Shotgun metagenome sequencing has become a fast, cheap and high-throughput technology for characterizing microbial communities in complex environments and human body sites. However, accurate identification of microorganisms at the strain/species level remains extremely challenging. We present a novel k-mer-based approach, termed GSMer, that identifies genome-specific markers (GSMs) from currently sequenced microbial genomes, which were then used for strain/species-level identification in metagenomes. Using 5390 sequenced microbial genomes, 8 770 321 50-mer strain-specific and 11 736 360 species-specific GSMs were identified for 4088 strains and 2005 species (4933 strains), respectively. The GSMs were first evaluated against mock community metagenomes, recently sequenced genomes and real metagenomes from different body sites, suggesting that the identified GSMs were specific to their targeting genomes. Sensitivity evaluation against synthetic metagenomes with different coverage suggested that 50 GSMs per strain were sufficient to identify most microbial strains with ≥0.25× coverage, and 10% of selected GSMs in a database should be detected for confident positive callings. Application of GSMs identified 45 and 74 microbial strains/species significantly associated with type 2 diabetes patients and obese/lean individuals from corresponding gastrointestinal tract metagenomes, respectively. Our result agreed with previous studies but provided strain-level information. The approach can be directly applied to identify microbial strains/species from raw metagenomes, without the effort of complex data pre-processing. PMID:24523352
Improving the prospects of cleavage-based nanopore sequencing engines
NASA Astrophysics Data System (ADS)
Brady, Kyle T.; Reiner, Joseph E.
2015-08-01
Recently proposed methods for DNA sequencing involve the use of cleavage-based enzymes attached to the opening of a nanopore. The idea is that DNA interacting with either an exonuclease or polymerase protein will lead to a small molecule being cleaved near the mouth of the nanopore, and subsequent entry into the pore will yield information about the DNA sequence. The prospects for this approach seem promising, but it has been shown that diffusion related effects impose a limit on the capture probability of molecules by the pore, which limits the efficacy of the technique. Here, we revisit the problem with the goal of optimizing the capture probability via a step decrease in the nucleotide diffusion coefficient between the pore and bulk solutions. It is shown through random walk simulations and a simplified analytical model that decreasing the molecule's diffusion coefficient in the bulk relative to its value in the pore increases the nucleotide capture probability. Specifically, we show that at sufficiently high applied transmembrane potentials (≥100 mV), increasing the potential by a factor f is equivalent to decreasing the diffusion coefficient ratio Dbulk/Dpore by the same factor f. This suggests a promising route toward implementation of cleavage-based sequencing protocols. We also discuss the feasibility of forming a step function in the diffusion coefficient across the pore-bulk interface.
Video Salient Object Detection via Fully Convolutional Networks.
Wang, Wenguan; Shen, Jianbing; Shao, Ling
This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data and 2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the densely annotated video segmentation data set (MAE of .06) and the Freiburg-Berkeley Motion Segmentation data set (MAE of .07), and do so with much improved speed (2 fps with all steps).This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data and 2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the densely annotated video segmentation data set (MAE of .06) and the Freiburg-Berkeley Motion Segmentation data set (MAE of .07), and do so with much improved speed (2 fps with all steps).
Minimal sufficient positive-operator valued measure on a separable Hilbert space
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kuramochi, Yui, E-mail: kuramochi.yui.22c@st.kyoto-u.ac.jp
We introduce a concept of a minimal sufficient positive-operator valued measure (POVM), which is the least redundant POVM among the POVMs that have the equivalent information about the measured quantum system. Assuming the system Hilbert space to be separable, we show that for a given POVM, a sufficient statistic called a Lehmann-Scheffé-Bahadur statistic induces a minimal sufficient POVM. We also show that every POVM has an equivalent minimal sufficient POVM and that such a minimal sufficient POVM is unique up to relabeling neglecting null sets. We apply these results to discrete POVMs and information conservation conditions proposed by the author.
Clinical next-generation sequencing in patients with non-small cell lung cancer.
Hagemann, Ian S; Devarakonda, Siddhartha; Lockwood, Christina M; Spencer, David H; Guebert, Kalin; Bredemeyer, Andrew J; Al-Kateb, Hussam; Nguyen, TuDung T; Duncavage, Eric J; Cottrell, Catherine E; Kulkarni, Shashikant; Nagarajan, Rakesh; Seibert, Karen; Baggstrom, Maria; Waqar, Saiama N; Pfeifer, John D; Morgensztern, Daniel; Govindan, Ramaswamy
2015-02-15
A clinical assay was implemented to perform next-generation sequencing (NGS) of genes commonly mutated in multiple cancer types. This report describes the feasibility and diagnostic yield of this assay in 381 consecutive patients with non-small cell lung cancer (NSCLC). Clinical targeted sequencing of 23 genes was performed with DNA from formalin-fixed, paraffin-embedded (FFPE) tumor tissue. The assay used Agilent SureSelect hybrid capture followed by Illumina HiSeq 2000, MiSeq, or HiSeq 2500 sequencing in a College of American Pathologists-accredited, Clinical Laboratory Improvement Amendments-certified laboratory. Single-nucleotide variants and insertion/deletion events were reported. This assay was performed before methods were developed to detect rearrangements by NGS. Two hundred nine of all requisitioned samples (55%) were successfully sequenced. The most common reason for not performing the sequencing was an insufficient quantity of tissue available in the blocks (29%). Excisional, endoscopic, and core biopsy specimens were sufficient for testing in 95%, 66%, and 40% of the cases, respectively. The median turnaround time (TAT) in the pathology laboratory was 21 days, and there was a trend of an improved TAT with more rapid sequencing platforms. Sequencing yielded a mean coverage of 1318×. Potentially actionable mutations (ie, predictive or prognostic) were identified in 46% of 209 samples and were most commonly found in KRAS (28%), epidermal growth factor receptor (14%), phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha (4%), phosphatase and tensin homolog (1%), and BRAF (1%). Five percent of the samples had multiple actionable mutations. A targeted therapy was instituted on the basis of NGS in 11% of the sequenced patients or in 6% of all patients. NGS-based diagnostics are feasible in NSCLC and provide clinically relevant information from readily available FFPE tissue. The sample type is associated with the probability of successful testing. © 2014 American Cancer Society.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.
Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H
2017-04-15
Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. Copyright © 2017 Sinclair et al.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification
Sinclair, Robert M.; Ravantti, Janne J.
2017-01-01
ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. PMID:28122979
Ley, A C; Hardy, O J
2010-11-01
Species delimitation is a fundamental biological concept which is frequently discussed and altered to integrate new insights. These revealed that speciation is not a one step phenomenon but an ongoing process and morphological characters alone are not sufficient anymore to properly describe the results of this process. Here we want to assess the degree of speciation in two closely related lianescent taxa from the tropical African genus Haumania which display distinct vegetative traits despite a high similarity in reproductive traits and a partial overlap in distribution area which might facilitate gene flow. To this end, we combined phylogenetic and phylogeographic analyses using nuclear (nr) and chloroplast (cp) DNA sequences in comparison to morphological species descriptions. The nuclear dataset unambiguously supports the morphological species concept in Haumania. However, the main chloroplastic haplotypes are shared between species and, although a geographic analysis of cpDNA diversity confirms that individuals from the same taxon are more related than individuals from distinct taxa, cp-haplotypes display correlated geographic distributions between species. Hybridization is the most plausible reason for this pattern. A scenario involving speciation in geographic isolation followed by range expansion is outlined. The study highlights the gain of information on the speciation process in Haumania by adding georeferenced molecular data to the morphological characteristics. It also shows that nr and cp sequence data might provide different but complementary information, questioning the reliability of the unique use of chloroplast data for species recognition by DNA barcoding. Copyright © 2010 Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Kelley, H. J.; Cliff, E. M.; Lutze, F. H.
1981-01-01
Maneuvers available to a spacecraft having sufficient propellant to escape an antisatellite satellite (ASAT) attack are examined. The ASAT and the evading spacecraft are regarded as being in circular orbits, and equations of motion are developed for the ASAT to commence a two-impulse maneuver sequence. The ASAT employs thrust impulses which yield a minimum-time-to-rendezvous, considering available fuel. Optimal evasion is shown to involve only in-plane maneuvers, and begins as soon as the ASAT launch information is gathered and thrust activation can be initiated. A closest approach, along with a maximum evasion by the target spacecraft, is calculated to be 14,400 ft. Further research to account for ASATs in parking orbit and for generalization of a continuous control-modeled differential game is indicated.
Slowing down DNA translocation by a nanofiber meshed layer
NASA Astrophysics Data System (ADS)
Zhao, Yue; Xie, Wanyi; Tian, Enling; Ren, Yiwei; Zhu, Jifeng; Deng, Yunsheng; He, Shixuan; Liang, Liyuan; Wang, Yunjiao; Zhou, Daming; Wang, Deqiang
2018-01-01
Due to the weak interaction between DNA molecules and the inner surface of nanopores, DNA translocation is very fast, just leaving a short current drop without sufficient information to recognise the nucleotide sequence in the strand. In this paper, we propose a nanopore-nanofiber mesh hybridized structure to decelerate DNA translocation speed. Experimental results reveal that due to hydrophobic interaction between the DNA fragments and the nanofibers, the DNA moving speed can be retarded to two orders of magnitude slower. Furthermore, according to theory simulations, the additional fiber layer will reduce the electric field in the channel but elongate the capture region at the pore orifice, which will be helpful for increasing the capture rate and extending the DNA dwelling time in the meanwhile.
A model of the human observer and decision maker
NASA Technical Reports Server (NTRS)
Wewerinke, P. H.
1981-01-01
The decision process is described in terms of classical sequential decision theory by considering the hypothesis that an abnormal condition has occurred by means of a generalized likelihood ratio test. For this, a sufficient statistic is provided by the innovation sequence which is the result of the perception an information processing submodel of the human observer. On the basis of only two model parameters, the model predicts the decision speed/accuracy trade-off and various attentional characteristics. A preliminary test of the model for single variable failure detection tasks resulted in a very good fit of the experimental data. In a formal validation program, a variety of multivariable failure detection tasks was investigated and the predictive capability of the model was demonstrated.
Complete genome sequence of a potyvirus infecting yam beans (Pachyrhizus spp.) in Peru.
Fuentes, Segundo; Heider, Bettina; Tasso, Ruby Carolina; Romero, Elisa; Zum Felde, Thomas; Kreuze, Jan Frederik
2012-04-01
In 2010, yam beans in a field trial in Peru showed viral disease symptoms. Graft-transmission and positive ELISA results using potyvirus-specific antibodies suggested that the symptoms could be the result of a potyviral infection. Small interfering RNA (siRNA) were extracted from one of the samples and sent for high-throughput sequencing. The full genome of a new potyvirus could be assembled from the resulting siRNA sequences, and it was sufficiently different from other sequences to be considered a member of a new species, which we have designated Yam bean mosaic virus (YBMV). Sequence similarity suggests that YBMV has also been detected in yam beans in Indonesia.
Wickersheim, Michelle L; Blumenstiel, Justin P
2013-11-01
A large number of methods are available to deplete ribosomal RNA reads from high-throughput RNA sequencing experiments. Such methods are critical for sequencing Drosophila small RNAs between 20 and 30 nucleotides because size selection is not typically sufficient to exclude the highly abundant class of 30 nucleotide 2S rRNA. Here we demonstrate that pre-annealing terminator oligos complimentary to Drosophila 2S rRNA prior to 5' adapter ligation and reverse transcription efficiently depletes 2S rRNA sequences from the sequencing reaction in a simple and inexpensive way. This depletion is highly specific and is achieved with minimal perturbation of miRNA and piRNA profiles.
Code of Federal Regulations, 2013 CFR
2013-10-01
... Regulations Relating to Public Welfare (Continued) OFFICE OF HUMAN DEVELOPMENT SERVICES, DEPARTMENT OF HEALTH... sufficient information for ANA to determine the extent to which the recipient meets ANA project evaluation standards. Sufficient information means information adequate to enable ANA to compare the recipient's...
Code of Federal Regulations, 2011 CFR
2011-10-01
... Regulations Relating to Public Welfare (Continued) OFFICE OF HUMAN DEVELOPMENT SERVICES, DEPARTMENT OF HEALTH... sufficient information for ANA to determine the extent to which the recipient meets ANA project evaluation standards. Sufficient information means information adequate to enable ANA to compare the recipient's...
Code of Federal Regulations, 2010 CFR
2010-10-01
... Regulations Relating to Public Welfare (Continued) OFFICE OF HUMAN DEVELOPMENT SERVICES, DEPARTMENT OF HEALTH... sufficient information for ANA to determine the extent to which the recipient meets ANA project evaluation standards. Sufficient information means information adequate to enable ANA to compare the recipient's...
Code of Federal Regulations, 2014 CFR
2014-10-01
... Regulations Relating to Public Welfare (Continued) OFFICE OF HUMAN DEVELOPMENT SERVICES, DEPARTMENT OF HEALTH... sufficient information for ANA to determine the extent to which the recipient meets ANA project evaluation standards. Sufficient information means information adequate to enable ANA to compare the recipient's...
Sequencing Needs for Viral Diagnostics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gardner, S N; Lam, M; Mulakken, N J
2004-01-26
We built a system to guide decisions regarding the amount of genomic sequencing required to develop diagnostic DNA signatures, which are short sequences that are sufficient to uniquely identify a viral species. We used our existing DNA diagnostic signature prediction pipeline, which selects regions of a target species genome that are conserved among strains of the target (for reliability, to prevent false negatives) and unique relative to other species (for specificity, to avoid false positives). We performed simulations, based on existing sequence data, to assess the number of genome sequences of a target species and of close phylogenetic relatives (''nearmore » neighbors'') that are required to predict diagnostic signature regions that are conserved among strains of the target species and unique relative to other bacterial and viral species. For DNA viruses such as variola (smallpox), three target genomes provide sufficient guidance for selecting species-wide signatures. Three near neighbor genomes are critical for species specificity. In contrast, most RNA viruses require four target genomes and no near neighbor genomes, since lack of conservation among strains is more limiting than uniqueness. SARS and Ebola Zaire are exceptional, as additional target genomes currently do not improve predictions, but near neighbor sequences are urgently needed. Our results also indicate that double stranded DNA viruses are more conserved among strains than are RNA viruses, since in most cases there was at least one conserved signature candidate for the DNA viruses and zero conserved signature candidates for the RNA viruses.« less
Zumaraga, Mark Pretzel; Medina, Paul Julius; Recto, Juan Miguel; Abrahan, Lauro; Azurin, Edelyn; Tanchoco, Celeste C; Jimeno, Cecilia A; Palmes-Saloma, Cynthia
2017-03-01
This study aimed to discover genetic variants in the entire 101 kB vitamin D receptor (VDR) gene for vitamin D deficiency in a group of postmenopausal Filipino women using targeted next generation sequencing (TNGS) approach in a case-control study design. A total of 50 women with and without osteoporotic fracture seen at the Philippine Orthopedic Center were included. Blood samples were collected for determination of serum vitamin D, calcium, phosphorus, glucose, blood urea nitrogen, creatinine, aspartate aminotransferase, alanine aminotransferase and as primary source for targeted VDR gene sequencing using the Ion Torrent Personal Genome Machine. The variant calling was based on the GATK best practice workflow and annotated using Annovar tool. A total of 1496 unique variants in the whole 101-kb VDR gene were identified. Novel sequence variations not registered in the dbSNP database were found among cases and controls at a rate of 23.1% and 16.6% of total discovered variants, respectively. One disease-associated enhancer showed statistically significant association to low serum 25-hydroxy vitamin D levels (Pearson chi-square P-value=0.009). The transcription factor binding site prediction program PROMO predicted the disruption of three transcription factor binding sites in this enhancer region. These findings show the power of TNGS in identifying sequence variations in a very large gene and the surprising results obtained in this study greatly expand the catalog of known VDR sequence variants that may represent an important clue in the emergence of vitamin D deficiency. Such information will also provide the additional guidance necessary toward a personalized nutritional advice to reach sufficient vitamin D status. Copyright © 2016 Elsevier Inc. All rights reserved.
Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter
2015-01-01
Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438
Inhalable Microorganisms in Beijing’s PM2.5 and PM10 Pollutants during a Severe Smog Event
2014-01-01
Particulate matter (PM) air pollution poses a formidable public health threat to the city of Beijing. Among the various hazards of PM pollutants, microorganisms in PM2.5 and PM10 are thought to be responsible for various allergies and for the spread of respiratory diseases. While the physical and chemical properties of PM pollutants have been extensively studied, much less is known about the inhalable microorganisms. Most existing data on airborne microbial communities using 16S or 18S rRNA gene sequencing to categorize bacteria or fungi into the family or genus levels do not provide information on their allergenic and pathogenic potentials. Here we employed metagenomic methods to analyze the microbial composition of Beijing’s PM pollutants during a severe January smog event. We show that with sufficient sequencing depth, airborne microbes including bacteria, archaea, fungi, and dsDNA viruses can be identified at the species level. Our results suggested that the majority of the inhalable microorganisms were soil-associated and nonpathogenic to human. Nevertheless, the sequences of several respiratory microbial allergens and pathogens were identified and their relative abundance appeared to have increased with increased concentrations of PM pollution. Our findings may serve as an important reference for environmental scientists, health workers, and city planners. PMID:24456276
NASA Astrophysics Data System (ADS)
Mielke, Steven P.; Grønbech-Jensen, Niels; Krishnan, V. V.; Fink, William H.; Benham, Craig J.
2005-09-01
The topological state of DNA in vivo is dynamically regulated by a number of processes that involve interactions with bound proteins. In one such process, the tracking of RNA polymerase along the double helix during transcription, restriction of rotational motion of the polymerase and associated structures, generates waves of overtwist downstream and undertwist upstream from the site of transcription. The resulting superhelical stress is often sufficient to drive double-stranded DNA into a denatured state at locations such as promoters and origins of replication, where sequence-specific duplex opening is a prerequisite for biological function. In this way, transcription and other events that actively supercoil the DNA provide a mechanism for dynamically coupling genetic activity with regulatory and other cellular processes. Although computer modeling has provided insight into the equilibrium dynamics of DNA supercoiling, to date no model has appeared for simulating sequence-dependent DNA strand separation under the nonequilibrium conditions imposed by the dynamic introduction of torsional stress. Here, we introduce such a model and present results from an initial set of computer simulations in which the sequences of dynamically superhelical, 147 base pair DNA circles were systematically altered in order to probe the accuracy with which the model can predict location, extent, and time of stress-induced duplex denaturation. The results agree both with well-tested statistical mechanical calculations and with available experimental information. Additionally, we find that sites susceptible to denaturation show a propensity for localizing to supercoil apices, suggesting that base sequence determines locations of strand separation not only through the energetics of interstrand interactions, but also by influencing the geometry of supercoiling.
Rector, Annabel; Tachezy, Ruth; Van Ranst, Marc
2004-01-01
The discovery of novel viruses has often been accomplished by using hybridization-based methods that necessitate the availability of a previously characterized virus genome probe or knowledge of the viral nucleotide sequence to construct consensus or degenerate PCR primers. In their natural replication cycle, certain viruses employ a rolling-circle mechanism to propagate their circular genomes, and multiply primed rolling-circle amplification (RCA) with φ29 DNA polymerase has recently been applied in the amplification of circular plasmid vectors used in cloning. We employed an isothermal RCA protocol that uses random hexamer primers to amplify the complete genomes of papillomaviruses without the need for prior knowledge of their DNA sequences. We optimized this RCA technique with extracted human papillomavirus type 16 (HPV-16) DNA from W12 cells, using a real-time quantitative PCR assay to determine amplification efficiency, and obtained a 2.4 × 104-fold increase in HPV-16 DNA concentration. We were able to clone the complete HPV-16 genome from this multiply primed RCA product. The optimized protocol was subsequently applied to a bovine fibropapillomatous wart tissue sample. Whereas no papillomavirus DNA could be detected by restriction enzyme digestion of the original sample, multiply primed RCA enabled us to obtain a sufficient amount of papillomavirus DNA for restriction enzyme analysis, cloning, and subsequent sequencing of a novel variant of bovine papillomavirus type 1. The multiply primed RCA method allows the discovery of previously unknown papillomaviruses, and possibly also other circular DNA viruses, without a priori sequence information. PMID:15113879
NASA Technical Reports Server (NTRS)
Romano, Laura A.; Wray, Gregory A.
2003-01-01
Evolutionary changes in transcriptional regulation undoubtedly play an important role in creating morphological diversity. However, there is little information about the evolutionary dynamics of cis-regulatory sequences. This study examines the functional consequence of evolutionary changes in the Endo16 promoter of sea urchins. The Endo16 gene encodes a large extracellular protein that is expressed in the endoderm and may play a role in cell adhesion. Its promoter has been characterized in exceptional detail in the purple sea urchin, Strongylocentrotus purpuratus. We have characterized the structure and function of the Endo16 promoter from a second sea urchin species, Lytechinus variegatus. The Endo16 promoter sequences have evolved in a strongly mosaic manner since these species diverged approximately 35 million years ago: the most proximal region (module A) is conserved, but the remaining modules (B-G) are unalignable. Despite extensive divergence in promoter sequences, the pattern of Endo16 transcription is largely conserved during embryonic and larval development. Transient expression assays demonstrate that 2.2 kb of upstream sequence in either species is sufficient to drive GFP reporter expression that correctly mimics this pattern of Endo16 transcription. Reciprocal cross-species transient expression assays imply that changes have also evolved in the set of transcription factors that interact with the Endo16 promoter. Taken together, these results suggest that stabilizing selection on the transcriptional output may have operated to maintain a similar pattern of Endo16 expression in S. purpuratus and L. variegatus, despite dramatic divergence in promoter sequence and mechanisms of transcriptional regulation.
Mielke, Steven P; Grønbech-Jensen, Niels; Krishnan, V V; Fink, William H; Benham, Craig J
2005-09-22
The topological state of DNA in vivo is dynamically regulated by a number of processes that involve interactions with bound proteins. In one such process, the tracking of RNA polymerase along the double helix during transcription, restriction of rotational motion of the polymerase and associated structures, generates waves of overtwist downstream and undertwist upstream from the site of transcription. The resulting superhelical stress is often sufficient to drive double-stranded DNA into a denatured state at locations such as promoters and origins of replication, where sequence-specific duplex opening is a prerequisite for biological function. In this way, transcription and other events that actively supercoil the DNA provide a mechanism for dynamically coupling genetic activity with regulatory and other cellular processes. Although computer modeling has provided insight into the equilibrium dynamics of DNA supercoiling, to date no model has appeared for simulating sequence-dependent DNA strand separation under the nonequilibrium conditions imposed by the dynamic introduction of torsional stress. Here, we introduce such a model and present results from an initial set of computer simulations in which the sequences of dynamically superhelical, 147 base pair DNA circles were systematically altered in order to probe the accuracy with which the model can predict location, extent, and time of stress-induced duplex denaturation. The results agree both with well-tested statistical mechanical calculations and with available experimental information. Additionally, we find that sites susceptible to denaturation show a propensity for localizing to supercoil apices, suggesting that base sequence determines locations of strand separation not only through the energetics of interstrand interactions, but also by influencing the geometry of supercoiling.
No need to replace an “anomalous” primate (Primates) with an “anomalous” bear (Carnivora, Ursidae)
Gutiérrez, Eliécer E.; Pine, Ronald H.
2015-01-01
Abstract By means of mitochondrial 12S rRNA sequencing of putative “yeti”, “bigfoot”, and other “anomalous primate” hair samples, a recent study concluded that two samples, presented as from the Himalayas, do not belong to an “anomalous primate”, but to an unknown, anomalous type of ursid. That is, that they match 12S rRNA sequences of a fossil Polar Bear (Ursus maritimus), but neither of modern Polar Bears, nor of Brown Bears (Ursus arctos), the closest relative of Polar Bears, and one that occurs today in the Himalayas. We have undertaken direct comparison of sequences; replication of the original comparative study; inference of phylogenetic relationships of the two samples with respect to those from all extant species of Ursidae (except for the Giant Panda, Ailuropoda melanoleuca) and two extinct Pleistocene species; and application of a non-tree-based population aggregation approach for species diagnosis and identification. Our results demonstrate that the very short fragment of the 12S rRNA gene sequenced by Sykes et al. is not sufficiently informative to support the hypotheses provided by these authors with respect to the taxonomic identity of the individuals from which these sequences were obtained. We have concluded that there is no reason to believe that the two samples came from anything other than Brown Bears. These analyses afforded an opportunity to test the monophyly of morphologically defined species and to comment on both their phylogenetic relationships and future efforts necessary to advance our understanding of ursid systematics. PMID:25829853
How the Sequence of a Gene Specifies Structural Symmetry in Proteins
Shen, Xiaojuan; Huang, Tongcheng; Wang, Guanyu; Li, Guanglin
2015-01-01
Internal symmetry is commonly observed in the majority of fundamental protein folds. Meanwhile, sufficient evidence suggests that nascent polypeptide chains of proteins have the potential to start the co-translational folding process and this process allows mRNA to contain additional information on protein structure. In this paper, we study the relationship between gene sequences and protein structures from the viewpoint of symmetry to explore how gene sequences code for structural symmetry in proteins. We found that, for a set of two-fold symmetric proteins from left-handed beta-helix fold, intragenic symmetry always exists in their corresponding gene sequences. Meanwhile, codon usage bias and local mRNA structure might be involved in modulating translation speed for the formation of structural symmetry: a major decrease of local codon usage bias in the middle of the codon sequence can be identified as a common feature; and major or consecutive decreases in local mRNA folding energy near the boundaries of the symmetric substructures can also be observed. The results suggest that gene duplication and fusion may be an evolutionarily conserved process for this protein fold. In addition, the usage of rare codons and the formation of higher order of secondary structure near the boundaries of symmetric substructures might have coevolved as conserved mechanisms to slow down translation elongation and to facilitate effective folding of symmetric substructures. These findings provide valuable insights into our understanding of the mechanisms of translation and its evolution, as well as the design of proteins via symmetric modules. PMID:26641668
Federal Register 2010, 2011, 2012, 2013, 2014
2011-07-05
... Information Collection: Transformation Initiative Family Self-Sufficiency Demonstration Small Grants AGENCY... information: Title of Proposal: Notice of Funding Availability for the Transformation Initiative Family Self..., think tanks, consortia, Institutions of higher education accredited by a national or regional...
Rapid Multistep Synthesis of 1,2,4-Oxadiazoles in a Single Continuous Microreactor Sequence
Grant, Daniel; Dahl, Russell; Cosford, Nicholas D. P.
2009-01-01
A general method for the synthesis of bis-substituted 1,2,4-oxadiazoles from readily available arylnitriles and activated carbonyls in a single continuous microreactor sequence is described. The synthesis incorporates three sequential microreactors to produce 1,2,4-oxadiazoles in ~30 min in quantities (40–80 mg) sufficient for full characterization and rapid library supply. PMID:18687005
Dichosa, Armand E. K.; Davenport, Karen W.; Li, Po-E; ...
2015-03-19
In this study, we report here the genome sequence of Thauera sp. strain SWB20, isolated from a Singaporean wastewater treatment facility using gel microdroplets (GMDs) and single-cell genomics (SCG). This approach provided a single clonal microcolony that was sufficient to obtain a 4.9-Mbp genome assembly of an ecologically relevant Thauera species.
2014-01-01
Background Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properties are critical) are pertinent sources for mistaken homologies. Regretfully, these considerations regularly escape attention in large-scale annotation studies since, often, there is no substitute to manual handling of these cases. Quantitative criteria are required to suppress events of function annotation transfer as a result of false homology assignments. Results The sequence homology concept is based on the similarity comparison between the structural elements, the basic building blocks for conferring the overall fold of a protein. We propose to dissect the total similarity score into fold-critical and other, remaining contributions and suggest that, for a valid homology statement, the fold-relevant score contribution should at least be significant on its own. As part of the article, we provide the DissectHMMER software program for dissecting HMMER2/3 scores into segment-specific contributions. We show that DissectHMMER reproduces HMMER2/3 scores with sufficient accuracy and that it is useful in automated decisions about homology for instructive sequence examples. To generalize the dissection concept for cases without 3D structural information, we find that a dissection based on alignment quality is an appropriate surrogate. The approach was applied to a large-scale study of SMART and PFAM domains in the space of seed sequences and in the space of UniProt/SwissProt. Conclusions Sequence similarity core dissection with regard to fold-critical and other contributions systematically suppresses false hits and, additionally, recovers previously obscured homology relationships such as the one between aquaporins and formate/nitrite transporters that, so far, was only supported by structure comparison. PMID:24890864
Mosaic organization of DNA nucleotides
NASA Technical Reports Server (NTRS)
Peng, C. K.; Buldyrev, S. V.; Havlin, S.; Simons, M.; Stanley, H. E.; Goldberger, A. L.
1994-01-01
Long-range power-law correlations have been reported recently for DNA sequences containing noncoding regions. We address the question of whether such correlations may be a trivial consequence of the known mosaic structure ("patchiness") of DNA. We analyze two classes of controls consisting of patchy nucleotide sequences generated by different algorithms--one without and one with long-range power-law correlations. Although both types of sequences are highly heterogenous, they are quantitatively distinguishable by an alternative fluctuation analysis method that differentiates local patchiness from long-range correlations. Application of this analysis to selected DNA sequences demonstrates that patchiness is not sufficient to account for long-range correlation properties.
Recombination of polynucleotide sequences using random or defined primers
Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin H; Giver, Lorraine J.
2000-01-01
A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Recombination of polynucleotide sequences using random or defined primers
Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin; Giver, Lorraine J.
2001-01-01
A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
A draft annotation and overview of the human genome
Wright, Fred A; Lemon, William J; Zhao, Wei D; Sears, Russell; Zhuo, Degen; Wang, Jian-Ping; Yang, Hee-Yung; Baer, Troy; Stredney, Don; Spitzner, Joe; Stutz, Al; Krahe, Ralf; Yuan, Bo
2001-01-01
Background The recent draft assembly of the human genome provides a unified basis for describing genomic structure and function. The draft is sufficiently accurate to provide useful annotation, enabling direct observations of previously inferred biological phenomena. Results We report here a functionally annotated human gene index placed directly on the genome. The index is based on the integration of public transcript, protein, and mapping information, supplemented with computational prediction. We describe numerous global features of the genome and examine the relationship of various genetic maps with the assembly. In addition, initial sequence analysis reveals highly ordered chromosomal landscapes associated with paralogous gene clusters and distinct functional compartments. Finally, these annotation data were synthesized to produce observations of gene density and number that accord well with historical estimates. Such a global approach had previously been described only for chromosomes 21 and 22, which together account for 2.2% of the genome. Conclusions We estimate that the genome contains 65,000-75,000 transcriptional units, with exon sequences comprising 4%. The creation of a comprehensive gene index requires the synthesis of all available computational and experimental evidence. PMID:11516338
Role of sufficient statistics in stochastic thermodynamics and its implication to sensory adaptation
NASA Astrophysics Data System (ADS)
Matsumoto, Takumi; Sagawa, Takahiro
2018-04-01
A sufficient statistic is a significant concept in statistics, which means a probability variable that has sufficient information required for an inference task. We investigate the roles of sufficient statistics and related quantities in stochastic thermodynamics. Specifically, we prove that for general continuous-time bipartite networks, the existence of a sufficient statistic implies that an informational quantity called the sensory capacity takes the maximum. Since the maximal sensory capacity imposes a constraint that the energetic efficiency cannot exceed one-half, our result implies that the existence of a sufficient statistic is inevitably accompanied by energetic dissipation. We also show that, in a particular parameter region of linear Langevin systems there exists the optimal noise intensity at which the sensory capacity, the information-thermodynamic efficiency, and the total entropy production are optimized at the same time. We apply our general result to a model of sensory adaptation of E. coli and find that the sensory capacity is nearly maximal with experimentally realistic parameters.
Kurzeja, Patrick
2016-05-01
Modern imaging techniques, increased simulation capabilities and extended theoretical frameworks, naturally drive the development of multiscale modelling by the question: which new information should be considered? Given the need for concise constitutive relationships and efficient data evaluation; however, one important question is often neglected: which information is sufficient? For this reason, this work introduces the formalized criterion of subscale sufficiency. This criterion states whether a chosen constitutive relationship transfers all necessary information from micro to macroscale within a multiscale framework. It further provides a scheme to improve constitutive relationships. Direct application to static capillary pressure demonstrates usefulness and conditions for subscale sufficiency of saturation and interfacial areas.
BiQ Analyzer HT: locus-specific analysis of DNA methylation by high-throughput bisulfite sequencing
Lutsik, Pavlo; Feuerbach, Lars; Arand, Julia; Lengauer, Thomas; Walter, Jörn; Bock, Christoph
2011-01-01
Bisulfite sequencing is a widely used method for measuring DNA methylation in eukaryotic genomes. The assay provides single-base pair resolution and, given sufficient sequencing depth, its quantitative accuracy is excellent. High-throughput sequencing of bisulfite-converted DNA can be applied either genome wide or targeted to a defined set of genomic loci (e.g. using locus-specific PCR primers or DNA capture probes). Here, we describe BiQ Analyzer HT (http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de/), a user-friendly software tool that supports locus-specific analysis and visualization of high-throughput bisulfite sequencing data. The software facilitates the shift from time-consuming clonal bisulfite sequencing to the more quantitative and cost-efficient use of high-throughput sequencing for studying locus-specific DNA methylation patterns. In addition, it is useful for locus-specific visualization of genome-wide bisulfite sequencing data. PMID:21565797
Gene and translation initiation site prediction in metagenomic sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John
2012-01-01
Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translationmore » initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.« less
Spontaneous emergence of autocatalytic information-coding polymers
Tkachenko, Alexei V.; Maslov, Sergei
2015-07-28
Self-replicating systems based on information-coding polymers are of crucial importance in biology. They also recently emerged as a paradigm in material design on nano- and micro-scales. We present a general theoretical and numerical analysis of the problem of spontaneous emergence of autocatalysis for heteropolymers capable of template-assisted ligation driven by cyclic changes in the environment. Our central result is the existence of the first order transition between the regime dominated by free monomers and that with a self-sustaining population of sufficiently long chains. We provide a simple, mathematically tractable model supported by numerical simulations, which predicts the distribution of chainmore » lengths and the onset of autocatalysis in terms of the overall monomer concentration and two fundamental rate constants. Another key result of our study is the emergence of the kinetically limited optimal overlap length between a template and each of its two substrates. The template-assisted ligation allows for heritable transmission of the information encoded in chain sequences thus opening up the possibility of long-term memory and evolvability in such systems.« less
Spontaneous emergence of autocatalytic information-coding polymers
NASA Astrophysics Data System (ADS)
Tkachenko, Alexei V.; Maslov, Sergei
2015-07-01
Self-replicating systems based on information-coding polymers are of crucial importance in biology. They also recently emerged as a paradigm in material design on nano- and micro-scales. We present a general theoretical and numerical analysis of the problem of spontaneous emergence of autocatalysis for heteropolymers capable of template-assisted ligation driven by cyclic changes in the environment. Our central result is the existence of the first order transition between the regime dominated by free monomers and that with a self-sustaining population of sufficiently long chains. We provide a simple, mathematically tractable model supported by numerical simulations, which predicts the distribution of chain lengths and the onset of autocatalysis in terms of the overall monomer concentration and two fundamental rate constants. Another key result of our study is the emergence of the kinetically limited optimal overlap length between a template and each of its two substrates. The template-assisted ligation allows for heritable transmission of the information encoded in chain sequences thus opening up the possibility of long-term memory and evolvability in such systems.
Regulating the dorsal neural tube expression of Ptf1a through a distal 3' enhancer.
Mona, Bishakha; Avila, John M; Meredith, David M; Kollipara, Rahul K; Johnson, Jane E
2016-10-01
Generating the correct balance of inhibitory and excitatory neurons in a neural network is essential for normal functioning of a nervous system. The neural network in the dorsal spinal cord functions in somatosensation where it modulates and relays sensory information from the periphery. PTF1A is a key transcriptional regulator present in a specific subset of neural progenitor cells in the dorsal spinal cord, cerebellum and retina that functions to specify an inhibitory neuronal fate while suppressing excitatory neuronal fates. Thus, the regulation of Ptf1a expression is critical for determining mechanisms controlling neuronal diversity in these regions of the nervous system. Here we identify a sequence conserved, tissue-specific enhancer located 10.8kb 3' of the Ptf1a coding region that is sufficient to direct expression to dorsal neural tube progenitors that give rise to neurons in the dorsal spinal cord in chick and mouse. DNA binding motifs for Paired homeodomain (Pd-HD) and zinc finger (ZF) transcription factors are required for enhancer activity. Mutations in these sequences implicate the Pd-HD motif for activator function and the ZF motif for repressor function. Although no repressor transcription factor was identified, both PAX6 and SOX3 can increase enhancer activity in reporter assays. Thus, Ptf1a is regulated by active and repressive inputs integrated through multiple sequence elements within a highly conserved sequence downstream of the Ptf1a gene. Copyright © 2016 Elsevier Inc. All rights reserved.
Kempton, Colton E.; Heninger, Justin R.; Johnson, Steven M.
2014-01-01
Nucleosomes and their positions in the eukaryotic genome play an important role in regulating gene expression by influencing accessibility to DNA. Many factors influence a nucleosome's final position in the chromatin landscape including the underlying genomic sequence. One of the primary reasons for performing in vitro nucleosome reconstitution experiments is to identify how the underlying DNA sequence will influence a nucleosome's position in the absence of other compounding cellular factors. However, concerns have been raised about the reproducibility of data generated from these kinds of experiments. Here we present data for in vitro nucleosome reconstitution experiments performed on linear plasmid DNA that demonstrate that, when coverage is deep enough, these reconstitution experiments are exquisitely reproducible and highly consistent. Our data also suggests that a coverage depth of 35X be maintained for maximal confidence when assaying nucleosome positions, but lower coverage levels may be generally sufficient. These coverage depth recommendations are sufficient in the experimental system and conditions used in this study, but may vary depending on the exact parameters used in other systems. PMID:25093869
NASA Astrophysics Data System (ADS)
Feng, Xiao-Li; Li, Yu-Xiao; Gu, Jian-Zhong; Zhuo, Yi-Zhong
2009-10-01
The relaxation property of both Eigen model and Crow-Kimura model with a single peak fitness landscape is studied from phase transition point of view. We first analyze the eigenvalue spectra of the replication mutation matrices. For sufficiently long sequences, the almost crossing point between the largest and second-largest eigenvalues locates the error threshold at which critical slowing down behavior appears. We calculate the critical exponent in the limit of infinite sequence lengths and compare it with the result from numerical curve fittings at sufficiently long sequences. We find that for both models the relaxation time diverges with exponent 1 at the error (mutation) threshold point. Results obtained from both methods agree quite well. From the unlimited correlation length feature, the first order phase transition is further confirmed. Finally with linear stability theory, we show that the two model systems are stable for all ranges of mutation rate. The Eigen model is asymptotically stable in terms of mutant classes, and the Crow-Kimura model is completely stable.
Mutual information identifies spurious Hurst phenomena in resting state EEG and fMRI data
NASA Astrophysics Data System (ADS)
von Wegner, Frederic; Laufs, Helmut; Tagliazucchi, Enzo
2018-02-01
Long-range memory in time series is often quantified by the Hurst exponent H , a measure of the signal's variance across several time scales. We analyze neurophysiological time series from electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) resting state experiments with two standard Hurst exponent estimators and with the time-lagged mutual information function applied to discretized versions of the signals. A confidence interval for the mutual information function is obtained from surrogate Markov processes with equilibrium distribution and transition matrix identical to the underlying signal. For EEG signals, we construct an additional mutual information confidence interval from a short-range correlated, tenth-order autoregressive model. We reproduce the previously described Hurst phenomenon (H >0.5 ) in the analytical amplitude of alpha frequency band oscillations, in EEG microstate sequences, and in fMRI signals, but we show that the Hurst phenomenon occurs without long-range memory in the information-theoretical sense. We find that the mutual information function of neurophysiological data behaves differently from fractional Gaussian noise (fGn), for which the Hurst phenomenon is a sufficient condition to prove long-range memory. Two other well-characterized, short-range correlated stochastic processes (Ornstein-Uhlenbeck, Cox-Ingersoll-Ross) also yield H >0.5 , whereas their mutual information functions lie within the Markovian confidence intervals, similar to neural signals. In these processes, which do not have long-range memory by construction, a spurious Hurst phenomenon occurs due to slow relaxation times and heteroscedasticity (time-varying conditional variance). In summary, we find that mutual information correctly distinguishes long-range from short-range dependence in the theoretical and experimental cases discussed. Our results also suggest that the stationary fGn process is not sufficient to describe neural data, which seem to belong to a more general class of stochastic processes, in which multiscale variance effects produce Hurst phenomena without long-range dependence. In our experimental data, the Hurst phenomenon and long-range memory appear as different system properties that should be estimated and interpreted independently.
Mutations that Cause Human Disease: A Computational/Experimental Approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beernink, P; Barsky, D; Pesavento, B
International genome sequencing projects have produced billions of nucleotides (letters) of DNA sequence data, including the complete genome sequences of 74 organisms. These genome sequences have created many new scientific opportunities, including the ability to identify sequence variations among individuals within a species. These genetic differences, which are known as single nucleotide polymorphisms (SNPs), are particularly important in understanding the genetic basis for disease susceptibility. Since the report of the complete human genome sequence, over two million human SNPs have been identified, including a large-scale comparison of an entire chromosome from twenty individuals. Of the protein coding SNPs (cSNPs), approximatelymore » half leads to a single amino acid change in the encoded protein (non-synonymous coding SNPs). Most of these changes are functionally silent, while the remainder negatively impact the protein and sometimes cause human disease. To date, over 550 SNPs have been found to cause single locus (monogenic) diseases and many others have been associated with polygenic diseases. SNPs have been linked to specific human diseases, including late-onset Parkinson disease, autism, rheumatoid arthritis and cancer. The ability to predict accurately the effects of these SNPs on protein function would represent a major advance toward understanding these diseases. To date several attempts have been made toward predicting the effects of such mutations. The most successful of these is a computational approach called ''Sorting Intolerant From Tolerant'' (SIFT). This method uses sequence conservation among many similar proteins to predict which residues in a protein are functionally important. However, this method suffers from several limitations. First, a query sequence must have a sufficient number of relatives to infer sequence conservation. Second, this method does not make use of or provide any information on protein structure, which can be used to understand how an amino acid change affects the protein. The experimental methods that provide the most detailed structural information on proteins are X-ray crystallography and NMR spectroscopy. However, these methods are labor intensive and currently cannot be carried out on a genomic scale. Nonetheless, Structural Genomics projects are being pursued by more than a dozen groups and consortia worldwide and as a result the number of experimentally determined structures is rising exponentially. Based on the expectation that protein structures will continue to be determined at an ever-increasing rate, reliable structure prediction schemes will become increasingly valuable, leading to information on protein function and disease for many different proteins. Given known genetic variability and experimentally determined protein structures, can we accurately predict the effects of single amino acid substitutions? An objective assessment of this question would involve comparing predicted and experimentally determined structures, which thus far has not been rigorously performed. The completed research leveraged existing expertise at LLNL in computational and structural biology, as well as significant computing resources, to address this question.« less
Singh, Kumar Saurabh; Thual, Dominique; Spurio, Roberto; Cannata, Nicola
2015-01-01
One of the most crucial characteristics of day-to-day laboratory information management is the collection, storage and retrieval of information about research subjects and environmental or biomedical samples. An efficient link between sample data and experimental results is absolutely important for the successful outcome of a collaborative project. Currently available software solutions are largely limited to large scale, expensive commercial Laboratory Information Management Systems (LIMS). Acquiring such LIMS indeed can bring laboratory information management to a higher level, but most of the times this requires a sufficient investment of money, time and technical efforts. There is a clear need for a light weighted open source system which can easily be managed on local servers and handled by individual researchers. Here we present a software named SaDA for storing, retrieving and analyzing data originated from microorganism monitoring experiments. SaDA is fully integrated in the management of environmental samples, oligonucleotide sequences, microarray data and the subsequent downstream analysis procedures. It is simple and generic software, and can be extended and customized for various environmental and biomedical studies. PMID:26047146
Singh, Kumar Saurabh; Thual, Dominique; Spurio, Roberto; Cannata, Nicola
2015-06-03
One of the most crucial characteristics of day-to-day laboratory information management is the collection, storage and retrieval of information about research subjects and environmental or biomedical samples. An efficient link between sample data and experimental results is absolutely important for the successful outcome of a collaborative project. Currently available software solutions are largely limited to large scale, expensive commercial Laboratory Information Management Systems (LIMS). Acquiring such LIMS indeed can bring laboratory information management to a higher level, but most of the times this requires a sufficient investment of money, time and technical efforts. There is a clear need for a light weighted open source system which can easily be managed on local servers and handled by individual researchers. Here we present a software named SaDA for storing, retrieving and analyzing data originated from microorganism monitoring experiments. SaDA is fully integrated in the management of environmental samples, oligonucleotide sequences, microarray data and the subsequent downstream analysis procedures. It is simple and generic software, and can be extended and customized for various environmental and biomedical studies.
Influence of gag and RRE Sequences on HIV-1 RNA Packaging Signal Structure and Function.
Kharytonchyk, Siarhei; Brown, Joshua D; Stilger, Krista; Yasin, Saif; Iyer, Aishwarya S; Collins, John; Summers, Michael F; Telesnitsky, Alice
2018-07-06
The packaging signal (Ψ) and Rev-responsive element (RRE) enable unspliced HIV-1 RNAs' export from the nucleus and packaging into virions. For some retroviruses, engrafting Ψ onto a heterologous RNA is sufficient to direct encapsidation. In contrast, HIV-1 RNA packaging requires 5' leader Ψ elements plus poorly defined additional features. We previously defined minimal 5' leader sequences competitive with intact Ψ for HIV-1 packaging, and here examined the potential roles of additional downstream elements. The findings confirmed that together, HIV-1 5' leader Ψ sequences plus a nuclear export element are sufficient to specify packaging. However, RNAs trafficked using a heterologous export element did not compete well with RNAs using HIV-1's RRE. Furthermore, some RNA additions to well-packaged minimal vectors rendered them packaging-defective. These defects were rescued by extending gag sequences in their native context. To understand these packaging defects' causes, in vitro dimerization properties of RNAs containing minimal packaging elements were compared to RNAs with sequence extensions that were or were not compatible with packaging. In vitro dimerization was found to correlate with packaging phenotypes, suggesting that HIV-1 evolved to prevent 5' leader residues' base pairing with downstream residues and misfolding of the packaging signal. Our findings explain why gag sequences have been implicated in packaging and show that RRE's packaging contributions appear more specific than nuclear export alone. Paired with recent work showing that sequences upstream of Ψ can dictate RNA folds, the current work explains how genetic context of minimal packaging elements contributes to HIV-1 RNA fate determination. Copyright © 2018 Elsevier Ltd. All rights reserved.
Atibalentja, N; Noel, G R; Ciancio, A
2004-03-01
For many years the taxonomy of the genus Pasteuria has been marred with confusion because the bacterium could not be cultured in vitro and, therefore, descriptions were based solely on morphological, developmental, and pathological characteristics. The current study sought to devise a simple method for PCR-amplification, cloning, and sequencing of Pasteuria 16S rDNA from small numbers of endospores, with no need for prior DNA purification. Results show that DNA extracts from plain glass bead-beating of crude suspensions containing 10,000 endospores at 0.2 x 10 endospores ml(-1) were sufficient for PCR-amplification of Pasteuria 16S rDNA, when used in conjunction with specific primers. These results imply that for P. penetrans and P. nishizawae only one parasitized female of Meloidogyne spp. and Heterodera glycines, respectively, should be sufficient, and as few as eight cadavers of Belonolaimus longicaudatus with an average number of 1,250 endospores of "Candidatus Pasteuria usgae" are needed for PCR-amplification of Pasteuria 16S rDNA. The method described in this paper should facilitate the sequencing of the 16S rDNA of the many Pasteuria isolates that have been reported on nematodes and, consequently, expedite the classification of those isolates through comparative sequence analysis.
Non-rigid Motion Correction in 3D Using Autofocusing with Localized Linear Translations
Cheng, Joseph Y.; Alley, Marcus T.; Cunningham, Charles H.; Vasanawala, Shreyas S.; Pauly, John M.; Lustig, Michael
2012-01-01
MR scans are sensitive to motion effects due to the scan duration. To properly suppress artifacts from non-rigid body motion, complex models with elements such as translation, rotation, shear, and scaling have been incorporated into the reconstruction pipeline. However, these techniques are computationally intensive and difficult to implement for online reconstruction. On a sufficiently small spatial scale, the different types of motion can be well-approximated as simple linear translations. This formulation allows for a practical autofocusing algorithm that locally minimizes a given motion metric – more specifically, the proposed localized gradient-entropy metric. To reduce the vast search space for an optimal solution, possible motion paths are limited to the motion measured from multi-channel navigator data. The novel navigation strategy is based on the so-called “Butterfly” navigators which are modifications to the spin-warp sequence that provide intrinsic translational motion information with negligible overhead. With a 32-channel abdominal coil, sufficient number of motion measurements were found to approximate possible linear motion paths for every image voxel. The correction scheme was applied to free-breathing abdominal patient studies. In these scans, a reduction in artifacts from complex, non-rigid motion was observed. PMID:22307933
Michel, Audrey M.; Ahern, Anna M.; Donohue, Claire A.
2015-01-01
The boundaries of protein coding sequences are more difficult to define at the 5′ end than at the 3′ end due to potential multiple translation initiation sites (TISs). Even in the presence of phylogenetic data, the use of sequence information only may not be sufficient for the accurate identification of TISs. Traditional proteomics approaches may also fail because the N‐termini of newly synthesized proteins are often processed. Thus ribosome profiling (ribo‐seq), producing a snapshot of the ribosome distribution across the entire transcriptome, is an attractive experimental technique for the purpose of TIS location exploration. The GWIPS‐viz (Genome Wide Information on Protein Synthesis visualized) browser (http://gwips.ucc.ie) provides free access to the genomic alignments of ribo‐seq data and corresponding mRNA‐seq data along with relevant annotation tracks. In this brief, we illustrate how GWIPS‐viz can be used to explore the ribosome occupancy at the 5′ ends of protein coding genes to assess the activity of AUG and non‐AUG TISs responsible for the synthesis of proteoforms with alternative or heterogeneous N‐termini. The presence of ribo‐seq tracks for various organisms allows for cross‐species comparison of orthologous genes and the availability of datasets from multiple laboratories permits the assessment of the technical reproducibility of the ribosome densities. PMID:25736862
Discovery of the "RNA continent" through a contrarian's research strategy.
Hayashizaki, Yoshihide
2011-01-01
The International Human Genome Sequencing Consortium completed the decoding of the human genome sequence in 2003. Readers will be aware of the paradigm shift which has occurred since then in the field of life science research. At last, mankind has been able to focus on a complete picture of the full extent of the genome, on which is recorded the basic information that controls all life. Meanwhile, another genome project, centered on Japan and known as the mouse genome encyclopedia project, was progressing with participation from around the world. Led by our research group at RIKEN, it was a full-length cDNA project which aimed to decode the whole RNA (transcriptome) using the mouse as a model. The basic information that controls all life is recorded on the genome, but in order to obtain a complete picture of this extensive information, the decoding of the genome alone is far from sufficient. These two genome projects established that the number of letters in the genome, which is the blueprint of life, is finite, that the number of RNA molecules derived from it is also finite, and that the number of protein molecules derived from the RNA is probably finite too. A massive number of combinations is still involved, but we are now able to understand one section of the network formed by these data. Once an object of study has been understood to be finite, establishing an image of the whole is certain to lead us to an understanding of the whole. Omics is an approach that views the information controlling life as finite and seeks to assemble and analyze it as a whole. Here, I would like to present our transcriptome research while making reference to our unique research strategy.
Kalaycioglu, Atila T; Baykal, Atakan; Guldemir, Dilek; Bakkaloglu, Zekiye; Korukluoglu, Gulay; Coskun, Aslihan; Torunoglu, Mehmet Ali; Ertek, Mustafa; Durmaz, Riza
2013-12-01
Genetic characterization of measles viruses (MVs) combined with acquisition of epidemiologic information is essential for measles surveillance programs used in determining transmission pathways. This study describes the molecular characterization of 26 MV strains (3 from 2010, 23 from 2011) obtained from urine or throat swabs harvested from patients in Turkey. MV RNA samples (n = 26) were subjected to sequence analysis of 450 nucleotides comprising the most variable C-terminal region of the nucleoprotein (N) gene. Phylogenetic analysis revealed 20 strains from 2011 belonged to genotype D9, 3 to D4, 2 strains from 2010 to genotype D4 and 1 to genotype B3. This study represents the first report describing the involvement of MV genotype D9 in an outbreak in Turkey. The sequence of the majority of genotype D9 strains was identical to those identified in Russia, Malaysia, Japan, and the UK. Despite lack of sufficient epidemiologic information, the presence of variants observed following phylogenetic analysis suggested that exposure to genotype D9 might have occurred due to importation more than once. Phylogenetic analysis of five genotype D4 strains revealed the presence of four variants. Epidemiological information and phylogenetic analysis suggested that three genotype D4 strains and one genotype B3 strain were associated with importation. This study suggests the presence of pockets of unimmunized individuals making Turkey susceptible to outbreaks. Continuing molecular surveillance of measles strains in Turkey is essential as a means of acquiring epidemiologic information to define viral transmission patterns and determine the effectiveness of measles vaccination programs designed to eliminate this virus. © 2013 Wiley Periodicals, Inc.
Kress, W John; Erickson, David L
2007-06-06
A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.
Tertiary alphabet for the observable protein structural universe.
Mackenzie, Craig O; Zhou, Jianfu; Grigoryan, Gevorg
2016-11-22
Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence-a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure.
2013-01-01
Background Longan is a tropical/subtropical fruit tree of great economic importance in Southeast Asia. Progress in understanding molecular mechanisms of longan embryogenesis, which is the primary influence on fruit quality and yield, is slowed by lack of transcriptomic and genomic information. Illumina second generation sequencing, which is suitable for generating enormous numbers of transcript sequences that can be used for functional genomic analysis of longan. Results In this study, a longan embryogenic callus (EC) cDNA library was sequenced using an Illumina HiSeq 2000 system. A total of 64,876,258 clean reads comprising 5.84 Gb of nucleotides were assembled into 68,925 unigenes of 448-bp mean length, with unigenes ≥1000 bp accounting for 8.26% of the total. Using BLASTx, 40,634 unigenes were found to have significant similarity with accessions in Nr and Swiss- Prot databases. Of these, 38,845 unigenes were assigned to 43 GO sub-categories and 17,118 unigenes were classified into 25 COG sub-groups. In addition, 17,306 unigenes mapped to 199 KEGG pathways, with the categories of Metabolic pathways, Plant-pathogen interaction, Biosynthesis of secondary metabolites, and Genetic information processing being well represented. Analyses of unigenes ≥1000 bp revealed 328 embryogenesis-related unigenes as well as numerous unigenes expressed in EC associated with functions of reproductive growth, such as flowering, gametophytogenesis, and fertility, and vegetative growth, such as root and shoot growth. Furthermore, 23 unigenes related to embryogenesis and reproductive and vegetative growth were validated by quantitative real time PCR (qPCR) in samples from different stages of longan somatic embryogenesis (SE); their differentially expressions in the various embryogenic cultures indicated their possible roles in longan SE. Conclusions The quantity and variety of expressed EC genes identified in this study is sufficient to serve as a global transcriptome dataset for longan EC and to provide more molecular resources for longan functional genomics. PMID:23957614
Qiu, Ping; Stevens, Richard; Wei, Bo; Lahser, Fred; Howe, Anita Y. M.; Klappenbach, Joel A.; Marton, Matthew J.
2015-01-01
Genotyping of hepatitis C virus (HCV) plays an important role in the treatment of HCV. As new genotype-specific treatment options become available, it has become increasingly important to have accurate HCV genotype and subtype information to ensure that the most appropriate treatment regimen is selected. Most current genotyping methods are unable to detect mixed genotypes from two or more HCV infections. Next generation sequencing (NGS) allows for rapid and low cost mass sequencing of viral genomes and provides an opportunity to probe the viral population from a single host. In this paper, the possibility of using short NGS reads for direct HCV genotyping without genome assembly was evaluated. We surveyed the publicly-available genetic content of three HCV drug target regions (NS3, NS5A, NS5B) in terms of whether these genes contained genotype-specific regions that could predict genotype. Six genotypes and 38 subtypes were included in this study. An automated phylogenetic analysis based HCV genotyping method was implemented and used to assess different HCV target gene regions. Candidate regions of 250-bp each were found for all three genes that have enough genetic information to predict HCV genotypes/subtypes. Validation using public datasets shows 100% genotyping accuracy. To test whether these 250-bp regions were sufficient to identify mixed genotypes, we developed a random primer-based method to sequence HCV plasma samples containing mixtures of two HCV genotypes in different ratios. We were able to determine the genotypes without ambiguity and to quantify the ratio of the abundances of the mixed genotypes in the samples. These data provide a proof-of-concept that this random primed, NGS-based short-read genotyping approach does not need prior information about the viral population and is capable of detecting mixed viral infection. PMID:25830316
2016-01-01
Modern imaging techniques, increased simulation capabilities and extended theoretical frameworks, naturally drive the development of multiscale modelling by the question: which new information should be considered? Given the need for concise constitutive relationships and efficient data evaluation; however, one important question is often neglected: which information is sufficient? For this reason, this work introduces the formalized criterion of subscale sufficiency. This criterion states whether a chosen constitutive relationship transfers all necessary information from micro to macroscale within a multiscale framework. It further provides a scheme to improve constitutive relationships. Direct application to static capillary pressure demonstrates usefulness and conditions for subscale sufficiency of saturation and interfacial areas. PMID:27279769
Information encoded in non-native states drives substrate-chaperone pairing.
Mapa, Koyeli; Tiwari, Satyam; Kumar, Vignesh; Jayaraj, Gopal Gunanathan; Maiti, Souvik
2012-09-05
Many proteins refold in vitro through kinetic folding intermediates that are believed to be by-products of native-state centric evolution. These intermediates are postulated to play only minor roles, if any, in vivo because they lack any information related to translation-associated vectorial folding. We demonstrate that refolding intermediate of a test protein, generated in vitro, is able to find its cognate chaperone, from the whole complement of Escherichia coli soluble chaperones. Cognate chaperone-binding uniquely alters the conformation of non-native substrate. Importantly, precise chaperone targeting of substrates are maintained as long as physiological molar ratios of chaperones remain unaltered. Using a library of different chaperone substrates, we demonstrate that kinetically trapped refolding intermediates contain sufficient structural features for precise targeting to cognate chaperones. We posit that evolution favors sequences that, in addition to coding for a functional native state, encode folding intermediates with higher affinity for cognate chaperones than noncognate ones. Copyright © 2012 Elsevier Ltd. All rights reserved.
Joy, Nisha; Asha, Srinivasan; Mallika, Vijayan; Soniya, Eppurathu Vasudevan
2013-01-01
Next generation sequencing has an advantageon transformational development of species with limited available sequence data as it helps to decode the genome and transcriptome. We carried out the de novo sequencing using illuminaHiSeq™ 2000 to generate the first leaf transcriptome of black pepper (Piper nigrum L.), an important spice variety native to South India and also grown in other tropical regions. Despite the economic and biochemical importance of pepper, a scientifically rigorous study at the molecular level is far from complete due to lack of sufficient sequence information and cytological complexity of its genome. The 55 million raw reads obtained, when assembled using Trinity program generated 2,23,386 contigs and 1,28,157 unigenes. Reports suggest that the repeat-rich genomic regions give rise to small non-coding functional RNAs. MicroRNAs (miRNAs) are the most abundant type of non-coding regulatory RNAs. In spite of the widespread research on miRNAs, little is known about the hair-pin precursors of miRNAs bearing Simple Sequence Repeats (SSRs). We used the array of transcripts generated, for the in silico prediction and detection of '43 pre-miRNA candidates bearing different types of SSR motifs'. The analysis identified 3913 different types of SSR motifs with an average of one SSR per 3.04 MB of thetranscriptome. About 0.033% of the transcriptome constituted 'pre-miRNA candidates bearing SSRs'. The abundance, type and distribution of SSR motifs studied across the hair-pin miRNA precursors, showed a significant bias in the position of SSRs towards the downstream of predicted 'pre-miRNA candidates'. The catalogue of transcripts identified, together with the demonstration of reliable existence of SSRs in the miRNA precursors, permits future opportunities for understanding the genetic mechanism of black pepper and likely functions of 'tandem repeats' in miRNAs.
Joy, Nisha; Asha, Srinivasan; Mallika, Vijayan; Soniya, Eppurathu Vasudevan
2013-01-01
Next generation sequencing has an advantageon transformational development of species with limited available sequence data as it helps to decode the genome and transcriptome. We carried out the de novo sequencing using illuminaHiSeq™ 2000 to generate the first leaf transcriptome of black pepper (Piper nigrum L.), an important spice variety native to South India and also grown in other tropical regions. Despite the economic and biochemical importance of pepper, a scientifically rigorous study at the molecular level is far from complete due to lack of sufficient sequence information and cytological complexity of its genome. The 55 million raw reads obtained, when assembled using Trinity program generated 2,23,386 contigs and 1,28,157 unigenes. Reports suggest that the repeat-rich genomic regions give rise to small non-coding functional RNAs. MicroRNAs (miRNAs) are the most abundant type of non-coding regulatory RNAs. In spite of the widespread research on miRNAs, little is known about the hair-pin precursors of miRNAs bearing Simple Sequence Repeats (SSRs). We used the array of transcripts generated, for the in silico prediction and detection of ‘43 pre-miRNA candidates bearing different types of SSR motifs’. The analysis identified 3913 different types of SSR motifs with an average of one SSR per 3.04 MB of thetranscriptome. About 0.033% of the transcriptome constituted ‘pre-miRNA candidates bearing SSRs’. The abundance, type and distribution of SSR motifs studied across the hair-pin miRNA precursors, showed a significant bias in the position of SSRs towards the downstream of predicted ‘pre-miRNA candidates’. The catalogue of transcripts identified, together with the demonstration of reliable existence of SSRs in the miRNA precursors, permits future opportunities for understanding the genetic mechanism of black pepper and likely functions of ‘tandem repeats’ in miRNAs. PMID:23469176
Hawkins, Troy; Chitale, Meghana; Luban, Stanislav; Kihara, Daisuke
2009-02-15
Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence-based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI-BLAST search by extracting and scoring individual functional attributes, searching a wide range of E-value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low-resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome-scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60-90% could be annotated with a GO molecular function term at high confidence (>or=80%). We also applied our predictions to the protein-protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (>or=90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI-BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP-SIG '05 and CASP7 function (FN) assessments. PFP is available as a web service at http://dragon.bio.purdue.edu/pfp/. (c) 2008 Wiley-Liss, Inc.
Transforming clinical microbiology with bacterial genome sequencing.
Didelot, Xavier; Bowden, Rory; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W
2012-09-01
Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.
Transforming clinical microbiology with bacterial genome sequencing
2016-01-01
Whole genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here we review the current status of clinical microbiology and how it has already begun to be transformed by the use of next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. The application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow. PMID:22868263
Foerster, Rebecca M
2018-03-01
Before acting humans saccade to a target object to extract relevant visual information. Even when acting on remembered objects, locations previously occupied by relevant objects are fixated during imagery and memory tasks - a phenomenon called "looking-at-nothing". While looking-at-nothing was robustly found in tasks encouraging declarative memory built-up, results are mixed in the case of procedural sensorimotor tasks. Eye-guidance to manual targets in complete darkness was observed in a task practiced for days beforehand, while investigations using only a single session did not find fixations to remembered action targets. Here, it is asked whether looking-at-nothing can be found in a single sensorimotor session and thus independent from sleep consolidation, and how it progresses when visual information is repeatedly unavailable. Eye movements were investigated in a computerized version of the trail making test. Participants clicked on numbered circles in ascending sequence. Fifty trials were performed with the same spatial arrangement of 9 visual targets to enable long-term memory consolidation. During 50 consecutive trials, participants had to click the remembered target sequence on an empty screen. Participants scanned the visual targets and also the empty target locations sequentially with their eyes, however, the latter less precise than the former. Over the course of the memory trials, manual and oculomotor sequential target scanning became more similar to the visual trials. Results argue for robust looking-at-nothing during procedural sensorimotor tasks provided that long-term memory information is sufficient. Copyright © 2018 Elsevier Ltd. All rights reserved.
Pickard, Mark R.; Williams, Gwyn T.
2016-01-01
Growth arrest-specific 5 (GAS5) lncRNA promotes apoptosis, and its expression is down-regulated in breast cancer. GAS5 lncRNA is a decoy of glucocorticoid/related receptors; a stem-loop sequence constitutes the GAS5 hormone response element mimic (HREM), which is essential for the regulation of breast cancer cell apoptosis. This preclinical study aimed to determine if the GAS5 HREM sequence alone promotes the apoptosis of breast cancer cells. Nucleofection of hormone-sensitive and –insensitive breast cancer cell lines with a GAS5 HREM DNA oligonucleotide increased both basal and ultraviolet-C-induced apoptosis, and decreased culture viability and clonogenic growth, similar to GAS5 lncRNA. The HREM oligonucleotide demonstrated similar sequence specificity to the native HREM for its functional activity and had no effect on endogenous GAS5 lncRNA levels. Certain chemically modified HREM oligonucleotides, notably DNA and RNA phosphorothioates, retained pro-apoptotic. activity. Crucially the HREM oligonucleotide could overcome apoptosis resistance secondary to deficient endogenous GAS5 lncRNA levels. Thus, the GAS5 lncRNA HREM sequence alone is sufficient to induce apoptosis in breast cancer cells, including triple-negative breast cancer cells. These findings further suggest that emerging knowledge of structure/function relationships in the field of lncRNA biology can be exploited for the development of entirely novel, oligonucleotide mimic-based, cancer therapies. PMID:26862727
Tataru, Paula; Hobolth, Asger
2011-12-05
Continuous time Markov chains (CTMCs) is a widely used model for describing the evolution of DNA sequences on the nucleotide, amino acid or codon level. The sufficient statistics for CTMCs are the time spent in a state and the number of changes between any two states. In applications past evolutionary events (exact times and types of changes) are unaccessible and the past must be inferred from DNA sequence data observed in the present. We describe and implement three algorithms for computing linear combinations of expected values of the sufficient statistics, conditioned on the end-points of the chain, and compare their performance with respect to accuracy and running time. The first algorithm is based on an eigenvalue decomposition of the rate matrix (EVD), the second on uniformization (UNI), and the third on integrals of matrix exponentials (EXPM). The implementation in R of the algorithms is available at http://www.birc.au.dk/~paula/. We use two different models to analyze the accuracy and eight experiments to investigate the speed of the three algorithms. We find that they have similar accuracy and that EXPM is the slowest method. Furthermore we find that UNI is usually faster than EVD.
Bill, Anke; Rosethorne, Elizabeth M; Kent, Toby C; Fawcett, Lindsay; Burchell, Lynn; van Diepen, Michiel T; Marelli, Anthony; Batalov, Sergey; Miraglia, Loren; Orth, Anthony P; Renaud, Nicole A; Charlton, Steven J; Gosling, Martin; Gaither, L Alex; Groot-Kormelink, Paul J
2014-01-01
The human prostacyclin receptor (hIP receptor) is a seven-transmembrane G protein-coupled receptor (GPCR) that plays a critical role in vascular smooth muscle relaxation and platelet aggregation. hIP receptor dysfunction has been implicated in numerous cardiovascular abnormalities, including myocardial infarction, hypertension, thrombosis and atherosclerosis. Genomic sequencing has discovered several genetic variations in the PTGIR gene coding for hIP receptor, however, its structure-function relationship has not been sufficiently explored. Here we set out to investigate the applicability of high throughput random mutagenesis to study the structure-function relationship of hIP receptor. While chemical mutagenesis was not suitable to generate a mutagenesis library with sufficient coverage, our data demonstrate error-prone PCR (epPCR) mediated mutagenesis as a valuable method for the unbiased screening of residues regulating hIP receptor function and expression. Here we describe the generation and functional characterization of an epPCR derived mutagenesis library compromising >4000 mutants of the hIP receptor. We introduce next generation sequencing as a useful tool to validate the quality of mutagenesis libraries by providing information about the coverage, mutation rate and mutational bias. We identified 18 mutants of the hIP receptor that were expressed at the cell surface, but demonstrated impaired receptor function. A total of 38 non-synonymous mutations were identified within the coding region of the hIP receptor, mapping to 36 distinct residues, including several mutations previously reported to affect the signaling of the hIP receptor. Thus, our data demonstrates epPCR mediated random mutagenesis as a valuable and practical method to study the structure-function relationship of GPCRs.
Kent, Toby C.; Fawcett, Lindsay; Burchell, Lynn; van Diepen, Michiel T.; Marelli, Anthony; Batalov, Sergey; Miraglia, Loren; Orth, Anthony P.; Renaud, Nicole A.; Charlton, Steven J.; Gosling, Martin; Gaither, L. Alex; Groot-Kormelink, Paul J.
2014-01-01
The human prostacyclin receptor (hIP receptor) is a seven-transmembrane G protein-coupled receptor (GPCR) that plays a critical role in vascular smooth muscle relaxation and platelet aggregation. hIP receptor dysfunction has been implicated in numerous cardiovascular abnormalities, including myocardial infarction, hypertension, thrombosis and atherosclerosis. Genomic sequencing has discovered several genetic variations in the PTGIR gene coding for hIP receptor, however, its structure-function relationship has not been sufficiently explored. Here we set out to investigate the applicability of high throughput random mutagenesis to study the structure-function relationship of hIP receptor. While chemical mutagenesis was not suitable to generate a mutagenesis library with sufficient coverage, our data demonstrate error-prone PCR (epPCR) mediated mutagenesis as a valuable method for the unbiased screening of residues regulating hIP receptor function and expression. Here we describe the generation and functional characterization of an epPCR derived mutagenesis library compromising >4000 mutants of the hIP receptor. We introduce next generation sequencing as a useful tool to validate the quality of mutagenesis libraries by providing information about the coverage, mutation rate and mutational bias. We identified 18 mutants of the hIP receptor that were expressed at the cell surface, but demonstrated impaired receptor function. A total of 38 non-synonymous mutations were identified within the coding region of the hIP receptor, mapping to 36 distinct residues, including several mutations previously reported to affect the signaling of the hIP receptor. Thus, our data demonstrates epPCR mediated random mutagenesis as a valuable and practical method to study the structure-function relationship of GPCRs. PMID:24886841
Gao, Chunsheng; Xin, Pengfei; Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining
2014-01-01
Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis.
Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining
2014-01-01
Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis. PMID:25329551
Seo, Sunhee; Kim, Og Yeon; Shim, Soonmi
2014-06-01
The purpose of this study is to identify how level of information affected intention, using the Theory of Planned Behavior. The study was conducted survey in diverse community centers and shopping malls in Seoul, which yielded N = 209 datasets. To compare processed foods consumption behavior, we divided samples into two groups based on level of information about food additives (whether respondents felt that information on food additives was sufficient or not). We analyzed differences in attitudes toward food additives and toward purchasing processed foods, subjective norms, perceived behavioral control, and behavioral intentions to processed foods between sufficient information group and lack information group. The results confirmed that more than 78% of respondents thought information on food additives was insufficient. However, the group who felt information was sufficient had more positive attitudes about consuming processed foods and behavioral intentions than the group who thought information was inadequate. This study found people who consider that they have sufficient information on food additives tend to have more positive attitudes toward processed foods and intention to consume processed foods. This study suggests increasing needs for nutrition education on the appropriate use of processed foods. Designing useful nutrition education requires a good understanding of factors which influence on processed foods consumption.
Kim, Og Yeon; Shim, Soonmi
2014-01-01
BACKGROUND/OBJECTIVES The purpose of this study is to identify how level of information affected intention, using the Theory of Planned Behavior. SUBJECTS/METHODS The study was conducted survey in diverse community centers and shopping malls in Seoul, which yielded N = 209 datasets. To compare processed foods consumption behavior, we divided samples into two groups based on level of information about food additives (whether respondents felt that information on food additives was sufficient or not). We analyzed differences in attitudes toward food additives and toward purchasing processed foods, subjective norms, perceived behavioral control, and behavioral intentions to processed foods between sufficient information group and lack information group. RESULTS The results confirmed that more than 78% of respondents thought information on food additives was insufficient. However, the group who felt information was sufficient had more positive attitudes about consuming processed foods and behavioral intentions than the group who thought information was inadequate. This study found people who consider that they have sufficient information on food additives tend to have more positive attitudes toward processed foods and intention to consume processed foods. CONCLUSIONS This study suggests increasing needs for nutrition education on the appropriate use of processed foods. Designing useful nutrition education requires a good understanding of factors which influence on processed foods consumption. PMID:24944779
Production Scheduling of Sequenced Tapes for Printed Circuit Pack Assembly.
1987-07-09
detail. L j 6 The subject matter of this thesis is inspired directly from their technical report. The goals of this research are twofold: 1) Test their...The subject matter of the following chapters describes a heuristic approach to another variation of the sequenced tape production scheduling problem...assignment problem, comprise the subject matter of Chapter 5. It is sufficient to note that the three definitions of the term common correspond to the
NASA Astrophysics Data System (ADS)
Slyusarchuk, Vasilii E.
2009-02-01
Necessary and sufficient conditions are found for the invertibility of the nonlinear difference operator \\displaystyle (\\mathscr Rx)(n)=H(x(n),x(n+1)),\\qquad n\\in\\mathbb Z, in the space of bounded two-sided number sequences. Here H\\colon \\mathbb R^2\\to \\mathbb R is a continuous function. Bibliography: 29 titles.
Augmentation of machine structure to improve its diagnosability
NASA Technical Reports Server (NTRS)
Hsieh, L.
1973-01-01
Two methods of augmenting the structure of a sequential machine so that it is diagnosable are presented. The checkable (checking sequences) and repeated symbol distinguishing sequences (RDS) are discussed. It was found that as few as twice the number of outputs of the given machine is sufficient for constructing a state-output augmentation with RDS. Techniques for minimizing the number of states in resolving convergences and in resolving equivalent and nonreduced cycles are developed.
Photometric binary stars in Praesepe and the search for globular cluster binaries
NASA Technical Reports Server (NTRS)
Bolte, Michael
1991-01-01
A radial velocity study of the stars which are located on a second sequence above the single-star zero-age main sequence at a given color in the color-magnitude diagram of the open cluster Praesepe, (NGC 2632) shows that 10, and possibly 11, of 17 are binary systems. Of the binary systems, five have full amplitudes for their velocity variations that are greater than 50 km/s. To the extent that they can be applied to globular clusters, these results suggests that (1) observations of 'second-sequence' stars in globular clusters would be an efficient way of finding main-sequence binary systems in globulars, and (2) current instrumentation on large telescopes is sufficient for establishing unambiguously the existence of main-sequence binary systems in nearby globular clusters.
Participants' recall and understanding of genomic research and large-scale data sharing.
Robinson, Jill Oliver; Slashinski, Melody J; Wang, Tao; Hilsenbeck, Susan G; McGuire, Amy L
2013-10-01
As genomic researchers are urged to openly share generated sequence data with other researchers, it is important to examine the utility of informed consent documents and processes, particularly as these relate to participants' engagement with and recall of the information presented to them, their objective or subjective understanding of the key elements of genomic research (e.g., data sharing), as well as how these factors influence or mediate the decisions they make. We conducted a randomized trial of three experimental informed consent documents (ICDs) with participants (n = 229) being recruited to genomic research studies; each document afforded varying control over breadth of release of genetic information. Recall and understanding, their impact on data sharing decisions, and comfort in decision making were assessed in a follow-up structured interview. Over 25% did not remember signing an ICD to participate in a genomic study, and the majority (54%) could not correctly identify with whom they had agreed to share their genomic data. However, participants felt that they understood enough to make an informed decision, and lack of recall did not impact final data sharing decisions or satisfaction with participation. These findings raise questions about the types of information participants need in order to provide valid informed consent, and whether subjective understanding and comfort with decision making are sufficient to satisfy the ethical principle of respect for persons.
Casal, J I; Langeveld, J P; Cortés, E; Schaaper, W W; van Dijk, E; Vela, C; Kamstrup, S; Meloen, R H
1995-01-01
The N-terminal domain of the major capsid protein VP2 of canine parvovirus was shown to be an excellent target for development of a synthetic peptide vaccine, but detailed information about number of epitopes, optimal length, sequence choice, and site of coupling to the carrier protein was lacking. Therefore, several overlapping peptides based on this N terminus were synthesized to establish conditions for optimal and reproducible induction of neutralizing antibodies in rabbits. The specificity and neutralizing ability of the antibody response for these peptides were determined. Within the N-terminal 23 residues of VP2, two subsites able to induce neutralizing antibodies and which overlapped by only two glycine residues at positions 10 and 11 could be discriminated. The shortest sequence sufficient for neutralization induction was nine residues. Peptides longer than 13 residues consistently induced neutralization, provided that their N termini were located between positions 1 and 11 of VP2. The orientation of the peptides at the carrier protein was also of importance, being more effective when coupled through the N terminus than through the C terminus to keyhole limpet hemocyanin. The results suggest that the presence of amino acid residues 2 to 21 (and probably 3 to 17) of VP2 in a single peptide is preferable for a synthetic peptide vaccine. PMID:7474152
Casal, J I; Langeveld, J P; Cortés, E; Schaaper, W W; van Dijk, E; Vela, C; Kamstrup, S; Meloen, R H
1995-11-01
The N-terminal domain of the major capsid protein VP2 of canine parvovirus was shown to be an excellent target for development of a synthetic peptide vaccine, but detailed information about number of epitopes, optimal length, sequence choice, and site of coupling to the carrier protein was lacking. Therefore, several overlapping peptides based on this N terminus were synthesized to establish conditions for optimal and reproducible induction of neutralizing antibodies in rabbits. The specificity and neutralizing ability of the antibody response for these peptides were determined. Within the N-terminal 23 residues of VP2, two subsites able to induce neutralizing antibodies and which overlapped by only two glycine residues at positions 10 and 11 could be discriminated. The shortest sequence sufficient for neutralization induction was nine residues. Peptides longer than 13 residues consistently induced neutralization, provided that their N termini were located between positions 1 and 11 of VP2. The orientation of the peptides at the carrier protein was also of importance, being more effective when coupled through the N terminus than through the C terminus to keyhole limpet hemocyanin. The results suggest that the presence of amino acid residues 2 to 21 (and probably 3 to 17) of VP2 in a single peptide is preferable for a synthetic peptide vaccine.
Genome-wide uniformity of human ‘open’ pre-initiation complexes
Lai, William K.M.; Pugh, B. Franklin
2017-01-01
Transcription of protein-coding and noncoding DNA occurs pervasively throughout the mammalian genome. Their sites of initiation are generally inferred from transcript 5′ ends and are thought to be either locally dispersed or focused. How these two modes of initiation relate is unclear. Here, we apply permanganate treatment and chromatin immunoprecipitation (PIP-seq) of initiation factors to identify the precise location of melted DNA separately associated with the preinitiation complex (PIC) and the adjacent paused complex (PC). This approach revealed the two known modes of transcription initiation. However, in contrast to prevailing views, they co-occurred within the same promoter region: initiation originating from a focused PIC, and broad nucleosome-linked initiation. PIP-seq allowed transcriptional orientation of Pol II to be determined, which may be useful near promoters where sufficient sense/anti-sense transcript mapping information is lacking. PIP-seq detected divergently oriented Pol II at both coding and noncoding promoters, as well as at enhancers. Their occupancy levels were not necessarily coupled in the two orientations. DNA sequence and shape analysis of initiation complex sites suggest that both sequence and shape contribute to specificity, but in a context-restricted manner. That is, initiation sites have the locally “best” initiator (INR) sequence and/or shape. These findings reveal a common core to pervasive Pol II initiation throughout the human genome. PMID:27927716
Alignment-free sequence comparison (II): theoretical power of comparison statistics.
Wan, Lin; Reinert, Gesine; Sun, Fengzhu; Waterman, Michael S
2010-11-01
Rapid methods for alignment-free sequence comparison make large-scale comparisons between sequences increasingly feasible. Here we study the power of the statistic D2, which counts the number of matching k-tuples between two sequences, as well as D2*, which uses centralized counts, and D2S, which is a self-standardized version, both from a theoretical viewpoint and numerically, providing an easy to use program. The power is assessed under two alternative hidden Markov models; the first one assumes that the two sequences share a common motif, whereas the second model is a pattern transfer model; the null model is that the two sequences are composed of independent and identically distributed letters and they are independent. Under the first alternative model, the means of the tuple counts in the individual sequences change, whereas under the second alternative model, the marginal means are the same as under the null model. Using the limit distributions of the count statistics under the null and the alternative models, we find that generally, asymptotically D2S has the largest power, followed by D2*, whereas the power of D2 can even be zero in some cases. In contrast, even for sequences of length 140,000 bp, in simulations D2* generally has the largest power. Under the first alternative model of a shared motif, the power of D2*approaches 100% when sufficiently many motifs are shared, and we recommend the use of D2* for such practical applications. Under the second alternative model of pattern transfer,the power for all three count statistics does not increase with sequence length when the sequence is sufficiently long, and hence none of the three statistics under consideration canbe recommended in such a situation. We illustrate the approach on 323 transcription factor binding motifs with length at most 10 from JASPAR CORE (October 12, 2009 version),verifying that D2* is generally more powerful than D2. The program to calculate the power of D2, D2* and D2S can be downloaded from http://meta.cmb.usc.edu/d2. Supplementary Material is available at www.liebertonline.com/cmb.
Raventós, D; Jensen, A B; Rask, M B; Casacuberta, J M; Mundy, J; San Segundo, B
1995-01-01
Transient gene expression assays in barley aleurone protoplasts were used to identify a cis-regulatory element involved in the elicitor-responsive expression of the maize PRms gene. Analysis of transcriptional fusions between PRms 5' upstream sequences and a chloramphenicol acetyltransferase reporter gene, as well as chimeric promoters containing PRms promoter fragments or repeated oligonucleotides fused to a minimal promoter, delineated a 20 bp sequence which functioned as an elicitor-response element (ERE). This sequence contains a motif (-246 AATTGACC) similar to sequences found in promoters of other pathogen-responsive genes. The analysis also indicated that an enhancing sequence(s) between -397 and -296 is required for full PRms activation by elicitors. The protein kinase inhibitor staurosporine was found to completely block the transcriptional activation induced by elicitors. These data indicate that protein phosphorylation is involved in the signal transduction pathway leading to PRms expression.
Kress, W. John; Erickson, David L.
2007-01-01
Background A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Methodology/Principal Findings Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. Conclusions/Significance A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination. PMID:17551588
Christen, Matthias; Deutsch, Samuel; Christen, Beat
2015-08-21
Recent advances in synthetic biology have resulted in an increasing demand for the de novo synthesis of large-scale DNA constructs. Any process improvement that enables fast and cost-effective streamlining of digitized genetic information into fabricable DNA sequences holds great promise to study, mine, and engineer genomes. Here, we present Genome Calligrapher, a computer-aided design web tool intended for whole genome refactoring of bacterial chromosomes for de novo DNA synthesis. By applying a neutral recoding algorithm, Genome Calligrapher optimizes GC content and removes obstructive DNA features known to interfere with the synthesis of double-stranded DNA and the higher order assembly into large DNA constructs. Subsequent bioinformatics analysis revealed that synthesis constraints are prevalent among bacterial genomes. However, a low level of codon replacement is sufficient for refactoring bacterial genomes into easy-to-synthesize DNA sequences. To test the algorithm, 168 kb of synthetic DNA comprising approximately 20 percent of the synthetic essential genome of the cell-cycle bacterium Caulobacter crescentus was streamlined and then ordered from a commercial supplier of low-cost de novo DNA synthesis. The successful assembly into eight 20 kb segments indicates that Genome Calligrapher algorithm can be efficiently used to refactor difficult-to-synthesize DNA. Genome Calligrapher is broadly applicable to recode biosynthetic pathways, DNA sequences, and whole bacterial genomes, thus offering new opportunities to use synthetic biology tools to explore the functionality of microbial diversity. The Genome Calligrapher web tool can be accessed at https://christenlab.ethz.ch/GenomeCalligrapher .
Du, Ruofei; Mercante, Donald; Fang, Zhide
2013-01-01
In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs) in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures. PMID:23516532
DOE Office of Scientific and Technical Information (OSTI.GOV)
Richardson, C.C.
1993-12-31
This project focuses on the DNA polymerase (gene 5 protein) of phage T7 for use in DNA sequence analysis. Gene 5 protein interacts with accessory proteins to acquire properties essential for DNA replication. One goal is to understand these interactions in order to modify the proteins for use in DNA sequencing. E. coli thioredoxin, binds to gene 5 protein and clamps it to a primer-template. They have analyzed the binding of gene 5 protein-thioredoxin to primer-templates and have defined the optimal conditions to form an extremely stable complex with a dNTP in the polymerase catalytic site. The spatial proximity ofmore » these components has been determined using fluorescence emission anisotropy. The T7 DNA binding protein, the gene 2.5 protein, interacts with gene 5 protein and gene 4 protein to increase processivity and primer synthesis, respectively. Mutant gene 2.5 proteins have been isolated that do not interact with T7 DNA polymerase and can not support T7 growth. The nucleotide binding site of the T7 helicase has been identified and mutations affecting the site provide information on how the hydrolysis of NTPs fuel its unidirectional translocation. The sequence, GTC, has been shown to be necessary and sufficient for recognition by the T7 primase. The T7 gene 5.5 protein interacts with the E. coli nucleoid protein, H-NS, and also overcomes the phage {lambda} rex restriction system.« less
Phylogenetic distribution of plant snoRNA families.
Patra Bhattacharya, Deblina; Canzler, Sebastian; Kehr, Stephanie; Hertel, Jana; Grosse, Ivo; Stadler, Peter F
2016-11-24
Small nucleolar RNAs (snoRNAs) are one of the most ancient families amongst non-protein-coding RNAs. They are ubiquitous in Archaea and Eukarya but absent in bacteria. Their main function is to target chemical modifications of ribosomal RNAs. They fall into two classes, box C/D snoRNAs and box H/ACA snoRNAs, which are clearly distinguished by conserved sequence motifs and the type of chemical modification that they govern. Similarly to microRNAs, snoRNAs appear in distinct families of homologs that affect homologous targets. In animals, snoRNAs and their evolution have been studied in much detail. In plants, however, their evolution has attracted comparably little attention. In order to chart the phylogenetic distribution of individual snoRNA families in plants, we applied a sophisticated approach for identifying homologs of known plant snoRNAs across the plant kingdom. In response to the relatively fast evolution of snoRNAs, information on conserved sequence boxes, target sequences, and secondary structure is combined to identify additional snoRNAs. We identified 296 families of snoRNAs in 24 species and traced their evolution throughout the plant kingdom. Many of the plant snoRNA families comprise paralogs. We also found that targets are well-conserved for most snoRNA families. The sequence conservation of snoRNAs is sufficient to establish homologies between phyla. The degree of this conservation tapers off, however, between land plants and algae. Plant snoRNAs are frequently organized in highly conserved spatial clusters. As a resource for further investigations we provide carefully curated and annotated alignments for each snoRNA family under investigation.
Updating during reading comprehension: why causality matters.
Kendeou, Panayiota; Smith, Emily R; O'Brien, Edward J
2013-05-01
The present set of 7 experiments systematically examined the effectiveness of adding causal explanations to simple refutations in reducing or eliminating the impact of outdated information on subsequent comprehension. The addition of a single causal-explanation sentence to a refutation was sufficient to eliminate any measurable disruption in comprehension caused by the outdated information (Experiment 1) but was not sufficient to eliminate its reactivation (Experiment 2). However, a 3 sentence causal-explanation addition to a refutation eliminated both any measurable disruption in comprehension (Experiment 3) and the reactivation of the outdated information (Experiment 4). A direct comparison between the 1 and 3 causal-explanation conditions provided converging evidence for these findings (Experiment 5). Furthermore, a comparison of the 3 sentence causal-explanation condition with a 3 sentence qualified-elaboration condition demonstrated that even though both conditions were sufficient to eliminate any measurable disruption in comprehension (Experiment 6), only the causal-explanation condition was sufficient to eliminate the reactivation of the outdated information (Experiment 7). These results establish a boundary condition under which outdated information will influence comprehension; they also have broader implications for both the updating process and knowledge revision in general.
Whole genome sequence of an unusual Borrelia burgdorferi sensu lato isolate
DOE Office of Scientific and Technical Information (OSTI.GOV)
Casjens, S.R.; Dunn, J.; Fraser-Liggett, C. M.
2011-03-01
Human Lyme disease is caused by a number of related Borrelia burgdorferi sensu lato species. We report here the complete genome sequence of Borrelia sp. isolate SV1 from Finland. This isolate is to date the closest known relative of B. burgdorferi sensu stricto, but it is sufficiently genetically distinct from that species that it and its close relatives warrant its candidacy for new-species status. We suggest that this isolate should be named 'Borrelia finlandensis.'
Jia, Haiwei; Zhang, Xiaojuan; Wang, Wenjun; Bai, Yuanyuan; Ling, Youguo; Cao, Cheng; Ma, Runlin Z; Zhong, Hui; Wang, Xue; Xu, Quanbin
2015-02-27
Mps1, an essential component of the mitotic checkpoint, is also an important interphase regulator and has roles in DNA damage response, cytokinesis and centrosome duplication. Mps1 predominantly resides in the cytoplasm and relocates into the nucleus at the late G2 phase. So far, the mechanism underlying the Mps1 translocation between the cytoplasm and nucleus has been unclear. In this work, a dynamic export process of Mps1 from the nucleus to cytoplasm in interphase was revealed- a process blocked by the Crm1 inhibitor, Leptomycin B, suggesting that export of Mps1 is Crm1 dependent. Consistent with this speculation, a direct association between Mps1 and Crm1 was found. Furthermore, a putative nuclear export sequence (pNES) motif at the N-terminal of Mps1 was identified by analyzing the motif of Mps1. This motif shows a high sequence similarity to the classic NES, a fusion of this motif with EGFP results in dramatic exclusion of the fusion protein from the nucleus. Additionally, Mps1 mutant loss of pNES integrity was shown by replacing leucine with alanine which produced a diffused subcellular distribution, compared to the wild type protein which resides predominantly in cytoplasm. Taken these findings together, it was concluded that the pNES sequence is sufficient for the Mps1 export from nucleus during interphase.
Kim, Kyungsub; Sim, Se-Hoon; Jeon, Che Ok; Lee, Younghoon; Lee, Kangseok
2011-02-01
RNase III, a double-stranded RNA-specific endoribonuclease, degrades bdm mRNA via cleavage at specific sites. To better understand the mechanism of cleavage site selection by RNase III, we performed a genetic screen for sequences containing mutations at the bdm RNA cleavage sites that resulted in altered mRNA stability using a transcriptional bdm'-'cat fusion construct. While most of the isolated mutants showed the increased bdm'-'cat mRNA stability that resulted from the inability of RNase III to cleave the mutated sequences, one mutant sequence (wt-L) displayed in vivo RNA stability similar to that of the wild-type sequence. In vivo and in vitro analyses of the wt-L RNA substrate showed that it was cut only once on the RNA strand to the 5'-terminus by RNase III, while the binding constant of RNase III to this mutant substrate was moderately increased. A base substitution at the uncleaved RNase III cleavage site in wt-L mutant RNA found in another mutant lowered the RNA-binding affinity by 11-fold and abolished the hydrolysis of scissile bonds by RNase III. Our results show that base substitutions at sites forming the scissile bonds are sufficient to alter RNA cleavage as well as the binding activity of RNase III. © 2010 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
Single reaction, real time RT-PCR detection of all known avian and human metapneumoviruses.
Lemaitre, E; Allée, C; Vabret, A; Eterradossi, N; Brown, P A
2018-01-01
Current molecular methods for the detection of avian and human metapneumovirus (AMPV, HMPV) are specifically targeted towards each virus species or individual subgroups of these. Here a broad range SYBR Green I real time RT-PCR was developed which amplified a highly conserved fragment of sequence in the N open reading frame. This method was sufficiently efficient and specific in detecting all MPVs. Its validation according to the NF U47-600 norm for the four AMPV subgroups estimated low limits of detection between 1000 and 10copies/μL, similar with detection levels described previously for real time RT-PCRs targeting specific subgroups. RNA viruses present a challenge for the design of durable molecular diagnostic test due to the rate of change in their genome sequences which can vary substantially in different areas and over time. The fact that the regions of sequence for primer hybridization in the described method have remained sufficiently conserved since the AMPV and HMPV diverged, should give the best chance of continued detection of current subgroups and of potential unknown or future emerging MPV strains. Copyright © 2017 Elsevier B.V. All rights reserved.
Online Learning in Higher Education: Necessary and Sufficient Conditions
ERIC Educational Resources Information Center
Lim, Cher Ping
2005-01-01
The spectacular development of information and communication technologies through the Internet has provided opportunities for students to explore the virtual world of information. In this article, the author discusses the necessary and sufficient conditions for successful online learning in educational institutions. The necessary conditions…
Integration of Temporal and Ordinal Information During Serial Interception Sequence Learning
Gobel, Eric W.; Sanchez, Daniel J.; Reber, Paul J.
2011-01-01
The expression of expert motor skills typically involves learning to perform a precisely timed sequence of movements (e.g., language production, music performance, athletic skills). Research examining incidental sequence learning has previously relied on a perceptually-cued task that gives participants exposure to repeating motor sequences but does not require timing of responses for accuracy. Using a novel perceptual-motor sequence learning task, learning a precisely timed cued sequence of motor actions is shown to occur without explicit instruction. Participants learned a repeating sequence through practice and showed sequence-specific knowledge via a performance decrement when switched to an unfamiliar sequence. In a second experiment, the integration of representation of action order and timing sequence knowledge was examined. When either action order or timing sequence information was selectively disrupted, performance was reduced to levels similar to completely novel sequences. Unlike prior sequence-learning research that has found timing information to be secondary to learning action sequences, when the task demands require accurate action and timing information, an integrated representation of these types of information is acquired. These results provide the first evidence for incidental learning of fully integrated action and timing sequence information in the absence of an independent representation of action order, and suggest that this integrative mechanism may play a material role in the acquisition of complex motor skills. PMID:21417511
Involvement of Alternative Splicing in Barley Seed Germination
Zhang, Qisen; Zhang, Xiaoqi; Wang, Songbo; Tan, Cong; Zhou, Gaofeng; Li, Chengdao
2016-01-01
Seed germination activates many new biological processes including DNA, membrane and mitochondrial repairs and requires active protein synthesis and sufficient energy supply. Alternative splicing (AS) regulates many cellular processes including cell differentiation and environmental adaptations. However, limited information is available on the regulation of seed germination at post-transcriptional levels. We have conducted RNA-sequencing experiments to dissect AS events in barley seed germination. We identified between 552 and 669 common AS transcripts in germinating barley embryos from four barley varieties (Hordeum vulgare L. Bass, Baudin, Harrington and Stirling). Alternative 3’ splicing (34%-45%), intron retention (32%-34%) and alternative 5’ splicing (16%-21%) were three major AS events in germinating embryos. The AS transcripts were predominantly mapped onto ribosome, RNA transport machineries, spliceosome, plant hormone signal transduction, glycolysis, sugar and carbon metabolism pathways. Transcripts of these genes were also very abundant in the early stage of seed germination. Correlation analysis of gene expression showed that AS hormone responsive transcripts could also be co-expressed with genes responsible for protein biosynthesis and sugar metabolisms. Our RNA-sequencing data revealed that AS could play important roles in barley seed germination. PMID:27031341
Esteve-Codina, Anna; Arpi, Oriol; Martinez-García, Maria; Pineda, Estela; Mallo, Mar; Gut, Marta; Carrato, Cristina; Rovira, Anna; Lopez, Raquel; Tortosa, Avelina; Dabad, Marc; Del Barco, Sonia; Heath, Simon; Bagué, Silvia; Ribalta, Teresa; Alameda, Francesc; de la Iglesia, Nuria
2017-01-01
The molecular classification of glioblastoma (GBM) based on gene expression might better explain outcome and response to treatment than clinical factors. Whole transcriptome sequencing using next-generation sequencing platforms is rapidly becoming accepted as a tool for measuring gene expression for both research and clinical use. Fresh frozen (FF) tissue specimens of GBM are difficult to obtain since tumor tissue obtained at surgery is often scarce and necrotic and diagnosis is prioritized over freezing. After diagnosis, leftover tissue is usually stored as formalin-fixed paraffin-embedded (FFPE) tissue. However, RNA from FFPE tissues is usually degraded, which could hamper gene expression analysis. We compared RNA-Seq data obtained from matched pairs of FF and FFPE GBM specimens. Only three FFPE out of eleven FFPE-FF matched samples yielded informative results. Several quality-control measurements showed that RNA from FFPE samples was highly degraded but maintained transcriptomic similarities to RNA from FF samples. Certain issues regarding mutation analysis and subtype prediction were detected. Nevertheless, our results suggest that RNA-Seq of FFPE GBM specimens provides reliable gene expression data that can be used in molecular studies of GBM if the RNA is sufficiently preserved. PMID:28122052
Cell type discovery using single-cell transcriptomics: implications for ontological representation.
Aevermann, Brian D; Novotny, Mark; Bakken, Trygve; Miller, Jeremy A; Diehl, Alexander D; Osumi-Sutherland, David; Lasken, Roger S; Lein, Ed S; Scheuermann, Richard H
2018-05-01
Cells are fundamental function units of multicellular organisms, with different cell types playing distinct physiological roles in the body. The recent advent of single-cell transcriptional profiling using RNA sequencing is producing 'big data', enabling the identification of novel human cell types at an unprecedented rate. In this review, we summarize recent work characterizing cell types in the human central nervous and immune systems using single-cell and single-nuclei RNA sequencing, and discuss the implications that these discoveries are having on the representation of cell types in the reference Cell Ontology (CL). We propose a method, based on random forest machine learning, for identifying sets of necessary and sufficient marker genes, which can be used to assemble consistent and reproducible cell type definitions for incorporation into the CL. The representation of defined cell type classes and their relationships in the CL using this strategy will make the cell type classes being identified by high-throughput/high-content technologies findable, accessible, interoperable and reusable (FAIR), allowing the CL to serve as a reference knowledgebase of information about the role that distinct cellular phenotypes play in human health and disease.
Keeley, Fred W; Bellingham, Catherine M; Woodhouse, Kimberley A
2002-02-28
Elastin is the major extracellular matrix protein of large arteries such as the aorta, imparting characteristics of extensibility and elastic recoil. Once laid down in tissues, polymeric elastin is not subject to turnover, but is able to sustain its mechanical resilience through thousands of millions of cycles of extension and recoil. Elastin consists of ca. 36 domains with alternating hydrophobic and cross-linking characteristics. It has been suggested that these hydrophobic domains, predominantly containing glycine, proline, leucine and valine, often occurring in tandemly repeated sequences, are responsible for the ability of elastin to align monomeric chains for covalent cross-linking. We have shown that small, recombinantly expressed polypeptides based on sequences of human elastin contain sufficient information to self-organize into fibrillar structures and promote the formation of lysine-derived cross-links. These cross-linked polypeptides can also be fabricated into membrane structures that have solubility and mechanical properties reminiscent of native insoluble elastin. Understanding the basis of the self-organizational ability of elastin-based polypeptides may provide important clues for the general design of self-assembling biomaterials.
Tertiary alphabet for the observable protein structural universe
Mackenzie, Craig O.; Zhou, Jianfu; Grigoryan, Gevorg
2016-01-01
Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence—a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure. PMID:27810958
The chemical structure of DNA sequence signals for RNA transcription
NASA Technical Reports Server (NTRS)
George, D. G.; Dayhoff, M. O.
1982-01-01
The proposed recognition sites for RNA transcription for E. coli NRA polymerase, bacteriophage T7 RNA polymerase, and eukaryotic RNA polymerase Pol II are evaluated in the light of the requirements for efficient recognition. It is shown that although there is good experimental evidence that specific nucleic acid sequence patterns are involved in transcriptional regulation in bacteria and bacterial viruses, among the sequences now available, only in the case of the promoters recognized by bacteriophage T7 polymerase does it seem likely that the pattern is sufficient. It is concluded that the eukaryotic pattern that is investigated is not restrictive enough to serve as a recognition site.
Ingham, Richard J; Battilocchio, Claudio; Fitzpatrick, Daniel E; Sliwinski, Eric; Hawkins, Joel M; Ley, Steven V
2015-01-01
Performing reactions in flow can offer major advantages over batch methods. However, laboratory flow chemistry processes are currently often limited to single steps or short sequences due to the complexity involved with operating a multi-step process. Using new modular components for downstream processing, coupled with control technologies, more advanced multi-step flow sequences can be realized. These tools are applied to the synthesis of 2-aminoadamantane-2-carboxylic acid. A system comprising three chemistry steps and three workup steps was developed, having sufficient autonomy and self-regulation to be managed by a single operator. PMID:25377747
Arnold, Frances H.; Shao, Zhixin; Zhao, Huimin; Giver, Lorraine J.
2002-01-01
A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Enhanced sequencing coverage with digital droplet multiple displacement amplification
Sidore, Angus M.; Lan, Freeman; Lim, Shaun W.; Abate, Adam R.
2016-01-01
Sequencing small quantities of DNA is important for applications ranging from the assembly of uncultivable microbial genomes to the identification of cancer-associated mutations. To obtain sufficient quantities of DNA for sequencing, the small amount of starting material must be amplified significantly. However, existing methods often yield errors or non-uniform coverage, reducing sequencing data quality. Here, we describe digital droplet multiple displacement amplification, a method that enables massive amplification of low-input material while maintaining sequence accuracy and uniformity. The low-input material is compartmentalized as single molecules in millions of picoliter droplets. Because the molecules are isolated in compartments, they amplify to saturation without competing for resources; this yields uniform representation of all sequences in the final product and, in turn, enhances the quality of the sequence data. We demonstrate the ability to uniformly amplify the genomes of single Escherichia coli cells, comprising just 4.7 fg of starting DNA, and obtain sequencing coverage distributions that rival that of unamplified material. Digital droplet multiple displacement amplification provides a simple and effective method for amplifying minute amounts of DNA for accurate and uniform sequencing. PMID:26704978
Pollen, Alex A; Nowakowski, Tomasz J; Shuga, Joe; Wang, Xiaohui; Leyrat, Anne A; Lui, Jan H; Li, Nianzhen; Szpankowski, Lukasz; Fowler, Brian; Chen, Peilin; Ramalingam, Naveen; Sun, Gang; Thu, Myo; Norris, Michael; Lebofsky, Ronald; Toppani, Dominique; Kemp, Darnell W; Wong, Michael; Clerkson, Barry; Jones, Brittnee N; Wu, Shiquan; Knutsson, Lawrence; Alvarado, Beatriz; Wang, Jing; Weaver, Lesley S; May, Andrew P; Jones, Robert C; Unger, Marc A; Kriegstein, Arnold R; West, Jay A A
2014-10-01
Large-scale surveys of single-cell gene expression have the potential to reveal rare cell populations and lineage relationships but require efficient methods for cell capture and mRNA sequencing. Although cellular barcoding strategies allow parallel sequencing of single cells at ultra-low depths, the limitations of shallow sequencing have not been investigated directly. By capturing 301 single cells from 11 populations using microfluidics and analyzing single-cell transcriptomes across downsampled sequencing depths, we demonstrate that shallow single-cell mRNA sequencing (~50,000 reads per cell) is sufficient for unbiased cell-type classification and biomarker identification. In the developing cortex, we identify diverse cell types, including multiple progenitor and neuronal subtypes, and we identify EGR1 and FOS as previously unreported candidate targets of Notch signaling in human but not mouse radial glia. Our strategy establishes an efficient method for unbiased analysis and comparison of cell populations from heterogeneous tissue by microfluidic single-cell capture and low-coverage sequencing of many cells.
Ethical issues in pediatric genetic testing and screening.
Botkin, Jeffrey R
2016-12-01
Developments in genetic test technologies enable a detailed analysis of the genomes of individuals across the range of human development from embryos to adults with increased precision and lower cost. These powerful technologies raise a number of ethical issues in pediatrics, primarily because of the frequent lack of clinical utility of genetic information, the generation of secondary results and questions over the proper scope of parental authority for testing. Several professional organizations in the fields of genetics and pediatrics have published new guidance on the ethical, legal, and policy issues relevant to genetic testing in children. The roles of predictive testing for adult-onset conditions, the management of secondary findings and the role of informed consent for newborn screening remain controversial. However, research and experience are not demonstrating serious adverse psychosocial impacts from genetic testing and screening in children. The use of these technologies is expanding with the notion that the personal utility of test results, rather than clinical utility, may be sufficient to justify testing. The use of microarray and genome sequencing technologies is expanding in the care of children. More deference to parental decision-making is evolving in contexts wherein information and counseling can be made readily available.
A homozygous mutation in the stem II domain of RNU4ATAC causes typical Roifman syndrome.
Dinur Schejter, Yael; Ovadia, Adi; Alexandrova, Roumiana; Thiruvahindrapuram, Bhooma; Pereira, Sergio L; Manson, David E; Vincent, Ajoy; Merico, Daniele; Roifman, Chaim M
2017-01-01
Roifman syndrome (OMIM# 616651) is a complex syndrome encompassing skeletal dysplasia, immunodeficiency, retinal dystrophy and developmental delay, and is caused by compound heterozygous mutations involving the Stem II region and one of the other domains of the RNU4ATAC gene. This small nuclear RNA gene is essential for minor intron splicing. The Canadian Centre for Primary Immunodeficiency Registry and Repository were used to derive patient information as well as tissues. Utilising RNA sequencing methodologies, we analysed samples from patients with Roifman syndrome and assessed intron retention. We demonstrate that a homozygous mutation in Stem II is sufficient to cause the full spectrum of features associated with typical Roifman syndrome. Further, we demonstrate the same pattern of aberration in minor intron retention as found in cases with compound heterozygous mutations.
Kohoutek, Tobias K.; Mautz, Rainer; Wegner, Jan D.
2013-01-01
We present a novel approach for autonomous location estimation and navigation in indoor environments using range images and prior scene knowledge from a GIS database (CityGML). What makes this task challenging is the arbitrary relative spatial relation between GIS and Time-of-Flight (ToF) range camera further complicated by a markerless configuration. We propose to estimate the camera's pose solely based on matching of GIS objects and their detected location in image sequences. We develop a coarse-to-fine matching strategy that is able to match point clouds without any initial parameters. Experiments with a state-of-the-art ToF point cloud show that our proposed method delivers an absolute camera position with decimeter accuracy, which is sufficient for many real-world applications (e.g., collision avoidance). PMID:23435055
Smith, Scott A; Kalcic, Christine L; Safran, Kyle A; Stemmer, Paul M; Dantus, Marcos; Reid, Gavin E
2010-12-01
To develop an improved understanding of the regulatory role that post-translational modifications (PTMs) involving phosphorylation play in the maintenance of normal cellular function, tandem mass spectrometry (MS/MS) strategies coupled with ion activation techniques such as collision-induced dissociation (CID) and electron-transfer dissociation (ETD) are typically employed to identify the presence and site-specific locations of the phosphate moieties within a given phosphoprotein of interest. However, the ability of these techniques to obtain sufficient structural information for unambiguous phosphopeptide identification and characterization is highly dependent on the ion activation method employed and the properties of the precursor ion that is subjected to dissociation. Herein, we describe the application of a recently developed alternative ion activation technique for phosphopeptide analysis, termed femtosecond laser-induced ionization/dissociation (fs-LID). In contrast to CID and ETD, fs-LID is shown to be particularly suited to the analysis of singly protonated phosphopeptide ions, yielding a wide range of product ions including a, b, c, x, y, and z sequence ions, as well as ions that are potentially diagnostic of the positions of phosphorylation (e.g., 'a(n)+1-98'). Importantly, the lack of phosphate moiety losses or phosphate group 'scrambling' provides unambiguous information for sequence identification and phosphorylation site characterization. Therefore, fs-LID-MS/MS can serve as a complementary technique to established methodologies for phosphoproteomic analysis. Copyright © 2010. Published by Elsevier Inc.
Greenberg, Jay R.; Perry, Robert P.
1971-01-01
The relationship of the DNA sequences from which polyribosomal messenger RNA (mRNA) and heterogeneous nuclear RNA (NRNA) of mouse L cells are transcribed was investigated by means of hybridization kinetics and thermal denaturation of the hybrids. Hybridization was performed in formamide solutions at DNA excess. Under these conditions most of the hybridizing mRNA and NRNA react at values of Dot (DNA concentration multiplied by time) expected for RNA transcribed from the nonrepeated or rarely repeated fraction of the genome. However, a fraction of both mRNA and NRNA hybridize at values of Dot about 10,000 times lower, and therefore must be transcribed from highly redundant DNA sequences. The fraction of NRNA hybridizing to highly repeated sequences is about 1.7 times greater than the corresponding fraction of mRNA. The hybrids formed by the rapidly reacting fractions of both NRNA and mRNA melt over a narrow temperature range with a midpoint about 11°C below that of native L cell DNA. This indicates that these hybrids consist of partially complementary sequences with approximately 11% mismatching of bases. Hybrids formed by the slowly reacting fraction of NRNA melt within 4°–6°C of native DNA, indicating very little, if any, mismatching of bases. Hybrids of the slowly reacting components of mRNA, formed under conditions of sufficiently low RNA input, have a high thermal stability, similar to that observed for hybrids of the slowly reacting NRNA component. However, when higher inputs of mRNA are used, hybrids are formed which have a strikingly lower thermal stability. This observation can be explained by assuming that there is sufficient similarity among the relatively rare DNA sequences coding for mRNA so that under hybridization conditions, in which these DNA sequences are not truly in excess, reversible hybrids exhibiting a considerable amount of mispairing are formed. The fact that a comparable phenomenon has not been observed for NRNA may mean that there is less similarity among the relatively rare DNA sequences coding for NRNA than there is among the rare sequences coding for mRNA. PMID:4999767
Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M G; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes
2015-08-19
Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of <100 viral copies. Furthermore, distantly related proviral sequences may be enriched by orders of magnitude, enabling discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material.
Porter, Joanne L; Sabatini, Selina; Manning, Jack; Tavanti, Michele; Galman, James L; Turner, Nicholas J; Flitsch, Sabine L
2018-06-01
Cytochrome P450 monooxygenases are able to catalyse a range of synthetically challenging reactions ranging from hydroxylation and demethylation to sulfoxidation and epoxidation. As such they have great potential for biocatalytic applications but are underutilised due to often-poor expression, stability and solubility in recombinant bacterial hosts. The use of self-sufficient P450 s with fused haem and reductase domains has already contributed heavily to improving catalytic efficiency and simplifying an otherwise more complex multi-component system of P450 and redox partners. Herein, we present a new addition to the class VII family with the cloning, sequencing and characterisation of the self-sufficient CYP116B62 Hal1 from Halomonas sp. NCIMB 172, the genome of which has not yet been sequenced. Hal1 exhibits high levels of expression in a recombinant E. coli host and can be utilised from cell lysate or used in purified form. Hal1 favours NADPH as electron donor and displays a diverse range of activities including hydroxylation, demethylation and sulfoxidation. These properties make Hal1 suitable for future biocatalytic applications or as a template for optimisation through engineering. Copyright © 2018 Elsevier Inc. All rights reserved.
Struniawski, R; Szpechcinski, A; Poplawska, B; Skronski, M; Chorostowska-Wynimko, J
2013-01-01
The dried blood spot (DBS) specimens have been successfully employed for the large-scale diagnostics of α1-antitrypsin (AAT) deficiency as an easy to collect and transport alternative to plasma/serum. In the present study we propose a fast, efficient, and cost effective protocol of DNA extraction from dried blood spot (DBS) samples that provides sufficient quantity and quality of DNA and effectively eliminates any natural PCR inhibitors, allowing for successful AAT genotyping by real-time PCR and direct sequencing. DNA extracted from 84 DBS samples from chronic obstructive pulmonary disease patients was genotyped for AAT deficiency variants by real-time PCR. The results of DBS AAT genotyping were validated by serum IEF phenotyping and AAT concentration measurement. The proposed protocol allowed successful DNA extraction from all analyzed DBS samples. Both quantity and quality of DNA were sufficient for further real-time PCR and, if necessary, for genetic sequence analysis. A 100% concordance between AAT DBS genotypes and serum phenotypes in positive detection of two major deficiency S- and Z- alleles was achieved. Both assays, DBS AAT genotyping by real-time PCR and serum AAT phenotyping by IEF, positively identified PI*S and PI*Z allele in 8 out of the 84 (9.5%) and 16 out of 84 (19.0%) patients, respectively. In conclusion, the proposed protocol noticeably reduces the costs and the hand-on-time of DBS samples preparation providing genomic DNA of sufficient quantity and quality for further real-time PCR or genetic sequence analysis. Consequently, it is ideally suited for large-scale AAT deficiency screening programs and should be method of choice.
Learning multiple variable-speed sequences in striatum via cortical tutoring.
Murray, James M; Escola, G Sean
2017-05-08
Sparse, sequential patterns of neural activity have been observed in numerous brain areas during timekeeping and motor sequence tasks. Inspired by such observations, we construct a model of the striatum, an all-inhibitory circuit where sequential activity patterns are prominent, addressing the following key challenges: (i) obtaining control over temporal rescaling of the sequence speed, with the ability to generalize to new speeds; (ii) facilitating flexible expression of distinct sequences via selective activation, concatenation, and recycling of specific subsequences; and (iii) enabling the biologically plausible learning of sequences, consistent with the decoupling of learning and execution suggested by lesion studies showing that cortical circuits are necessary for learning, but that subcortical circuits are sufficient to drive learned behaviors. The same mechanisms that we describe can also be applied to circuits with both excitatory and inhibitory populations, and hence may underlie general features of sequential neural activity pattern generation in the brain.
Effects of informed consent for individual genome sequencing on relevant knowledge.
Kaphingst, K A; Facio, F M; Cheng, M-R; Brooks, S; Eidem, H; Linn, A; Biesecker, B B; Biesecker, L G
2012-11-01
Increasing availability of individual genomic information suggests that patients will need knowledge about genome sequencing to make informed decisions, but prior research is limited. In this study, we examined genome sequencing knowledge before and after informed consent among 311 participants enrolled in the ClinSeq™ sequencing study. An exploratory factor analysis of knowledge items yielded two factors (sequencing limitations knowledge; sequencing benefits knowledge). In multivariable analysis, high pre-consent sequencing limitations knowledge scores were significantly related to education [odds ratio (OR): 8.7, 95% confidence interval (CI): 2.45-31.10 for post-graduate education, and OR: 3.9; 95% CI: 1.05, 14.61 for college degree compared with less than college degree] and race/ethnicity (OR: 2.4, 95% CI: 1.09, 5.38 for non-Hispanic Whites compared with other racial/ethnic groups). Mean values increased significantly between pre- and post-consent for the sequencing limitations knowledge subscale (6.9-7.7, p < 0.0001) and sequencing benefits knowledge subscale (7.0-7.5, p < 0.0001); increase in knowledge did not differ by sociodemographic characteristics. This study highlights gaps in genome sequencing knowledge and underscores the need to target educational efforts toward participants with less education or from minority racial/ethnic groups. The informed consent process improved genome sequencing knowledge. Future studies could examine how genome sequencing knowledge influences informed decision making. © 2012 John Wiley & Sons A/S.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-02-23
... DEPARTMENT OF HOUSING AND URBAN DEVELOPMENT [Docket No. FR-5603-N-16] Notice of Submission of Proposed Information Collection to OMB Application for the Resident Opportunities and Self Sufficiency... Program and Family Self-Sufficiency for Public Housing. Eligible applicants are PHAs, Tribes/TDHEs, Non...
Assessment of genetic diversity of Bermudagrass (Cynodon dactylon) using ISSR markers.
Farsani, Tayebeh Mohammadi; Etemadi, Nematollah; Sayed-Tabatabaei, Badraldin Ebrahim; Talebi, Majid
2012-01-01
Bermudagrass (Cynodon spp.) is a major turfgrass for home lawns, public parks, golf courses and sport fields and is known to have originated in the Middle East. Morphological and physiological characteristics are not sufficient to differentiate some bermudagrass genotypes because the differences between them are often subtle and subjected to environmental influences. In this study, twenty seven bermudagrass accessions and introductions, mostly from different parts of Iran, were assayed by inter-simple sequence repeat (ISSR) markers to differentiate and explore their genetic relationships. Fourteen ISSR primers amplified 389 fragments of which 313 (80.5%) were polymorphic. The average polymorphism information content (PIC) was 0.328, which shows that the majority of primers are informative. Cluster analysis using the un-weighted paired group method with arithmetic average (UPGMA) method and Jaccard's similarity coefficient (r = 0.828) grouped the accessions into six main clusters according to some degree to geographical origin, their chromosome number and some morphological characteristics. It can be concluded that there exists a wide genetic base of bermudograss in Iran and that ISSR markers are effective in determining genetic diversity and relationships among them.
Assessment of Genetic Diversity of Bermudagrass (Cynodon dactylon) Using ISSR Markers
Farsani, Tayebeh Mohammadi; Etemadi, Nematollah; Sayed-Tabatabaei, Badraldin Ebrahim; Talebi, Majid
2012-01-01
Bermudagrass (Cynodon spp.) is a major turfgrass for home lawns, public parks, golf courses and sport fields and is known to have originated in the Middle East. Morphological and physiological characteristics are not sufficient to differentiate some bermudagrass genotypes because the differences between them are often subtle and subjected to environmental influences. In this study, twenty seven bermudagrass accessions and introductions, mostly from different parts of Iran, were assayed by inter-simple sequence repeat (ISSR) markers to differentiate and explore their genetic relationships. Fourteen ISSR primers amplified 389 fragments of which 313 (80.5%) were polymorphic. The average polymorphism information content (PIC) was 0.328, which shows that the majority of primers are informative. Cluster analysis using the un-weighted paired group method with arithmetic average (UPGMA) method and Jaccard’s similarity coefficient (r = 0.828) grouped the accessions into six main clusters according to some degree to geographical origin, their chromosome number and some morphological characteristics. It can be concluded that there exists a wide genetic base of bermudograss in Iran and that ISSR markers are effective in determining genetic diversity and relationships among them. PMID:22312259
Strategic approaches to unraveling genetic causes of cardiovascular diseases
USDA-ARS?s Scientific Manuscript database
DNA sequence variants are major components of the "causal field" for virtually all medical phenotypes, whether single gene familial disorders or complex traits without a clear familial aggregation. The causal variants in single gene disorders are necessary and sufficient to impart large effects. In ...
A general method for the purification of restriction enzymes.
Greene, P J; Heyneker, H L; Bolivar, F; Rodriguez, R L; Betlach, M C; Covarrubias, A A; Backman, K; Russel, D J; Tait, R; Boyer, H W
1978-01-01
An abbreviated procedure has been developed for the purification of restriction endonucleases. This procedure uses chromatography on phosphocellulose and hydroxylapatite and results in enzymes of sufficient purity to permit their use in the sequencing, molecular cloning, and physical mapping of DNA. PMID:673857
Stellar evolution of high mass based on the Ledoux criterion for convection
NASA Technical Reports Server (NTRS)
Stothers, R.; Chin, C.
1972-01-01
Theoretical evolutionary sequences of models for stars of 15 and 30 solar masses were computed from the zero-age main sequence to the end of core helium burning. During the earliest stages of core helium depletion, the envelope rapidly expands into the red-supergiant configuration. At 15 solar mass, a blue loop on the H-R diagram ensues if the initial metals abundance, initial helium abundance, or C-12 + alpha particle reaction rate is sufficiently large, or if the 3-alpha reaction rate is sufficiently small. These quantities affect the opacity of the base of the outer convection zone, the mass of the core, and the thermal properties of the core. The blue loop occurs abruptly and fully developed when the critical value of any of these quantities is exceeded, and the effective temperature range and fraction of the lifetime of core helium burning during the slow phase of the blue loop vary surprisingly little. At 30 solar mass no blue loop occurs for any reasonable set of input parameters.
Formation of high-field magnetic white dwarfs from common envelopes
Nordhaus, Jason; Wellons, Sarah; Spiegel, David S.; Metzger, Brian D.; Blackman, Eric G.
2011-01-01
The origin of highly magnetized white dwarfs has remained a mystery since their initial discovery. Recent observations indicate that the formation of high-field magnetic white dwarfs is intimately related to strong binary interactions during post-main-sequence phases of stellar evolution. If a low-mass companion, such as a planet, brown dwarf, or low-mass star, is engulfed by a post-main-sequence giant, gravitational torques in the envelope of the giant lead to a reduction of the companion’s orbit. Sufficiently low-mass companions in-spiral until they are shredded by the strong gravitational tides near the white dwarf core. Subsequent formation of a super-Eddington accretion disk from the disrupted companion inside a common envelope can dramatically amplify magnetic fields via a dynamo. Here, we show that these disk-generated fields are sufficiently strong to explain the observed range of magnetic field strengths for isolated, high-field magnetic white dwarfs. A higher-mass binary analogue may also contribute to the origin of magnetar fields. PMID:21300910
Rubio-Moraga, Angela; Candel-Perez, David; Lucas-Borja, Manuel E; Tiscar, Pedro A; Viñegla, Benjamin; Linares, Juan C; Gómez-Gómez, Lourdes; Ahrazem, Oussama
2012-01-01
Eight Pinus nigra Arn. populations from Southern Spain and Northern Morocco were examined using inter-simple sequence repeat markers to characterize the genetic variability amongst populations. Pair-wise population genetic distance ranged from 0.031 to 0.283, with a mean of 0.150 between populations. The highest inter-population average distance was between PaCU from Cuenca and YeCA from Cazorla, while the lowest distance was between TaMO from Morocco and MA Sierra Mágina populations. Analysis of molecular variance (AMOVA) and Nei's genetic diversity analyses revealed higher genetic variation within the same population than among different populations. Genetic differentiation (Gst) was 0.233. Cuenca showed the highest Nei's genetic diversity followed by the Moroccan region, Sierra Mágina, and Cazorla region. However, clustering of populations was not in accordance with their geographical locations. Principal component analysis showed the presence of two major groups-Group 1 contained all populations from Cuenca while Group 2 contained populations from Cazorla, Sierra Mágina and Morocco-while Bayesian analysis revealed the presence of three clusters. The low genetic diversity observed in PaCU and YeCA is probably a consequence of inappropriate management since no estimation of genetic variability was performed before the silvicultural treatments. Data indicates that the inter-simple sequence repeat (ISSR) method is sufficiently informative and powerful to assess genetic variability among populations of P. nigra.
Rubio-Moraga, Angela; Candel-Perez, David; Lucas-Borja, Manuel E.; Tiscar, Pedro A.; Viñegla, Benjamin; Linares, Juan C.; Gómez-Gómez, Lourdes; Ahrazem, Oussama
2012-01-01
Eight Pinus nigra Arn. populations from Southern Spain and Northern Morocco were examined using inter-simple sequence repeat markers to characterize the genetic variability amongst populations. Pair-wise population genetic distance ranged from 0.031 to 0.283, with a mean of 0.150 between populations. The highest inter-population average distance was between PaCU from Cuenca and YeCA from Cazorla, while the lowest distance was between TaMO from Morocco and MA Sierra Mágina populations. Analysis of molecular variance (AMOVA) and Nei’s genetic diversity analyses revealed higher genetic variation within the same population than among different populations. Genetic differentiation (Gst) was 0.233. Cuenca showed the highest Nei’s genetic diversity followed by the Moroccan region, Sierra Mágina, and Cazorla region. However, clustering of populations was not in accordance with their geographical locations. Principal component analysis showed the presence of two major groups—Group 1 contained all populations from Cuenca while Group 2 contained populations from Cazorla, Sierra Mágina and Morocco—while Bayesian analysis revealed the presence of three clusters. The low genetic diversity observed in PaCU and YeCA is probably a consequence of inappropriate management since no estimation of genetic variability was performed before the silvicultural treatments. Data indicates that the inter-simple sequence repeat (ISSR) method is sufficiently informative and powerful to assess genetic variability among populations of P. nigra. PMID:22754321
Assessment of an automated capillary system for Plasmodium vivax microsatellite genotyping.
Manrique, Paulo; Hoshi, Mari; Fasabi, Manuel; Nolasco, Oscar; Yori, Pablo; Calderón, Martiza; Gilman, Robert H; Kosek, Margaret N; Vinetz, Joseph M; Gamboa, Dionicia
2015-08-21
Several platforms have been used to generate the primary data for microsatellite analysis of malaria parasite genotypes. Each has relative advantages but share a limitation of being time- and cost-intensive. A commercially available automated capillary gel cartridge system was assessed in the microsatellite analysis of Plasmodium vivax diversity in the Peruvian Amazon. The reproducibility and accuracy of a commercially-available automated capillary system, QIAxcel, was assessed using a sequenced PCR product of 227 base pairs. This product was measured 42 times, then 27 P. vivax samples from Peruvian Amazon subjects were analyzed with this instrument using five informative microsatellites. Results from the QIAxcel system were compared with a Sanger-type sequencing machine, the ABI PRISM(®) 3100 Genetic Analyzer. Significant differences were seen between the sequenced amplicons and the results from the QIAxcel instrument. Different runs, plates and cartridges yielded significantly different results. Additionally, allele size decreased with each run by 0.045, or 1 bp, every three plates. QIAxcel and ABI PRISM systems differed in giving different values than those obtained by ABI PRISM, and too many (i.e. inaccurate) alleles per locus were also seen with the automated instrument. While P. vivax diversity could generally be estimated using an automated capillary gel cartridge system, the data demonstrate that this system is not sufficiently precise for reliably identifying parasite strains via microsatellite analysis. This conclusion reached after systematic analysis was due both to inadequate precision and poor reproducibility in measuring PCR product size.
Jacob, Jacob H; Hussein, Emad I; Shakhatreh, Muhamad Ali K; Cornelison, Christopher T
2017-10-01
Amplicon sequencing using next-generation technology (bTEFAP ® ) has been utilized in describing the diversity of Dead Sea microbiota. The investigated area is a well-known salt lake in the western part of Jordan found in the lowest geographical location in the world (more than 420 m below sea level) and characterized by extreme salinity (approximately, 34%) in addition to other extreme conditions (low pH, unique ionic composition different from sea water). DNA was extracted from Dead Sea water. A total of 314,310 small subunit RNA (SSU rRNA) sequences were parsed, and 288,452 sequences were then clustered. For alpha diversity analysis, sample was rarefied to 3,000 sequences. The Shannon-Wiener index curve plot reached a plateau at approximately 3,000 sequences indicating that sequencing depth was sufficient to capture the full scope of microbial diversity. Archaea was found to be dominating the sequences (52%), whereas Bacteria constitute 45% of the sequences. Altogether, prokaryotic sequences (which constitute 97% of all sequences) were found to predominate. The findings expand on previous studies by using high-throughput amplicon sequencing to describe the microbial community in an environment which in recent years has been shown to hide some interesting diversity. © 2017 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.
Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing.
Vega-Arreguín, Julio C; Ibarra-Laclette, Enrique; Jiménez-Moraila, Beatriz; Martínez, Octavio; Vielle-Calzada, Jean Philippe; Herrera-Estrella, Luis; Herrera-Estrella, Alfredo
2009-07-06
In-depth sequencing analysis has not been able to determine the overall complexity of transcriptional activity of a plant organ or tissue sample. In some cases, deep parallel sequencing of Expressed Sequence Tags (ESTs), although not yet optimized for the sequencing of cDNAs, has represented an efficient procedure for validating gene prediction and estimating overall gene coverage. This approach could be very valuable for complex plant genomes. In addition, little emphasis has been given to efforts aiming at an estimation of the overall transcriptional universe found in a multicellular organism at a specific developmental stage. To explore, in depth, the transcriptional diversity in an ancient maize landrace, we developed a protocol to optimize the sequencing of cDNAs and performed 4 consecutive GS20-454 pyrosequencing runs of a cDNA library obtained from 2 week-old Palomero Toluqueño maize plants. The protocol reported here allowed obtaining over 90% of informative sequences. These GS20-454 runs generated over 1.5 Million reads, representing the largest amount of sequences reported from a single plant cDNA library. A collection of 367,391 quality-filtered reads (30.09 Mb) from a single run was sufficient to identify transcripts corresponding to 34% of public maize ESTs databases; total sequences generated after 4 filtered runs increased this coverage to 50%. Comparisons of all 1.5 Million reads to the Maize Assembled Genomic Islands (MAGIs) provided evidence for the transcriptional activity of 11% of MAGIs. We estimate that 5.67% (86,069 sequences) do not align with public ESTs or annotated genes, potentially representing new maize transcripts. Following the assembly of 74.4% of the reads in 65,493 contigs, real-time PCR of selected genes confirmed a predicted correlation between the abundance of GS20-454 sequences and corresponding levels of gene expression. A protocol was developed that significantly increases the number, length and quality of cDNA reads using massive 454 parallel sequencing. We show that recurrent 454 pyrosequencing of a single cDNA sample is necessary to attain a thorough representation of the transcriptional universe present in maize, that can also be used to estimate transcript abundance of specific genes. This data suggests that the molecular and functional diversity contained in the vast native landraces remains to be explored, and that large-scale transcriptional sequencing of a presumed ancestor of the modern maize varieties represents a valuable approach to characterize the functional diversity of maize for future agricultural and evolutionary studies.
Transposon facilitated DNA sequencing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berg, D.E.; Berg, C.M.; Huang, H.V.
1990-01-01
The purpose of this research is to investigate and develop methods that exploit the power of bacterial transposable elements for large scale DNA sequencing: Our premise is that the use of transposons to put primer binding sites randomly in target DNAs should provide access to all portions of large DNA fragments, without the inefficiencies of methods involving random subcloning and attendant repetitive sequencing, or of sequential synthesis of many oligonucleotide primers that are used to match systematically along a DNA molecule. Two unrelated bacterial transposons, Tn5 and {gamma}{delta}, are being used because they have both proven useful for molecular analyses,more » and because they differ sufficiently in mechanism and specificity of transposition to merit parallel development.« less
Chandra, Amaresh; Jain, Radha; Solomon, Sushil; Shrivastava, Shiksha; Roy, Ajoy K
2013-02-04
Sugarcane is an important cash crop, providing 70% of the global raw sugar as well as raw material for biofuel production. Genetic analysis is hindered in sugarcane because of its large and complex polyploid genome and lack of sufficiently informative gene-tagged markers. Modern genomics has produced large amount of ESTs, which can be exploited to develop molecular markers based on comparative analysis with EST datasets of related crops and whole rice genome sequence, and accentuate their cross-technical functionality in orphan crops like tropical grasses. Utilising 246,180 Saccharum officinarum EST sequences vis-à-vis its comparative analysis with ESTs of sorghum and barley and the whole rice genome sequence, we have developed 3425 novel gene-tagged markers - namely, conserved-intron scanning primers (CISP) - using the web program GeMprospector. Rice orthologue annotation results indicated homology of 1096 sequences with expressed proteins, 491 with hypothetical proteins. The remaining 1838 were miscellaneous in nature. A total of 367 primer-pairs were tested in diverse panel of samples. The data indicate amplification of 41% polymorphic bands leading to 0.52 PIC and 3.50 MI with a set of sugarcane varieties and Saccharum species. In addition, a moderate technical functionality of a set of such markers with orphan tropical grasses (22%) and fodder cum cereal oat (33%) is observed. Developed gene-tagged CISP markers exhibited considerable technical functionality with varieties of sugarcane and unexplored species of tropical grasses. These markers would thus be particularly useful in identifying the economical traits in sugarcane and developing conservation strategies for orphan tropical grasses.
MOCCS: Clarifying DNA-binding motif ambiguity using ChIP-Seq data.
Ozaki, Haruka; Iwasaki, Wataru
2016-08-01
As a key mechanism of gene regulation, transcription factors (TFs) bind to DNA by recognizing specific short sequence patterns that are called DNA-binding motifs. A single TF can accept ambiguity within its DNA-binding motifs, which comprise both canonical (typical) and non-canonical motifs. Clarification of such DNA-binding motif ambiguity is crucial for revealing gene regulatory networks and evaluating mutations in cis-regulatory elements. Although chromatin immunoprecipitation sequencing (ChIP-seq) now provides abundant data on the genomic sequences to which a given TF binds, existing motif discovery methods are unable to directly answer whether a given TF can bind to a specific DNA-binding motif. Here, we report a method for clarifying the DNA-binding motif ambiguity, MOCCS. Given ChIP-Seq data of any TF, MOCCS comprehensively analyzes and describes every k-mer to which that TF binds. Analysis of simulated datasets revealed that MOCCS is applicable to various ChIP-Seq datasets, requiring only a few minutes per dataset. Application to the ENCODE ChIP-Seq datasets proved that MOCCS directly evaluates whether a given TF binds to each DNA-binding motif, even if known position weight matrix models do not provide sufficient information on DNA-binding motif ambiguity. Furthermore, users are not required to provide numerous parameters or background genomic sequence models that are typically unavailable. MOCCS is implemented in Perl and R and is freely available via https://github.com/yuifu/moccs. By complementing existing motif-discovery software, MOCCS will contribute to the basic understanding of how the genome controls diverse cellular processes via DNA-protein interactions. Copyright © 2016 Elsevier Ltd. All rights reserved.
Yamada, Kazunori D.; Tomii, Kentaro; Katoh, Kazutaka
2016-01-01
Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones. Results: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use. Availability and Implementation: http://mafft.cbrc.jp/alignment/software/ Contact: katoh@ifrec.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27378296
Zhang, Na; Zhang, Lei; Yang, Qi; Pei, Anqi; Tong, Xiaoxin; Chung, Yiu-Cho; Liu, Xin
2017-06-01
To implement a fast (~15min) MRI protocol for carotid plaque screening using 3D multi-contrast MRI sequences without contrast agent on a 3Tesla MRI scanner. 7 healthy volunteers and 25 patients with clinically confirmed transient ischemic attack or suspected cerebrovascular ischemia were included in this study. The proposed protocol, including 3D T1-weighted and T2-weighted SPACE (variable-flip-angle 3D turbo spin echo), and T1-weighted magnetization prepared rapid acquisition gradient echo (MPRAGE) was performed first and was followed by 2D T1-weighted and T2-weighted turbo spin echo, and post-contrast T1-weighted SPACE sequences. Image quality, number of plaques, and vessel wall thicknesses measured at the intersection of the plaques were evaluated and compared between sequences. Average examination time of the proposed protocol was 14.6min. The average image quality scores of 3D T1-weighted, T2-weighted SPACE, and T1-weighted magnetization prepared rapid acquisition gradient echo were 3.69, 3.75, and 3.48, respectively. There was no significant difference in detecting the number of plaques and vulnerable plaques using pre-contrast 3D images with or without post-contrast T1-weighted SPACE. The 3D SPACE and 2D turbo spin echo sequences had excellent agreement (R=0.96 for T1-weighted and 0.98 for T2-weighted, p<0.001) regarding vessel wall thickness measurements. The proposed protocol demonstrated the feasibility of attaining carotid plaque screening within a 15-minute scan, which provided sufficient anatomical coverage and critical diagnostic information. This protocol offers the potential for rapid and reliable screening for carotid plaques without contrast agent. Copyright © 2016. Published by Elsevier Inc.
Henry, Kelli F.; Kawashima, Tomokazu; Goldberg, Robert B.
2015-03-22
Little is known about the molecular mechanisms by which the embryo proper and suspensor of plant embryos activate specific gene sets shortly after fertilization. We analyzed the upstream region of the Scarlet Runner Bean ( Phaseolus coccineus) G564 gene in order to understand how genes are activated specifically in the suspensor during early embryo development. Previously, we showed that a 54-bp fragment of the G564 upstream region is sufficient for suspensor transcription and contains at least three required cis-regulatory sequences, including the 10-bp motif (5'-GAAAAGCGAA-3'), the 10 bp-like motif (5'-GAAAAACGAA-3'), and Region 2 motif (partial sequence 5'-TTGGT-3'). Here, we usemore » site-directed mutagenesis experiments in transgenic tobacco globularstage embryos to identify two additional cis-regulatory elements within the 54-bp cis-regulatory module that are required for G564 suspensor transcription: the Fifth motif (5'-GAGTTA-3') and a third 10-bp-related sequence (5'-GAAAACCACA-3'). Further deletion of the 54-bp fragment revealed that a 47-bp fragment containing the five motifs (the 10-bp, 10-bp-like, 10-bp-related, Region 2 and Fifth motifs) is sufficient for suspensor transcription, and represents a cis-regulatory module. A consensus sequence for each type of motif was determined by comparing motif sequences shown to activate suspensor transcription. Phylogenetic analyses suggest that the regulation of G564 is evolutionarily conserved. Lastly, a homologous cis-regulatory module was found upstream of the G564 ortholog in the Common Bean (Phaseolus vulgaris), indicating that the regulation of G564 is evolutionarily conserved in closely related bean species.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Henry, Kelli F.; Kawashima, Tomokazu; Goldberg, Robert B.
Little is known about the molecular mechanisms by which the embryo proper and suspensor of plant embryos activate specific gene sets shortly after fertilization. We analyzed the upstream region of the Scarlet Runner Bean ( Phaseolus coccineus) G564 gene in order to understand how genes are activated specifically in the suspensor during early embryo development. Previously, we showed that a 54-bp fragment of the G564 upstream region is sufficient for suspensor transcription and contains at least three required cis-regulatory sequences, including the 10-bp motif (5'-GAAAAGCGAA-3'), the 10 bp-like motif (5'-GAAAAACGAA-3'), and Region 2 motif (partial sequence 5'-TTGGT-3'). Here, we usemore » site-directed mutagenesis experiments in transgenic tobacco globularstage embryos to identify two additional cis-regulatory elements within the 54-bp cis-regulatory module that are required for G564 suspensor transcription: the Fifth motif (5'-GAGTTA-3') and a third 10-bp-related sequence (5'-GAAAACCACA-3'). Further deletion of the 54-bp fragment revealed that a 47-bp fragment containing the five motifs (the 10-bp, 10-bp-like, 10-bp-related, Region 2 and Fifth motifs) is sufficient for suspensor transcription, and represents a cis-regulatory module. A consensus sequence for each type of motif was determined by comparing motif sequences shown to activate suspensor transcription. Phylogenetic analyses suggest that the regulation of G564 is evolutionarily conserved. Lastly, a homologous cis-regulatory module was found upstream of the G564 ortholog in the Common Bean (Phaseolus vulgaris), indicating that the regulation of G564 is evolutionarily conserved in closely related bean species.« less
Henry, Kelli F; Kawashima, Tomokazu; Goldberg, Robert B
2015-06-01
Little is known about the molecular mechanisms by which the embryo proper and suspensor of plant embryos activate specific gene sets shortly after fertilization. We analyzed the upstream region of the Scarlet Runner Bean (Phaseolus coccineus) G564 gene in order to understand how genes are activated specifically in the suspensor during early embryo development. Previously, we showed that a 54-bp fragment of the G564 upstream region is sufficient for suspensor transcription and contains at least three required cis-regulatory sequences, including the 10-bp motif (5'-GAAAAGCGAA-3'), the 10 bp-like motif (5'-GAAAAACGAA-3'), and Region 2 motif (partial sequence 5'-TTGGT-3'). Here, we use site-directed mutagenesis experiments in transgenic tobacco globular-stage embryos to identify two additional cis-regulatory elements within the 54-bp cis-regulatory module that are required for G564 suspensor transcription: the Fifth motif (5'-GAGTTA-3') and a third 10-bp-related sequence (5'-GAAAACCACA-3'). Further deletion of the 54-bp fragment revealed that a 47-bp fragment containing the five motifs (the 10-bp, 10-bp-like, 10-bp-related, Region 2 and Fifth motifs) is sufficient for suspensor transcription, and represents a cis-regulatory module. A consensus sequence for each type of motif was determined by comparing motif sequences shown to activate suspensor transcription. Phylogenetic analyses suggest that the regulation of G564 is evolutionarily conserved. A homologous cis-regulatory module was found upstream of the G564 ortholog in the Common Bean (Phaseolus vulgaris), indicating that the regulation of G564 is evolutionarily conserved in closely related bean species.
Method and apparatus for biological sequence comparison
Marr, T.G.; Chang, W.I.
1997-12-23
A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.
Method and apparatus for biological sequence comparison
Marr, Thomas G.; Chang, William I-Wei
1997-01-01
A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence.
Bhassu, Subha; Tan, Yee Shin; Vikineswary, Sabaratnam
2014-01-01
Identification of edible mushrooms particularly Pleurotus genus has been restricted due to various obstacles. The present study attempted to use the combination of two variable regions of IGS1 and ITS for classifying the economically cultivated Pleurotus species. Integration of the two regions proved a high ability that not only could clearly distinguish the species but also served sufficient intraspecies variation. Phylogenetic tree (IGS1 + ITS) showed seven distinct clades, each clade belonging to a separate species group. Moreover, the species differentiation was tested by AMOVA and the results were reconfirmed by presenting appropriate amounts of divergence (91.82% among and 8.18% within the species). In spite of achieving a proper classification of species by combination of IGS1 and ITS sequences, the phylogenetic tree showed the misclassification of the species of P. nebrodensis and P. eryngii var. ferulae with other strains of P. eryngii. However, the constructed median joining (MJ) network could not only differentiate between these species but also offer a profound perception of the species' evolutionary process. Eventually, due to the sufficient variation among and within species, distinct sequences, simple amplification, and location between ideal conserved ribosomal genes, the integration of IGS1 and ITS sequences is recommended as a desirable DNA barcode. PMID:24587752
García-Sastre, Adolfo; Palese, Peter
2005-01-01
In public databases, we identified sequences reported as human genes expressed in kidney mesangial cells. The similarity of these genes to paramyxovirus matrix, fusion, and phosphoprotein genes suggests that they are derived from a novel paramyxovirus. These genes are sufficiently unique to suggest the existence of a novel paramyxovirus genus. PMID:15705331
T-IDBA: A de novo Iterative de Bruijn Graph Assembler for Transcriptome
NASA Astrophysics Data System (ADS)
Peng, Yu; Leung, Henry C. M.; Yiu, S. M.; Chin, Francis Y. L.
RNA-seq data produced by next-generation sequencing technology is a useful tool for analyzing transcriptomes. However, existing de novo transcriptome assemblers do not fully utilize the properties of transcriptomes and may result in short contigs because of the splicing nature (shared exons) of the genes. We propose the T-IDBA algorithm to reconstruct expressed isoforms without reference genome. By using pair-end information to solve the problem of long repeats in different genes and branching in the same gene due to alternative splicing, the graph can be decomposed into small components, each corresponds to a gene. The most possible isoforms with sufficient support from the pair-end reads will be found heuristically. In practice, our de novo transcriptome assembler, T-IDBA, outperforms Abyss substantially in terms of sensitivity and precision for both simulated and real data. T-IDBA is available at http://www.cs.hku.hk/~alse/tidba/
Schlosberg, Arran
2016-01-01
The advent of next-generation sequencing (NGS) brings with it a need to manage large volumes of patient data in a manner that is compliant with both privacy laws and long-term archival needs. Outside of the realm of genomics there is a need in the broader medical community to store data, and although radiology aside the volume may be less than that of NGS, the concepts discussed herein are similarly relevant. The relation of so-called "privacy principles" to data protection and cryptographic techniques is explored with regards to the archival and backup storage of health data in Australia, and an example implementation of secure management of genomic archives is proposed with regards to this relation. Readers are presented with sufficient detail to have informed discussions - when implementing laboratory data protocols - with experts in the fields.
Schlosberg, Arran
2016-01-01
The advent of next-generation sequencing (NGS) brings with it a need to manage large volumes of patient data in a manner that is compliant with both privacy laws and long-term archival needs. Outside of the realm of genomics there is a need in the broader medical community to store data, and although radiology aside the volume may be less than that of NGS, the concepts discussed herein are similarly relevant. The relation of so-called “privacy principles” to data protection and cryptographic techniques is explored with regards to the archival and backup storage of health data in Australia, and an example implementation of secure management of genomic archives is proposed with regards to this relation. Readers are presented with sufficient detail to have informed discussions – when implementing laboratory data protocols – with experts in the fields. PMID:26955504
Li, Xinya; Deng, Z. Daniel; USA, Richland Washington; ...
2014-11-27
Better understanding of fish behavior is vital for recovery of many endangered species including salmon. The Juvenile Salmon Acoustic Telemetry System (JSATS) was developed to observe the out-migratory behavior of juvenile salmonids tagged by surgical implantation of acoustic micro-transmitters and to estimate the survival when passing through dams on the Snake and Columbia Rivers. A robust three-dimensional solver was needed to accurately and efficiently estimate the time sequence of locations of fish tagged with JSATS acoustic transmitters, to describe in sufficient detail the information needed to assess the function of dam-passage design alternatives. An approximate maximum likelihood solver was developedmore » using measurements of time difference of arrival from all hydrophones in receiving arrays on which a transmission was detected. Field experiments demonstrated that the developed solver performed significantly better in tracking efficiency and accuracy than other solvers described in the literature.« less
NASA Astrophysics Data System (ADS)
Li, Xinya; Deng, Z. Daniel; Sun, Yannan; Martinez, Jayson J.; Fu, Tao; McMichael, Geoffrey A.; Carlson, Thomas J.
2014-11-01
Better understanding of fish behavior is vital for recovery of many endangered species including salmon. The Juvenile Salmon Acoustic Telemetry System (JSATS) was developed to observe the out-migratory behavior of juvenile salmonids tagged by surgical implantation of acoustic micro-transmitters and to estimate the survival when passing through dams on the Snake and Columbia Rivers. A robust three-dimensional solver was needed to accurately and efficiently estimate the time sequence of locations of fish tagged with JSATS acoustic transmitters, to describe in sufficient detail the information needed to assess the function of dam-passage design alternatives. An approximate maximum likelihood solver was developed using measurements of time difference of arrival from all hydrophones in receiving arrays on which a transmission was detected. Field experiments demonstrated that the developed solver performed significantly better in tracking efficiency and accuracy than other solvers described in the literature.
Li, Xinya; Deng, Z. Daniel; Sun, Yannan; Martinez, Jayson J.; Fu, Tao; McMichael, Geoffrey A.; Carlson, Thomas J.
2014-01-01
Better understanding of fish behavior is vital for recovery of many endangered species including salmon. The Juvenile Salmon Acoustic Telemetry System (JSATS) was developed to observe the out-migratory behavior of juvenile salmonids tagged by surgical implantation of acoustic micro-transmitters and to estimate the survival when passing through dams on the Snake and Columbia Rivers. A robust three-dimensional solver was needed to accurately and efficiently estimate the time sequence of locations of fish tagged with JSATS acoustic transmitters, to describe in sufficient detail the information needed to assess the function of dam-passage design alternatives. An approximate maximum likelihood solver was developed using measurements of time difference of arrival from all hydrophones in receiving arrays on which a transmission was detected. Field experiments demonstrated that the developed solver performed significantly better in tracking efficiency and accuracy than other solvers described in the literature. PMID:25427517
Li, Xinya; Deng, Z Daniel; Sun, Yannan; Martinez, Jayson J; Fu, Tao; McMichael, Geoffrey A; Carlson, Thomas J
2014-11-27
Better understanding of fish behavior is vital for recovery of many endangered species including salmon. The Juvenile Salmon Acoustic Telemetry System (JSATS) was developed to observe the out-migratory behavior of juvenile salmonids tagged by surgical implantation of acoustic micro-transmitters and to estimate the survival when passing through dams on the Snake and Columbia Rivers. A robust three-dimensional solver was needed to accurately and efficiently estimate the time sequence of locations of fish tagged with JSATS acoustic transmitters, to describe in sufficient detail the information needed to assess the function of dam-passage design alternatives. An approximate maximum likelihood solver was developed using measurements of time difference of arrival from all hydrophones in receiving arrays on which a transmission was detected. Field experiments demonstrated that the developed solver performed significantly better in tracking efficiency and accuracy than other solvers described in the literature.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Xinya; Deng, Z. Daniel; USA, Richland Washington
Better understanding of fish behavior is vital for recovery of many endangered species including salmon. The Juvenile Salmon Acoustic Telemetry System (JSATS) was developed to observe the out-migratory behavior of juvenile salmonids tagged by surgical implantation of acoustic micro-transmitters and to estimate the survival when passing through dams on the Snake and Columbia Rivers. A robust three-dimensional solver was needed to accurately and efficiently estimate the time sequence of locations of fish tagged with JSATS acoustic transmitters, to describe in sufficient detail the information needed to assess the function of dam-passage design alternatives. An approximate maximum likelihood solver was developedmore » using measurements of time difference of arrival from all hydrophones in receiving arrays on which a transmission was detected. Field experiments demonstrated that the developed solver performed significantly better in tracking efficiency and accuracy than other solvers described in the literature.« less
The current status and portability of our sequence handling software.
Staden, R
1986-01-01
I describe the current status of our sequence analysis software. The package contains a comprehensive suite of programs for managing large shotgun sequencing projects, a program containing 61 functions for analysing single sequences and a program for comparing pairs of sequences for similarity. The programs that have been described before have been improved by the addition of new functions and by being made very much easier to use. The major interactive programs have 125 pages of online help available from within them. Several new programs are described including screen editing of aligned gel readings for shotgun sequencing projects; a method to highlight errors in aligned gel readings, new methods for searching for putative signals in sequences. We use the programs on a VAX computer but the whole package has been rewritten to make it easy to transport it to other machines. I believe the programs will now run on any machine with a FORTRAN77 compiler and sufficient memory. We are currently putting the programs onto an IBM PC XT/AT and another micro running under UNIX. PMID:3511446
On some new normed sequence spaces
NASA Astrophysics Data System (ADS)
Pranajaya, G.; Herawati, E.
2018-01-01
The sequence spaces (c 0)Λ, c Λ, and (ℓ ∞)Λ was introduced and studied by Mursaleen and Noman [11]. In the present paper, for M is a generalization of Orlicz function, we extend the spaces Mursaleen and Noman’s to [c 0(M)]Λ, [c(M)]Λ, and [ℓ ∞(M)]Λ, respectively, and investigate some topological properties of these spaces. Finally, we determine the necessary and sufficient conditions of an infinite matrix A belonging to classes (c 0(M), c 0(M)), (c(M), c(M)), and (ℓ ∞(M), ℓ ∞(M)).
Introduction of optical tweezers in advanced physics laboratory
NASA Astrophysics Data System (ADS)
Wang, Gang
2017-08-01
Laboratories are an essential part of undergraduate optoelectronics and photonics education. Of particular interest are the sequence of laboratories which offer students meaningful research experience within a reasonable time-frame limited by regular laboratory hours. We will present our introduction of optical tweezers into the upper-level physics laboratory. We developed the sequence of experiments in the Advanced Lab to offer students sufficient freedom to explore, rather than simply setting up a demonstration following certain recipes. We will also present its impact on our current curriculum of optoelectronics concentration within the physics program.
Embedding strategies for effective use of information from multiple sequence alignments.
Henikoff, S.; Henikoff, J. G.
1997-01-01
We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain. PMID:9070452
Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas; ...
2017-08-08
Here, we present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a MetagenomeAssembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Genemore » Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas
Here, we present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a MetagenomeAssembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Genemore » Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.« less
Udatha, D B R K Gupta; Kouskoumvekaki, Irene; Olsson, Lisbeth; Panagiotou, Gianni
2011-01-01
One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs. Copyright © 2010 Elsevier Inc. All rights reserved.
Miya, M.; Sato, Y.; Fukunaga, T.; Sado, T.; Poulsen, J. Y.; Sato, K.; Minamoto, T.; Yamamoto, S.; Yamanaka, H.; Araki, H.; Kondoh, M.; Iwasaki, W.
2015-01-01
We developed a set of universal PCR primers (MiFish-U/E) for metabarcoding environmental DNA (eDNA) from fishes. Primers were designed using aligned whole mitochondrial genome (mitogenome) sequences from 880 species, supplemented by partial mitogenome sequences from 160 elasmobranchs (sharks and rays). The primers target a hypervariable region of the 12S rRNA gene (163–185 bp), which contains sufficient information to identify fishes to taxonomic family, genus and species except for some closely related congeners. To test versatility of the primers across a diverse range of fishes, we sampled eDNA from four tanks in the Okinawa Churaumi Aquarium with known species compositions, prepared dual-indexed libraries and performed paired-end sequencing of the region using high-throughput next-generation sequencing technologies. Out of the 180 marine fish species contained in the four tanks with reference sequences in a custom database, we detected 168 species (93.3%) distributed across 59 families and 123 genera. These fishes are not only taxonomically diverse, ranging from sharks and rays to higher teleosts, but are also greatly varied in their ecology, including both pelagic and benthic species living in shallow coastal to deep waters. We also sampled natural seawaters around coral reefs near the aquarium and detected 93 fish species using this approach. Of the 93 species, 64 were not detected in the four aquarium tanks, rendering the total number of species detected to 232 (from 70 families and 152 genera). The metabarcoding approach presented here is non-invasive, more efficient, more cost-effective and more sensitive than the traditional survey methods. It has the potential to serve as an alternative (or complementary) tool for biodiversity monitoring that revolutionizes natural resource management and ecological studies of fish communities on larger spatial and temporal scales. PMID:26587265
Miya, M; Sato, Y; Fukunaga, T; Sado, T; Poulsen, J Y; Sato, K; Minamoto, T; Yamamoto, S; Yamanaka, H; Araki, H; Kondoh, M; Iwasaki, W
2015-07-01
We developed a set of universal PCR primers (MiFish-U/E) for metabarcoding environmental DNA (eDNA) from fishes. Primers were designed using aligned whole mitochondrial genome (mitogenome) sequences from 880 species, supplemented by partial mitogenome sequences from 160 elasmobranchs (sharks and rays). The primers target a hypervariable region of the 12S rRNA gene (163-185 bp), which contains sufficient information to identify fishes to taxonomic family, genus and species except for some closely related congeners. To test versatility of the primers across a diverse range of fishes, we sampled eDNA from four tanks in the Okinawa Churaumi Aquarium with known species compositions, prepared dual-indexed libraries and performed paired-end sequencing of the region using high-throughput next-generation sequencing technologies. Out of the 180 marine fish species contained in the four tanks with reference sequences in a custom database, we detected 168 species (93.3%) distributed across 59 families and 123 genera. These fishes are not only taxonomically diverse, ranging from sharks and rays to higher teleosts, but are also greatly varied in their ecology, including both pelagic and benthic species living in shallow coastal to deep waters. We also sampled natural seawaters around coral reefs near the aquarium and detected 93 fish species using this approach. Of the 93 species, 64 were not detected in the four aquarium tanks, rendering the total number of species detected to 232 (from 70 families and 152 genera). The metabarcoding approach presented here is non-invasive, more efficient, more cost-effective and more sensitive than the traditional survey methods. It has the potential to serve as an alternative (or complementary) tool for biodiversity monitoring that revolutionizes natural resource management and ecological studies of fish communities on larger spatial and temporal scales.
Blenda, Anna; Fang, David D.; Rami, Jean-François; Garsmeur, Olivier; Luo, Feng; Lacape, Jean-Marc
2012-01-01
A consensus genetic map of tetraploid cotton was constructed using six high-density maps and after the integration of a sequence-based marker redundancy check. Public cotton SSR libraries (17,343 markers) were curated for sequence redundancy using 90% as a similarity cutoff. As a result, 20% of the markers (3,410) could be considered as redundant with some other markers. The marker redundancy information had been a crucial part of the map integration process, in which the six most informative interspecific Gossypium hirsutum×G. barbadense genetic maps were used for assembling a high density consensus (HDC) map for tetraploid cotton. With redundant markers being removed, the HDC map could be constructed thanks to the sufficient number of collinear non-redundant markers in common between the component maps. The HDC map consists of 8,254 loci, originating from 6,669 markers, and spans 4,070 cM, with an average of 2 loci per cM. The HDC map presents a high rate of locus duplications, as 1,292 markers among the 6,669 were mapped in more than one locus. Two thirds of the duplications are bridging homoeologous AT and DT chromosomes constitutive of allopolyploid cotton genome, with an average of 64 duplications per AT/DT chromosome pair. Sequences of 4,744 mapped markers were used for a mutual blast alignment (BBMH) with the 13 major scaffolds of the recently released Gossypium raimondii genome indicating high level of homology between the diploid D genome and the tetraploid cotton genetic map, with only a few minor possible structural rearrangements. Overall, the HDC map will serve as a valuable resource for trait QTL comparative mapping, map-based cloning of important genes, and better understanding of the genome structure and evolution of tetraploid cotton. PMID:23029214
Kinkar, Liina; Laurimäe, Teivi; Acosta-Jamett, Gerardo; Andresiuk, Vanessa; Balkaya, Ibrahim; Casulli, Adriano; Gasser, Robin B; González, Luis Miguel; Haag, Karen L; Zait, Houria; Irshadullah, Malik; Jabbar, Abdul; Jenkins, David J; Manfredi, Maria Teresa; Mirhendi, Hossein; M'rad, Selim; Rostami-Nejad, Mohammad; Oudni-M'rad, Myriam; Pierangeli, Nora Beatriz; Ponce-Gordo, Francisco; Rehbein, Steffen; Sharbatkhori, Mitra; Kia, Eshrat Beigom; Simsek, Sami; Soriano, Silvia Viviana; Sprong, Hein; Šnábel, Viliam; Umhang, Gérald; Varcasia, Antonio; Saarma, Urmas
2018-06-21
Cystic echinococcosis (CE), a zoonotic disease caused by tapeworms of the species complex Echinococcus granulosus sensu lato, represents a substantial global health and economic burden. Within this complex, E. granulosus sensu stricto (genotypes G1 and G3) is the most frequent causative agent of human CE. Currently, there is no fully reliable method for assigning samples to genotypes G1 and G3, as the commonly used mitochondrial cox1 and nad1 genes are not sufficiently consistent for the identification and differentiation of these genotypes. Thus, a new genetic assay is required for the accurate assignment of G1 and G3. Here we use a large dataset of near-complete mtDNA sequences (n = 303) to reveal the extent of genetic variation of G1 and G3 on a broad geographical scale and to identify reliable informative positions for G1 and G3. Based on extensive sampling and sequencing data, we developed a new method, that is simple and cost-effective, to designate samples to genotypes G1 and G3. We found that the nad5 is the best gene in mtDNA to differentiate between G1 and G3, and developed new primers for the analysis. Our results also highlight problems related to the commonly used cox1 and nad1. To guarantee consistent identification of G1 and G3, we suggest using the sequencing of the nad5 gene region (680 bp). This region contains six informative positions within a relatively short fragment of the mtDNA, allowing differentiation of G1 and G3 with confidence. Our method offers clear advantages over the previous ones, providing a significantly more consistent means to distinguish G1 and G3 than the commonly used cox1 and nad1. Copyright © 2018. Published by Elsevier B.V.
Possenti, Andrea; Vendruscolo, Michele; Camilloni, Carlo; Tiana, Guido
2018-05-23
Proteins employ the information stored in the genetic code and translated into their sequences to carry out well-defined functions in the cellular environment. The possibility to encode for such functions is controlled by the balance between the amount of information supplied by the sequence and that left after that the protein has folded into its structure. We study the amount of information necessary to specify the protein structure, providing an estimate that keeps into account the thermodynamic properties of protein folding. We thus show that the information remaining in the protein sequence after encoding for its structure (the 'information gap') is very close to what needed to encode for its function and interactions. Then, by predicting the information gap directly from the protein sequence, we show that it may be possible to use these insights from information theory to discriminate between ordered and disordered proteins, to identify unknown functions, and to optimize artificially-designed protein sequences. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.
40 CFR 90.612 - Exemptions and exclusions.
Code of Federal Regulations, 2013 CFR
2013-07-01
... proving conformity of individual engines is to contain sufficiently organized data or evidence... all the information required by this part, or is not sufficiently organized, EPA will notify the...
40 CFR 90.612 - Exemptions and exclusions.
Code of Federal Regulations, 2014 CFR
2014-07-01
... proving conformity of individual engines is to contain sufficiently organized data or evidence... all the information required by this part, or is not sufficiently organized, EPA will notify the...
40 CFR 90.612 - Exemptions and exclusions.
Code of Federal Regulations, 2012 CFR
2012-07-01
... proving conformity of individual engines is to contain sufficiently organized data or evidence... all the information required by this part, or is not sufficiently organized, EPA will notify the...
[Study on ITS sequences of Aconitum vilmorinianum and its medicinal adulterant].
Zhang, Xiao-nan; Du, Chun-hua; Fu, De-huan; Gao, Li; Zhou, Pei-jun; Wang, Li
2012-09-01
To analyze and compare the ITS sequences of Aconitum vilmorinianum and its medicinal adulterant Aconitum austroyunnanense. Total genomic DNA were extracted from sample materials by improved CTAB method, ITS sequences of samples were amplified using PCR systems, directly sequenced and analyzed using software DNAStar, ClustalX1.81 and MEGA 4.0. 299 consistent sites, 19 variable sites and 13 informative sites were found in ITS1 sequences, 162 consistent sites, 2 variable sites and 1 informative sites were found in 5.8S sequences, 217 consistent sites, 3 variable sites and 1 informative site were found in ITS2 sequences. Base transition and transversion was not found only in 5.8S sequences, 2 sites transition and 1 site transversion were found in ITS1 sequences, only 1 site transversion was found in ITS2 sequences comparting the ITS sequences data matrix. By analyzing the ITS sequences data matrix from 2 population of Aconitum vilmorinianum and 3 population of Aconitum austroyunnanense, we found a stable informative site at the 596th base in ITS2 sequences, in all the samples of Aconitum vilmorinianum the base was C, and in all the samples of Aconitum austroyunnanense the base was A. Aconitum vilmorinianum and Aconitum austroyunnanense can be identified by their characters of ITS sequences, and the variable sites in ITS1 sequences are more than in ITS2 sequences.
Zhang, Q; Baldwin, V J; Acland, G M; Parshall, C J; Haskel, J; Aguirre, G D; Ray, K
1999-01-01
Photoreceptor dysplasia (pd) is one of a group of at least six distinct autosomal and one X-linked retinal disorders identified in dogs which are collectively known as progressive retinal atrophy (PRA). It is an early onset retinal disease identified in miniature schnauzer dogs, and pedigree analysis and breeding studies have established autosomal recessive inheritance of the disease. Using a gene-based approach, a number of retina-expressed genes, including some members of the phototransduction pathway, have been causally implicated in retinal diseases of humans and other animals. Here we examined seven such potential candidate genes (opsin, RDS/peripherin, ROM1, rod cGMP-gated cation channel alpha-subunit, and three subunits of transducin) for their causal association with the pd locus by testing segregation of intragenic markers with the disease locus, or, in the absence of informative polymorphisms, sequencing of the coding regions of the genes. Based on these results, we have conclusively excluded four photoreceptor-specific genes as candidates for pd by linkage analysis. For three other photoreceptor-specific genes, we did not find any mutation in the coding sequences of the genes and have excluded them provisionally. Formal exclusion would require investigation of the levels of expression of the candidate genes in pd-affected dogs relative to age-matched controls. At present we are building suitable informative pedigrees for the disease locus with a sufficient number of meiosis to be useful for genomewide screening. This should identify markers linked to the disease locus and eventually permit progress toward the identification of the photoreceptor dysplasia gene and the disease-causing mutation.
Qiu, Wang-Ren; Sun, Bi-Qian; Xiao, Xuan; Xu, Dong; Chou, Kuo-Chen
2017-05-01
Protein phosphorylation plays a critical role in human body by altering the structural conformation of a protein, causing it to become activated/deactivated, or functional modification. Given an uncharacterized protein sequence, can we predict whether it may be phosphorylated or may not? This is no doubt a very meaningful problem for both basic research and drug development. Unfortunately, to our best knowledge, so far no high throughput bioinformatics tool whatsoever has been developed to address such a very basic but important problem due to its extremely complexity and lacking sufficient training data. Here we proposed a predictor called iPhos-PseEvo by (1) incorporating the protein sequence evolutionary information into the general pseudo amino acid composition (PseAAC) via the grey system theory, (2) balancing out the skewed training datasets by the asymmetric bootstrap approach, and (3) constructing an ensemble predictor by fusing an array of individual random forest classifiers thru a voting system. Rigorous jackknife tests have indicated that very promising success rates have been achieved by iPhos-PseEvo even for such a difficult problem. A user-friendly web-server for iPhos-PseEvo has been established at http://www.jci-bioinfo.cn/iPhos-PseEvo, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can be used to analyze many other problems in protein science as well. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Multiplexed microsatellite recovery using massively parallel sequencing
Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.
2011-01-01
Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).
Warris, Sven; Boymans, Sander; Muiser, Iwe; Noback, Michiel; Krijnen, Wim; Nap, Jan-Peter
2014-01-13
Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.
NASA Technical Reports Server (NTRS)
Crouch, A.; Barnes, G.
2008-01-01
We demonstrate that the azimuthal ambiguity that is present in solar vector magnetogram data can be resolved with line-of-sight and horizontal heliographic derivative information by using the divergence-free property of magnetic fields without additional assumptions. We discuss the specific derivative information that is sufficient to resolve the ambiguity away from disk center, with particular emphasis on the line-of-sight derivative of the various components of the magnetic field. Conversely, we also show cases where ambiguity resolution fails because sufficient line-of-sight derivative information is not available. For example, knowledge of only the line-of-sight derivative of the line-of-sight component of the field is not sufficient to resolve the ambiguity away from disk center.
Noise-gating to Clean Astrophysical Image Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
DeForest, C. E.
I present a family of algorithms to reduce noise in astrophysical images and image sequences, preserving more information from the original data than is retained by conventional techniques. The family uses locally adaptive filters (“noise gates”) in the Fourier domain to separate coherent image structure from background noise based on the statistics of local neighborhoods in the image. Processing of solar data limited by simple shot noise or by additive noise reveals image structure not easily visible in the originals, preserves photometry of observable features, and reduces shot noise by a factor of 10 or more with little to nomore » apparent loss of resolution. This reveals faint features that were either not directly discernible or not sufficiently strongly detected for quantitative analysis. The method works best on image sequences containing related subjects, for example movies of solar evolution, but is also applicable to single images provided that there are enough pixels. The adaptive filter uses the statistical properties of noise and of local neighborhoods in the data to discriminate between coherent features and incoherent noise without reference to the specific shape or evolution of those features. The technique can potentially be modified in a straightforward way to exploit additional a priori knowledge about the functional form of the noise.« less
Wang, Cheng-Long; Ding, Meng-Qi; Zou, Chen-Yan; Zhu, Xue-Mei; Tang, Yu; Zhou, Mei-Liang; Shao, Ji-Rong
2017-07-26
Buckwheat is a nutritional and economically crop belonging to Polygonaceae, Fagopyrum. To better understand the mutation patterns and evolution trend in the chloroplast (cp) genome of buckwheat, and found sufficient number of variable regions to explore the phylogenetic relationships of this genus, two complete cp genomes of buckwheat including Fagopyrum dibotrys (F. dibotrys) and Fagopyrum luojishanense (F. luojishanense) were sequenced, and other two Fagopyrum cp genomes were used for comparative analysis. After morphological analysis, the main difference among these buckwheat were height, leaf shape, seeds and flower type. F. luojishanense was distinguishable from the cultivated species easily. Although the F. dibotrys and two cultivated species has some similarity, they different in habit and component contents. The cp genome of F. dibotrys was 159,320 bp while the F. luojishanense was 159,265 bp. 48 and 61 SSRs were found in F. dibotrys and F. luojishanense respectively. Meanwhile, 10 highly variable regions among these buckwheat species were located precisely. The phylogenetic relationships among four Fagopyrum species based on complete cp genomes was showed. The results suggested that F. dibotrys is more closely related to Fagopyrum tataricum. These data provided valuable genetic information for Fagopyrum species identification, taxonomy, phylogenetic study and molecular breeding.
Cascade Error Projection with Low Bit Weight Quantization for High Order Correlation Data
NASA Technical Reports Server (NTRS)
Duong, Tuan A.; Daud, Taher
1998-01-01
In this paper, we reinvestigate the solution for chaotic time series prediction problem using neural network approach. The nature of this problem is such that the data sequences are never repeated, but they are rather in chaotic region. However, these data sequences are correlated between past, present, and future data in high order. We use Cascade Error Projection (CEP) learning algorithm to capture the high order correlation between past and present data to predict a future data using limited weight quantization constraints. This will help to predict a future information that will provide us better estimation in time for intelligent control system. In our earlier work, it has been shown that CEP can sufficiently learn 5-8 bit parity problem with 4- or more bits, and color segmentation problem with 7- or more bits of weight quantization. In this paper, we demonstrate that chaotic time series can be learned and generalized well with as low as 4-bit weight quantization using round-off and truncation techniques. The results show that generalization feature will suffer less as more bit weight quantization is available and error surfaces with the round-off technique are more symmetric around zero than error surfaces with the truncation technique. This study suggests that CEP is an implementable learning technique for hardware consideration.
Bilateral wilms tumor with TP53-related anaplasia.
Popov, Sergey D; Vujanic, Gordan M; Sebire, Neil J; Chagtai, Tasnim; Williams, Richard; Vaidya, Sucheta; Pritchard-Jones, Kathy
2013-01-01
Wilms tumor (WT) with diffuse anaplasia has an unfavorable prognosis and is often (>70%) associated with mutations in the TP53 gene. Although most WTs are unilateral, 5-10% are bilateral, and they are almost always present with nephrogenic rests. The latter are considered a precursor of WT. Two cases of bilateral WTs with nephroblastomatosis, in which anaplastic changes were detected over a period of time, were analyzed using clinical, radiological, histopathological, and molecular-genetic data. TP53 was analyzed by direct sequencing of its full coding sequence and intron-exon boundaries in 11 fragments. DNA was extracted from paraffin-embedded or frozen specimens. High-resolution genomic copy number profiling was carried out by UCL Genomics on the Affymetrix Human Mapping 250K Nsp or Genome-Wide Human SNP Array 6.0 platform. Both cases demonstrated a strong association between the appearance of anaplastic clones and TP53 mutations. Synchronous ganglioneuroma was diagnosed in one case. Our cases are unique as they represent a long disease history and demonstrate the difficulties in managing rare cases of bilateral WT with anaplasia. These cases also emphasize the practical importance of modern molecular-genetic techniques and their clinical application. Moreover, they highlight the issue of the adequate sampling needed in order to gather comprehensive, efficient, and sufficient information about genetic events in a single tumor.
Noise-gating to Clean Astrophysical Image Data
NASA Astrophysics Data System (ADS)
DeForest, C. E.
2017-04-01
I present a family of algorithms to reduce noise in astrophysical images and image sequences, preserving more information from the original data than is retained by conventional techniques. The family uses locally adaptive filters (“noise gates”) in the Fourier domain to separate coherent image structure from background noise based on the statistics of local neighborhoods in the image. Processing of solar data limited by simple shot noise or by additive noise reveals image structure not easily visible in the originals, preserves photometry of observable features, and reduces shot noise by a factor of 10 or more with little to no apparent loss of resolution. This reveals faint features that were either not directly discernible or not sufficiently strongly detected for quantitative analysis. The method works best on image sequences containing related subjects, for example movies of solar evolution, but is also applicable to single images provided that there are enough pixels. The adaptive filter uses the statistical properties of noise and of local neighborhoods in the data to discriminate between coherent features and incoherent noise without reference to the specific shape or evolution of those features. The technique can potentially be modified in a straightforward way to exploit additional a priori knowledge about the functional form of the noise.
Guimaraes, S; Pruvost, M; Daligault, J; Stoetzel, E; Bennett, E A; Côté, N M-L; Nicolas, V; Lalis, A; Denys, C; Geigl, E-M; Grange, T
2017-05-01
We present a cost-effective metabarcoding approach, aMPlex Torrent, which relies on an improved multiplex PCR adapted to highly degraded DNA, combining barcoding and next-generation sequencing to simultaneously analyse many heterogeneous samples. We demonstrate the strength of these improvements by generating a phylochronology through the genotyping of ancient rodent remains from a Moroccan cave whose stratigraphy covers the last 120 000 years. Rodents are important for epidemiology, agronomy and ecological investigations and can act as bioindicators for human- and/or climate-induced environmental changes. Efficient and reliable genotyping of ancient rodent remains has the potential to deliver valuable phylogenetic and paleoecological information. The analysis of multiple ancient skeletal remains of very small size with poor DNA preservation, however, requires a sensitive high-throughput method to generate sufficient data. We show this approach to be particularly adapted at accessing this otherwise difficult taxonomic and genetic resource. As a highly scalable, lower cost and less labour-intensive alternative to targeted sequence capture approaches, we propose the aMPlex Torrent strategy to be a useful tool for the genetic analysis of multiple degraded samples in studies involving ecology, archaeology, conservation and evolutionary biology. © 2016 John Wiley & Sons Ltd.
Acylation-dependent protein export in Leishmania.
Denny, P W; Gokool, S; Russell, D G; Field, M C; Smith, D F
2000-04-14
The surface of the protozoan parasite Leishmania is unusual in that it consists predominantly of glycosylphosphatidylinositol-anchored glycoconjugates and proteins. Additionally, a family of hydrophilic acylated surface proteins (HASPs) has been localized to the extracellular face of the plasma membrane in infective parasite stages. These surface polypeptides lack a recognizable endoplasmic reticulum secretory signal sequence, transmembrane spanning domain, or glycosylphosphatidylinositol-anchor consensus sequence, indicating that novel mechanisms are involved in their transport and localization. Here, we show that the N-terminal domain of HASPB contains primary structural information that directs both N-myristoylation and palmitoylation and is essential for correct localization of the protein to the plasma membrane. Furthermore, the N-terminal 18 amino acids of HASPB, encoding the dual acylation site, are sufficient to target the heterologous Aequorea victoria green fluorescent protein to the cell surface of Leishmania. Mutagenesis of the predicted acylated residues confirms that modification by both myristate and palmitate is required for correct trafficking. These data suggest that HASPB is a representative of a novel class of proteins whose translocation onto the surface of eukaryotic cells is dependent upon a "non-classical" pathway involving N-myristoylation/palmitoylation. Significantly, HASPB is also translocated on to the extracellular face of the plasma membrane of transfected mammalian cells, indicating that the export signal for HASPB is recognized by a higher eukaryotic export mechanism.
Seinen, Erwin; Burgerhof, Johannes G. M.; Jansen, Ritsert C.; Sibon, Ody C. M.
2010-01-01
Background RNAi technology is widely used to downregulate specific gene products. Investigating the phenotype induced by downregulation of gene products provides essential information about the function of the specific gene of interest. When RNAi is applied in Drosophila melanogaster or Caenorhabditis elegans, often large dsRNAs are used. One of the drawbacks of RNAi technology is that unwanted gene products with sequence similarity to the gene of interest can be down regulated too. To verify the outcome of an RNAi experiment and to avoid these unwanted off-target effects, an additional non-overlapping dsRNA can be used to down-regulate the same gene. However it has never been tested whether this approach is sufficient to reduce the risk of off-targets. Methodology We created a novel tool to analyse the occurance of off-target effects in Drosophila and we analyzed 99 randomly chosen genes. Principal Findings Here we show that nearly all genes contain non-overlapping internal sequences that do show overlap in a common off-target gene. Conclusion Based on our in silico findings, off-target effects should not be ignored and our presented on-line tool enables the identification of two RNA interference constructs, free of overlapping off-targets, from any gene of interest. PMID:20957038
Phylogenetic mixtures and linear invariants for equal input models.
Casanellas, Marta; Steel, Mike
2017-04-01
The reconstruction of phylogenetic trees from molecular sequence data relies on modelling site substitutions by a Markov process, or a mixture of such processes. In general, allowing mixed processes can result in different tree topologies becoming indistinguishable from the data, even for infinitely long sequences. However, when the underlying Markov process supports linear phylogenetic invariants, then provided these are sufficiently informative, the identifiability of the tree topology can be restored. In this paper, we investigate a class of processes that support linear invariants once the stationary distribution is fixed, the 'equal input model'. This model generalizes the 'Felsenstein 1981' model (and thereby the Jukes-Cantor model) from four states to an arbitrary number of states (finite or infinite), and it can also be described by a 'random cluster' process. We describe the structure and dimension of the vector spaces of phylogenetic mixtures and of linear invariants for any fixed phylogenetic tree (and for all trees-the so called 'model invariants'), on any number n of leaves. We also provide a precise description of the space of mixtures and linear invariants for the special case of [Formula: see text] leaves. By combining techniques from discrete random processes and (multi-) linear algebra, our results build on a classic result that was first established by James Lake (Mol Biol Evol 4:167-191, 1987).
Kwarciak, Kamil; Radom, Marcin; Formanowicz, Piotr
2016-04-01
The classical sequencing by hybridization takes into account a binary information about sequence composition. A given element from an oligonucleotide library is or is not a part of the target sequence. However, the DNA chip technology has been developed and it enables to receive a partial information about multiplicity of each oligonucleotide the analyzed sequence consist of. Currently, it is not possible to assess the exact data of such type but even partial information should be very useful. Two realistic multiplicity information models are taken into consideration in this paper. The first one, called "one and many" assumes that it is possible to obtain information if a given oligonucleotide occurs in a reconstructed sequence once or more than once. According to the second model, called "one, two and many", one is able to receive from biochemical experiment information if a given oligonucleotide is present in an analyzed sequence once, twice or at least three times. An ant colony optimization algorithm has been implemented to verify the above models and to compare with existing algorithms for sequencing by hybridization which utilize the additional information. The proposed algorithm solves the problem with any kind of hybridization errors. Computational experiment results confirm that using even the partial information about multiplicity leads to increased quality of reconstructed sequences. Moreover, they also show that the more precise model enables to obtain better solutions and the ant colony optimization algorithm outperforms the existing ones. Test data sets and the proposed ant colony optimization algorithm are available on: http://bioserver.cs.put.poznan.pl/download/ACO4mSBH.zip. Copyright © 2016 Elsevier Ltd. All rights reserved.
Visual perception of fatigued lifting actions.
Fischer, Steven L; Albert, Wayne J; McGarry, Tim
2012-12-01
Fatigue-related changes in lifting kinematics may expose workers to undue injury risks. Early detection of accumulating fatigue offers the prospect of intervention strategies to mitigate such fatigue-related risks. In a first step towards this objective, this study investigated whether fatigue detection was accessible to visual perception and, if so, what was the key visual information required for successful fatigue discrimination. Eighteen participants were tasked with identifying fatigued lifts when viewing 24 trials presented using both video and point-light representations. Each trial comprised a pair of lifting actions containing a fresh and a fatigued lift from the same individual presented in counter-balanced sequence. Confidence intervals demonstrated that the frequency of correct responses for both sexes exceeded chance expectations (50%) for both video (68%±12%) and point-light representations (67%±10%), demonstrating that fatigued lifting kinematics are open to visual perception. There were no significant differences between sexes or viewing condition, the latter result indicating kinematic dynamics as providing sufficient information for successful fatigue discrimination. Moreover, results from single viewer investigation reported fatigue detection (75%) from point-light information describing only the kinematics of the box lifted. These preliminary findings may have important workplace applications if fatigue discrimination rates can be improved upon through future research. Copyright © 2012 Elsevier B.V. All rights reserved.
Temporal Precision of Neuronal Information in a Rapid Perceptual Judgment
Ghose, Geoffrey M.; Harrison, Ian T.
2009-01-01
In many situations, such as pedestrians crossing a busy street or prey evading predators, rapid decisions based on limited perceptual information are critical for survival. The brevity of these perceptual judgments constrains how neuronal signals are integrated or pooled over time because the underlying sequence of processes, from sensation to perceptual evaluation to motor planning and execution, all occur within several hundred milliseconds. Because most previous physiological studies of these processes have relied on tasks requiring considerably longer temporal integration, the neuronal basis of such rapid decisions remains largely unexplored. In this study, we examine the temporal precision of neuronal activity associated with a rapid perceptual judgment. We find that the activity of individual neurons over tens of milliseconds can reliably convey information about sensory events and was well correlated with the animals' judgments. There was a strong correlation between sensory reliability and the correlation with behavioral choice, suggesting that rapid decisions were preferentially based on the most reliable sensory signals. We also find that a simple model in which the responses of a small number of individual neurons (<5) are summed can completely explain behavioral performance. These results suggest that neuronal circuits are sufficiently precise to allow for cognitive decisions to be based on small numbers of action potentials from highly reliable neurons. PMID:19109454
Spontaneous emergence of autocatalytic information-coding polymers
NASA Astrophysics Data System (ADS)
Tkachenko, Alexei; Maslov, Sergei
2015-03-01
Self-replicating systems based on information-coding polymers are of crucial importance in biology. They also recently emerged as a paradigm in design on nano- and micro-scales. We present a general theoretical and numerical analysis of the problem of spontaneous emergence of autocatalysis for heteropolymers capable of template-assisted ligation driven by cyclic changes in the environment. Our central result is the existence of the first order transition between the regime dominated by free monomers and that with a self-sustaining population of sufficiently long oligomers. We provide a simple mathematically tractable model that predicts the parameters for the onset of autocatalysis and the distribution of chain lengths, in terms of monomer concentration, and two fundamental rate constants. Another key result is the emergence of the kinetically-limited optimal overlap length between a template and its two substrates. Template-assisted ligation allows for heritable transmission of information encoded in oligomer sequences thus opening up the possibility of long-term memory and evolvability of such systems. Research was carried out in part at the Center for Functional Nanomaterials at Brookhaven National Laboratory, which is supported by the U.S. Department of Energy, Office of Basic Energy Sciences, under Contract No. DE-AC02-98CH10886. Work at Biosciences Department was supported by US Department of Energy Office of Biological Research Grant PM-031.
Temporal precision of neuronal information in a rapid perceptual judgment.
Ghose, Geoffrey M; Harrison, Ian T
2009-03-01
In many situations, such as pedestrians crossing a busy street or prey evading predators, rapid decisions based on limited perceptual information are critical for survival. The brevity of these perceptual judgments constrains how neuronal signals are integrated or pooled over time because the underlying sequence of processes, from sensation to perceptual evaluation to motor planning and execution, all occur within several hundred milliseconds. Because most previous physiological studies of these processes have relied on tasks requiring considerably longer temporal integration, the neuronal basis of such rapid decisions remains largely unexplored. In this study, we examine the temporal precision of neuronal activity associated with a rapid perceptual judgment. We find that the activity of individual neurons over tens of milliseconds can reliably convey information about sensory events and was well correlated with the animals' judgments. There was a strong correlation between sensory reliability and the correlation with behavioral choice, suggesting that rapid decisions were preferentially based on the most reliable sensory signals. We also find that a simple model in which the responses of a small number of individual neurons (<5) are summed can completely explain behavioral performance. These results suggest that neuronal circuits are sufficiently precise to allow for cognitive decisions to be based on small numbers of action potentials from highly reliable neurons.
Fields, Chris
2011-01-01
The perception of persisting visual objects is mediated by transient intermediate representations, object files, that are instantiated in response to some, but not all, visual trajectories. The standard object file concept does not, however, provide a mechanism sufficient to account for all experimental data on visual object persistence, object tracking, and the ability to perceive spatially disconnected stimuli as continuously existing objects. Based on relevant anatomical, functional, and developmental data, a functional model is constructed that bases visual object individuation on the recognition of temporal sequences of apparent center-of-mass positions that are specifically identified as trajectories by dedicated “trajectory recognition networks” downstream of the medial–temporal motion-detection area. This model is shown to account for a wide range of data, and to generate a variety of testable predictions. Individual differences in the recognition, abstraction, and encoding of trajectory information are expected to generate distinct object persistence judgments and object recognition abilities. Dominance of trajectory information over feature information in stored object tokens during early infancy, in particular, is expected to disrupt the ability to re-identify human and other individuals across perceptual episodes, and lead to developmental outcomes with characteristics of autism spectrum disorders. PMID:21716599
Atrx promotes heterochromatin formation at retrotransposons
Sadic, Dennis; Schmidt, Katharina; Groh, Sophia; Kondofersky, Ivan; Ellwart, Joachim; Fuchs, Christiane; Theis, Fabian J; Schotta, Gunnar
2015-01-01
More than 50% of mammalian genomes consist of retrotransposon sequences. Silencing of retrotransposons by heterochromatin is essential to ensure genomic stability and transcriptional integrity. Here, we identified a short sequence element in intracisternal A particle (IAP) retrotransposons that is sufficient to trigger heterochromatin formation. We used this sequence in a genome-wide shRNA screen and identified the chromatin remodeler Atrx as a novel regulator of IAP silencing. Atrx binds to IAP elements and is necessary for efficient heterochromatin formation. In addition, Atrx facilitates a robust and largely inaccessible heterochromatin structure as Atrx knockout cells display increased chromatin accessibility at retrotransposons and non-repetitive heterochromatic loci. In summary, we demonstrate a direct role of Atrx in the establishment and robust maintenance of heterochromatin. PMID:26012739
Pardo, Belén G; Fernández, Carlos; Millán, Adrián; Bouza, Carmen; Vázquez-López, Araceli; Vera, Manuel; Alvarez-Dios, José A; Calaza, Manuel; Gómez-Tato, Antonio; Vázquez, María; Cabaleiro, Santiago; Magariños, Beatriz; Lemos, Manuel L; Leiro, José M; Martínez, Paulino
2008-01-01
Background The turbot (Scophthalmus maximus; Scophthalmidae; Pleuronectiformes) is a flatfish species of great relevance for marine aquaculture in Europe. In contrast to other cultured flatfish, very few genomic resources are available in this species. Aeromonas salmonicida and Philasterides dicentrarchi are two pathogens that affect turbot culture causing serious economic losses to the turbot industry. Little is known about the molecular mechanisms for disease resistance and host-pathogen interactions in this species. In this work, thousands of ESTs for functional genomic studies and potential markers linked to ESTs for mapping (microsatellites and single nucleotide polymorphisms (SNPs)) are provided. This information enabled us to obtain a preliminary view of regulated genes in response to these pathogens and it constitutes the basis for subsequent and more accurate microarray analysis. Results A total of 12584 cDNAs partially sequenced from three different cDNA libraries of turbot (Scophthalmus maximus) infected with Aeromonas salmonicida, Philasterides dicentrarchi and from healthy fish were analyzed. Three immune-relevant tissues (liver, spleen and head kidney) were sampled at several time points in the infection process for library construction. The sequences were processed into 9256 high-quality sequences, which constituted the source for the turbot EST database. Clustering and assembly of these sequences, revealed 3482 different putative transcripts, 1073 contigs and 2409 singletons. BLAST searches with public databases detected significant similarity (e-value ≤ 1e-5) in 1766 (50.7%) sequences and 816 of them (23.4%) could be functionally annotated. Two hundred three of these genes (24.9%), encoding for defence/immune-related proteins, were mostly identified for the first time in turbot. Some ESTs showed significant differences in the number of transcripts when comparing the three libraries, suggesting regulation in response to these pathogens. A total of 191 microsatellites, with 104 having sufficient flanking sequences for primer design, and 1158 putative SNPs were identified from these EST resources in turbot. Conclusion A collection of 9256 high-quality ESTs was generated representing 3482 unique turbot sequences. A large proportion of defence/immune-related genes were identified, many of them regulated in response to specific pathogens. Putative microsatellites and SNPs were identified. These genome resources constitute the basis to develop a microarray for functional genomics studies and marker validation for genetic linkage and QTL analysis in turbot. PMID:18817567
Solis, Armando D
2015-12-01
To reduce complexity, understand generalized rules of protein folding, and facilitate de novo protein design, the 20-letter amino acid alphabet is commonly reduced to a smaller alphabet by clustering amino acids based on some measure of similarity. In this work, we seek the optimal alphabet that preserves as much of the structural information found in long-range (contact) interactions among amino acids in natively-folded proteins. We employ the Information Maximization Device, based on information theory, to partition the amino acids into well-defined clusters. Numbering from 2 to 19 groups, these optimal clusters of amino acids, while generated automatically, embody well-known properties of amino acids such as hydrophobicity/polarity, charge, size, and aromaticity, and are demonstrated to maintain the discriminative power of long-range interactions with minimal loss of mutual information. Our measurements suggest that reduced alphabets (of less than 10) are able to capture virtually all of the information residing in native contacts and may be sufficient for fold recognition, as demonstrated by extensive threading tests. In an expansive survey of the literature, we observe that alphabets derived from various approaches-including those derived from physicochemical intuition, local structure considerations, and sequence alignments of remote homologs-fare consistently well in preserving contact interaction information, highlighting a convergence in the various factors thought to be relevant to the folding code. Moreover, we find that alphabets commonly used in experimental protein design are nearly optimal and are largely coherent with observations that have arisen in this work. © 2015 Wiley Periodicals, Inc.
Moser, Aline; Wüthrich, Daniel; Bruggmann, Rémy; Eugster-Meier, Elisabeth; Meile, Leo; Irmler, Stefan
2017-01-01
The advent of massive parallel sequencing technologies has opened up possibilities for the study of the bacterial diversity of ecosystems without the need for enrichment or single strain isolation. By exploiting 78 genome data-sets from Lactobacillus helveticus strains, we found that the slpH locus that encodes a putative surface layer protein displays sufficient genetic heterogeneity to be a suitable target for strain typing. Based on high-throughput slpH gene sequencing and the detection of single-base DNA sequence variations, we established a culture-independent method to assess the biodiversity of the L. helveticus strains present in fermented dairy food. When we applied the method to study the L. helveticus strain composition in 15 natural whey cultures (NWCs) that were collected at different Gruyère, a protected designation of origin (PDO) production facilities, we detected a total of 10 sequence types (STs). In addition, we monitored the development of a three-strain mix in raclette cheese for 17 weeks. PMID:28775722
Nanoliter reactors improve multiple displacement amplification of genomes from single cells.
Marcy, Yann; Ishoey, Thomas; Lasken, Roger S; Stockwell, Timothy B; Walenz, Brian P; Halpern, Aaron L; Beeson, Karen Y; Goldberg, Susanne M D; Quake, Stephen R
2007-09-01
Since only a small fraction of environmental bacteria are amenable to laboratory culture, there is great interest in genomic sequencing directly from single cells. Sufficient DNA for sequencing can be obtained from one cell by the Multiple Displacement Amplification (MDA) method, thereby eliminating the need to develop culture methods. Here we used a microfluidic device to isolate individual Escherichia coli and amplify genomic DNA by MDA in 60-nl reactions. Our results confirm a report that reduced MDA reaction volume lowers nonspecific synthesis that can result from contaminant DNA templates and unfavourable interaction between primers. The quality of the genome amplification was assessed by qPCR and compared favourably to single-cell amplifications performed in standard 50-microl volumes. Amplification bias was greatly reduced in nanoliter volumes, thereby providing a more even representation of all sequences. Single-cell amplicons from both microliter and nanoliter volumes provided high-quality sequence data by high-throughput pyrosequencing, thereby demonstrating a straightforward route to sequencing genomes from single cells.
Yelina, Nataliya E; Lambing, Christophe; Hardcastle, Thomas J; Zhao, Xiaohui; Santos, Bruno; Henderson, Ian R
2015-10-15
During meiosis, homologous chromosomes undergo crossover recombination, which is typically concentrated in narrow hot spots that are controlled by genetic and epigenetic information. Arabidopsis chromosomes are highly DNA methylated in the repetitive centromeres, which are also crossover-suppressed. Here we demonstrate that RNA-directed DNA methylation is sufficient to locally silence Arabidopsis euchromatic crossover hot spots and is associated with increased nucleosome density and H3K9me2. However, loss of CG DNA methylation maintenance in met1 triggers epigenetic crossover remodeling at the chromosome scale, with pericentromeric decreases and euchromatic increases in recombination. We used recombination mutants that alter interfering and noninterfering crossover repair pathways (fancm and zip4) to demonstrate that remodeling primarily involves redistribution of interfering crossovers. Using whole-genome bisulfite sequencing, we show that crossover remodeling is driven by loss of CG methylation within the centromeric regions. Using cytogenetics, we profiled meiotic DNA double-strand break (DSB) foci in met1 and found them unchanged relative to wild type. We propose that met1 chromosome structure is altered, causing centromere-proximal DSBs to be inhibited from maturation into interfering crossovers. These data demonstrate that DNA methylation is sufficient to silence crossover hot spots and plays a key role in establishing domains of meiotic recombination along chromosomes. © 2015 Yelina et al.; Published by Cold Spring Harbor Laboratory Press.
Yilmaz, Pelin; Kottmann, Renzo; Field, Dawn; Knight, Rob; Cole, James R; Amaral-Zettler, Linda; Gilbert, Jack A; Karsch-Mizrachi, Ilene; Johnston, Anjanette; Cochrane, Guy; Vaughan, Robert; Hunter, Christopher; Park, Joonhong; Morrison, Norman; Rocca-Serra, Philippe; Sterk, Peter; Arumugam, Manimozhiyan; Bailey, Mark; Baumgartner, Laura; Birren, Bruce W; Blaser, Martin J; Bonazzi, Vivien; Booth, Tim; Bork, Peer; Bushman, Frederic D; Buttigieg, Pier Luigi; Chain, Patrick S G; Charlson, Emily; Costello, Elizabeth K; Huot-Creasy, Heather; Dawyndt, Peter; DeSantis, Todd; Fierer, Noah; Fuhrman, Jed A; Gallery, Rachel E; Gevers, Dirk; Gibbs, Richard A; Gil, Inigo San; Gonzalez, Antonio; Gordon, Jeffrey I; Guralnick, Robert; Hankeln, Wolfgang; Highlander, Sarah; Hugenholtz, Philip; Jansson, Janet; Kau, Andrew L; Kelley, Scott T; Kennedy, Jerry; Knights, Dan; Koren, Omry; Kuczynski, Justin; Kyrpides, Nikos; Larsen, Robert; Lauber, Christian L; Legg, Teresa; Ley, Ruth E; Lozupone, Catherine A; Ludwig, Wolfgang; Lyons, Donna; Maguire, Eamonn; Methé, Barbara A; Meyer, Folker; Muegge, Brian; Nakielny, Sara; Nelson, Karen E; Nemergut, Diana; Neufeld, Josh D; Newbold, Lindsay K; Oliver, Anna E; Pace, Norman R; Palanisamy, Giriprakash; Peplies, Jörg; Petrosino, Joseph; Proctor, Lita; Pruesse, Elmar; Quast, Christian; Raes, Jeroen; Ratnasingham, Sujeevan; Ravel, Jacques; Relman, David A; Assunta-Sansone, Susanna; Schloss, Patrick D; Schriml, Lynn; Sinha, Rohini; Smith, Michelle I; Sodergren, Erica; Spor, Aymé; Stombaugh, Jesse; Tiedje, James M; Ward, Doyle V; Weinstock, George M; Wendel, Doug; White, Owen; Whiteley, Andrew; Wilke, Andreas; Wortman, Jennifer R; Yatsunenko, Tanya; Glöckner, Frank Oliver
2012-01-01
Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences—the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The ‘environmental packages’ apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere. PMID:21552244
Deroost, Natacha; Coomans, Daphné
2018-02-01
We examined the role of sequence awareness in a pure perceptual sequence learning design. Participants had to react to the target's colour that changed according to a perceptual sequence. By varying the mapping of the target's colour onto the response keys, motor responses changed randomly. The effect of sequence awareness on perceptual sequence learning was determined by manipulating the learning instructions (explicit versus implicit) and assessing the amount of sequence awareness after the experiment. In the explicit instruction condition (n = 15), participants were instructed to intentionally search for the colour sequence, whereas in the implicit instruction condition (n = 15), they were left uninformed about the sequenced nature of the task. Sequence awareness after the sequence learning task was tested by means of a questionnaire and the process-dissociation-procedure. The results showed that the instruction manipulation had no effect on the amount of perceptual sequence learning. Based on their report to have actively applied their sequence knowledge during the experiment, participants were subsequently regrouped in a sequence strategy group (n = 14, of which 4 participants from the implicit instruction condition and 10 participants from the explicit instruction condition) and a no-sequence strategy group (n = 16, of which 11 participants from the implicit instruction condition and 5 participants from the explicit instruction condition). Only participants of the sequence strategy group showed reliable perceptual sequence learning and sequence awareness. These results indicate that perceptual sequence learning depends upon the continuous employment of strategic cognitive control processes on sequence knowledge. Sequence awareness is suggested to be a necessary but not sufficient condition for perceptual learning to take place. Copyright © 2018 Elsevier B.V. All rights reserved.
Phase 2 of the array automated assembly task for the low cost solar array project
NASA Technical Reports Server (NTRS)
Campbell, R. B.; Davis, J. R.; Ostroski, J. W.; Rai-Choudhury, P.; Rohatgi, A.; Seman, E. J.; Stapleton, R. E.
1979-01-01
The process sequence for the fabrication of dendritic web silicon into solar panels was modified to include aluminum back surface field formation. Plasma etching was found to be a feasible technique for pre-diffusion cleaning of the web. Several contacting systems were studied. The total plated Pd-Ni system was not compatible with the process sequence; however, the evaporated TiPd-electroplated Cu system was shown stable under life testing. Ultrasonic bonding parameters were determined for various interconnect and contact metals but the yield of the process was not sufficiently high to use for module fabrication at this time. Over 400 solar cells were fabricated according to the modified sequence. No sub-process incompatibility was seen. These cells were used to fabricate four demonstration modules. A cost analysis of the modified process sequence resulted in a selling price of $0.75/peak watt.
On Statistical Modeling of Sequencing Noise in High Depth Data to Assess Tumor Evolution
NASA Astrophysics Data System (ADS)
Rabadan, Raul; Bhanot, Gyan; Marsilio, Sonia; Chiorazzi, Nicholas; Pasqualucci, Laura; Khiabanian, Hossein
2018-07-01
One cause of cancer mortality is tumor evolution to therapy-resistant disease. First line therapy often targets the dominant clone, and drug resistance can emerge from preexisting clones that gain fitness through therapy-induced natural selection. Such mutations may be identified using targeted sequencing assays by analysis of noise in high-depth data. Here, we develop a comprehensive, unbiased model for sequencing error background. We find that noise in sufficiently deep DNA sequencing data can be approximated by aggregating negative binomial distributions. Mutations with frequencies above noise may have prognostic value. We evaluate our model with simulated exponentially expanded populations as well as data from cell line and patient sample dilution experiments, demonstrating its utility in prognosticating tumor progression. Our results may have the potential to identify significant mutations that can cause recurrence. These results are relevant in the pretreatment clinical setting to determine appropriate therapy and prepare for potential recurrence pretreatment.
Design of nucleic acid strands with long low-barrier folding pathways.
Condon, Anne; Kirkpatrick, Bonnie; Maňuch, Ján
2017-01-01
A major goal of natural computing is to design biomolecules, such as nucleic acid sequences, that can be used to perform computations. We design sequences of nucleic acids that are "guaranteed" to have long folding pathways relative to their length. This particular sequences with high probability follow low-barrier folding pathways that visit a large number of distinct structures. Long folding pathways are interesting, because they demonstrate that natural computing can potentially support long and complex computations. Formally, we provide the first scalable designs of molecules whose low-barrier folding pathways, with respect to a simple, stacked pair energy model, grow superlinearly with the molecule length, but for which all significantly shorter alternative folding pathways have an energy barrier that is [Formula: see text] times that of the low-barrier pathway for any [Formula: see text] and a sufficiently long sequence.
On Statistical Modeling of Sequencing Noise in High Depth Data to Assess Tumor Evolution
NASA Astrophysics Data System (ADS)
Rabadan, Raul; Bhanot, Gyan; Marsilio, Sonia; Chiorazzi, Nicholas; Pasqualucci, Laura; Khiabanian, Hossein
2017-12-01
One cause of cancer mortality is tumor evolution to therapy-resistant disease. First line therapy often targets the dominant clone, and drug resistance can emerge from preexisting clones that gain fitness through therapy-induced natural selection. Such mutations may be identified using targeted sequencing assays by analysis of noise in high-depth data. Here, we develop a comprehensive, unbiased model for sequencing error background. We find that noise in sufficiently deep DNA sequencing data can be approximated by aggregating negative binomial distributions. Mutations with frequencies above noise may have prognostic value. We evaluate our model with simulated exponentially expanded populations as well as data from cell line and patient sample dilution experiments, demonstrating its utility in prognosticating tumor progression. Our results may have the potential to identify significant mutations that can cause recurrence. These results are relevant in the pretreatment clinical setting to determine appropriate therapy and prepare for potential recurrence pretreatment.
Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine
2011-03-10
Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de.
Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine
2011-01-01
Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de. PMID:21423752
Gries, Jasmin; Schumacher, Dirk; Arand, Julia; Lutsik, Pavlo; Markelova, Maria Rivera; Fichtner, Iduna; Walter, Jörn; Sers, Christine; Tierling, Sascha
2013-01-01
The use of next generation sequencing has expanded our view on whole mammalian methylome patterns. In particular, it provides a genome-wide insight of local DNA methylation diversity at single nucleotide level and enables the examination of single chromosome sequence sections at a sufficient statistical power. We describe a bisulfite-based sequence profiling pipeline, Bi-PROF, which is based on the 454 GS-FLX Titanium technology that allows to obtain up to one million sequence stretches at single base pair resolution without laborious subcloning. To illustrate the performance of the experimental workflow connected to a bioinformatics program pipeline (BiQ Analyzer HT) we present a test analysis set of 68 different epigenetic marker regions (amplicons) in five individual patient-derived xenograft tissue samples of colorectal cancer and one healthy colon epithelium sample as a control. After the 454 GS-FLX Titanium run, sequence read processing and sample decoding, the obtained alignments are quality controlled and statistically evaluated. Comprehensive methylation pattern interpretation (profiling) assessed by analyzing 102-104 sequence reads per amplicon allows an unprecedented deep view on pattern formation and methylation marker heterogeneity in tissues concerned by complex diseases like cancer. PMID:23803588
Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors.
Adalsteinsson, Viktor A; Ha, Gavin; Freeman, Samuel S; Choudhury, Atish D; Stover, Daniel G; Parsons, Heather A; Gydush, Gregory; Reed, Sarah C; Rotem, Denisse; Rhoades, Justin; Loginov, Denis; Livitz, Dimitri; Rosebrock, Daniel; Leshchiner, Ignaty; Kim, Jaegil; Stewart, Chip; Rosenberg, Mara; Francis, Joshua M; Zhang, Cheng-Zhong; Cohen, Ofir; Oh, Coyin; Ding, Huiming; Polak, Paz; Lloyd, Max; Mahmud, Sairah; Helvie, Karla; Merrill, Margaret S; Santiago, Rebecca A; O'Connor, Edward P; Jeong, Seong H; Leeson, Rachel; Barry, Rachel M; Kramkowski, Joseph F; Zhang, Zhenwei; Polacek, Laura; Lohr, Jens G; Schleicher, Molly; Lipscomb, Emily; Saltzman, Andrea; Oliver, Nelly M; Marini, Lori; Waks, Adrienne G; Harshman, Lauren C; Tolaney, Sara M; Van Allen, Eliezer M; Winer, Eric P; Lin, Nancy U; Nakabayashi, Mari; Taplin, Mary-Ellen; Johannessen, Cory M; Garraway, Levi A; Golub, Todd R; Boehm, Jesse S; Wagle, Nikhil; Getz, Gad; Love, J Christopher; Meyerson, Matthew
2017-11-06
Whole-exome sequencing of cell-free DNA (cfDNA) could enable comprehensive profiling of tumors from blood but the genome-wide concordance between cfDNA and tumor biopsies is uncertain. Here we report ichorCNA, software that quantifies tumor content in cfDNA from 0.1× coverage whole-genome sequencing data without prior knowledge of tumor mutations. We apply ichorCNA to 1439 blood samples from 520 patients with metastatic prostate or breast cancers. In the earliest tested sample for each patient, 34% of patients have ≥10% tumor-derived cfDNA, sufficient for standard coverage whole-exome sequencing. Using whole-exome sequencing, we validate the concordance of clonal somatic mutations (88%), copy number alterations (80%), mutational signatures, and neoantigens between cfDNA and matched tumor biopsies from 41 patients with ≥10% cfDNA tumor content. In summary, we provide methods to identify patients eligible for comprehensive cfDNA profiling, revealing its applicability to many patients, and demonstrate high concordance of cfDNA and metastatic tumor whole-exome sequencing.
Information capacity of nucleotide sequences and its applications.
Sadovsky, M G
2006-05-01
The information capacity of nucleotide sequences is defined through the specific entropy of frequency dictionary of a sequence determined with respect to another one containing the most probable continuations of shorter strings. This measure distinguishes a sequence both from a random one, and from ordered entity. A comparison of sequences based on their information capacity is studied. An order within the genetic entities is found at the length scale ranged from 3 to 8. Some other applications of the developed methodology to genetics, bioinformatics, and molecular biology are discussed.
Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin
2016-06-15
Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx : xin.gao@kaust.edu.sa Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Mauchline, T H; Mohan, S; Davies, K G; Schaff, J E; Opperman, C H; Kerry, B R; Hirsch, P R
2010-05-01
To establish a reliable protocol to extract DNA from Pasteuria penetrans endospores for use as template in multiple strand amplification, thus providing sufficient material for genetic analyses. To develop a highly sensitive PCR-based diagnostic tool for P. penetrans. An optimized method to decontaminate endospores, release and purify DNA enabled multiple strand amplification. DNA purity was assessed by cloning and sequencing gyrB and 16S rRNA gene fragments obtained from PCR using generic primers. Samples indicated to be 100%P. penetrans by the gyrB assay were estimated at 46% using the 16S rRNA gene. No bias was detected on cloning and sequencing 12 housekeeping and sporulation gene fragments from amplified DNA. The detection limit by PCR with Pasteuria-specific 16S rRNA gene primers following multiple strand amplification of DNA extracted using the method was a single endospore. Generation of large quantities DNA will facilitate genomic sequencing of P. penetrans. Apparent differences in sample purity are explained by variations in 16S rRNA gene copy number in Eubacteria leading to exaggerated estimations of sample contamination. Detection of single endospores will facilitate investigations of P. penetrans molecular ecology. These methods will advance studies on P. penetrans and facilitate research on other obligate and fastidious micro-organisms where it is currently impractical to obtain DNA in sufficient quantity and quality.
29 CFR 1926.752 - Site layout, site-specific erection plan and construction sequence.
Code of Federal Regulations, 2011 CFR
2011-07-01
... standard test method of field-cured samples, either 75 percent of the intended minimum compressive design... the basis of an appropriate ASTM standard test method of field-cured samples, either 75 percent of the intended minimum compressive design strength or sufficient strength to support the loads imposed during...
29 CFR 1926.752 - Site layout, site-specific erection plan and construction sequence.
Code of Federal Regulations, 2013 CFR
2013-07-01
... standard test method of field-cured samples, either 75 percent of the intended minimum compressive design... the basis of an appropriate ASTM standard test method of field-cured samples, either 75 percent of the intended minimum compressive design strength or sufficient strength to support the loads imposed during...
29 CFR 1926.752 - Site layout, site-specific erection plan and construction sequence.
Code of Federal Regulations, 2012 CFR
2012-07-01
... standard test method of field-cured samples, either 75 percent of the intended minimum compressive design... the basis of an appropriate ASTM standard test method of field-cured samples, either 75 percent of the intended minimum compressive design strength or sufficient strength to support the loads imposed during...
29 CFR 1926.752 - Site layout, site-specific erection plan and construction sequence.
Code of Federal Regulations, 2010 CFR
2010-07-01
... standard test method of field-cured samples, either 75 percent of the intended minimum compressive design... the basis of an appropriate ASTM standard test method of field-cured samples, either 75 percent of the intended minimum compressive design strength or sufficient strength to support the loads imposed during...
29 CFR 1926.752 - Site layout, site-specific erection plan and construction sequence.
Code of Federal Regulations, 2014 CFR
2014-07-01
... standard test method of field-cured samples, either 75 percent of the intended minimum compressive design... the basis of an appropriate ASTM standard test method of field-cured samples, either 75 percent of the intended minimum compressive design strength or sufficient strength to support the loads imposed during...
USDA-ARS?s Scientific Manuscript database
The comprehensive identification of genes underlying phenotypic variation of complex traits such as disease resistance remains one of the greatest challenges in biology despite having genome sequences and more powerful tools. Most genome-wide screens lack sufficient resolving power as they typically...
Pride, Pity, Anger, Guilt: Thought-Affect Sequences in the Classroom.
ERIC Educational Resources Information Center
Weiner, Bernard
A set of prevalent emotions, including pity, anger, guilt, pride (self-esteem), gratitude, and resignation, shares a common characteristic, i.e., causal attributions appear to be sufficient antecedents for their elicitation. Research in the field of emotions has shown that the underlying properties or dimensions of attributions are the significant…
FSH is an important regulator of mammalian gametogenesis and the female reproductive cycle. Although little is known about the transcriptional regulation of the beta-subunit (the rate-limiting subunit of FSH synthesis), sequence analysis of the ovine FSHbeta promoter has revealed...
Inflammation Thread Runs across Medical Laboratory Specialities.
Nydegger, Urs; Lung, Thomas; Risch, Lorenz; Risch, Martin; Medina Escobar, Pedro; Bodmer, Thomas
2016-01-01
We work on the assumption that four major specialities or sectors of medical laboratory assays, comprising clinical chemistry, haematology, immunology, and microbiology, embraced by genome sequencing techniques, are routinely in use. Medical laboratory markers for inflammation serve as model: they are allotted to most fields of medical lab assays including genomics. Incessant coding of assays aligns each of them in the long lists of big data. As exemplified with the complement gene family, containing C2, C3, C8A, C8B, CFH, CFI, and ITGB2, heritability patterns/risk factors associated with diseases with genetic glitch of complement components are unfolding. The C4 component serum levels depend on sufficient vitamin D whilst low vitamin D is inversely related to IgG1, IgA, and C3 linking vitamin sufficiency to innate immunity. Whole genome sequencing of microbial organisms may distinguish virulent from nonvirulent and antibiotic resistant from nonresistant varieties of the same species and thus can be listed in personal big data banks including microbiological pathology; the big data warehouse continues to grow.
Inflammation Thread Runs across Medical Laboratory Specialities
Lung, Thomas; Risch, Lorenz; Risch, Martin; Medina Escobar, Pedro; Bodmer, Thomas
2016-01-01
We work on the assumption that four major specialities or sectors of medical laboratory assays, comprising clinical chemistry, haematology, immunology, and microbiology, embraced by genome sequencing techniques, are routinely in use. Medical laboratory markers for inflammation serve as model: they are allotted to most fields of medical lab assays including genomics. Incessant coding of assays aligns each of them in the long lists of big data. As exemplified with the complement gene family, containing C2, C3, C8A, C8B, CFH, CFI, and ITGB2, heritability patterns/risk factors associated with diseases with genetic glitch of complement components are unfolding. The C4 component serum levels depend on sufficient vitamin D whilst low vitamin D is inversely related to IgG1, IgA, and C3 linking vitamin sufficiency to innate immunity. Whole genome sequencing of microbial organisms may distinguish virulent from nonvirulent and antibiotic resistant from nonresistant varieties of the same species and thus can be listed in personal big data banks including microbiological pathology; the big data warehouse continues to grow. PMID:27493451
Achieving high confidence protein annotations in a sea of unknowns
NASA Astrophysics Data System (ADS)
Timmins-Schiffman, E.; May, D. H.; Noble, W. S.; Nunn, B. L.; Mikan, M.; Harvey, H. R.
2016-02-01
Increased sensitivity of mass spectrometry (MS) technology allows deep and broad insight into community functional analyses. Metaproteomics holds the promise to reveal functional responses of natural microbial communities, whereas metagenomics alone can only hint at potential functions. The complex datasets resulting from ocean MS have the potential to inform diverse realms of the biological, chemical, and physical ocean sciences, yet the extent of bacterial functional diversity and redundancy has not been fully explored. To take advantage of these impressive datasets, we need a clear bioinformatics pipeline for metaproteomics peptide identification and annotation with a database that can provide confident identifications. Researchers must consider whether it is sufficient to leverage the vast quantities of available ocean sequence data or if they must invest in site-specific metagenomic sequencing. We have sequenced, to our knowledge, the first western arctic metagenomes from the Bering Strait and the Chukchi Sea. We have addressed the long standing question: Is a metagenome required to accurately complete metaproteomics and assess the biological distribution of metabolic functions controlling nutrient acquisition in the ocean? Two different protein databases were constructed from 1) a site-specific metagenome and 2) subarctic/arctic groups available in NCBI's non-redundant database. Multiple proteomic search strategies were employed, against each individual database and against both databases combined, to determine the algorithm and approach that yielded the balance of high sensitivity and confident identification. Results yielded over 8200 confidently identified proteins. Our comparison of these results allows us to quantify the utility of investing resources in a metagenome versus using the constantly expanding and immediately available public databases for metaproteomic studies.
A paleomagnetic record in loess-paleosol sequences since late Pleistocene in the arid Central Asia
NASA Astrophysics Data System (ADS)
Li, Guanhua; Xia, Dunsheng; Appel, Erwin; Wang, Youjun; Jia, Jia; Yang, Xiaoqiang
2018-03-01
Geomagnetic excursions during Brunhes epoch have been brought to the forefront topic in paleomagnetic study, as they provide key information about Earth's interior dynamics and could serve as another tool for stratigraphic correlation among different lithology. Loess-paleosol sequences provide good archives for decoding geomagnetic excursions. However, the detailed pattern of these excursions was not sufficiently clarified due to pedogenic influence. In this study, paleomagnetic analysis was performed in loess-paleosol sequences on the northern piedmont of the Tianshan Mountains (northwestern China). By radiocarbon and luminance dating, the loess section was chronologically constrained to mainly the last c.130 ka, a period when several distinct geomagnetic excursions were involved. The rock magnetic properties in this loess section are dominated by magnetite and maghemite in a pseudo-single-domain state. The rock magnetic properties and magnetic anisotropy indicate weakly pedogenic influence for magnetic record. The stable component of remanent magnetization derived from thermal demagnetization revealed the presence of two intervals of directional anomalies with corresponding intensity lows in the Brunhes epoch. The age control in the key layers indicates these anomalies are likely associated with the Laschamp and Blake excursions, respectively. In addition, relative paleointensity in the loess section is basically compatible with other regional and global relative paleointensity records and indicates two low-paleointensity zones, possibly corresponding to the Blake and Laschamp excursions, respectively. As a result, this study suggests that the loess section may have the potential to record short-lived excursions, which largely reflect the variation of dipole components in the global archives.
Phylogeny of Lagos bat virus: challenges for lyssavirus taxonomy.
Markotter, W; Kuzmin, I; Rupprecht, C E; Nel, L H
2008-07-01
Lagos bat virus (LBV) belongs to genotype 2 of the Lyssavirus genus. The complete nucleoprotein (N), phosphoprotein (P), matrixprotein (M) and glycoprotein (G) genes of 13 LBV isolates were sequenced and phylogenetically compared with other lyssavirus representatives. The results identified three different lineages of LBV. One of these lineages demonstrated sufficient sequence diversity to be considered a new lyssavirus genotype (Dakar bat lyssavirus). The suggested quantitative separation of lyssavirus genotypes using the N, P, M and G genes was also investigated using P-distances matrixes. Results indicated that the current criteria should be revised since overlaps between intergenotypic and intragenotypic variation occur.
Origins of the protein synthesis cycle
NASA Technical Reports Server (NTRS)
Fox, S. W.
1981-01-01
Largely derived from experiments in molecular evolution, a theory of protein synthesis cycles has been constructed. The sequence begins with ordered thermal proteins resulting from the self-sequencing of mixed amino acids. Ordered thermal proteins then aggregate to cell-like structures. When they contained proteinoids sufficiently rich in lysine, the structures were able to synthesize offspring peptides. Since lysine-rich proteinoid (LRP) also catalyzes the polymerization of nucleoside triphosphate to polynucleotides, the same microspheres containing LRP could have synthesized both original cellular proteins and cellular nucleic acids. The LRP within protocells would have provided proximity advantageous for the origin and evolution of the genetic code.
Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J; Kellam, Paul; van der Hoek, Lia
2014-01-01
We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.
Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia
2014-01-01
We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106
Okura, Hiromichi; Takahashi, Tsuyoshi; Mihara, Hisakazu
2012-06-01
Successful approaches of de novo protein design suggest a great potential to create novel structural folds and to understand natural rules of protein folding. For these purposes, smaller and simpler de novo proteins have been developed. Here, we constructed smaller proteins by removing the terminal sequences from stable de novo vTAJ proteins and compared stabilities between mutant and original proteins. vTAJ proteins were screened from an α3β3 binary-patterned library which was designed with polar/ nonpolar periodicities of α-helix and β-sheet. vTAJ proteins have the additional terminal sequences due to the method of constructing the genetically repeated library sequences. By removing the parts of the sequences, we successfully obtained the stable smaller de novo protein mutants with fewer amino acid alphabets than the originals. However, these mutants showed the differences on ANS binding properties and stabilities against denaturant and pH change. The terminal sequences, which were designed just as flexible linkers not as secondary structure units, sufficiently affected these physicochemical details. This study showed implications for adjusting protein stabilities by designing N- and C-terminal sequences.
Polyomavirus BK non-coding control region rearrangements in health and disease.
Sharma, Preety M; Gupta, Gaurav; Vats, Abhay; Shapiro, Ron; Randhawa, Parmjeet S
2007-08-01
BK virus is an increasingly recognized pathogen in transplanted patients. DNA sequencing of this virus shows considerable genomic variability. To understand the clinical significance of rearrangements in the non-coding control region (NCCR) of BK virus (BKV), we report a meta-analysis of 507 sequences, including 40 sequences generated in our own laboratory, for associations between rearrangements and disease, tissue tropism, geographic origin, and viral genotype. NCCR rearrangements were less frequent in (a) asymptomatic BKV viruria compared to patients viral nephropathy (1.7% vs. 22.5%), and (b) viral genotype 1 compared to other genotypes (2.4% vs. 11.2%). Rearrangements were commoner in malignancy (78.6%), and Norwegians (45.7%), and less common in East Indians (0%), and Japanese (4.3%). A surprising number of rearranged sequences were reported from mononuclear cells of healthy subjects, whereas most plasma sequences were archetypal. This difference could not be related to potential recombinase activity in lymphocytes, as consensus recombination signal sequences could not be found in the NCCR region. NCCR rearrangements are neither required nor a sufficient condition to produce clinical disease. BKV nephropathy and hemorrhagic cystitis are not associated with any unique NCCR configuration or nucleotide sequence.
Savic, Branislav; Müri, René; Meier, Beat
Transcranial direct current stimulation (tDCS) is assumed to affect cortical excitability and dependent on the specific stimulation conditions either to increase or decrease learning. The purpose of this study was to modulate implicit task sequence learning with tDCS. As cortico-striatal loops are critically involved in implicit task sequence learning, tDCS was applied above the dorsolateral prefrontal cortex (DLPFC). In Experiment 1, anodal, cathodal, or sham tDCS was applied before the start of the sequence learning task. In Experiment 2, stimulation was applied during the sequence learning task. Consolidation of learning was assessed after 24 h. The results of both experiments showed that implicit task sequence learning occurred consistently but it was not modulated by different tDCS conditions. Similarly, consolidation measured after a 24 h-interval including sleep was also not affected by stimulation. These results indicate that a single session of DLPFC tDCS is not sufficient to modulate implicit task sequence learning. This study adds to the accumulating evidence that tDCS may not be as effective as originally thought. Copyright © 2017 Elsevier Inc. All rights reserved.
Information Assurance in Wireless Networks
NASA Astrophysics Data System (ADS)
Kabara, Joseph; Krishnamurthy, Prashant; Tipper, David
2001-09-01
Emerging wireless networks will contain a hybrid infrastructure based on fixed, mobile and ad hoc topologies and technologies. In such a dynamic architecture, we define information assurance as the provisions for both information security and information availability. The implications of this definition are that the wireless network architecture must (a) provide sufficient security measures, (b) be survivable under node or link attack or failure and (c) be designed such that sufficient capacity remains for all critical services (and preferably most other services) in the event of attack or component failure. We have begun a research project to investigate the provision of information assurance for wireless networks viz. survivability, security and availability and here discuss the issues and challenges therein.
Gärling, T
1996-09-01
How people choose between sequences of actions was investigated in an everyday errand-planning task. In this task subjects chose the preferred sequence of performing a number of errands in a fictitious environment. Two experiments were conducted with undergraduate students serving as subjects. One group searched information about each alternative. The same information was directly available to another group. In Experiment 1 the results showed that for two errands subjects took into account all attributes describing the errands, thus suggesting a tradeoff between priority, wait time, and travel distance with priority being the most important. Consistent with this finding predominantly intraalternative information search was observed. These results were replicated in Experiment 2 for three errands. In addition choice outcomes, information search, and sequence of responding suggested that for more than two actions sequence choices are made in stages.
Eaton, Deren A R; Spriggs, Elizabeth L; Park, Brian; Donoghue, Michael J
2017-05-01
Restriction-site associated DNA (RAD) sequencing and related methods rely on the conservation of enzyme recognition sites to isolate homologous DNA fragments for sequencing, with the consequence that mutations disrupting these sites lead to missing information. There is thus a clear expectation for how missing data should be distributed, with fewer loci recovered between more distantly related samples. This observation has led to a related expectation: that RAD-seq data are insufficiently informative for resolving deeper scale phylogenetic relationships. Here we investigate the relationship between missing information among samples at the tips of a tree and information at edges within it. We re-analyze and review the distribution of missing data across ten RAD-seq data sets and carry out simulations to determine expected patterns of missing information. We also present new empirical results for the angiosperm clade Viburnum (Adoxaceae, with a crown age >50 Ma) for which we examine phylogenetic information at different depths in the tree and with varied sequencing effort. The total number of loci, the proportion that are shared, and phylogenetic informativeness varied dramatically across the examined RAD-seq data sets. Insufficient or uneven sequencing coverage accounted for similar proportions of missing data as dropout from mutation-disruption. Simulations reveal that mutation-disruption, which results in phylogenetically distributed missing data, can be distinguished from the more stochastic patterns of missing data caused by low sequencing coverage. In Viburnum, doubling sequencing coverage nearly doubled the number of parsimony informative sites, and increased by >10X the number of loci with data shared across >40 taxa. Our analysis leads to a set of practical recommendations for maximizing phylogenetic information in RAD-seq studies. [hierarchical redundancy; phylogenetic informativeness; quartet informativeness; Restriction-site associated DNA (RAD) sequencing; sequencing coverage; Viburnum.]. © The authors 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please e-mail: journals.permission@oup.com.
Gardiner, Laura-Jayne; Gawroński, Piotr; Olohan, Lisa; Schnurbusch, Thorsten; Hall, Neil; Hall, Anthony
2014-12-01
Mapping-by-sequencing analyses have largely required a complete reference sequence and employed whole genome re-sequencing. In species such as wheat, no finished genome reference sequence is available. Additionally, because of its large genome size (17 Gb), re-sequencing at sufficient depth of coverage is not practical. Here, we extend the utility of mapping by sequencing, developing a bespoke pipeline and algorithm to map an early-flowering locus in einkorn wheat (Triticum monococcum L.) that is closely related to the bread wheat genome A progenitor. We have developed a genomic enrichment approach using the gene-rich regions of hexaploid bread wheat to design a 110-Mbp NimbleGen SeqCap EZ in solution capture probe set, representing the majority of genes in wheat. Here, we use the capture probe set to enrich and sequence an F2 mapping population of the mutant. The mutant locus was identified in T. monococcum, which lacks a complete genome reference sequence, by mapping the enriched data set onto pseudo-chromosomes derived from the capture probe target sequence, with a long-range order of genes based on synteny of wheat with Brachypodium distachyon. Using this approach we are able to map the region and identify a set of deleted genes within the interval. © 2014 The Authors.The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.
Elman RNN based classification of proteins sequences on account of their mutual information.
Mishra, Pooja; Nath Pandey, Paras
2012-10-21
In the present work we have employed the method of estimating residue correlation within the protein sequences, by using the mutual information (MI) of adjacent residues, based on structural and solvent accessibility properties of amino acids. The long range correlation between nonadjacent residues is improved by constructing a mutual information vector (MIV) for a single protein sequence, like this each protein sequence is associated with its corresponding MIVs. These MIVs are given to Elman RNN to obtain the classification of protein sequences. The modeling power of MIV was shown to be significantly better, giving a new approach towards alignment free classification of protein sequences. We also conclude that sequence structural and solvent accessible property based MIVs are better predictor. Copyright © 2012 Elsevier Ltd. All rights reserved.
Image encryption using random sequence generated from generalized information domain
NASA Astrophysics Data System (ADS)
Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu
2016-05-01
A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.
Code of Federal Regulations, 2012 CFR
2012-01-01
... statement of the basis for the appeal with sufficient facts, information, analysis, and explanation to rebut... materials are received. (2) Additional information. FHFA may request additional information or further...
Code of Federal Regulations, 2014 CFR
2014-01-01
... statement of the basis for the appeal with sufficient facts, information, analysis, and explanation to rebut... materials are received. (2) Additional information. FHFA may request additional information or further...
Code of Federal Regulations, 2011 CFR
2011-01-01
... statement of the basis for the appeal with sufficient facts, information, analysis, and explanation to rebut... materials are received. (2) Additional information. FHFA may request additional information or further...
Code of Federal Regulations, 2013 CFR
2013-01-01
... statement of the basis for the appeal with sufficient facts, information, analysis, and explanation to rebut... materials are received. (2) Additional information. FHFA may request additional information or further...
NASA Astrophysics Data System (ADS)
Shang-Guan, Li-Ying; Sun, Hong-Xiang; Wen, Qiao-Yan; Zhu, Fu-Chen
2009-12-01
Firstly, we investigate the necessary and sufficient conditions that an entangled channel of n-qubits should satisfy to carry out perfect teleportation of an arbitrary single qubit state and dense coding. It is shown that the sender can transmit two classical bits of information by sending one qubit. Further, the case of high-dimension quantum state is also considered. Utilizing n-qudit state as quantum channel, it is proposed that the necessary and sufficient conditions are {(d+2)(d-1)}/{2} in all to teleport an arbitrary single qudit state. The sender can transmit 2log2d classical bits of information to the receiver conditioned on the constraints.
2014-01-01
Background Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Results Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. Conclusion The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification. PMID:24418292
Taber, Jennifer M; Klein, William M P; Ferrer, Rebecca A; Lewis, Katie L; Harris, Peter R; Shepperd, James A; Biesecker, Leslie G
2015-08-01
Information avoidance is a defensive strategy that undermines receipt of potentially beneficial but threatening health information and may especially occur when threat management resources are unavailable. We examined whether individual differences in information avoidance predicted intentions to receive genetic sequencing results for preventable and unpreventable (i.e., more threatening) disease and, secondarily, whether threat management resources of self-affirmation or optimism mitigated any effects. Participants (N = 493) in an NIH study (ClinSeq®) piloting the use of genome sequencing reported intentions to receive (optional) sequencing results and completed individual difference measures of information avoidance, self-affirmation, and optimism. Information avoidance tendencies corresponded with lower intentions to learn results, particularly for unpreventable diseases. The association was weaker among individuals higher in self-affirmation or optimism, but only for results regarding preventable diseases. Information avoidance tendencies may influence decisions to receive threatening health information; threat management resources hold promise for mitigating this association.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas
We present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a Metagenome-Assembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Gene Sequencemore » (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.« less
Effects of Sequences of Cognitions on Group Performance Over Time
Molenaar, Inge; Chiu, Ming Ming
2017-01-01
Extending past research showing that sequences of low cognitions (low-level processing of information) and high cognitions (high-level processing of information through questions and elaborations) influence the likelihoods of subsequent high and low cognitions, this study examines whether sequences of cognitions are related to group performance over time; 54 primary school students (18 triads) discussed and wrote an essay about living in another country (32,375 turns of talk). Content analysis and statistical discourse analysis showed that within each lesson, groups with more low cognitions or more sequences of low cognition followed by high cognition added more essay words. Groups with more high cognitions, sequences of low cognition followed by low cognition, or sequences of high cognition followed by an action followed by low cognition, showed different words and sequences, suggestive of new ideas. The links between cognition sequences and group performance over time can inform facilitation and assessment of student discussions. PMID:28490854
Effects of Sequences of Cognitions on Group Performance Over Time.
Molenaar, Inge; Chiu, Ming Ming
2017-04-01
Extending past research showing that sequences of low cognitions (low-level processing of information) and high cognitions (high-level processing of information through questions and elaborations) influence the likelihoods of subsequent high and low cognitions, this study examines whether sequences of cognitions are related to group performance over time; 54 primary school students (18 triads) discussed and wrote an essay about living in another country (32,375 turns of talk). Content analysis and statistical discourse analysis showed that within each lesson, groups with more low cognitions or more sequences of low cognition followed by high cognition added more essay words. Groups with more high cognitions, sequences of low cognition followed by low cognition, or sequences of high cognition followed by an action followed by low cognition, showed different words and sequences, suggestive of new ideas. The links between cognition sequences and group performance over time can inform facilitation and assessment of student discussions.
Clifford, Jacob; Adami, Christoph
2015-09-02
Transcription factor binding to the surface of DNA regulatory regions is one of the primary causes of regulating gene expression levels. A probabilistic approach to model protein-DNA interactions at the sequence level is through position weight matrices (PWMs) that estimate the joint probability of a DNA binding site sequence by assuming positional independence within the DNA sequence. Here we construct conditional PWMs that depend on the motif signatures in the flanking DNA sequence, by conditioning known binding site loci on the presence or absence of additional binding sites in the flanking sequence of each site's locus. Pooling known sites with similar flanking sequence patterns allows for the estimation of the conditional distribution function over the binding site sequences. We apply our model to the Dorsal transcription factor binding sites active in patterning the Dorsal-Ventral axis of Drosophila development. We find that those binding sites that cooperate with nearby Twist sites on average contain about 0.5 bits of information about the presence of Twist transcription factor binding sites in the flanking sequence. We also find that Dorsal binding site detectors conditioned on flanking sequence information make better predictions about what is a Dorsal site relative to background DNA than detection without information about flanking sequence features.
2014-11-01
Kullback , S., & Leibler , R. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 79...cognitive challenges of sensemaking only informally using conceptual notions like "framing" and "re-framing", which are not sufficient to support T&E in...appropriate frame(s) from memory. Assess the Frame: Evaluate the quality of fit between data and frame. Generate Hypotheses: Use the current
de Hooge, Manouk; van den Berg, Rosaline; Navarro-Compán, Victoria; van Gaalen, Floris; van der Heijde, Désirée; Huizinga, Tom; Reijnierse, Monique
2013-07-01
To investigate the additional value of T1 fat-saturated after gadolinium (T1/Gd) compared with T1 and short tau inversion recovery (STIR) sequence in detecting active lesions of the SI joints typical of axial SpA (axSpA) in a prospective cohort study, the SpondyloArthritis Caught Early (SPACE) cohort, and to assess its influence on final MRI diagnosis of the SI joint (MRI-SIJ) based on the Assessment of Spondyloarthritis International Society (ASAS) definition of active sacroiliitis. Patients in the SPACE cohort received baseline and 3-month follow-up MRI-SIJ with coronal oblique T1, STIR and T1/Gd sequences. Bone marrow oedema (BME), capsulitis/enthesitis and synovitis and active sacroiliitis according to the ASAS definition were evaluated by three blinded readers. A total of 127 patients received an MRI-SIJ at baseline and 67 patients also received an MRI-SIJ at 3 months follow-up since the Gd protocol was added some months after the start of the SPACE project. Twenty-five of the 127 patients (19.7%) with a baseline MRI-SIJ and 14 of 67 patients (20.6%) with a follow-up MRI-SIJ presented BME on the STIR sequence sufficient to fulfill the ASAS definition for a positive MRI-SIJ. In eight patients, additional synovitis and/or capsulitis/enthesitis was observed; however, no additional BME was visualized on T1/Gd. One patient, without clinical diagnosis of axSpA, showed synovitis as an isolated finding. Synovitis and capsulitis/enthesitis are detectable with the administration of Gd. However, they are always observed in the presence of BME. Therefore T1 and STIR sequence alone are sufficient in the MRI assessment that, among others, is used for diagnosing patients with early axSpA.
1995-01-01
The permeation of monovalent cations through the cGMP-gated channel of catfish cone outer segments was examined by measuring permeability and conductance ratios under biionic conditions. For monovalent cations presented on the cytoplasmic side of the channel, the permeability ratios with respect to extracellular Na followed the sequence NH4 > K > Li > Rb = Na > Cs while the conductance ratios at +50 mV followed the sequence Na approximately NH4 > K > Rb > Li = Cs. These patterns are broadly similar to the amphibian rod channel. The symmetry of the channel was tested by presenting the test ion on the extracellular side and using Na as the common reference ion on the cytoplasmic side. Under these biionic conditions, the permeability ratios with respect to Na at the intracellular side followed the sequence NH4 > Li > K > Na > Rb > Cs while the conductance ratios at +50 mV followed the sequence NH4 > K approximately Na > Rb > Li > Cs. Thus, the channel is asymmetric with respect to external and internal cations. Under symmetrical 120 mM ionic conditions, the single-channel conductance at +50 mV ranged from 58 pS in NH4 to 15 pS for Cs and was in the order NH4 > Na > K > Rb > Cs. Unexpectedly, the single-channel current-voltage relation showed sufficient outward rectification to account for the rectification observed in multichannel patches without invoking voltage dependence in gating. The concentration dependence of the reversal potential for K showed that chloride was impermeant. Anomalous mole fraction behavior was not observed, nor, over a limited concentration range, were multiple dissociation constants. An Eyring rate theory model with a single binding site was sufficient to explain these observations. PMID:8786344
Functional linear models for association analysis of quantitative traits.
Fan, Ruzong; Wang, Yifan; Mills, James L; Wilson, Alexander F; Bailey-Wilson, Joan E; Xiong, Momiao
2013-11-01
Functional linear models are developed in this paper for testing associations between quantitative traits and genetic variants, which can be rare variants or common variants or the combination of the two. By treating multiple genetic variants of an individual in a human population as a realization of a stochastic process, the genome of an individual in a chromosome region is a continuum of sequence data rather than discrete observations. The genome of an individual is viewed as a stochastic function that contains both linkage and linkage disequilibrium (LD) information of the genetic markers. By using techniques of functional data analysis, both fixed and mixed effect functional linear models are built to test the association between quantitative traits and genetic variants adjusting for covariates. After extensive simulation analysis, it is shown that the F-distributed tests of the proposed fixed effect functional linear models have higher power than that of sequence kernel association test (SKAT) and its optimal unified test (SKAT-O) for three scenarios in most cases: (1) the causal variants are all rare, (2) the causal variants are both rare and common, and (3) the causal variants are common. The superior performance of the fixed effect functional linear models is most likely due to its optimal utilization of both genetic linkage and LD information of multiple genetic variants in a genome and similarity among different individuals, while SKAT and SKAT-O only model the similarities and pairwise LD but do not model linkage and higher order LD information sufficiently. In addition, the proposed fixed effect models generate accurate type I error rates in simulation studies. We also show that the functional kernel score tests of the proposed mixed effect functional linear models are preferable in candidate gene analysis and small sample problems. The methods are applied to analyze three biochemical traits in data from the Trinity Students Study. © 2013 WILEY PERIODICALS, INC.
Discovery of the porcine NGN3 gene and testing its endocrine function in the pig
USDA-ARS?s Scientific Manuscript database
Neurogenin 3 (NGN3) is a member of the basic helix-loop-helix transcription factor family. NGN3 is both necessary and sufficient to drive endocrine differentiation in the developing pancreas in mouse and humans. Until now, the sequence for NGN3 eluded discovery despite completion of the pig genome a...
USDA-ARS?s Scientific Manuscript database
The identification of specific genes underlying phenotypic variation of complex traits remains one of the greatest challenges in biology despite having genome sequences and more powerful tools. Most genome-wide screens lack sufficient resolving power as they typically depend on linkage. One altern...
Supporting Teachers' Use of Research-Based Instructional Sequences
ERIC Educational Resources Information Center
Cobb, Paul; Jackson, Kara
2015-01-01
In this paper, we frame the dissemination of the products of classroom design studies as a process of supporting the learning of large numbers of teachers. We argue that high-quality pull-out professional development is essential but not sufficient, and go on to consider teacher collaboration and one-on-one coaching in the classroom as additional…
Pre-processing SAR image stream to facilitate compression for transport on bandwidth-limited-link
Rush, Bobby G.; Riley, Robert
2015-09-29
Pre-processing is applied to a raw VideoSAR (or similar near-video rate) product to transform the image frame sequence into a product that resembles more closely the type of product for which conventional video codecs are designed, while sufficiently maintaining utility and visual quality of the product delivered by the codec.
Learning viewpoint invariant perceptual representations from cluttered images.
Spratling, Michael W
2005-05-01
In order to perform object recognition, it is necessary to form perceptual representations that are sufficiently specific to distinguish between objects, but that are also sufficiently flexible to generalize across changes in location, rotation, and scale. A standard method for learning perceptual representations that are invariant to viewpoint is to form temporal associations across image sequences showing object transformations. However, this method requires that individual stimuli be presented in isolation and is therefore unlikely to succeed in real-world applications where multiple objects can co-occur in the visual input. This paper proposes a simple modification to the learning method that can overcome this limitation and results in more robust learning of invariant representations.
Kamihigashi, Takashi
2017-01-01
Given a sequence [Formula: see text] of measurable functions on a σ -finite measure space such that the integral of each [Formula: see text] as well as that of [Formula: see text] exists in [Formula: see text], we provide a sufficient condition for the following inequality to hold: [Formula: see text] Our condition is considerably weaker than sufficient conditions known in the literature such as uniform integrability (in the case of a finite measure) and equi-integrability. As an application, we obtain a new result on the existence of an optimal path for deterministic infinite-horizon optimization problems in discrete time.
LookSeq: a browser-based viewer for deep sequencing data.
Manske, Heinrich Magnus; Kwiatkowski, Dominic P
2009-11-01
Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.
Taschner, Christian A; Le Thuc, Vianney; Reyns, Nicolas; Gieseke, Juergen; Gauvrit, Jean-Yves; Pruvo, Jean-Pierre; Leclerc, Xavier
2007-10-01
The aim of this study was to develop an algorithm for the integration of time-resolved contrast-enhanced magnetic resonance (MR) angiography into dosimetry planning for Gamma Knife surgery (GKS) of arteriovenous malformations (AVMs) in the brain. Twelve patients harboring brain AVMs referred for GKS underwent intraarterial digital subtraction (DS) angiography and time-resolved MR angiography while wearing an externally applied cranial stereotactic frame. Time-resolved MR angiography was performed on a 1.5-tesla MR unit (Achieva, Philips Medical Systems) using contrast-enhanced 3D fast field echo sequencing with stochastic central k-space ordering. Postprocessing with interactive data language (Research Systems, Inc.) produced hybrid data sets containing dynamic angiographic information and the MR markers necessary for stereotactic transformation. Image files were sent to the Leksell GammaPlan system (Elekta) for dosimetry planning. Stereotactic transformation of the hybrid data sets containing the time-resolved MR angiography information with automatic detection of the MR markers was possible in all 12 cases. The stereotactic coordinates of vascular structures predefined from time-resolved MR angiography matched with DS angiography data in all cases. In 10 patients dosimetry planning could be performed based on time-resolved MR angiography data. In two patients, time-resolved MR angiography data alone were considered insufficient. The target volumes showed a notable shift of centers between modalities. Integration of time-resolved MR angiography data into the Leksell GammaPlan system for patients with brain AVMs is feasible. The proposed algorithm seems concise and sufficiently robust for clinical application. The quality of the time-resolved MR angiography sequencing needs further improvement.
Tempo and mode of genomic mutations unveil human evolutionary history.
Hara, Yuichiro
2015-01-01
Mutations that have occurred in human genomes provide insight into various aspects of evolutionary history such as speciation events and degrees of natural selection. Comparing genome sequences between human and great apes or among humans is a feasible approach for inferring human evolutionary history. Recent advances in high-throughput or so-called 'next-generation' DNA sequencing technologies have enabled the sequencing of thousands of individual human genomes, as well as a variety of reference genomes of hominids, many of which are publicly available. These sequence data can help to unveil the detailed demographic history of the lineage leading to humans as well as the explosion of modern human population size in the last several thousand years. In addition, high-throughput sequencing illustrates the tempo and mode of de novo mutations, which are producing human genetic variation at this moment. Pedigree-based human genome sequencing has shown that mutation rates vary significantly across the human genome. These studies have also provided an improved timescale of human evolution, because the mutation rate estimated from pedigree analysis is half that estimated from traditional analyses based on molecular phylogeny. Because of the dramatic reduction in sequencing cost, sequencing on-demand samples designed for specific studies is now also becoming popular. To produce data of sufficient quality to meet the requirements of the study, it is necessary to set an explicit sequencing plan that includes the choice of sample collection methods, sequencing platforms, and number of sequence reads.
Nöth, Ulrike; Laufs, Helmut; Stoermer, Robert; Deichmann, Ralf
2012-03-01
To describe heating effects to be expected in simultaneous electroencephalography (EEG) and magnetic resonance imaging (MRI) when deviating from the EEG manufacturer's instructions; to test which anatomical MRI sequences have a sufficiently low specific absorption rate (SAR) to be performed with the EEG equipment in place; and to suggest precautions to reduce the risk of heating. Heating was determined in vivo below eight EEG electrodes, using both head and body coil transmission and sequences covering the whole range of SAR values. Head transmit coil: temperature increases were below 2.2°C for low SAR sequences, but reached 4.6°C (one subject, clavicle) for high SAR sequences; the equilibrium temperature T(eq) remained below 39°C. Body transmit coil: temperature increases were higher and more frequent over subjects and electrodes, with values below 2.6°C for low SAR sequences, reaching 6.9°C for high SAR sequences (T8 electrode) with T(eq) exceeding a critical level of 40°C. Anatomical imaging should be based on T1-weighted sequences (FLASH, MPRAGE, MDEFT) with an SAR below values for functional MRI sequences based on gradient echo planar imaging. Anatomical sequences with a high SAR can pose a significant risk, which is reduced by using head coil transmission. Copyright © 2011 Wiley-Liss, Inc.
Haplotype estimation using sequencing reads.
Delaneau, Olivier; Howie, Bryan; Cox, Anthony J; Zagury, Jean-François; Marchini, Jonathan
2013-10-03
High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Quantum Markov chains, sufficiency of quantum channels, and Rényi information measures
NASA Astrophysics Data System (ADS)
Datta, Nilanjana; Wilde, Mark M.
2015-12-01
A short quantum Markov chain is a tripartite state {ρ }{ABC} such that system A can be recovered perfectly by acting on system C of the reduced state {ρ }{BC}. Such states have conditional mutual information I(A;B| C) equal to zero and are the only states with this property. A quantum channel {N} is sufficient for two states ρ and σ if there exists a recovery channel using which one can perfectly recover ρ from {N}(ρ ) and σ from {N}(σ ). The relative entropy difference D(ρ \\parallel σ )-D({N}(ρ )\\parallel {N}(σ )) is equal to zero if and only if {N} is sufficient for ρ and σ. In this paper, we show that these properties extend to Rényi generalizations of these information measures which were proposed in (Berta et al 2015 J. Math. Phys. 56 022205; Seshadreesan et al 2015 J. Phys. A: Math. Theor. 48 395303), thus providing an alternate characterization of short quantum Markov chains and sufficient quantum channels. These results give further support to these quantities as being legitimate Rényi generalizations of the conditional mutual information and the relative entropy difference. Along the way, we solve some open questions of Ruskai and Zhang, regarding the trace of particular matrices that arise in the study of monotonicity of relative entropy under quantum operations and strong subadditivity of the von Neumann entropy.
A public HTLV-1 molecular epidemiology database for sequence management and data mining.
Araujo, Thessika Hialla Almeida; Souza-Brito, Leandro Inacio; Libin, Pieter; Deforche, Koen; Edwards, Dustin; de Albuquerque-Junior, Antonio Eduardo; Vandamme, Anne-Mieke; Galvao-Castro, Bernardo; Alcantara, Luiz Carlos Junior
2012-01-01
It is estimated that 15 to 20 million people are infected with the human T-cell lymphotropic virus type 1 (HTLV-1). At present, there are more than 2,000 unique HTLV-1 isolate sequences published. A central database to aggregate sequence information from a range of epidemiological aspects including HTLV-1 infections, pathogenesis, origins, and evolutionary dynamics would be useful to scientists and physicians worldwide. Described here, we have developed a database that collects and annotates sequence data and can be accessed through a user-friendly search interface. The HTLV-1 Molecular Epidemiology Database website is available at http://htlv1db.bahia.fiocruz.br/. All data was obtained from publications available at GenBank or through contact with the authors. The database was developed using Apache Webserver 2.1.6 and SGBD MySQL. The webpage interfaces were developed in HTML and sever-side scripting written in PHP. The HTLV-1 Molecular Epidemiology Database is hosted on the Gonçalo Moniz/FIOCRUZ Research Center server. There are currently 2,457 registered sequences with 2,024 (82.37%) of those sequences representing unique isolates. Of these sequences, 803 (39.67%) contain information about clinical status (TSP/HAM, 17.19%; ATL, 7.41%; asymptomatic, 12.89%; other diseases, 2.17%; and no information, 60.32%). Further, 7.26% of sequences contain information on patient gender while 5.23% of sequences provide the age of the patient. The HTLV-1 Molecular Epidemiology Database retrieves and stores annotated HTLV-1 proviral sequences from clinical, epidemiological, and geographical studies. The collected sequences and related information are now accessible on a publically available and user-friendly website. This open-access database will support clinical research and vaccine development related to viral genotype.
2010-01-01
Background Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. Results We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. Conclusions We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch. PMID:20682041
Randomizer for High Data Rates
NASA Technical Reports Server (NTRS)
Garon, Howard; Sank, Victor J.
2018-01-01
NASA as well as a number of other space agencies now recognize that the current recommended CCSDS randomizer used for telemetry (TM) is too short. When multiple applications of the PN8 Maximal Length Sequence (MLS) are required in order to fully cover a channel access data unit (CADU), spectral problems in the form of elevated spurious discretes (spurs) appear. Originally the randomizer was called a bit transition generator (BTG) precisely because it was thought that its primary value was to insure sufficient bit transitions to allow the bit/symbol synchronizer to lock and remain locked. We, NASA, have shown that the old BTG concept is a limited view of the real value of the randomizer sequence and that the randomizer also aids in signal acquisition as well as minimizing the potential for false decoder lock. Under the guidelines we considered here there are multiple maximal length sequences under GF(2) which appear attractive in this application. Although there may be mitigating reasons why another MLS sequence could be selected, one sequence in particular possesses a combination of desired properties which offsets it from the others.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, L.E.; Detter, C,; Barrie, K.
2006-06-01
Sequencing of the large (>50 kb), low-copy-number (<5 per cell) plasmids that mediate horizontal gene transfer has been hindered by the difficulty and expense of isolating DNA from individual plasmids of this class. We report here that a kit method previously devised for purification of bacterial artificial chromosomes (BACs) can be adapted for effective preparation of individual plasmids up to 220 kb from wild gram-negative and gram-positive bacteria. Individual plasmid DNA recovered from less than 10 ml of Escherichia coli, Staphylococcus, and Corynebacterium cultures was of sufficient quantity and quality for construction of highcoverage libraries, as shown by sequencing fivemore » native plasmids ranging in size from 30 kb to 94 kb. We also report recommendations for vector screening to optimize plasmid sequence assembly, preliminary annotation of novel plasmid genomes, and insights on mobile genetic element biology derived from these sequences. Adaptation of this BAC method for large plasmid isolation removes one major technical hurdle to expanding our knowledge of the natural plasmid gene pool.« less
cWINNOWER algorithm for finding fuzzy dna motifs
NASA Technical Reports Server (NTRS)
Liang, S.; Samanta, M. P.; Biegel, B. A.
2004-01-01
The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4). Copyright Imperial College Press.
Method for performing site-specific affinity fractionation for use in DNA sequencing
Mirzabekov, Andrei Darievich; Lysov, Yuri Petrovich; Dubley, Svetlana A.
1999-01-01
A method for fractionating and sequencing DNA via affinity interaction is provided comprising contacting cleaved DNA to a first array of oligonucleotide molecules to facilitate hybridization between said cleaved DNA and the molecules; extracting the hybridized DNA from the molecules; contacting said extracted hybridized DNA with a second array of oligonucleotide molecules, wherein the oligonucleotide molecules in the second array have specified base sequences that are complementary to said extracted hybridized DNA; and attaching labeled DNA to the second array of oligonucleotide molecules, wherein the labeled re-hybridized DNA have sequences that are complementary to the oligomers. The invention further provides a method for performing multi-step conversions of the chemical structure of compounds comprising supplying an array of polyacrylamide vessels separated by hydrophobic surfaces; immobilizing a plurality of reactants, such as enzymes, in the vessels so that each vessel contains one reactant; contacting the compounds to each of the vessels in a predetermined sequence and for a sufficient time to convert the compounds to a desired state; and isolating the converted compounds from said array.
cWINNOWER Algorithm for Finding Fuzzy DNA Motifs
NASA Technical Reports Server (NTRS)
Liang, Shoudan
2003-01-01
The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if multiple mutated copies of the motif (i.e., the signals) are present in the DNA sequence in sufficient abundance. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum number of detectable motifs qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc, by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12000 for (l,d) = (15,4).
Mirzabekov, Andrei Darievich; Lysov, Yuri Petrovich; Dubley, Svetlana A.
2000-01-01
A method for fractionating and sequencing DNA via affinity interaction is provided comprising contacting cleaved DNA to a first array of oligonucleotide molecules to facilitate hybridization between said cleaved DNA and the molecules; extracting the hybridized DNA from the molecules; contacting said extracted hybridized DNA with a second array of oligonucleotide molecules, wherein the oligonucleotide molecules in the second array have specified base sequences that are complementary to said extracted hybridized DNA; and attaching labeled DNA to the second array of oligonucleotide molecules, wherein the labeled re-hybridized DNA have sequences that are complementary to the oligomers. The invention further provides a method for performing multi-step conversions of the chemical structure of compounds comprising supplying an array of polyacrylamide vessels separated by hydrophobic surfaces; immobilizing a plurality of reactants, such as enzymes, in the vessels so that each vessel contains one reactant; contacting the compounds to each of the vessels in a predetermined sequence and for a sufficient time to convert the compounds to a desired state; and isolating the converted compounds from said array.
Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing
Tsai, Yu-Chih; Deming, Clayton; Segre, Julia A.; Kong, Heidi H.; Korlach, Jonas
2016-01-01
ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. PMID:26861018
Ikram, Najmul; Qadir, Muhammad Abdul; Afzal, Muhammad Tanvir
2018-01-01
Sequence similarity is a commonly used measure to compare proteins. With the increasing use of ontologies, semantic (function) similarity is getting importance. The correlation between these measures has been applied in the evaluation of new semantic similarity methods, and in protein function prediction. In this research, we investigate the relationship between the two similarity methods. The results suggest absence of a strong correlation between sequence and semantic similarities. There is a large number of proteins with low sequence similarity and high semantic similarity. We observe that Pearson's correlation coefficient is not sufficient to explain the nature of this relationship. Interestingly, the term semantic similarity values above 0 and below 1 do not seem to play a role in improving the correlation. That is, the correlation coefficient depends only on the number of common GO terms in proteins under comparison, and the semantic similarity measurement method does not influence it. Semantic similarity and sequence similarity have a distinct behavior. These findings are of significant effect for future works on protein comparison, and will help understand the semantic similarity between proteins in a better way.
Method for performing site-specific affinity fractionation for use in DNA sequencing
Mirzabekov, A.D.; Lysov, Y.P.; Dubley, S.A.
1999-05-18
A method for fractionating and sequencing DNA via affinity interaction is provided comprising contacting cleaved DNA to a first array of oligonucleotide molecules to facilitate hybridization between the cleaved DNA and the molecules; extracting the hybridized DNA from the molecules; contacting the extracted hybridized DNA with a second array of oligonucleotide molecules, wherein the oligonucleotide molecules in the second array have specified base sequences that are complementary to the extracted hybridized DNA; and attaching labeled DNA to the second array of oligonucleotide molecules, wherein the labeled re-hybridized DNA have sequences that are complementary to the oligomers. The invention further provides a method for performing multi-step conversions of the chemical structure of compounds comprising supplying an array of polyacrylamide vessels separated by hydrophobic surfaces; immobilizing a plurality of reactants, such as enzymes, in the vessels so that each vessel contains one reactant; contacting the compounds to each of the vessels in a predetermined sequence and for a sufficient time to convert the compounds to a desired state; and isolating the converted compounds from the array. 14 figs.
A Method for Preparing DNA Sequencing Templates Using a DNA-Binding Microplate
Yang, Yu; Hebron, Haroun R.; Hang, Jun
2009-01-01
A DNA-binding matrix was immobilized on the surface of a 96-well microplate and used for plasmid DNA preparation for DNA sequencing. The same DNA-binding plate was used for bacterial growth, cell lysis, DNA purification, and storage. In a single step using one buffer, bacterial cells were lysed by enzymes, and released DNA was captured on the plate simultaneously. After two wash steps, DNA was eluted and stored in the same plate. Inclusion of phosphates in the culture medium was found to enhance the yield of plasmid significantly. Purified DNA samples were used successfully in DNA sequencing with high consistency and reproducibility. Eleven vectors and nine libraries were tested using this method. In 10 μl sequencing reactions using 3 μl sample and 0.25 μl BigDye Terminator v3.1, the results from a 3730xl sequencer gave a success rate of 90–95% and read-lengths of 700 bases or more. The method is fully automatable and convenient for manual operation as well. It enables reproducible, high-throughput, rapid production of DNA with purity and yields sufficient for high-quality DNA sequencing at a substantially reduced cost. PMID:19568455
Discovery sequence and the nature of low permeability gas accumulations
Attanasi, E.D.
2005-01-01
There is an ongoing discussion regarding the geologic nature of accumulations that host gas in low-permeability sandstone environments. This note examines the discovery sequence of the accumulations in low permeability sandstone plays that were classified as continuous-type by the U.S. Geological Survey for the 1995 National Oil and Gas Assessment. It compares the statistical character of historical discovery sequences of accumulations associated with continuous-type sandstone gas plays to those of conventional plays. The seven sandstone plays with sufficient data exhibit declining size with sequence order, on average, and in three of the seven the trend is statistically significant. Simulation experiments show that both a skewed endowment size distribution and a discovery process that mimics sampling proportional to size are necessary to generate a discovery sequence that consistently produces a statistically significant negative size order relationship. The empirical findings suggest that discovery sequence could be used to constrain assessed gas in untested areas. The plays examined represent 134 of the 265 trillion cubic feet of recoverable gas assessed in undeveloped areas of continuous-type gas plays in low permeability sandstone environments reported in the 1995 National Assessment. ?? 2005 International Association for Mathematical Geology.
Analysis of the cytochrome c oxidase subunit II (COX2) gene in giant panda, Ailuropoda melanoleuca.
Ling, S S; Zhu, Y; Lan, D; Li, D S; Pang, H Z; Wang, Y; Li, D Y; Wei, R P; Zhang, H M; Wang, C D; Hu, Y D
2017-01-23
The giant panda, Ailuropoda melanoleuca (Ursidae), has a unique bamboo-based diet; however, this low-energy intake has been sufficient to maintain the metabolic processes of this species since the fourth ice age. As mitochondria are the main sites for energy metabolism in animals, the protein-coding genes involved in mitochondrial respiratory chains, particularly cytochrome c oxidase subunit II (COX2), which is the rate-limiting enzyme in electron transfer, could play an important role in giant panda metabolism. Therefore, the present study aimed to isolate, sequence, and analyze the COX2 DNA from individuals kept at the Giant Panda Protection and Research Center, China, and compare these sequences with those of the other Ursidae family members. Multiple sequence alignment showed that the COX2 gene had three point mutations that defined three haplotypes, with 60% of the sequences corresponding to haplotype I. The neutrality tests revealed that the COX2 gene was conserved throughout evolution, and the maximum likelihood phylogenetic analysis, using homologous sequences from other Ursidae species, showed clustering of the COX2 sequences of giant pandas, suggesting that this gene evolved differently in them.
[Learning and Repetive Reproduction of Memorized Sequences by the Right and the Left Hand].
Bobrova, E V; Lyakhovetskii, V A; Bogacheva, I N
2015-01-01
An important stage of learning a new skill is repetitive reproduction of one and the same sequence of movements, which plays a significant role in forming of the movement stereotypes. Two groups of right-handers repeatedly memorized (6-10 repetitions) the sequences of their hand transitions by experimenter in 6 positions, firstly by the right hand (RH), and then--by the left hand (LH) or vice versa. Random sequences previously unknown to the volunteers were reproduced in the 11 series. Modified sequences were tested in the 2nd and 3rd series, where the same elements' positions were presented in different order. The processes of repetitive sequence reproduction were similar for RH and LH. However, the learning of the modified sequences differed: Information about elements' position disregarding the reproduction order was used only when LH initiated task performing. This information was not used when LH followed RH and when RH performed the task. Consequently, the type of information coding activated by LH helped learn the positions of sequence elements, while the type of information coding activated by RH prevented learning. It is supposedly connected with the predominant role of right hemisphere in the processes of positional coding and motor learning.
On a new class of completely integrable nonlinear wave equations. II. Multi-Hamiltonian structure
NASA Astrophysics Data System (ADS)
Nutku, Y.
1987-11-01
The multi-Hamiltonian structure of a class of nonlinear wave equations governing the propagation of finite amplitude waves is discussed. Infinitely many conservation laws had earlier been obtained for these equations. Starting from a (primary) Hamiltonian formulation of these equations the necessary and sufficient conditions for the existence of bi-Hamiltonian structure are obtained and it is shown that the second Hamiltonian operator can be constructed solely through a knowledge of the first Hamiltonian function. The recursion operator which first appears at the level of bi-Hamiltonian structure gives rise to an infinite sequence of conserved Hamiltonians. It is found that in general there exist two different infinite sequences of conserved quantities for these equations. The recursion relation defining higher Hamiltonian structures enables one to obtain the necessary and sufficient conditions for the existence of the (k+1)st Hamiltonian operator which depends on the kth Hamiltonian function. The infinite sequence of conserved Hamiltonians are common to all the higher Hamiltonian structures. The equations of gas dynamics are discussed as an illustration of this formalism and it is shown that in general they admit tri-Hamiltonian structure with two distinct infinite sets of conserved quantities. The isothermal case of γ=1 is an exceptional one that requires separate treatment. This corresponds to a specialization of the equations governing the expansion of plasma into vacuum which will be shown to be equivalent to Poisson's equation in nonlinear acoustics.
Computing camera heading: A study
NASA Astrophysics Data System (ADS)
Zhang, John Jiaxiang
2000-08-01
An accurate estimate of the motion of a camera is a crucial first step for the 3D reconstruction of sites, objects, and buildings from video. Solutions to the camera heading problem can be readily applied to many areas, such as robotic navigation, surgical operation, video special effects, multimedia, and lately even in internet commerce. From image sequences of a real world scene, the problem is to calculate the directions of the camera translations. The presence of rotations makes this problem very hard. This is because rotations and translations can have similar effects on the images, and are thus hard to tell apart. However, the visual angles between the projection rays of point pairs are unaffected by rotations, and their changes over time contain sufficient information to determine the direction of camera translation. We developed a new formulation of the visual angle disparity approach, first introduced by Tomasi, to the camera heading problem. Our new derivation makes theoretical analysis possible. Most notably, a theorem is obtained that locates all possible singularities of the residual function for the underlying optimization problem. This allows identifying all computation trouble spots beforehand, and to design reliable and accurate computational optimization methods. A bootstrap-jackknife resampling method simultaneously reduces complexity and tolerates outliers well. Experiments with image sequences show accurate results when compared with the true camera motion as measured with mechanical devices.
TopHat: discovering splice junctions with RNA-Seq
Trapnell, Cole; Pachter, Lior; Salzberg, Steven L.
2009-01-01
Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu Contact: cole@cs.umd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19289445
Kashiwagi, Tom; Maxwell, Elisabeth A; Marshall, Andrea D; Christensen, Ana B
2015-01-01
Sharks and rays are increasingly being identified as high-risk species for extinction, prompting urgent assessments of their local or regional populations. Advanced genetic analyses can contribute relevant information on effective population size and connectivity among populations although acquiring sufficient regional sample sizes can be challenging. DNA is typically amplified from tissue samples which are collected by hand spears with modified biopsy punch tips. This technique is not always popular due mainly to a perception that invasive sampling might harm the rays, change their behaviour, or have a negative impact on tourism. To explore alternative methods, we evaluated the yields and PCR success of DNA template prepared from the manta ray mucus collected underwater and captured and stored on a Whatman FTA™ Elute card. The pilot study demonstrated that mucus can be effectively collected underwater using toothbrush. DNA stored on cards was found to be reliable for PCR-based population genetics studies. We successfully amplified mtDNA ND5, nuclear DNA RAG1, and microsatellite loci for all samples and confirmed sequences and genotypes being those of target species. As the yields of DNA with the tested method were low, further improvements are desirable for assays that may require larger amounts of DNA, such as population genomic studies using emerging next-gen sequencing.
A Secure Alignment Algorithm for Mapping Short Reads to Human Genome.
Zhao, Yongan; Wang, Xiaofeng; Tang, Haixu
2018-05-09
The elastic and inexpensive computing resources such as clouds have been recognized as a useful solution to analyzing massive human genomic data (e.g., acquired by using next-generation sequencers) in biomedical researches. However, outsourcing human genome computation to public or commercial clouds was hindered due to privacy concerns: even a small number of human genome sequences contain sufficient information for identifying the donor of the genomic data. This issue cannot be directly addressed by existing security and cryptographic techniques (such as homomorphic encryption), because they are too heavyweight to carry out practical genome computation tasks on massive data. In this article, we present a secure algorithm to accomplish the read mapping, one of the most basic tasks in human genomic data analysis based on a hybrid cloud computing model. Comparing with the existing approaches, our algorithm delegates most computation to the public cloud, while only performing encryption and decryption on the private cloud, and thus makes the maximum use of the computing resource of the public cloud. Furthermore, our algorithm reports similar results as the nonsecure read mapping algorithms, including the alignment between reads and the reference genome, which can be directly used in the downstream analysis such as the inference of genomic variations. We implemented the algorithm in C++ and Python on a hybrid cloud system, in which the public cloud uses an Apache Spark system.
Vinatzer, Boris A; Weisberg, Alexandra J; Monteil, Caroline L; Elmarakeby, Haitham A; Sheppard, Samuel K; Heath, Lenwood S
2017-01-01
Taxonomy of plant pathogenic bacteria is challenging because pathogens of different crops often belong to the same named species but current taxonomy does not provide names for bacteria below the subspecies level. The introduction of the host range-based pathovar system in the 1980s provided a temporary solution to this problem but has many limitations. The affordability of genome sequencing now provides the opportunity for developing a new genome-based taxonomic framework. We already proposed to name individual bacterial isolates based on pairwise genome similarity. Here, we expand on this idea and propose to use genome similarity-based codes, which we now call life identification numbers (LINs), to describe and name bacterial taxa. Using 93 genomes of Pseudomonas syringae sensu lato, LINs were compared with a P. syringae genome tree whereby the assigned LINs were found to be informative of a majority of phylogenetic relationships. LINs also reflected host range and outbreak association for strains of P. syringae pathovar actinidiae, a pathovar for which many genome sequences are available. We conclude that LINs could provide the basis for a new taxonomic framework to address the shortcomings of the current pathovar system and to complement the current taxonomic system of bacteria in general.
Maxwell, Elisabeth A.; Marshall, Andrea D.; Christensen, Ana B.
2015-01-01
Sharks and rays are increasingly being identified as high-risk species for extinction, prompting urgent assessments of their local or regional populations. Advanced genetic analyses can contribute relevant information on effective population size and connectivity among populations although acquiring sufficient regional sample sizes can be challenging. DNA is typically amplified from tissue samples which are collected by hand spears with modified biopsy punch tips. This technique is not always popular due mainly to a perception that invasive sampling might harm the rays, change their behaviour, or have a negative impact on tourism. To explore alternative methods, we evaluated the yields and PCR success of DNA template prepared from the manta ray mucus collected underwater and captured and stored on a Whatman FTA™ Elute card. The pilot study demonstrated that mucus can be effectively collected underwater using toothbrush. DNA stored on cards was found to be reliable for PCR-based population genetics studies. We successfully amplified mtDNA ND5, nuclear DNA RAG1, and microsatellite loci for all samples and confirmed sequences and genotypes being those of target species. As the yields of DNA with the tested method were low, further improvements are desirable for assays that may require larger amounts of DNA, such as population genomic studies using emerging next-gen sequencing. PMID:26413431
Cvicek, Vaclav; Goddard, William A.; Abrol, Ravinder
2016-01-01
The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs. These residues can be used to make testable hypotheses about the structural basis of receptor function and about the molecular basis of disease-associated single nucleotide polymorphisms. PMID:27028541
30 CFR 780.22 - Geologic information.
Code of Federal Regulations, 2014 CFR
2014-07-01
... the collection and analysis of such data is unnecessary because other equivalent information is... 30 Mineral Resources 3 2014-07-01 2014-07-01 false Geologic information. 780.22 Section 780.22... Geologic information. (a) General. Each application shall include geologic information in sufficient detail...
A computer simulation experiment of supervisory control of remote manipulation. M.S. Thesis
NASA Technical Reports Server (NTRS)
Mccandlish, S. G.
1966-01-01
A computer simulation of a remote manipulation task and a rate-controlled manipulator is described. Some low-level automatic decision making ability which could be used at the operator's discretion to augment his direct continuous control was built into the manipulator. Experiments were made on the effect of transmission delay, dynamic lag, and intermittent vision on human manipulative ability. Delay does not make remote manipulation impossible. Intermittent visual feedback, and the absence of rate information in the display presented to the operator do not seem to impair the operator's performance. A small-capacity visual feedback channel may be sufficient for remote manipulation tasks, or one channel might be time-shared between several operators. In other experiments the operator called in sequence various on-site automatic control programs of the machine, and thereby acted as a supervisor. The supervisory mode of operation has some advantages when the task to be performed is difficult for a human controlling directly.
Ribosomal synthesis and folding of peptide-helical aromatic foldamer hybrids
NASA Astrophysics Data System (ADS)
Rogers, Joseph M.; Kwon, Sunbum; Dawson, Simon J.; Mandal, Pradeep K.; Suga, Hiroaki; Huc, Ivan
2018-03-01
Translation, the mRNA-templated synthesis of peptides by the ribosome, can be manipulated to incorporate variants of the 20 cognate amino acids. Such approaches for expanding the range of chemical entities that can be produced by the ribosome may accelerate the discovery of molecules that can perform functions for which poorly folded, short peptidic sequences are ill suited. Here, we show that the ribosome tolerates some artificial helical aromatic oligomers, so-called foldamers. Using a flexible tRNA-acylation ribozyme—flexizyme—foldamers were attached to tRNA, and the resulting acylated tRNAs were delivered to the ribosome to initiate the synthesis of non-cyclic and cyclic foldamer-peptide hybrid molecules. Passing through the ribosome exit tunnel requires the foldamers to unfold. Yet foldamers encode sufficient folding information to influence the peptide structure once translation is completed. We also show that in cyclic hybrids, the foldamer portion can fold into a helix and force the peptide segment to adopt a constrained and stretched conformation.
DNA extraction and amplification from contemporary Polynesian bark-cloth.
Moncada, Ximena; Payacán, Claudia; Arriaza, Francisco; Lobos, Sergio; Seelenfreund, Daniela; Seelenfreund, Andrea
2013-01-01
Paper mulberry has been used for thousands of years in Asia and Oceania for making paper and bark-cloth, respectively. Museums around the world hold valuable collections of Polynesian bark-cloth. Genetic analysis of the plant fibers from which the textiles were made may answer a number of questions of interest related to provenance, authenticity or species used in the manufacture of these textiles. Recovery of nucleic acids from paper mulberry bark-cloth has not been reported before. We describe a simple method for the extraction of PCR-amplifiable DNA from small samples of contemporary Polynesian bark-cloth (tapa) using two types of nuclear markers. We report the amplification of about 300 bp sequences of the ITS1 region and of a microsatellite marker. Sufficient DNA was retrieved from all bark-cloth samples to permit successful PCR amplification. This method shows a means of obtaining useful genetic information from modern bark-cloth samples and opens perspectives for the analyses of small fragments derived from ethnographic materials.
An entangled-LED-driven quantum relay over 1 km
NASA Astrophysics Data System (ADS)
Varnava, Christiana; Stevenson, R. Mark; Nilsson, Jonas; Skiba-Szymanska, Joanna; Dzurňák, Branislav; Lucamarini, Marco; Penty, Richard V.; Farrer, Ian; Ritchie, David A.; Shields, Andrew J.
2016-03-01
Quantum cryptography allows confidential information to be communicated between two parties, with secrecy guaranteed by the laws of nature alone. However, upholding guaranteed secrecy over networks poses a further challenge, as classical receive-and-resend routing nodes can only be used conditional of trust by the communicating parties, which arguably diminishes the value of the underlying quantum cryptography. Quantum relays offer a potential solution by teleporting qubits from a sender to a receiver, without demanding additional trust from end users. Here we demonstrate the operation of a quantum relay over 1 km of optical fibre, which teleports a sequence of photonic quantum bits to a receiver by utilising entangled photons emitted by a semiconductor light-emitting diode. The average relay fidelity of the link is 0.90±0.03, exceeding the classical bound of 0.75 for the set of states used, and sufficiently high to allow error correction. The fundamentally low multiphoton emission statistics and the integration potential of the source present an appealing platform for future quantum networks.
DNA Extraction and Amplification from Contemporary Polynesian Bark-Cloth
Moncada, Ximena; Payacán, Claudia; Arriaza, Francisco; Lobos, Sergio; Seelenfreund, Daniela; Seelenfreund, Andrea
2013-01-01
Background Paper mulberry has been used for thousands of years in Asia and Oceania for making paper and bark-cloth, respectively. Museums around the world hold valuable collections of Polynesian bark-cloth. Genetic analysis of the plant fibers from which the textiles were made may answer a number of questions of interest related to provenance, authenticity or species used in the manufacture of these textiles. Recovery of nucleic acids from paper mulberry bark-cloth has not been reported before. Methodology We describe a simple method for the extraction of PCR-amplifiable DNA from small samples of contemporary Polynesian bark-cloth (tapa) using two types of nuclear markers. We report the amplification of about 300 bp sequences of the ITS1 region and of a microsatellite marker. Conclusions Sufficient DNA was retrieved from all bark-cloth samples to permit successful PCR amplification. This method shows a means of obtaining useful genetic information from modern bark-cloth samples and opens perspectives for the analyses of small fragments derived from ethnographic materials. PMID:23437166
Modeling genome coverage in single-cell sequencing
Daley, Timothy; Smith, Andrew D.
2014-01-01
Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873
Quero, Sara; García-Núñez, Marian; Párraga-Niño, Noemí; Barrabeig, Irene; Pedro-Botet, Maria L; de Simon, Mercè; Sopena, Nieves; Sabrià, Miquel
2016-06-01
To compare the discriminatory power of pulsed-field gel electrophoresis (PFGE) and sequence-based typing (SBT) in Legionella outbreaks for determining the infection source. Twenty-five investigations of Legionnaires' disease were analyzed by PFGE, SBT and Dresden monoclonal antibody. The results suggested that monoclonal antibody could reduce the number of Legionella isolates to be characterized by molecular methods. The epidemiological concordance PFGE-SBT was 100%, while the molecular concordance was 64%. Adjusted Wallace index (AW) showed that PFGE has better discriminatory power than SBT (AWSBT→PFGE = 0.767; AWPFGE→SBT = 1). The discrepancies appeared mostly in sequence type (ST) 1, a worldwide distributed ST for which PFGE discriminated different profiles. SBT discriminatory power was not sufficient verifying the infection source, especially in worldwide distributed STs, which were classified into different PFGE patterns.
OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid.
Poehlman, William L; Rynge, Mats; Branton, Chris; Balamurugan, D; Feltus, Frank A
2016-01-01
High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments.
OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid
Poehlman, William L.; Rynge, Mats; Branton, Chris; Balamurugan, D.; Feltus, Frank A.
2016-01-01
High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments. PMID:27499617
Recombinant pinoresinol/lariciresinol reductase, recombinant dirigent protein, and methods of use
Lewis, Norman G.; Davin, Laurence B.; Dinkova-Kostova, Albena T.; Fujita, Masayuki; Gang, David R.; Sarkanen, Simo; Ford, Joshua D.
2001-04-03
Dirigent proteins and pinoresinol/lariciresinol reductases have been isolated, together with cDNAs encoding dirigent proteins and pinoresinol/lariciresinol reductases. Accordingly, isolated DNA sequences are provided which code for the expression of dirigent proteins and pinoresinol/lariciresinol reductases. In other aspects, replicable recombinant cloning vehicles are provided which code for dirigent proteins or pinoresinol/lariciresinol reductases or for a base sequence sufficiently complementary to at least a portion of dirigent protein or pinoresinol/lariciresinol reductase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding dirigent protein or pinoresinol/lariciresinol reductase. Thus, systems and methods are provided for the recombinant expression of dirigent proteins and/or pinoresinol/lariciresinol reductases.
Main sequence models for massive zero-metal stars
NASA Technical Reports Server (NTRS)
Cary, N.
1974-01-01
Zero-age main-sequence models for stars of 20, 10, 5, and 2 solar masses with no heavy elements are constructed for three different possible primordial helium abundances: Y=0.00, Y=0.23, and Y=0.30. The latter two values of Y bracket the range of primordial helium abundances cited by Wagoner. With the exceptions of the two 20 solar mass models that contain helium, these models are found to be self-consistent in the sense that the formation of carbon through the triple-alpha process during premain sequence contraction is not sufficient to bring the CN cycle into competition with the proton-proton chain on the ZAMS. The zero-metal models of the present study have higher surface and central temperatures, higher central densities, smaller radii, and smaller convective cores than do the population I models with the same masses.
Space power system scheduling using an expert system
NASA Technical Reports Server (NTRS)
Bahrami, K. A.; Biefeld, E.; Costello, L.; Klein, J. W.
1986-01-01
A most pressing problem in space exploration is timely spacecraft power system sequence generation, which requires the scheduling of a set of loads given a set of resource constraints. This is particularly important after an anomaly or failure. This paper discusses the power scheduling problem and how the software program, Plan-It, can be used as a consultant for scheduling power system activities. Modeling of power activities, human interface, and two of the many strategies used by Plan-It are discussed. Preliminary results showing the development of a conflict-free sequence from an initial sequence with conflicts is presented. It shows that a 4-day schedule can be generated in a matter of a few minutes, which provides sufficient time in many cases to aid the crew in the replanning of loads and generation use following a failure or anomaly.
Rescaled earthquake recurrence time statistics: application to microrepeaters
NASA Astrophysics Data System (ADS)
Goltz, Christian; Turcotte, Donald L.; Abaimov, Sergey G.; Nadeau, Robert M.; Uchida, Naoki; Matsuzawa, Toru
2009-01-01
Slip on major faults primarily occurs during `characteristic' earthquakes. The recurrence statistics of characteristic earthquakes play an important role in seismic hazard assessment. A major problem in determining applicable statistics is the short sequences of characteristic earthquakes that are available worldwide. In this paper, we introduce a rescaling technique in which sequences can be superimposed to establish larger numbers of data points. We consider the Weibull and log-normal distributions, in both cases we rescale the data using means and standard deviations. We test our approach utilizing sequences of microrepeaters, micro-earthquakes which recur in the same location on a fault. It seems plausible to regard these earthquakes as a miniature version of the classic characteristic earthquakes. Microrepeaters are much more frequent than major earthquakes, leading to longer sequences for analysis. In this paper, we present results for the analysis of recurrence times for several microrepeater sequences from Parkfield, CA as well as NE Japan. We find that, once the respective sequence can be considered to be of sufficient stationarity, the statistics can be well fitted by either a Weibull or a log-normal distribution. We clearly demonstrate this fact by our technique of rescaled combination. We conclude that the recurrence statistics of the microrepeater sequences we consider are similar to the recurrence statistics of characteristic earthquakes on major faults.
A Pan-HIV Strategy for Complete Genome Sequencing
Yamaguchi, Julie; Alessandri-Gradt, Elodie; Tell, Robert W.; Brennan, Catherine A.
2015-01-01
Molecular surveillance is essential to monitor HIV diversity and track emerging strains. We have developed a universal library preparation method (HIV-SMART [i.e., switching mechanism at 5′ end of RNA transcript]) for next-generation sequencing that harnesses the specificity of HIV-directed priming to enable full genome characterization of all HIV-1 groups (M, N, O, and P) and HIV-2. Broad application of the HIV-SMART approach was demonstrated using a panel of diverse cell-cultured virus isolates. HIV-1 non-subtype B-infected clinical specimens from Cameroon were then used to optimize the protocol to sequence directly from plasma. When multiplexing 8 or more libraries per MiSeq run, full genome coverage at a median ∼2,000× depth was routinely obtained for either sample type. The method reproducibly generated the same consensus sequence, consistently identified viral sequence heterogeneity present in specimens, and at viral loads of ≤4.5 log copies/ml yielded sufficient coverage to permit strain classification. HIV-SMART provides an unparalleled opportunity to identify diverse HIV strains in patient specimens and to determine phylogenetic classification based on the entire viral genome. Easily adapted to sequence any RNA virus, this technology illustrates the utility of next-generation sequencing (NGS) for viral characterization and surveillance. PMID:26699702
Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.
Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio
2017-10-06
Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.
BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone.
Yang, Bite; Liu, Feng; Ren, Chao; Ouyang, Zhangyi; Xie, Ziwei; Bo, Xiaochen; Shu, Wenjie
2017-07-01
Enhancer elements are noncoding stretches of DNA that play key roles in controlling gene expression programmes. Despite major efforts to develop accurate enhancer prediction methods, identifying enhancer sequences continues to be a challenge in the annotation of mammalian genomes. One of the major issues is the lack of large, sufficiently comprehensive and experimentally validated enhancers for humans or other species. Thus, the development of computational methods based on limited experimentally validated enhancers and deciphering the transcriptional regulatory code encoded in the enhancer sequences is urgent. We present a deep-learning-based hybrid architecture, BiRen, which predicts enhancers using the DNA sequence alone. Our results demonstrate that BiRen can learn common enhancer patterns directly from the DNA sequence and exhibits superior accuracy, robustness and generalizability in enhancer prediction relative to other state-of-the-art enhancer predictors based on sequence characteristics. Our BiRen will enable researchers to acquire a deeper understanding of the regulatory code of enhancer sequences. Our BiRen method can be freely accessed at https://github.com/wenjiegroup/BiRen . shuwj@bmi.ac.cn or boxc@bmi.ac.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen).
Rambaut, Andrew; Lam, Tommy T; Max Carvalho, Luiz; Pybus, Oliver G
2016-01-01
Gene sequences sampled at different points in time can be used to infer molecular phylogenies on a natural timescale of months or years, provided that the sequences in question undergo measurable amounts of evolutionary change between sampling times. Data sets with this property are termed heterochronous and have become increasingly common in several fields of biology, most notably the molecular epidemiology of rapidly evolving viruses. Here we introduce the cross-platform software tool, TempEst (formerly known as Path-O-Gen), for the visualization and analysis of temporally sampled sequence data. Given a molecular phylogeny and the dates of sampling for each sequence, TempEst uses an interactive regression approach to explore the association between genetic divergence through time and sampling dates. TempEst can be used to (1) assess whether there is sufficient temporal signal in the data to proceed with phylogenetic molecular clock analysis, and (2) identify sequences whose genetic divergence and sampling date are incongruent. Examination of the latter can help identify data quality problems, including errors in data annotation, sample contamination, sequence recombination, or alignment error. We recommend that all users of the molecular clock models implemented in BEAST first check their data using TempEst prior to analysis.
1997-11-01
Information dominance may be defined as superiority in the generation, manipulation, and use of information sufficient to afford its possessors... information dominance at the strategic level: knowing oneself and one’s enemy; and, at best, inducing them to see things as one does.
46 CFR 503.57 - Mandatory review for declassification.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 503.57 Shipping FEDERAL MARITIME COMMISSION GENERAL AND ADMINISTRATIVE PROVISIONS PUBLIC INFORMATION Information Security Program § 503.57 Mandatory review for declassification. (a) Information originally... describes the documents or material containing the information with sufficient specificity to enable the...
Genomic information as a behavioral health intervention: can it work?
Bloss, Cinnamon S; Madlensky, Lisa; Schork, Nicholas J; Topol, Eric J
2011-01-01
Individuals can now obtain their personal genomic information via direct-to-consumer genetic testing, but what, if any, impact will this have on their lifestyle and health? A recent longitudinal cohort study of individuals who underwent consumer genome scanning found minimal impacts of testing on risk-reducing lifestyle behaviors, such as diet and exercise. These results raise an important question: is personal genomic information likely to beneficially impact public health through motivation of lifestyle behavioral change? In this article, we review the literature on lifestyle behavioral change in response to genetic testing for common disease susceptibility variants. We find that only a few studies have been carried out, and that those that have been done have yielded little evidence to suggest that the mere provision of genetic information alone results in widespread changes in lifestyle health behaviors. We suggest that further study of this issue is needed, in particular studies that examine response to multiplex testing for multiple genetic markers and conditions. This will be critical as we anticipate the wide availability of whole-genome sequencing and more comprehensive phenotyping of individuals. We also note that while simple communication of genomic information and disease susceptibility may be sufficient to catalyze lifestyle changes in some highly motivated groups of individuals, for others, additional strategies may be required to prompt changes, including more sophisticated means of risk communication (e.g., in the context of social norm feedback) either alone or in combination with other promising interventions (e.g., real-time wireless health monitoring devices). PMID:22199991
Malouli, Daniel; Howell, Grant L; Legasse, Alfred W; Kahl, Christoph; Axthelm, Michael K; Hansen, Scott G; Früh, Klaus
2014-09-01
Multiple novel simian adenoviruses have been isolated over the past years and their potential to cross the species barrier and infect the human population is an ever present threat. Here we describe the isolation and full genome sequencing of a novel simian adenovirus (SAdV) isolated from the urine of two independent, never co-housed, late stage simian immunodeficiency virus (SIV)-infected rhesus macaques. The viral genome sequences revealed a novel type with a unique genome length, GC content, E3 region and DNA polymerase amino acid sequence that is sufficiently distinct from all currently known human- or simian adenovirus species to warrant classifying these isolates as a novel species of simian adenovirus. This new species, termed Simian mastadenovirus D (SAdV-D), displays the standard genome organization for the genus Mastadenovirus containing only one copy of the fiber gene which sets it apart from the old world monkey adenovirus species HAdV-G, SAdV-B and SAdV-C.
Zheng, H; Ye, C; Segura, M; Gottschalk, M; Xu, J
2008-09-01
Streptococcus suis serotype 2 sequence type 7 strains emerged in 1996 and caused a streptococcal toxic shock-like syndrome in 1998 and 2005 in China. Evidence indicated that the virulence of S. suis sequence type 7 had increased, but the mechanism was unknown. The sequence type 7 strain SC84, isolated from a patient with streptococcal toxic shock-like syndrome during the Sichuan outbreak, and the sequence type 1 strain 31533, a typical highly pathogenic strain isolated from a diseased pig, were used in comparative studies. In this study we show the mechanisms underlying cytokine production differed between the two types of strains. The S. suis sequence type 7 strain SC84 possesses a stronger capacity to stimulate T cells, naive T cells and peripheral blood mononuclear cell proliferation than does S. suis sequence type 1 strain 31533. The T cell response to both strains was dependent upon the presence of antigen-presenting cells. Histo-incompatible antigen-presenting cells were sufficient to provide the accessory signals to naive T cell stimulated by the two strains, indicating that both sequence type 7 and 1 strains possess mitogens; however, the mitogenic effect was different. Therefore, we propose that the difference in the mitogenic effect of sequence type 7 strain SC84 compared with the sequence type 1 strain 31533 of S. suis may be associated with the clinical, epidemiological and microbiological difference, where the ST 7 strains have a larger mitogenic effect.
Zheng, H; Ye, C; Segura, M; Gottschalk, M; Xu, J
2008-01-01
Streptococcus suis serotype 2 sequence type 7 strains emerged in 1996 and caused a streptococcal toxic shock-like syndrome in 1998 and 2005 in China. Evidence indicated that the virulence of S. suis sequence type 7 had increased, but the mechanism was unknown. The sequence type 7 strain SC84, isolated from a patient with streptococcal toxic shock-like syndrome during the Sichuan outbreak, and the sequence type 1 strain 31533, a typical highly pathogenic strain isolated from a diseased pig, were used in comparative studies. In this study we show the mechanisms underlying cytokine production differed between the two types of strains. The S. suis sequence type 7 strain SC84 possesses a stronger capacity to stimulate T cells, naive T cells and peripheral blood mononuclear cell proliferation than does S. suis sequence type 1 strain 31533. The T cell response to both strains was dependent upon the presence of antigen-presenting cells. Histo-incompatible antigen-presenting cells were sufficient to provide the accessory signals to naive T cell stimulated by the two strains, indicating that both sequence type 7 and 1 strains possess mitogens; however, the mitogenic effect was different. Therefore, we propose that the difference in the mitogenic effect of sequence type 7 strain SC84 compared with the sequence type 1 strain 31533 of S. suis may be associated with the clinical, epidemiological and microbiological difference, where the ST 7 strains have a larger mitogenic effect. PMID:18803762
Sequence information gain based motif analysis.
Maynou, Joan; Pairó, Erola; Marco, Santiago; Perera, Alexandre
2015-11-09
The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70% of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.
Report: EPA Needs to Improve Oversight of Its Information Technology Projects
Report #2005-P-00023, September 14, 2005. EPA’s Office of Environmental Information (OEI) did not sufficiently oversee information technology projects to ensure they met planned budgets and schedules.
SOPRA: Scaffolding algorithm for paired reads via statistical optimization.
Dayarian, Adel; Michael, Todd P; Sengupta, Anirvan M
2010-06-24
High throughput sequencing (HTS) platforms produce gigabases of short read (<100 bp) data per run. While these short reads are adequate for resequencing applications, de novo assembly of moderate size genomes from such reads remains a significant challenge. These limitations could be partially overcome by utilizing mate pair technology, which provides pairs of short reads separated by a known distance along the genome. We have developed SOPRA, a tool designed to exploit the mate pair/paired-end information for assembly of short reads. The main focus of the algorithm is selecting a sufficiently large subset of simultaneously satisfiable mate pair constraints to achieve a balance between the size and the quality of the output scaffolds. Scaffold assembly is presented as an optimization problem for variables associated with vertices and with edges of the contig connectivity graph. Vertices of this graph are individual contigs with edges drawn between contigs connected by mate pairs. Similar graph problems have been invoked in the context of shotgun sequencing and scaffold building for previous generation of sequencing projects. However, given the error-prone nature of HTS data and the fundamental limitations from the shortness of the reads, the ad hoc greedy algorithms used in the earlier studies are likely to lead to poor quality results in the current context. SOPRA circumvents this problem by treating all the constraints on equal footing for solving the optimization problem, the solution itself indicating the problematic constraints (chimeric/repetitive contigs, etc.) to be removed. The process of solving and removing of constraints is iterated till one reaches a core set of consistent constraints. For SOLiD sequencer data, SOPRA uses a dynamic programming approach to robustly translate the color-space assembly to base-space. For assessing the quality of an assembly, we report the no-match/mismatch error rate as well as the rates of various rearrangement errors. Applying SOPRA to real data from bacterial genomes, we were able to assemble contigs into scaffolds of significant length (N50 up to 200 Kb) with very few errors introduced in the process. In general, the methodology presented here will allow better scaffold assemblies of any type of mate pair sequencing data.
NASA Astrophysics Data System (ADS)
Bucklin, Ann; Ortman, Brian D.; Jennings, Robert M.; Nigro, Lisa M.; Sweetman, Christopher J.; Copley, Nancy J.; Sutton, Tracey; Wiebe, Peter H.
2010-12-01
Species diversity of the metazoan holozooplankton assemblage of the Sargasso Sea, Northwest Atlantic Ocean, was examined through coordinated morphological taxonomic identification of species and DNA sequencing of a ˜650 base-pair region of mitochondrial cytochrome oxidase I (mtCOI) as a DNA barcode (i.e., short sequence for species recognition and discrimination). Zooplankton collections were made from the surface to 5,000 meters during April, 2006 on the R/V R.H. Brown. Samples were examined by a ship-board team of morphological taxonomists; DNA barcoding was carried out in both ship-board and land-based DNA sequencing laboratories. DNA barcodes were determined for a total of 297 individuals of 175 holozooplankton species in four phyla, including: Cnidaria (Hydromedusae, 4 species; Siphonophora, 47); Arthropoda (Amphipoda, 10; Copepoda, 34; Decapoda, 9; Euphausiacea, 10; Mysidacea, 1; Ostracoda, 27); and Mollusca (Cephalopoda, 8; Heteropoda, 6; Pteropoda, 15); and Chaetognatha (4). Thirty species of fish (Teleostei) were also barcoded. For all seven zooplankton groups for which sufficient data were available, Kimura-2-Parameter genetic distances were significantly lower between individuals of the same species (mean=0.0114; S.D. 0.0117) than between individuals of different species within the same group (mean=0.3166; S.D. 0.0378). This difference, known as the barcode gap, ensures that mtCOI sequences are reliable characters for species identification for the oceanic holozooplankton assemblage. In addition, DNA barcodes allow recognition of new or undescribed species, reveal cryptic species within known taxa, and inform phylogeographic and population genetic studies of geographic variation. The growing database of "gold standard" DNA barcodes serves as a Rosetta Stone for marine zooplankton, providing the key for decoding species diversity by linking species names, morphology, and DNA sequence variation. In light of the pivotal position of zooplankton in ocean food webs, their usefulness as rapid responders to environmental change, and the increasing scarcity of taxonomists, the use of DNA barcodes is an important and useful approach for rapid analysis of species diversity and distribution in the pelagic community.
Rodrigues, Jorge L. M.; Serres, Margrethe H.; Tiedje, James M.
2011-01-01
The use of comparative genomics for the study of different microbiological species has increased substantially as sequence technologies become more affordable. However, efforts to fully link a genotype to its phenotype remain limited to the development of one mutant at a time. In this study, we provided a high-throughput alternative to this limiting step by coupling comparative genomics to the use of phenotype arrays for five sequenced Shewanella strains. Positive phenotypes were obtained for 441 nutrients (C, N, P, and S sources), with N-based compounds being the most utilized for all strains. Many genes and pathways predicted by genome analyses were confirmed with the comparative phenotype assay, and three degradation pathways believed to be missing in Shewanella were confirmed as missing. A number of previously unknown gene products were predicted to be parts of pathways or to have a function, expanding the number of gene targets for future genetic analyses. Ecologically, the comparative high-throughput phenotype analysis provided insights into niche specialization among the five different strains. For example, Shewanella amazonensis strain SB2B, isolated from the Amazon River delta, was capable of utilizing 60 C compounds, whereas Shewanella sp. strain W3-18-1, isolated from deep marine sediment, utilized only 25 of them. In spite of the large number of nutrient sources yielding positive results, our study indicated that except for the N sources, they were not sufficiently informative to predict growth phenotypes from increasing evolutionary distances. Our results indicate the importance of phenotypic evaluation for confirming genome predictions. This strategy will accelerate the functional discovery of genes and provide an ecological framework for microbial genome sequencing projects. PMID:21642407
Nouripour-Sisakht, Sadegh; Ahmadi, Bahram; Makimura, Koichi; Hoog, Sybren de; Umeda, Yoshiko; Alshahni, Mohamed Mahdi; Mirhendi, Hossein
2017-04-01
We aimed to evaluate the resolving power of the translation elongation factor (TEF)-1α gene for phylogenetic analysis of Aspergillus species. Sequences of 526 bp representing the coding region of the TEF-1α gene were used for the assessment of levels of intra- and inter-specific nucleotide polymorphism in 33 species of Aspergillus, including 57 reference, clinical and environmental strains. Analysis of TEF-1α sequences indicated a mean similarity of 92.6 % between the species, with inter-species diversity ranging from 0 to 70 nucleotides. The species with the closest resemblance were A. candidus/A. carneus, and A. flavus/A. oryzae/A. ochraceus, with 100 and 99.8 % identification, respectively. These species are phylogenetically very close and the TEF-1α gene appears not to have sufficient discriminatory power to differentiate them. Meanwhile, intra-species differences were found within strains of A. clavatus, A. clavatonanicus, A. candidus, A. fumigatus, A. terreus, A. alliaceus, A. flavus, Eurotium amstelodami and E. chevalieri. The tree topology with strongly supported clades (≥70 % bootstrap values) was almost compatible with the phylogeny inferred from analysis of the DNA sequences of the beta tubulin gene (BT2). However, the backbone of the tree exhibited low bootstrap values, and inter-species correlations were not obvious in some clades; for example, tree topologies based on BT2 and TEF-1α genes were incompatible for some species, such as A. deflectus, A. janus and A. penicillioides. The gene was not phylogenetically more informative than other known molecular markers. It will be necessary to test other genes or larger genomic regions to better understand the taxonomy of this important group of fungi.
Analysis of Spatio-Temporal Traffic Patterns Based on Pedestrian Trajectories
NASA Astrophysics Data System (ADS)
Busch, S.; Schindler, T.; Klinger, T.; Brenner, C.
2016-06-01
For driver assistance and autonomous driving systems, it is essential to predict the behaviour of other traffic participants. Usually, standard filter approaches are used to this end, however, in many cases, these are not sufficient. For example, pedestrians are able to change their speed or direction instantly. Also, there may be not enough observation data to determine the state of an object reliably, e.g. in case of occlusions. In those cases, it is very useful if a prior model exists, which suggests certain outcomes. For example, it is useful to know that pedestrians are usually crossing the road at a certain location and at certain times. This information can then be stored in a map which then can be used as a prior in scene analysis, or in practical terms to reduce the speed of a vehicle in advance in order to minimize critical situations. In this paper, we present an approach to derive such a spatio-temporal map automatically from the observed behaviour of traffic participants in everyday traffic situations. In our experiments, we use one stationary camera to observe a complex junction, where cars, public transportation and pedestrians interact. We concentrate on the pedestrians trajectories to map traffic patterns. In the first step, we extract trajectory segments from the video data. These segments are then clustered in order to derive a spatial model of the scene, in terms of a spatially embedded graph. In the second step, we analyse the temporal patterns of pedestrian movement on this graph. We are able to derive traffic light sequences as well as the timetables of nearby public transportation. To evaluate our approach, we used a 4 hour video sequence. We show that we are able to derive traffic light sequences as well as time tables of nearby public transportation.
Di Pierro, Michele; Cheng, Ryan R; Lieberman Aiden, Erez; Wolynes, Peter G; Onuchic, José N
2017-11-14
Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data derived from chromatin immunoprecipitation-sequencing (ChIP-Seq). We exploit the idea that chromosomes encode a 1D sequence of chromatin structural types. Interactions between these chromatin types determine the 3D structural ensemble of chromosomes through a process similar to phase separation. First, a neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization [Minimal Chromatin Model (MiChroM)] to generate an ensemble of 3D chromosome conformations at a resolution of 50 kilobases (kb). After training the model, dubbed Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles (MEGABASE), on odd-numbered chromosomes, we predict the sequences of chromatin types and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps, as well as distances measured using 3D fluorescence in situ hybridization (FISH) experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible. Copyright © 2017 the Author(s). Published by PNAS.
Konop, Christopher J; Knickelbine, Jennifer J; Sygulla, Molly S; Wruck, Colin D; Vestling, Martha M; Stretton, Antony O W
2015-12-01
Neuromodulators have become an increasingly important component of functional circuits, dramatically changing the properties of both neurons and synapses to affect behavior. To explore the role of neuropeptides in Ascaris suum behavior, we devised an improved method for cleanly dissecting single motorneuronal cell bodies from the many other cell processes and hypodermal tissue in the ventral nerve cord. We determined their peptide content using matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry (MS). The reduced complexity of the peptide mixture greatly aided the detection of peptides; peptide levels were sufficient to permit sequencing by tandem MS from single cells. Inhibitory motorneurons, known to be GABAergic, contain a novel neuropeptide, As-NLP-22 (SLASGRWGLRPamide). From this sequence and information from the A. suum expressed sequence tag (EST) database, we cloned the transcript (As-nlp-22) and synthesized a riboprobe for in situ hybridization, which labeled the inhibitory motorneurons; this validates the integrity of the dissection method, showing that the peptides detected originate from the cells themselves and not from adhering processes from other cells (e.g., synaptic terminals). Synthetic As-NLP-22 has potent inhibitory activity on acetylcholine-induced muscle contraction as well as on basal muscle tone. Both of these effects are dose-dependent: the inhibitory effect on ACh contraction has an IC50 of 8.3 × 10(-9) M. When injected into whole worms, As-NLP-22 produces a dose-dependent inhibition of locomotory movements and, at higher levels, complete paralysis. These experiments demonstrate the utility of MALDI TOF/TOF MS in identifying novel neuromodulators at the single-cell level. Graphical Abstract ᅟ.
NASA Astrophysics Data System (ADS)
Konop, Christopher J.; Knickelbine, Jennifer J.; Sygulla, Molly S.; Wruck, Colin D.; Vestling, Martha M.; Stretton, Antony O. W.
2015-12-01
Neuromodulators have become an increasingly important component of functional circuits, dramatically changing the properties of both neurons and synapses to affect behavior. To explore the role of neuropeptides in Ascaris suum behavior, we devised an improved method for cleanly dissecting single motorneuronal cell bodies from the many other cell processes and hypodermal tissue in the ventral nerve cord. We determined their peptide content using matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry (MS). The reduced complexity of the peptide mixture greatly aided the detection of peptides; peptide levels were sufficient to permit sequencing by tandem MS from single cells. Inhibitory motorneurons, known to be GABAergic, contain a novel neuropeptide, As-NLP-22 (SLASGRWGLRPamide). From this sequence and information from the A. suum expressed sequence tag (EST) database, we cloned the transcript ( As-nlp-22) and synthesized a riboprobe for in situ hybridization, which labeled the inhibitory motorneurons; this validates the integrity of the dissection method, showing that the peptides detected originate from the cells themselves and not from adhering processes from other cells (e.g., synaptic terminals). Synthetic As-NLP-22 has potent inhibitory activity on acetylcholine-induced muscle contraction as well as on basal muscle tone. Both of these effects are dose-dependent: the inhibitory effect on ACh contraction has an IC50 of 8.3 × 10-9 M. When injected into whole worms, As-NLP-22 produces a dose-dependent inhibition of locomotory movements and, at higher levels, complete paralysis. These experiments demonstrate the utility of MALDI TOF/TOF MS in identifying novel neuromodulators at the single-cell level.
Chu, Chien-Hsin; Chang, Lung-Chun; Hsu, Hong-Ming; Wei, Shu-Yi; Liu, Hsing-Wei; Lee, Yu; Kuo, Chung-Chi; Indra, Dharmu; Chen, Chinpan; Ong, Shiou-Jeng; Tai, Jung-Hsiang
2011-01-01
Nuclear proteins usually contain specific peptide sequences, referred to as nuclear localization signals (NLSs), for nuclear import. These signals remain unexplored in the protozoan pathogen, Trichomonas vaginalis. The nuclear import of a Myb2 transcription factor was studied here using immunodetection of a hemagglutinin-tagged Myb2 overexpressed in the parasite. The tagged Myb2 was localized to the nucleus as punctate signals. With mutations of its polybasic sequences, 48KKQK51 and 61KR62, Myb2 was localized to the nucleus, but the signal was diffusive. When fused to a C-terminal non-nuclear protein, the Myb2 sequence spanning amino acid (aa) residues 48 to 143, which is embedded within the R2R3 DNA-binding domain (aa 40 to 156), was essential and sufficient for efficient nuclear import of a bacterial tetracycline repressor (TetR), and yet the transport efficiency was reduced with an additional fusion of a firefly luciferase to TetR, while classical NLSs from the simian virus 40 T-antigen had no function in this assay system. Myb2 nuclear import and DNA-binding activity were substantially perturbed with mutation of a conserved isoleucine (I74) in helix 2 to proline that altered secondary structure and ternary folding of the R2R3 domain. Disruption of DNA-binding activity alone by point mutation of a lysine residue, K51, preceding the structural domain had little effect on Myb2 nuclear localization, suggesting that nuclear translocation of Myb2, which requires an ordered structural domain, is independent of its DNA binding activity. These findings provide useful information for testing whether myriad Mybs in the parasite use a common module to regulate nuclear import. PMID:22021237
30 CFR 784.22 - Geologic information.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 30 Mineral Resources 3 2011-07-01 2011-07-01 false Geologic information. 784.22 Section 784.22... Geologic information. (a) General. Each application shall include geologic information in sufficient detail...; and (4) Preparing the subsidence control plan under § 784.20. (b) Geologic information shall include...
30 CFR 784.22 - Geologic information.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 30 Mineral Resources 3 2012-07-01 2012-07-01 false Geologic information. 784.22 Section 784.22... Geologic information. (a) General. Each application shall include geologic information in sufficient detail...; and (4) Preparing the subsidence control plan under § 784.20. (b) Geologic information shall include...
Bryner, J.S.
1961-07-01
The growth of thorium bismutaide particles, which are formed when thorium is suspended in liquid bismuth, is inhibited when the liquid metal suspension is being flowed through a reactor and through a heat exchanger in sequence. It involves the addition of as little as 1 part by weight of tellurium to 100 parts of thorium. This addition is sufficient to inhibit particle growth and agglomeration.
Cao, Youfang; Wang, Lianjie; Xu, Kexue; Kou, Chunhai; Zhang, Yulei; Wei, Guifang; He, Junjian; Wang, Yunfang; Zhao, Liping
2005-07-26
A new algorithm for assessing similarity between primer and template has been developed based on the hypothesis that annealing of primer to template is an information transfer process. Primer sequence is converted to a vector of the full potential hydrogen numbers (3 for G or C, 2 for A or T), while template sequence is converted to a vector of the actual hydrogen bond numbers formed after primer annealing. The former is considered as source information and the latter destination information. An information coefficient is calculated as a measure for fidelity of this information transfer process and thus a measure of similarity between primer and potential annealing site on template. Successful prediction of PCR products from whole genomic sequences with a computer program based on the algorithm demonstrated the potential of this new algorithm in areas like in silico PCR and gene finding.
Private information alone can trigger trapping of ant colonies in local feeding optima.
Czaczkes, Tomer J; Salmane, Anete K; Klampfleuthner, Felicia A M; Heinze, Jürgen
2016-03-01
Ant colonies are famous for using trail pheromones to make collective decisions. Trail pheromone systems are characterised by positive feedback, which results in rapid collective decision making. However, in an iconic experiment, ants were shown to become 'trapped' in exploiting a poor food source, if it was discovered earlier. This has conventionally been explained by the established pheromone trail becoming too strong for new trails to compete. However, many social insects have a well-developed memory, and private information often overrules conflicting social information. Thus, route memory could also explain this collective 'trapping' effect. Here, we disentangled the effects of social and private information in two 'trapping' experiments: one in which ants were presented with a good and a poor food source, and one in which ants were presented with a long and a short path to the same food source. We found that private information is sufficient to trigger trapping in selecting the poorer of two food sources, and may be sufficient to cause it altogether. Memories did not trigger trapping in the shortest path experiment, probably because sufficiently detailed memories did not form. The fact that collective decisions can be triggered by private information alone may require other collective patterns previously attributed solely to social information use to be reconsidered. © 2016. Published by The Company of Biologists Ltd.
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M
2015-05-01
To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor
2015-01-01
Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745
Read clouds uncover variation in complex regions of the human genome
Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E.; West, Robert; Sidow, Arend; Batzoglou, Serafim
2015-01-01
Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. PMID:26286554
Mismer, D.; Rubin, G. M.
1989-01-01
We have analyzed the cis-acting regulatory sequences of the Rh1 (ninaE) gene in Drosophila melanogaster by P-element-mediated germline transformation of indicator genes transcribed from mutant ninaE promoter sequences. We have previously shown that a 200-bp region extending from -120 to +67 relative to the transcription start site is sufficient to obtain eye-specific expression from the ninaE promoter. In the present study, 22 different 4-13-bp sequences in the -120/+67 promoter region were altered by oligonucleotide-directed mutagenesis. Several of these sequences were found to be required for proper promoter function; two of these are conserved in the promoter of the homologous gene isolated from the related species Drosophila virilis. Alteration of a conserved 9-bp sequence results in aberrant, low level expression in the body. Alteration of a separate 11-bp sequence, found in the promoter regions of several photoreceptor-specific genes of Drosophila, results in an approximately 15-fold reduction in promoter efficiency but without apparent alteration of tissue-specificity. A protein factor capable of interacting with this 11-bp sequence has been detected by DNaseI footprinting in embryonic nuclear extracts. Finally, we have further characterized two separable enhancer sequences previously shown to be required for normal levels of expression from this promoter. PMID:2521839
Distribution of genotype network sizes in sequence-to-structure genotype-phenotype maps.
Manrubia, Susanna; Cuesta, José A
2017-04-01
An essential quantity to ensure evolvability of populations is the navigability of the genotype space. Navigability, understood as the ease with which alternative phenotypes are reached, relies on the existence of sufficiently large and mutually attainable genotype networks. The size of genotype networks (e.g. the number of RNA sequences folding into a particular secondary structure or the number of DNA sequences coding for the same protein structure) is astronomically large in all functional molecules investigated: an exhaustive experimental or computational study of all RNA folds or all protein structures becomes impossible even for moderately long sequences. Here, we analytically derive the distribution of genotype network sizes for a hierarchy of models which successively incorporate features of increasingly realistic sequence-to-structure genotype-phenotype maps. The main feature of these models relies on the characterization of each phenotype through a prototypical sequence whose sites admit a variable fraction of letters of the alphabet. Our models interpolate between two limit distributions: a power-law distribution, when the ordering of sites in the prototypical sequence is strongly constrained, and a lognormal distribution, as suggested for RNA, when different orderings of the same set of sites yield different phenotypes. Our main result is the qualitative and quantitative identification of those features of sequence-to-structure maps that lead to different distributions of genotype network sizes. © 2017 The Author(s).
Wei, Lei; Wang, Jianmin; Lampert, Erika; Schlanger, Simon; DePriest, Adam D.; Hu, Qiang; Gomez, Eduardo Cortes; Murakam, Mitsuko; Glenn, Sean T.; Conroy, Jeffrey; Morrison, Carl; Azabdaftari, Gissou; Mohler, James L.; Liu, Song; Heemers, Hannelore V.
2018-01-01
Background Next-generation sequencing is revealing genomic heterogeneity in localized prostate cancer (CaP). Incomplete sampling of CaP multiclonality has limited the implications for molecular subtyping, stratification, and systemic treatment. Objective To determine the impact of genomic and transcriptomic diversity within and among intraprostatic CaP foci on CaP molecular taxonomy, predictors of progression, and actionable therapeutic targets. Design, setting, and participants Four consecutive patients with clinically localized National Comprehensive Cancer Network intermediate- or high-risk CaP who did not receive neoadjuvant therapy underwent radical prostatectomy at Roswell Park Cancer Institute in June–July 2014. Presurgical information on CaP content and a customized tissue procurement procedure were used to isolate nonmicroscopic and noncontiguous CaP foci in radical prostatectomy specimens. Three cores were obtained from the index lesion and one core from smaller lesions. RNA and DNA were extracted simultaneously from 26 cores with ≥90% CaP content and analyzed using whole-exome sequencing, single-nucleotide polymorphism arrays, and RNA sequencing. Outcome measurements and statistical analysis Somatic mutations, copy number alternations, gene expression, gene fusions, and phylogeny were defined. The impact of genomic alterations on CaP molecular classification, gene sets measured in Oncotype DX, Prolaris, and Decipher assays, and androgen receptor activity among CaP cores was determined. Results and limitations There was considerable variability in genomic alterations among CaP cores, and between RNA- and DNA-based platforms. Heterogeneity was found in molecular grouping of individual CaP foci and the activity of gene sets underlying the assays for risk stratification and androgen receptor activity, and was validated in independent genomic data sets. Determination of the implications for clinical decision-making requires follow-up studies. Conclusions Genomic make-up varies widely among CaP foci, so care should be taken when making treatment decisions based on a single biopsy or index lesions. Patient summary We examined the molecular composition of individual cancers in a patient’s prostate. We found a lot of genetic diversity among these cancers, and concluded that information from a single cancer biopsy is not sufficient to guide treatment decisions. PMID:27451135
Biological Information Transfer Beyond the Genetic Code: The Sugar Code
NASA Astrophysics Data System (ADS)
Gabius, H.-J.
In the era of genetic engineering, cloning, and genome sequencing the focus of research on the genetic code has received an even further accentuation in the public eye. In attempting, however, to understand intra- and intercellular recognition processes comprehensively, the two biochemical dimensions established by nucleic acids and proteins are not sufficient to satisfactorily explain all molecular events in, for example, cell adhesion or routing. The consideration of further code systems is essential to bridge this gap. A third biochemical alphabet forming code words with an information storage capacity second to no other substance class in rather small units (words, sentences) is established by monosaccharides (letters). As hardware oligosaccharides surpass peptides by more than seven orders of magnitude in the theoretical ability to build isomers, when the total of conceivable hexamers is calculated. In addition to the sequence complexity, the use of magnetic resonance spectroscopy and molecular modeling has been instrumental in discovering that even small glycans can often reside in not only one but several distinct low-energy conformations (keys). Intriguingly, conformers can display notably different capacities to fit snugly into the binding site of nonhomologous receptors (locks). This process, experimentally verified for two classes of lectins, is termed "differential conformer selection." It adds potential for shifts of the conformer equilibrium to modulate ligand properties dynamically and reversibly to the well-known changes in sequence (including anomeric positioning and linkage points) and in pattern of substitution, for example, by sulfation. In the intimate interplay with sugar receptors (lectins, enzymes, and antibodies) the message of coding units of the sugar code is deciphered. Their recognition will trigger postbinding signaling and the intended biological response. Knowledge about the driving forces for the molecular rendezvous, i.e., contributions of bidentate or cooperative hydrogen bonds, dispersion forces, stacking, and solvent rearrangement, will enable the design of high-affinity ligands or mimetics thereof. They embody clinical applications reaching from receptor localization in diagnostic pathology to cell type-selective targeting of drugs and inhibition of undesired cell adhesion in bacterial/viral infections, inflammation, or metastasis.
Functional regression method for whole genome eQTL epistasis analysis with sequencing data.
Xu, Kelin; Jin, Li; Xiong, Momiao
2017-05-18
Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction identified using FRGM, RPKM and DESeq were 16,2361, 260 and 51, respectively, from the 350 European samples. The proposed FRGM for epistasis analysis of RNA-seq can capture isoform and position-level information and will have a broad application. Both simulations and real data analysis highlight the potential for the FRGM to be a good choice of the epistatic analysis with sequencing data.
Bonfiglio, Luca; Virgillito, Alessandra; Magrini, Massimo; Piarulli, Andrea; Bergamasco, Massimo; Barcaro, Umberto; Rossi, Bruno; Salvetti, Ovidio; Carboncini, Maria Chiara
2015-03-01
A series of ERP components, each provided with both a precise timing with respect to stimulation and a specific cortical localization, reflects the temporal succession of processing stages of music information. This makes the musical stimulus potentially usable to probe residual brain functions in non-communicating patients with disorders of consciousness. In an attempt to find a simple stimulation protocol that was suitable for use in a clinical setting, the purpose of this study was to verify whether a minimum-length musical stimulus, provided with a definite music-syntactic connotation, was still able to elicit musical ERPs in a group of eight healthy subjects. The stimulus was composed of the minimum number of chords necessary and sufficient to enable the subject to predict a plausible closure of the sequence (priming) and, at the same time, to provide him/her with the closing chord of the sequence (target), either congruous (probable closing) or not (improbable closing) to the tonal context. The subject's task was to discriminate and recognize the irregular targets. The components that were expected to be elicited, in this experimental situation, were ERAN, N5, P600/LPC. Conversely, in addition to these former components, we unexpectedly observed a N400-like component. To determine whether this component was a real N400, we submitted our data to a sLORETA analysis in order to identify its cortical generators. Irregular chords showed higher current densities with respect to regular ones on the right-sided medial and superior temporal gyri, superior and inferior parietal lobules, fusiform and parahippocampal gyri, and on the bilateral posterior cingulate cortex. In particular, the N400-like wave seems to share with the word-primed music-elicited N400 certain generators that are located in cortical areas BA 21/37 and BA 22. This suggests that even chord-primed chord targets can convey extra-musical meanings and that, consequently, they might be useful in assessing residual higher-order information-processing capabilities in non-communicating patients with disorders of consciousness.
Code of Federal Regulations, 2010 CFR
2010-01-01
... the appeal with sufficient facts, information, analysis, and explanation to support the applicant's... Board may request additional information or further supporting arguments from the applicant, the Bank...
30 CFR 780.22 - Geologic information.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 30 Mineral Resources 3 2011-07-01 2011-07-01 false Geologic information. 780.22 Section 780.22... Geologic information. (a) General. Each application shall include geologic information in sufficient detail...) Geologic information shall include, at a minimum the following: (1) A description of the geology of the...
30 CFR 780.22 - Geologic information.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 30 Mineral Resources 3 2012-07-01 2012-07-01 false Geologic information. 780.22 Section 780.22... Geologic information. (a) General. Each application shall include geologic information in sufficient detail...) Geologic information shall include, at a minimum the following: (1) A description of the geology of the...
49 CFR 238.201 - Scope/alternative compliance.
Code of Federal Regulations, 2010 CFR
2010-10-01
... equivalent safety and compliance with this subpart, other than § 238.203, based upon a submission of data and analysis sufficient to support that determination. The petition shall include: (i) The information required..., sufficient to describe the actual construction of the equipment of special design; (iii) Engineering analysis...
Googling DNA sequences on the World Wide Web.
Hajibabaei, Mehrdad; Singer, Gregory A C
2009-11-10
New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bioinformatics applications. We have developed a novel algorithm and implemented it for searching species-specific genomic sequences, DNA barcodes, by using popular web-based methods such as Google. We developed an alignment independent character based algorithm based on dividing a sequence library (DNA barcodes) and query sequence to words. The actual search is conducted by conventional search tools such as freely available Google Desktop Search. We implemented our algorithm in two exemplar packages. We developed pre and post-processing software to provide customized input and output services, respectively. Our analysis of all publicly available DNA barcode sequences shows a high accuracy as well as rapid results. Our method makes use of conventional web-based technologies for specialized genetic data. It provides a robust and efficient solution for sequence search on the web. The integration of our search method for large-scale sequence libraries such as DNA barcodes provides an excellent web-based tool for accessing this information and linking it to other available categories of information on the web.
Rogan, P K; Schneider, T D
1995-01-01
Predicting the effects of nucleotide substitutions in human splice sites has been based on analysis of consensus sequences. We used a graphic representation of sequence conservation and base frequency, the sequence logo, to demonstrate that a change in a splice acceptor of hMSH2 (a gene associated with familial nonpolyposis colon cancer) probably does not reduce splicing efficiency. This confirms a population genetic study that suggested that this substitution is a genetic polymorphism. The information theory-based sequence logo is quantitative and more sensitive than the corresponding splice acceptor consensus sequence for detection of true mutations. Information analysis may potentially be used to distinguish polymorphisms from mutations in other types of transcriptional, translational, or protein-coding motifs.
42 CFR 485.60 - Condition of participation: Clinical records.
Code of Federal Regulations, 2013 CFR
2013-10-01
... retrieval and compilation of information. (a) Standard: Content. Each clinical record must contain sufficient information to identify the patient clearly and to justify the diagnosis and treatment. Entries in...: Protection of clinical record information. The facility must safeguard clinical record information against...
42 CFR 485.60 - Condition of participation: Clinical records.
Code of Federal Regulations, 2010 CFR
2010-10-01
... retrieval and compilation of information. (a) Standard: Content. Each clinical record must contain sufficient information to identify the patient clearly and to justify the diagnosis and treatment. Entries in...: Protection of clinical record information. The facility must safeguard clinical record information against...
42 CFR 485.60 - Condition of participation: Clinical records.
Code of Federal Regulations, 2014 CFR
2014-10-01
... retrieval and compilation of information. (a) Standard: Content. Each clinical record must contain sufficient information to identify the patient clearly and to justify the diagnosis and treatment. Entries in...: Protection of clinical record information. The facility must safeguard clinical record information against...
42 CFR 485.60 - Condition of participation: Clinical records.
Code of Federal Regulations, 2011 CFR
2011-10-01
... retrieval and compilation of information. (a) Standard: Content. Each clinical record must contain sufficient information to identify the patient clearly and to justify the diagnosis and treatment. Entries in...: Protection of clinical record information. The facility must safeguard clinical record information against...
48 CFR 9.105-1 - Obtaining information.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 48 Federal Acquisition Regulations System 1 2013-10-01 2013-10-01 false Obtaining information. 9... information. (a) Before making a determination of responsibility, the contracting officer shall possess or obtain information sufficient to be satisfied that a prospective contractor currently meets the...
Protein Information Resource: a community resource for expert annotation of protein data
Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy
2001-01-01
The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-International databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041
Seo, Joann; Ivanovich, Jennifer; Goodman, Melody S; Biesecker, Barbara B; Kaphingst, Kimberly A
2017-06-01
We investigated what information women diagnosed with breast cancer at a young age would want to learn when genome sequencing results are returned. We conducted 60 semi-structured interviews with women diagnosed with breast cancer at age 40 or younger. We examined what specific information participants would want to learn across result types and for each type of result, as well as how much information they would want. Genome sequencing was not offered to participants as part of the study. Two coders independently coded interview transcripts; analysis was conducted using NVivo10. Across result types, participants wanted to learn about health implications, risk and prevalence in quantitative terms, causes of variants, and causes of diseases. Participants wanted to learn actionable information for variants affecting risk of preventable or treatable disease, medication response, and carrier status. The amount of desired information differed for variants affecting risk of unpreventable or untreatable disease, with uncertain significance, and not health-related. Women diagnosed with breast cancer at a young age recognize the value of genome sequencing results in identifying potential causes and effective treatments and expressed interest in using the information to help relatives and to further understand their other health risks. Our findings can inform the development of effective feedback strategies for genome sequencing that meet patients' information needs and preferences.
The Malarial Host-Targeting Signal Is Conserved in the Irish Potato Famine Pathogen
Liolios, Konstantinos; Win, Joe; Kanneganti, Thirumala-Devi; Young, Carolyn; Kamoun, Sophien; Haldar, Kasturi
2006-01-01
Animal and plant eukaryotic pathogens, such as the human malaria parasite Plasmodium falciparum and the potato late blight agent Phytophthora infestans, are widely divergent eukaryotic microbes. Yet they both produce secretory virulence and pathogenic proteins that alter host cell functions. In P. falciparum, export of parasite proteins to the host erythrocyte is mediated by leader sequences shown to contain a host-targeting (HT) motif centered on an RxLx (E, D, or Q) core: this motif appears to signify a major pathogenic export pathway with hundreds of putative effectors. Here we show that a secretory protein of P. infestans, which is perceived by plant disease resistance proteins and induces hypersensitive plant cell death, contains a leader sequence that is equivalent to the Plasmodium HT-leader in its ability to export fusion of green fluorescent protein (GFP) from the P. falciparum parasite to the host erythrocyte. This export is dependent on an RxLR sequence conserved in P. infestans leaders, as well as in leaders of all ten secretory oomycete proteins shown to function inside plant cells. The RxLR motif is also detected in hundreds of secretory proteins of P. infestans, Phytophthora sojae, and Phytophthora ramorum and has high value in predicting host-targeted leaders. A consensus motif further reveals E/D residues enriched within ~25 amino acids downstream of the RxLR, which are also needed for export. Together the data suggest that in these plant pathogenic oomycetes, a consensus HT motif may reside in an extended sequence of ~25–30 amino acids, rather than in a short linear sequence. Evidence is presented that although the consensus is much shorter in P. falciparum, information sufficient for vacuolar export is contained in a region of ~30 amino acids, which includes sequences flanking the HT core. Finally, positional conservation between Phytophthora RxLR and P. falciparum RxLx (E, D, Q) is consistent with the idea that the context of their presentation is constrained. These studies provide the first evidence to our knowledge that eukaryotic microbes share equivalent pathogenic HT signals and thus conserved mechanisms to access host cells across plant and animal kingdoms that may present unique targets for prophylaxis across divergent pathogens. PMID:16733545
Serial data correlator/code translator
NASA Technical Reports Server (NTRS)
Morgan, L. E. (Inventor)
1982-01-01
A system for analyzing asynchronous signals containing bits of information for ensuring the validity of said signals, by sampling each bit of information a plurality of times, and feeding the sampled pieces of bits of information into a sequence controlled is described. The sequence controller has a plurality of maps or programs through which the sampled pieces of bits are stepped so as to identify the particular bit of information and determine the validity and phase of the bit. The step in which the sequence controller is clocked is controlled by a storage register. A data decoder decodes the information fed out of the storage register and feeds such information to shift registers for storage.
75 FR 39035 - Housing Choice Voucher (HCV) Family Self-Sufficiency (FSS) Program
Federal Register 2010, 2011, 2012, 2013, 2014
2010-07-07
...) Family Self-Sufficiency (FSS) Program AGENCY: Office of the Chief Information Officer, HUD. ACTION... Department is soliciting public comments on the subject proposal. The FSS program, which was established in... coordinate the use of public housing assistance and assistance under the Section 8 rental certificate and...
Owens, John
2009-01-01
Technological advances in the acquisition of DNA and protein sequence information and the resulting onrush of data can quickly overwhelm the scientist unprepared for the volume of information that must be evaluated and carefully dissected to discover its significance. Few laboratories have the luxury of dedicated personnel to organize, analyze, or consistently record a mix of arriving sequence data. A methodology based on a modern relational-database manager is presented that is both a natural storage vessel for antibody sequence information and a conduit for organizing and exploring sequence data and accompanying annotation text. The expertise necessary to implement such a plan is equal to that required by electronic word processors or spreadsheet applications. Antibody sequence projects maintained as independent databases are selectively unified by the relational-database manager into larger database families that contribute to local analyses, reports, interactive HTML pages, or exported to facilities dedicated to sophisticated sequence analysis techniques. Database files are transposable among current versions of Microsoft, Macintosh, and UNIX operating systems.
Worley, K C; Wiese, B A; Smith, R F
1995-09-01
BEAUTY (BLAST enhanced alignment utility) is an enhanced version of the NCBI's BLAST data base search tool that facilitates identification of the functions of matched sequences. We have created new data bases of conserved regions and functional domains for protein sequences in NCBI's Entrez data base, and BEAUTY allows this information to be incorporated directly into BLAST search results. A Conserved Regions Data Base, containing the locations of conserved regions within Entrez protein sequences, was constructed by (1) clustering the entire data base into families, (2) aligning each family using our PIMA multiple sequence alignment program, and (3) scanning the multiple alignments to locate the conserved regions within each aligned sequence. A separate Annotated Domains Data Base was constructed by extracting the locations of all annotated domains and sites from sequences represented in the Entrez, PROSITE, BLOCKS, and PRINTS data bases. BEAUTY performs a BLAST search of those Entrez sequences with conserved regions and/or annotated domains. BEAUTY then uses the information from the Conserved Regions and Annotated Domains data bases to generate, for each matched sequence, a schematic display that allows one to directly compare the relative locations of (1) the conserved regions, (2) annotated domains and sites, and (3) the locally aligned regions matched in the BLAST search. In addition, BEAUTY search results include World-Wide Web hypertext links to a number of external data bases that provide a variety of additional types of information on the function of matched sequences. This convenient integration of protein families, conserved regions, annotated domains, alignment displays, and World-Wide Web resources greatly enhances the biological informativeness of sequence similarity searches. BEAUTY searches can be performed remotely on our system using the "BCM Search Launcher" World-Wide Web pages (URL is < http:/ /gc.bcm.tmc.edu:8088/ search-launcher/launcher.html > ).
Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco
2016-03-01
Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences. Copyright © 2016 Elsevier B.V. All rights reserved.
Identifying the Critical Time Period for Information Extraction when Recognizing Sequences of Play
ERIC Educational Resources Information Center
North, Jamie S.; Williams, A. Mark
2008-01-01
The authors attempted to determine the critical time period for information extraction when recognizing play sequences in soccer. Although efforts have been made to identify the perceptual information underpinning such decisions, no researchers have attempted to determine "when" this information may be extracted from the display. The authors…
Cloud-based adaptive exon prediction for DNA analysis.
Putluri, Srinivasareddy; Zia Ur Rahman, Md; Fathima, Shaik Yasmeen
2018-02-01
Cloud computing offers significant research and economic benefits to healthcare organisations. Cloud services provide a safe place for storing and managing large amounts of such sensitive data. Under conventional flow of gene information, gene sequence laboratories send out raw and inferred information via Internet to several sequence libraries. DNA sequencing storage costs will be minimised by use of cloud service. In this study, the authors put forward a novel genomic informatics system using Amazon Cloud Services, where genomic sequence information is stored and accessed for processing. True identification of exon regions in a DNA sequence is a key task in bioinformatics, which helps in disease identification and design drugs. Three base periodicity property of exons forms the basis of all exon identification techniques. Adaptive signal processing techniques found to be promising in comparison with several other methods. Several adaptive exon predictors (AEPs) are developed using variable normalised least mean square and its maximum normalised variants to reduce computational complexity. Finally, performance evaluation of various AEPs is done based on measures such as sensitivity, specificity and precision using various standard genomic datasets taken from National Center for Biotechnology Information genomic sequence database.
Identifying functionally informative evolutionary sequence profiles.
Gil, Nelson; Fiser, Andras
2018-04-15
Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.
High density FTA plates serve as efficient long-term sample storage for HLA genotyping.
Lange, V; Arndt, K; Schwarzelt, C; Boehme, I; Giani, A S; Schmidt, A H; Ehninger, G; Wassmuth, R
2014-02-01
Storage of dried blood spots (DBS) on high-density FTA(®) plates could constitute an appealing alternative to frozen storage. However, it remains controversial whether DBS are suitable for high-resolution sequencing of human leukocyte antigen (HLA) alleles. Therefore, we extracted DNA from DBS that had been stored for up to 4 years, using six different methods. We identified those extraction methods that recovered sufficient high-quality DNA for reliable high-resolution HLA sequencing. Further, we confirmed that frozen whole blood samples that had been stored for several years can be transferred to filter paper without compromising HLA genotyping upon extraction. Concluding, DNA derived from high-density FTA(®) plates is suitable for high-resolution HLA sequencing, provided that appropriate extraction protocols are employed. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Recominant Pinoresino-Lariciresinol Reductase, Recombinant Dirigent Protein And Methods Of Use
Lewis, Norman G.; Davin, Laurence B.; Dinkova-Kostova, Albena T.; Fujita, Masayuki , Gang; David R. , Sarkanen; Simo , Ford; Joshua D.
2003-10-21
Dirigent proteins and pinoresinol/lariciresinol reductases have been isolated, together with cDNAs encoding dirigent proteins and pinoresinol/lariciresinol reductases. Accordingly, isolated DNA sequences are provided from source species Forsythia intermedia, Thuja plicata, Tsuga heterophylla, Eucommia ulmoides, Linum usitatissimum, and Schisandra chinensis, which code for the expression of dirigent proteins and pinoresinol/lariciresinol reductases. In other aspects, replicable recombinant cloning vehicles are provided which code for dirigent proteins or pinoresinol/lariciresinol reductases or for a base sequence sufficiently complementary to at least a portion of dirigent protein or pinoresinol/lariciresinol reductase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding dirigent protein or pinoresinol/lariciresinol reductase. Thus, systems and methods are provided for the recombinant expression of dirigent proteins and/or pinoresinol/lariciresinol reductases.